ebxml-architecture message

Subject: Re: Shallow parse trees

From: Matthew MacKenzie <matt@xmlglobal.com>
To: "William J. Kammerer" <wkammerer@foresightcorp.com>, "ebXML-Architecture List" <ebXML-Architecture@lists.oasis-open.org>
Date: Sat, 11 Mar 2000 08:08:25 -0800

At 09:05 PM 3/10/00 -0500, William J. Kammerer wrote:
>Matthew MacKenzie asked why don't we just settle on "keep the parse tree
>as shallow as possible for the benefit of processing applications"?
>
>I'm not convinced that it would be all that beneficial,
>performance-wise, though I have few metrics to prove otherwise.

I have written more parser applications in C, Perl, and Java than I can 
count - and informal timing of many of the applications made the impression 
to me that deeply hierarchical (I hope I am using the right term :]) XML 
*with* dense content and attributes definitely takes longer to parse than 
the equivalent in a more flat hierarchy sans attributes.  I also have 
noticed, since I am the primary architect of an XML search engine, that the 
flatter XML is faster to search through because the element relationships 
are less complex.  Maybe your mileage varies.

>But it definitely is more natural to build up more complicated
>structures from smaller ones.   EDIFACT (and X12 to a lesser extent),
>with its copious use of qualifiers serving as adjectives, mimics the way
>our brains actually use language by building up the meaning, or
>"semantics," one piece at a time.   Trying to compress all the semantics
>of a business data unit into one XML tag, requiring the use of a
>complicated repository to extract its meaning, might seem
>counter-intuitive.

Ack!  I would never suggest counter intuitive XML or shoving all of the 
data into one element!  Notice I said that the data should be kept as 
shallow as possible?  Well, the "as possible" part is key.  If it is going 
to be a deep hierarchy in sections (e.g. Betty Harvey's comment about 
product databases which need to be very descriptive of the data, using a 
deeper hierarchy to mimic table linkages in an RDBMS), please - lay off on 
the attributes! Little trade-offs here and there will add up to a nice 
performance gain.

How about:

"Keep the xml as shallow _as possible_ for the benefit of processing 
applications.   It is acknowledged that deep,
hierarchical XML is the most efficient way to represent certain data, so 
when doing so, the actual structure of the fragment in question should make 
sparing use of attributes - and the hierarchy should go only as deep as is 
logically necessary."

--
Matthew MacKenzie
XML Global Technologies, Inc.
--
Matthew MacKenzie
CTO/VP R&D
XML Global Technologies, Inc.

References:
- Shallow parse trees
  - From: "William J. Kammerer" <wkammerer@foresightcorp.com>