[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Re: Shallow parse trees
At 09:05 PM 3/10/00 -0500, William J. Kammerer wrote: >Matthew MacKenzie asked why don't we just settle on "keep the parse tree >as shallow as possible for the benefit of processing applications"? > >I'm not convinced that it would be all that beneficial, >performance-wise, though I have few metrics to prove otherwise. I have written more parser applications in C, Perl, and Java than I can count - and informal timing of many of the applications made the impression to me that deeply hierarchical (I hope I am using the right term :]) XML *with* dense content and attributes definitely takes longer to parse than the equivalent in a more flat hierarchy sans attributes. I also have noticed, since I am the primary architect of an XML search engine, that the flatter XML is faster to search through because the element relationships are less complex. Maybe your mileage varies. >But it definitely is more natural to build up more complicated >structures from smaller ones. EDIFACT (and X12 to a lesser extent), >with its copious use of qualifiers serving as adjectives, mimics the way >our brains actually use language by building up the meaning, or >"semantics," one piece at a time. Trying to compress all the semantics >of a business data unit into one XML tag, requiring the use of a >complicated repository to extract its meaning, might seem >counter-intuitive. Ack! I would never suggest counter intuitive XML or shoving all of the data into one element! Notice I said that the data should be kept as shallow as possible? Well, the "as possible" part is key. If it is going to be a deep hierarchy in sections (e.g. Betty Harvey's comment about product databases which need to be very descriptive of the data, using a deeper hierarchy to mimic table linkages in an RDBMS), please - lay off on the attributes! Little trade-offs here and there will add up to a nice performance gain. How about: "Keep the xml as shallow _as possible_ for the benefit of processing applications. It is acknowledged that deep, hierarchical XML is the most efficient way to represent certain data, so when doing so, the actual structure of the fragment in question should make sparing use of attributes - and the hierarchy should go only as deep as is logically necessary." -- Matthew MacKenzie XML Global Technologies, Inc. -- Matthew MacKenzie CTO/VP R&D XML Global Technologies, Inc.
Powered by eList eXpress LLC