[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Versioning
Hello all, I know I am new to the list, but I have been "forced" to get involved due to the OTA's decision to accept whatever comes out of ebXML. I have been pushing for the OTA to accept "parsing friendly" versioning. I believe they have. However, some keep referring to the "way that ebXML does it". So... Please find attached, my analysis on different versioning options and their impact on different parsing techniques. Also note that, IMO, the use of namespaces for messaging is a BAD idea! It does not bring sufficient value for the pain it inflicts (bad ROI). Thanks, George Smith Highwire 206-812-4614 x 228
XML Parsing Techniques/Tools by George Smith 28 Sept 00 The primary XML Parsing Techniques/Tools: Custom 100% Application based, SAX, or XML event driven, DOM based with referenced schema validation, & Schema Compiler. Details for each of the above: Custom - A program or class custom developed to "deal" with a particular XML document. Pros: Can be VERY fast. Supports direct conversion to "internal" desired representation. Cons: Development is incredibly labor intensive. Therefor, the need to support many massages usually "dissuades" this approach. All validation is application's responsibility. Any default or fixed values, or enumerations specified in the schema must be duplicated in the code. SAX - A parsing system that parses "tags", and generates events on XML "parts". Pros: Very Fast. Supports direct conversion to "internal" desired representation. Cons: Only validates tag structure. Does not even validate that the XML is "well-formed". Development for complex messages is labor intensive. Almost all validation is application's responsibility. Any default or fixed values, or enumerations specified in the schema must be duplicated in the code. Note: Since "most" XML messaging is currently "trusted", this approach is currently the most popular. DOM - A parsing system that validates that the XML document "conforms" to the schema indicated "within" (at the top of) the XML message. Pros: Creates a DOM tree, that can be "queried". If you trust the sender then the XML is guaranteed to be well-formed, and properly structured to the schema. Any default or fixed values, or enumerations specified in the schema are automatically "handled". Cons: The DOM access API is "painful" to use. DOM is "claimed" to be memory heavy. This is IMO, usually due to the fact that the DOM tree is converted to a desired "internal" representation. This means that almost every node/data element is in memory twice (once for the DOM supporting Object and once for the developer's supporting Object). Since the schema is "interpreted", this form of validation is considered "too slow" by many development shops. Since "most" DOM validating parsers only support DTDs, the "level" of "automatic" validation is limited to XML structure. Therefor, all data validation is the application's responsibility. If the sender is NOT "trusted", then how can you trust that they have indicated to validate against the correct DTD? To solve this problem, the DTD reference MUST be either at a "neutral" public location, or at the sender's site. This would allow the receiver to validate that the validation is to the correct DTD. This presumes that the parser provides access to the DTD reference. Schema Compiler - A system that takes a "public" schema (and possibly an augmentation file) and generates a program, class, or class tree to parse (and validate?) an XML message. There are a number of these systems "out there". Some of the companies using, developing, or offering these are (or the product names): Oracle, SUN, ConXtra, DXML (product name), & jDOM (product name). Some of these "solve" the DOM tree "access" problem. Some solve the "untrusted" validation problem. Some solve BOTH problems. Pros: Creates a "DOM like" tree, that can be "queried". May solve the DOM tree "access" problem. May solve the "untrusted" validation problem. If it does, then it does it "fast". Some either support augmented schema files, or schema augmentation files. So..., these can raise the validation level to include data typing, and possibly some biz rules (like no past dates). This "minimizes" the remaining validation that is the application's responsibility. Any default or fixed values, or enumerations specified in the schema are automatically "handled". Cons: If they don't solve the DOM tree "access" problem, the DOM access API is "painful" to use. If the resulting "DOM like" tree needs to be converted to a desired "internal" representation, then almost every node/data element is in memory twice (once for the "DOM like" supporting Object and once for the developer's supporting Object). This can be memory heavy. If the receiver expects to receive multiple messages (or versions), then it must be "easy" to incorporate the "switch" into the resulting generated code/program. This is often done by creating a "master" schema, that indicates the "switch" via element names. This requires that the "switch" NOT be based on data content! With some of these products, validation is problematical, and hence these should NOT be used for "untrusted" message processing. Note: Picking (or creating) the right Schema Compiler can dramatically reduce the effort to support multiple messages with multiple versions. And all this comes with little to NO performance penalties! Message differentiation & Versioning: Any form of versioning of messages should not favor, and more importantly, not preclude any of the above tool sets. The ONLY form of versioning that is guaranteed to work with ALL of the above tool sets is element "name" based versions. This same logic applies to major functional differences (actions) which might be represented by different schema. Furthermore, the closer that this versioned element or "action" id is to the root of an XML document, the "friendlier" the multiple message specification is to the Schema Compiler option. Note: For a graphical perspective on the Two Stages of XML Message Validation, please print the Validation.gif (or view the Validation.vsd) file. -The End-
Pros & Cons of different versioning options... by George Smith 29 Sept 00 Intro: If there is a need to implement a server to handle "just" the OTA profile messages, then there are currently four messages that must be supported. These are: Create, Read, Update, & Delete. To both validate, and "do something", someplace in the XML, there must be an indication of the action desired (and also the "data" to perform this action on). The OTA currently does this with it's "action" verb elements (tag names). Now if in addition: this server needs to support more than one client, AND, it needs to support these clients for "years", AND the specification of "what" makes up a profile changes, AND the clients can NOT be expected to change to use the new specifications "simultaneously", THEN the server must support the "simultaneous" (more of less) receipt of multiple versions. If the server only supports three versions "simultaneously" (at any one time), then the four "simple" messages, generate the need to actually support twelve (12) messages. Now most likely, this server will need to support both: more than three versions "simultaneously", and more then just the four OTA profile messages. For example, if there are fifteen messages, and four versions, then the server is really supporting sixty (60) messages. To select both the appropriate validation and action from these 10s to 1000s of messages, a combination of version identification and action identification needs to be present. This version identification is currently present in the OTA's root element name. The exact location of the (or more than one) versioned element name is flexible, given that it is above (closer to the root) than any change related to the version change. The question has been raised of NOT using versioning in the element names. Where else could versioning be, and what are the consequences (Pros & Cons). Background: There are four basic XML parsing techniques (please see XMLparsingTechniques.txt for the details). These are: Custom (CUSTOM) 100% Application based, SAX, (SAX) or XML event driven, DOM (DOM) based with referenced schema validation, & Schema Compiler (SCHEMA-COMP). Assumption: The version identification method should NOT preclude the use of any of these techniques. In addition, it would be "nice" if the version identification method was "easy" to use with all of these techniques. Version Options: The version MUST be available as part of the message, or "in something" attached to the message. Some of the options are: 1) In the schema file (referred to by the !DOCTYPE SYSTEM field). 2) In the schema file name (from the !DOCTYPE SYSTEM field). 3) In the schema file "path" (from the !DOCTYPE SYSTEM field). 4) In the !DOCTYPE PUBLIC field. 5) In an Attribute, as a value. 6) In an Element, as a value. 7) As part of a NameSpace URI. 8) As part of an Attribute's name. 9) As part of an Element's name. Pros, Cons, & Opinion: Option 1) In the schema file. Pros: There is NO version "dirtiness" in the XML. Cons: Only the "DOM" parser "bothers" to fetch the schema. Opin: This option is a non-starter! Option 2 & 3 & 4) In the !DOCTYPE. Pros: There is NO version "dirtiness" in the XML except in the !DOCTYPE. Cons: Many communicating systems that do not use a "DOM" parser do not "bother" to sent the !DOCTYPE field. It becomes difficult to specify, if a private schema reference is desired. Opin: These options should be non-starters! Option 5) In an Attribute, as a value. Pros: This is a common practice. It is VERY easy to use with the CUSTOM, "SAX", & "DOM" parsers. It can be specified as a REQUIRED attribute with an enumeration list of "one" option. (It should NOT be FIXED, as that technically makes it optional) Cons: This option prohibits anyone from creating a "master" (or complete) schema by combining "all" the individual version's schema. This is because, "choices" in schema can ONLY be made against element names! This "master" schema is then used with the SCHEMA-COMP to produce a single "complete" validation & parsing sub-system. If this option is used with the SCHEMA-COMP option, then the resulting "compiled" code must be "adjusted" to handle the version "switch" outside of normal schema "choice" processes. Some SCHEMA-COMPs may not allow this "adjustment". Opin: This option, due to its popularity, would be my second choice! Option 6) In an Element, as a value. Pros: It is VERY easy to use with the CUSTOM, "SAX", & "DOM" parsers. Cons: In addition to all the Cons of Option 5, if the schema is a DTD, then it is NOT possible to specify/enforce the version. Opin: This option should be a non-starter! Option 7) As part of a NameSpace URI. Pros: The NameSpace as a URI, "can be" both globally explicit, and descriptive. Cons: Very few parsers of ANY type are currently NameSpace "friendly", this could eliminate many "DOM" options. DTDs and NameSpaces are (if you follow the rules for NameSpaces) basically incompatible! Only some SCHEMA-COMP parsers "could" handle this option with the same concerns as those of Option 5. Opin: Sine NameSpace use for messaging is "EVIL", this option should be a non-starter! Option 8) As part of an Attribute's name. Pros: It is VERY easy to use with the CUSTOM, "SAX", & "DOM" parsers. Cons: What would be the value of the attribute? No one (that I am aware of) uses this option. Opin: This option should be a non-starter! Option 9) As part of an Element's name. Pros: This is a common practice (including the OTA ver 1). It is VERY easy to use with ALL the parser techniques. It allows the creation of "master" schema. Cons: NONE! Opin: This option is my first choice! Conclusions: The only "reasonable" options are the two in current common practice: Option 5) In an Attribute, as a value. & Option 9) As part of an Element's name. Option 9) As part of an Element's name. Gets my vote due to reasons: 1) It is what the OTA is already "using" (and agreed to), & 2) It is "friendly" to ALL parsing techniques. -The End-
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC