[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Versioning
Hello all, I know I am new to the list, but I have been "forced" to get involved due to the OTA's decision to accept whatever comes out of ebXML. I have been pushing for the OTA to accept "parsing friendly" versioning. I believe they have. However, some keep referring to the "way that ebXML does it". So... Please find attached, my analysis on different versioning options and their impact on different parsing techniques. Also note that, IMO, the use of namespaces for messaging is a BAD idea! It does not bring sufficient value for the pain it inflicts (bad ROI). Thanks, George Smith Highwire 206-812-4614 x 228
XML Parsing Techniques/Tools
by
George Smith
28 Sept 00
The primary XML Parsing Techniques/Tools:
Custom 100% Application based,
SAX, or XML event driven,
DOM based with referenced schema validation, &
Schema Compiler.
Details for each of the above:
Custom - A program or class custom developed to "deal"
with a particular XML document.
Pros: Can be VERY fast.
Supports direct conversion to "internal"
desired representation.
Cons: Development is incredibly labor intensive.
Therefor, the need to support many massages
usually "dissuades" this approach.
All validation is application's responsibility.
Any default or fixed values, or enumerations
specified in the schema must be duplicated in
the code.
SAX - A parsing system that parses "tags", and generates
events on XML "parts".
Pros: Very Fast.
Supports direct conversion to "internal"
desired representation.
Cons: Only validates tag structure. Does not even
validate that the XML is "well-formed".
Development for complex messages is labor
intensive.
Almost all validation is application's
responsibility.
Any default or fixed values, or enumerations
specified in the schema must be duplicated in
the code.
Note: Since "most" XML messaging is currently
"trusted", this approach is currently the
most popular.
DOM - A parsing system that validates that the XML
document "conforms" to the schema indicated
"within" (at the top of) the XML message.
Pros: Creates a DOM tree, that can be "queried".
If you trust the sender then the XML is
guaranteed to be well-formed, and properly
structured to the schema.
Any default or fixed values, or enumerations
specified in the schema are automatically
"handled".
Cons: The DOM access API is "painful" to use.
DOM is "claimed" to be memory heavy. This is
IMO, usually due to the fact that the DOM tree
is converted to a desired "internal"
representation. This means that almost every
node/data element is in memory twice (once for
the DOM supporting Object and once for the
developer's supporting Object).
Since the schema is "interpreted", this form of
validation is considered "too slow" by many
development shops.
Since "most" DOM validating parsers only support
DTDs, the "level" of "automatic" validation is
limited to XML structure. Therefor, all data
validation is the application's responsibility.
If the sender is NOT "trusted", then how can you
trust that they have indicated to validate
against the correct DTD? To solve this problem,
the DTD reference MUST be either at a "neutral"
public location, or at the sender's site. This
would allow the receiver to validate that the
validation is to the correct DTD. This presumes
that the parser provides access to the DTD
reference.
Schema Compiler - A system that takes a "public" schema (and possibly
an augmentation file) and generates a program, class,
or class tree to parse (and validate?) an XML message.
There are a number of these systems "out there".
Some of the companies using, developing, or offering
these are (or the product names):
Oracle,
SUN,
ConXtra,
DXML (product name), &
jDOM (product name).
Some of these "solve" the DOM tree "access" problem.
Some solve the "untrusted" validation problem.
Some solve BOTH problems.
Pros: Creates a "DOM like" tree, that can be "queried".
May solve the DOM tree "access" problem.
May solve the "untrusted" validation problem.
If it does, then it does it "fast".
Some either support augmented schema files, or
schema augmentation files. So..., these can
raise the validation level to include data
typing, and possibly some biz rules (like no
past dates). This "minimizes" the remaining
validation that is the application's
responsibility.
Any default or fixed values, or enumerations
specified in the schema are automatically
"handled".
Cons: If they don't solve the DOM tree "access"
problem, the DOM access API is "painful" to use.
If the resulting "DOM like" tree needs to be
converted to a desired "internal"
representation, then almost every node/data
element is in memory twice (once for the "DOM
like" supporting Object and once for the
developer's supporting Object). This can be
memory heavy.
If the receiver expects to receive multiple
messages (or versions), then it must be "easy"
to incorporate the "switch" into the resulting
generated code/program. This is often done by
creating a "master" schema, that indicates the
"switch" via element names. This requires that
the "switch" NOT be based on data content!
With some of these products, validation is
problematical, and hence these should NOT be
used for "untrusted" message processing.
Note: Picking (or creating) the right Schema Compiler
can dramatically reduce the effort to support
multiple messages with multiple versions. And
all this comes with little to NO performance
penalties!
Message differentiation & Versioning:
Any form of versioning of messages should not favor, and more
importantly, not preclude any of the above tool sets. The ONLY form of
versioning that is guaranteed to work with ALL of the above tool sets
is element "name" based versions. This same logic applies to major
functional differences (actions) which might be represented by different
schema. Furthermore, the closer that this versioned element or "action"
id is to the root of an XML document, the "friendlier" the multiple
message specification is to the Schema Compiler option.
Note: For a graphical perspective on the Two Stages of XML Message
Validation, please print the Validation.gif (or view the
Validation.vsd) file.
-The End-
Pros & Cons of different versioning options...
by
George Smith
29 Sept 00
Intro:
If there is a need to implement a server to handle "just" the OTA
profile messages, then there are currently four messages that must be
supported. These are:
Create,
Read,
Update, &
Delete.
To both validate, and "do something", someplace in the XML, there must
be an indication of the action desired (and also the "data" to perform
this action on). The OTA currently does this with it's "action" verb
elements (tag names).
Now if in addition: this server needs to support more than one client,
AND, it needs to support these clients for "years", AND the
specification of "what" makes up a profile changes, AND the clients
can NOT be expected to change to use the new specifications
"simultaneously", THEN the server must support the "simultaneous"
(more of less) receipt of multiple versions.
If the server only supports three versions "simultaneously" (at any
one time), then the four "simple" messages, generate the need to
actually support twelve (12) messages. Now most likely, this server
will need to support both: more than three versions "simultaneously",
and more then just the four OTA profile messages. For example, if
there are fifteen messages, and four versions, then the server is
really supporting sixty (60) messages.
To select both the appropriate validation and action from these 10s to
1000s of messages, a combination of version identification and action
identification needs to be present. This version identification is
currently present in the OTA's root element name. The exact location
of the (or more than one) versioned element name is flexible, given
that it is above (closer to the root) than any change related to the
version change.
The question has been raised of NOT using versioning in the element
names. Where else could versioning be, and what are the consequences
(Pros & Cons).
Background:
There are four basic XML parsing techniques (please see
XMLparsingTechniques.txt for the details). These are:
Custom (CUSTOM) 100% Application based,
SAX, (SAX) or XML event driven,
DOM (DOM) based with referenced schema validation, &
Schema Compiler (SCHEMA-COMP).
Assumption: The version identification method should NOT preclude the
use of any of these techniques. In addition, it would be "nice" if
the version identification method was "easy" to use with all of these
techniques.
Version Options:
The version MUST be available as part of the message, or "in
something" attached to the message. Some of the options are:
1) In the schema file (referred to by the !DOCTYPE SYSTEM field).
2) In the schema file name (from the !DOCTYPE SYSTEM field).
3) In the schema file "path" (from the !DOCTYPE SYSTEM field).
4) In the !DOCTYPE PUBLIC field.
5) In an Attribute, as a value.
6) In an Element, as a value.
7) As part of a NameSpace URI.
8) As part of an Attribute's name.
9) As part of an Element's name.
Pros, Cons, & Opinion:
Option 1) In the schema file.
Pros:
There is NO version "dirtiness" in the XML.
Cons:
Only the "DOM" parser "bothers" to fetch the schema.
Opin:
This option is a non-starter!
Option 2 & 3 & 4) In the !DOCTYPE.
Pros:
There is NO version "dirtiness" in the XML except in the
!DOCTYPE.
Cons:
Many communicating systems that do not use a "DOM" parser do
not "bother" to sent the !DOCTYPE field.
It becomes difficult to specify, if a private schema
reference is desired.
Opin:
These options should be non-starters!
Option 5) In an Attribute, as a value.
Pros:
This is a common practice.
It is VERY easy to use with the CUSTOM, "SAX", & "DOM"
parsers.
It can be specified as a REQUIRED attribute with an
enumeration list of "one" option. (It should NOT be FIXED,
as that technically makes it optional)
Cons:
This option prohibits anyone from creating a "master" (or
complete) schema by combining "all" the individual version's
schema. This is because, "choices" in schema can ONLY be
made against element names! This "master" schema is then
used with the SCHEMA-COMP to produce a single "complete"
validation & parsing sub-system.
If this option is used with the SCHEMA-COMP option, then the
resulting "compiled" code must be "adjusted" to handle the
version "switch" outside of normal schema "choice"
processes. Some SCHEMA-COMPs may not allow this
"adjustment".
Opin:
This option, due to its popularity, would be my second
choice!
Option 6) In an Element, as a value.
Pros:
It is VERY easy to use with the CUSTOM, "SAX", & "DOM"
parsers.
Cons:
In addition to all the Cons of Option 5, if the schema is a
DTD, then it is NOT possible to specify/enforce the version.
Opin:
This option should be a non-starter!
Option 7) As part of a NameSpace URI.
Pros:
The NameSpace as a URI, "can be" both globally explicit, and
descriptive.
Cons:
Very few parsers of ANY type are currently NameSpace
"friendly", this could eliminate many "DOM" options.
DTDs and NameSpaces are (if you follow the rules for
NameSpaces) basically incompatible!
Only some SCHEMA-COMP parsers "could" handle this option
with the same concerns as those of Option 5.
Opin:
Sine NameSpace use for messaging is "EVIL", this option
should be a non-starter!
Option 8) As part of an Attribute's name.
Pros:
It is VERY easy to use with the CUSTOM, "SAX", & "DOM"
parsers.
Cons:
What would be the value of the attribute?
No one (that I am aware of) uses this option.
Opin:
This option should be a non-starter!
Option 9) As part of an Element's name.
Pros:
This is a common practice (including the OTA ver 1).
It is VERY easy to use with ALL the parser techniques.
It allows the creation of "master" schema.
Cons:
NONE!
Opin:
This option is my first choice!
Conclusions:
The only "reasonable" options are the two in current common practice:
Option 5) In an Attribute, as a value. &
Option 9) As part of an Element's name.
Option 9) As part of an Element's name. Gets my vote due to reasons:
1) It is what the OTA is already "using" (and agreed to), &
2) It is "friendly" to ALL parsing techniques.
-The End-
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC