ebxml-core message

Subject: RE: Tag Languages, UID's etc.

From: "Arnold, Curt" <Curt.Arnold@hyprotech.com>
To: "'ebxml-core@lists.ebxml.org'" <ebxml-core@lists.ebxml.org>
Date: Thu, 25 Jan 2001 12:42:59 -0700

I would actually hope the working groups are only paying lip service (I had already typed that term before Mary Blantz's comment) to the UID concept, since it seem antithetical to several key XML
design principles. If ebXML is really going to be UID driven, then the name should be changed to eb[something other than XML]

1. The tag name becomes just a comment.

All the other XML infrastructure uses the namespace qualified tag name as the primary means of declaring meaning and allowable structure. There is no mechanism, for example, in XML schema to match a
content model to a specific value of an arbitrary attribute such as UID.

2. Interpretation requires either:

a) fetching an external resource

Fetching an arbitrary DTD to provide the UID's to enable a message to be interpreted is unacceptible. All sorts of denial of service attacks could be launched by throwing messages with spurious DTD's
at a server (per David Megginson's "When XML turns Ugly" talk at XTech 2000).

If you don't dynamically fetch a DTD, then you are then creating an interoperability problem since servers would each have their catalog of known DTD's used to provide tag name <-> UID matching and
messages in less prominent languages would not be universally accepted.

b) Using the internal subset

Which few processors will build. For example, XSLT won't build an internal subset.

c) Putting the UID explicitly on each element

This is case, if you have:

<Invoice UID="{208AA0C4-8612-4327-823C-784278F0D0BE}"/>

Why not just format uid so that it is a valid name and do:

<_208AA0C4-8612-4327-823C-784278F0D0BE ...>

Then at least you can do schema-based validation.

3. Locale favoritism is attacked at the cost of making the messages hard to comprehend in all locales.

One of the design goals for XML was that it should support human legible documents. For a human to interpret an UID based document, you have to 1) read the tag name, 2) look up the tag name in the
DTD to find the UID, look up the UID to find the meaning.

The combination of a URI and an XML name ("http://www.ebxml.org/Namespace/Purchasing" + "invoice") is sufficient to provide the link to a "wealth of semantic information" using RDF and other existing
technologies.

The value of localizing machine-to-machine communications is lost on me since the processing infrastructure (programming languages, operating systems, etc) already have a strong English bias and
programmers will already have some familiarity with English.

However, as an native English speaker, I would much rather have the tag name for an invoice be <ebxml:Rechnung>, <ebxml:Facture>, <ebxml:factura> than <ebxml:_208AA0C4-8612-4327-823C-784278F0D0BE.
Just pick some human language as the canonical representation. Then XSLT transforms can be used, when needed, to convert the document to a locale-specific form for human analysis. But the processing
systems shouldn't have to be burdened with having 100+ synonyms for every concept.

Follow-Ups:
- Re: Tag Languages, UID's etc.
  - From: Martin Bryan <mtbryan@sgml.u-net.com>