ebxml-core message

Subject: RE: Tag Languages, UID's etc.
From: "Blantz, Mary Kay" <mblantz@netfish.com>
To: "'Arnold, Curt'" <Curt.Arnold@hyprotech.com>,"'ebxml-core@lists.ebxml.org'" <ebxml-core@lists.ebxml.org>
Date: Thu, 25 Jan 2001 13:13:37 -0800
Arnold,

I guess I wasn't clear.  I don't think (and Hartmut can correct me if I'm
wrong) that
we will be using the ID as a tag.  The tag, in my opinion, must be human
readable.
I also think it should be clear and short. The ID would be in the
dictionary, or lexicon,
or catalog, or whatever we are calling it these days.

For example, if the ID were 1234567, the English dictionary name might be
FinancialAccountIdentifier.
It would be something different in French, Japanese, German, etc, but the ID
would still be
1234567.  We X12/EWG types could also figure out which of our data elements
was 1234567.
And folks from OAGI or RosettaNet or any other industry consortium could
also match to
the ID, based on the definition.  I think that's what syntax neutrality is
all about. 

Have no idea what the 'tag' would be, but something meaningful like
'BankAcct'.  The business
people from the financial arena are probably best at choosing that.

If what we want is to bring the small, non-EDI enterprises into the fold (so
to speak) we
have to realize that many of them will start out just viewing the XML on
their
computers.  A meaningless tag, or a very long one, will be useless to them.

I bet this will start another round, huh?

MK

-----Original Message-----
From: Arnold, Curt [mailto:Curt.Arnold@hyprotech.com]
Sent: Thursday, January 25, 2001 2:43 PM
To: 'ebxml-core@lists.ebxml.org'
Subject: RE: Tag Languages, UID's etc.


I would actually hope the working groups are only paying lip service (I had
already typed that term before Mary Blantz's comment) to the UID concept,
since it seem antithetical to several key XML
design principles.  If ebXML is really going to be UID driven, then the name
should be changed to eb[something other than XML]

1. The tag name becomes just a comment.

All the other XML infrastructure uses the namespace qualified tag name as
the primary means of declaring meaning and allowable structure.  There is no
mechanism, for example, in XML schema to match a
content model to a specific value of an arbitrary attribute such as UID.

2. Interpretation requires either:

a) fetching an external resource

Fetching an arbitrary DTD to provide the UID's to enable a message to be
interpreted is unacceptible.  All sorts of denial of service attacks could
be launched by throwing messages with spurious DTD's
at a server (per David Megginson's "When XML turns Ugly" talk at XTech
2000).  

If you don't dynamically fetch a DTD, then you are then creating an
interoperability problem since servers would each have their catalog of
known DTD's used to provide tag name <-> UID matching and
messages in less prominent languages would not be universally accepted.

b) Using the internal subset

Which few processors will build.  For example, XSLT won't build an internal
subset.

c) Putting the UID explicitly on each element

This is case, if you have:

<!--  Invoice is just an comment  -->
<Invoice UID="{208AA0C4-8612-4327-823C-784278F0D0BE}"/>

Why not just format uid so that it is a valid name and do:

<!-- Invoice  -->
<_208AA0C4-8612-4327-823C-784278F0D0BE ...>

Then at least you can do schema-based validation.

3. Locale favoritism is attacked at the cost of making the messages hard to
comprehend in all locales.

One of the design goals for XML was that it should support human legible
documents.  For a human to interpret an UID based document, you have to 1)
read the tag name, 2) look up the tag name in the
DTD to find the UID, look up the UID to find the meaning.

The combination of a URI and an XML name
("http://www.ebxml.org/Namespace/Purchasing" + "invoice") is sufficient to
provide the link to a "wealth of semantic information" using RDF and other
existing
technologies.

The value of localizing machine-to-machine communications is lost on me
since the processing infrastructure (programming languages, operating
systems, etc) already have a strong English bias and
programmers will already have some familiarity with English.

However, as an native English speaker, I would much rather have the tag name
for an invoice be <ebxml:Rechnung>, <ebxml:Facture>, <ebxml:factura> than
<ebxml:_208AA0C4-8612-4327-823C-784278F0D0BE.
Just pick some human language as the canonical representation.  Then XSLT
transforms can be used, when needed, to convert the document to a
locale-specific form for human analysis.  But the processing
systems shouldn't have to be burdened with having 100+ synonyms for every
concept.