ebxml-core message

Subject: Re: Tag Languages, UID's etc.
From: Martin Bryan <mtbryan@sgml.u-net.com>
To: "Arnold, Curt" <Curt.Arnold@hyprotech.com>, ebxml-core@lists.ebxml.org
Date: Fri, 26 Jan 2001 08:33:55 +0000
Curt


> I would actually hope the working groups are only paying lip service (I
had already typed that term before Mary Blantz's comment) to the UID
concept, since it seem antithetical to several key XML
> design principles.  If ebXML is really going to be UID driven, then the
name should be changed to eb[something other than XML]

Whilst ebXML should not be UID driven, it should be paying more than lip
service to the idea.

> 1. The tag name becomes just a comment.
>
> All the other XML infrastructure uses the namespace qualified tag name as
the primary means of declaring meaning and allowable structure.  There is no
mechanism, for example, in XML schema to match a
> content model to a specific value of an arbitrary attribute such as UID.

The tag name becomes the "locally meaningful term" where locally is derived
from the contexts in which the core component is being used. For example,
for the insurance industry claims process in the US region under Federal
Code XYZ the name of the parties may need to be insurer and insuree. The
fact that  both of these map to the same UID, which is that of the generic
PartyType defined in the ebXML core component library shows that they are
similar in structure and role to the buyah and sellah elements defined by
the Electronic Goods industry ordering process in Bangladesh under Penal
Code 1234.

We cannot mandate that all applications use a single tag set. We can tell
people that if they refer to the ebXML:UID property assigned to the element
in the DTD/Schema they can find a formal definition of the semantics on
which the element in the message has been based.

> 2. Interpretation requires either:
>
> a) fetching an external resource
>
> Fetching an arbitrary DTD to provide the UID's to enable a message to be
interpreted is unacceptible.  All sorts of denial of service attacks could
be launched by throwing messages with spurious DTD's
> at a server (per David Megginson's "When XML turns Ugly" talk at XTech
2000).
>
> If you don't dynamically fetch a DTD, then you are then creating an
interoperability problem since servers would each have their catalog of
known DTD's used to provide tag name <-> UID matching and
> messages in less prominent languages would not be universally accepted.

In a B2B (as opposed to B2C) scenario interpretation is preceded by
agreement. During the agreement the names to be assigned by the partners to
message components are formally defined by exchange of DTD/Schema. This
DTD/Schema is then mapped to local naming conventions at each end. The only
time you need to reference the UIDs is at the time you are trying to
determine this mapping. Individual messages refer to the previously
exchanged DTD, which is locally cached. In this scenario there is no chance
of "denial of validation service".

> b) Using the internal subset
>
> Which few processors will build.  For example, XSLT won't build an
internal subset.

No need to. But what if you build your XSLT so that what it matches is the
fixed ebXML:UID attribute of the element that is provided by the DOM by
reference to the DTD/Schema? If my template starts:
<xslt:template match="@ebxml:UID='1234124'"> then I can match all
applications of the core component irrespective of the names assigned to
them locally using a single template.

> c) Putting the UID explicitly on each element
>
> This is case, if you have:
>
> <!--  Invoice is just an comment  -->
> <Invoice UID="{208AA0C4-8612-4327-823C-784278F0D0BE}"/>
>
> Why not just format uid so that it is a valid name and do:
>
> <!-- Invoice  -->
> <_208AA0C4-8612-4327-823C-784278F0D0BE ...>
>
> Then at least you can do schema-based validation.

Again, this is not necessary. The DTD is known to the partneers of the
agreement. It provides a fixed attribute for all ebXML derived elements. My
templates can be based on this. I don't need a transformation. I simply need
to validate my XML file as part of the XSLT process.

> 3. Locale favoritism is attacked at the cost of making the messages hard
to comprehend in all locales.

Another mistaken concept. Why would an Insurance application need to be able
to comprehend an Electronic Goods purchasing application? Messages are
context dependent. Each message is only comprehensible in a specific set of
locales that share a vocabulary.

> One of the design goals for XML was that it should support human legible
documents.  For a human to interpret an UID based document, you have to 1)
read the tag name, 2) look up the tag name in the
> DTD to find the UID, look up the UID to find the meaning.
>
> The combination of a URI and an XML name
("http://www.ebxml.org/Namespace/Purchasing" + "invoice") is sufficient to
provide the link to a "wealth of semantic information" using RDF and other
existing
> technologies.
>
> The value of localizing machine-to-machine communications is lost on me
since the processing infrastructure (programming languages, operating
systems, etc) already have a strong English bias and
> programmers will already have some familiarity with English.
>
> However, as an native English speaker, I would much rather have the tag
name for an invoice be <ebxml:Rechnung>, <ebxml:Facture>, <ebxml:factura>
than <ebxml:_208AA0C4-8612-4327-823C-784278F0D0BE.
> Just pick some human language as the canonical representation.  Then XSLT
transforms can be used, when needed, to convert the document to a
locale-specific form for human analysis.  But the processing
> systems shouldn't have to be burdened with having 100+ synonyms for every
concept.

No, but those working on them should be able to identify them. They should
be able to use easily available tools, such as multilingual dictionaries, to
do so, and not have to rely on having on-line access to an ebXML repository
that may be under denial-of-service attack to find the information. By
fixing the reference points in pre-exchanged DTDs/Schemas we can ensure that
message validation and processing can be fully divorced from server
availability.

Martin Bryan
Follow-Ups:
- Re: Tag Languages, UID's etc.
  - From: duane <duane@xmlglobal.com>
References:
- RE: Tag Languages, UID's etc.
  - From: "Arnold, Curt" <Curt.Arnold@hyprotech.com>