ebxml-transport message

Subject: RE: XML Syntax/Semantics
From: "Miller, Robert (GEIS)" <Robert.Miller@geis.ge.com>
To: "'ebXML-Transport@lists.oasis-open.org'" <ebXML-Transport@lists.oasis-open.org>
Date: Thu, 30 Mar 2000 10:54:47 -0500
Robert,

I strongly agree with your point that we need to define semantic identifiers
within the ebXML framework.
However, putting them in a DTD or Schema is not going to do the job. We need
to have a repository to store those identifiers. The ideal situation would
be to have a single namespace for those identifiers, where we store the
globally agreed semantic identifiers, together with multilingual
explanations of the identified semantics. 

MILR: IMO, only the semantic ID itself need be in the DTD/Schema referenced
by the document.  And yes, we expect that even that DTD will typically be
registered and stored in a repository.  I would expect that the bulk of the
semantic information would be accessible from a repository using the
Semantic ID as an access key.

MILR:  Having a single namespace for identifiers is a nice dream, but IMO is
not practical.  For example, X12 and EDIFACT provide two rich sources for
identifying semantic entities, and we haven't been able to get those two
merged.  Please understand, I very much would like to see just one namespace
having no duplication of semantic entities.  To that end, I would suggest we
need to provide some means by which the semantic ID lookup process might in
some cases yield a substitute semantic ID.  

My first submission on this topic may have suggested to some that the
assignment of semantic ID was a simple thing.  It is not always so.
In X12 (and likely in EDIFACT, though I did not look for same before writing
this) there are examples in which the first entity value identifies which of
several code lists the qualifier comes from, which qualifier in turn
provides the semantic meaning to the qualified  entity, which in turn is a
code list entry to the actual semantic entity.  That first qualifier code is
just a mechanic needed to access the governing code list.  In X12, DE66 /
DE67 is such a pair.  If DE66 contains the code value '2', then the value in
DE67 represents a Standard Alpha Carrier Code, which in turn identifies a
party.  Had DE66 contained a '1', then the value in DE67 would have
represented a D-U-N-S number, which in turn identifies a party (perhaps even
the same party identified using an SCAC).  


The 0.60 requirements doc already
mentions this in Section 3.7 (RegRep).
The real challenge with XML based document exchange is to map the semantics
from one application/XML framework to another. Being able to identify
semantics is a key factor in this process. A repository with agreed semantic
identifiers, which can be referenced from Schemas or DTDs will be helpful.

MILR: Probably more than just 'helpful', as in down right necessary.

The problem is maintenance and quality control: how are we going to ensure
that the contents of the repository are perfectly understandable (to
*everyone* around the globe), and that duplication of semantics is
minimized?

MILR: Of course, we (ebXML) are not charged with that 'content'
responsibility.  But we are probably charged with establishing some controls
on the kinds of content required, and on setting rules for maintenance and
quality control.  For example, we can require that every entry include a
'type' specification, one or more 'name' and 'description' specifications
(for one or more languages), etc.  

This problem is not new, we hit upon it in the relatively controlled
environment of UN/EDIFACT as well. IMO, a purely technical solution is
impossible, as it takes human expertise to control the semantics.
All frameworks use semantics, whether they are explicitly identified, or
implicitly (through document structure and human interpretable tags). The
trick is to enable interoperability by allowing/demanding these frameworks
to identify their semantics with universal identifiers.

MILR: Amen.  We won't fully solve this problem. But we can transform the
problem into a set of problems we can solve and address those.  That will
help some.  And maybe those who follow us will find a few more useful
transformations.  And don't forget that in this new world we can even
associate processes with these semantic units, which processes may provide
built in 'semantic smarts'.

Several attempts have been made to set up such repositories. One of them is
ISO BSR, another is the UN-XML Repository (a demo is available at
http://xml-edi.tie.nl). Although the UN-XML Repository bears the name of the
UN, it is not sanctioned by UN/CEFACT in any way. The name comes from the
way it retrieves it's content: All semantic identifiers are extracted from
UN/EDIFACT directories.



Gait Boxman
TIE Product Development
The Netherlands


-----Original Message-----
From: Miller, Robert (GEIS) [mailto:Robert.Miller@geis.ge.com]
Sent: woensdag 29 maart 2000 21:27
To: Ebxml Transport
Subject: XML Syntax/Semantics


There has been little if any discussion to date in TR&P on XML
syntax/semantic construction, which seems to be one of our responsibilities
to define.  It is a topic in which I have much interest, and is a source of
considerable concern to our client base due to its potential to impact
interoperability.  My personal observation of several existing consortium
attempts to define XML representation of business data is that they assume
all parties will use only the syntax/semantics they define, in which case
interoperability is a non-problem.

Of course, ebXML could make the same assumption, though surely that would
short-circuit the intent of the interoperability requirement we agree
exists.  Clearly ebXML must support interoperability between traditional EDI
and ebXML-based EDI, another stated goal of ebXML, and among multiple
nationalities whose native language differs.  Such support requires keen
attention to XML syntax/semantic detail.  It is not acceptable for example
to simply equate 'tag names' (XML or otherwise) in two systems and assume
the intersection represents the interoperability domain.

IMO, there is a pressing need to:

1) Adopt the use of unique semantic ID's for each semantic information item.
2) Support multiple name-owners for such semantic ID's, in recognition of
the current absence of a single registry source for such ID's
3) Require the identification of the semantic ID for each semantically
distinct information item in a conformant ebXML document.  
4) Specify common semantic attributes that associate to a semantically
distinct information item, and specify the XML representation of those
common semantic attributes.
5) Specify the means by which semantic ID's are conveyed through the XML
syntactic constructs we define for ebXML messages, and the means by which
the semantic attribute values associated with the semantic ID's are
conveyed.

I offer some recommendations with respect to these needs:

1) Require in the ebXML DTD/Schema for each XML element the definition of a
Semantic ID attribute, suggested attribute name being 'ID' in the ebXML
namespace (or schema namespace when/if the W3C Schema work group should
choose to define such an attribute.)
2) Define the semantic ID value as consisting of an owner part and an
identifier part, akin to the definition of XML element names which use the
NAMESPACE capability.  Note that the content of an XML attribute is not
subject to XML NAMESPACE interpretation;  we must define the syntax rules
for construction of Semantic ID's.  I would point out that the XML syntax
rules seem a good choice, as they provide both a shorthand representation of
ownership and a URI owner identifier.  As we define ebXML documents as part
of the ebXML infrastructure, we may need to specify ebXML as owner and
assignor of some semantic ID's for XML elements we define, at least until
such time as more formal naming agencies adopt and formalize semantic ID's.
3) An XML document which includes some information elements for which no
source of semantic information is available would seem to me incomplete.
4) The W3C Schema work group has done some work in this area, defining for
example type and length/range attributes.
5) I recognize that a set of 'semantic' attributes could be defined and
populated in an ebXML DTD (or Schema).  But its presence there may result in
unneccesary repeated processing costs.  I suggest that such information be
made available instead through a query process using the Semantic ID and any
supplemental keys such as document source/version as may be appropriate, on
a demand basis (Big Schema / Little Schema so to speak).  The semantic ID
for simple XML elements can be initialized in the DTD/Schema.  Code List
elements are a little more complicated as each code value represents a
unique semantic element in the context of the code list element of which it
is a value.  There may also be good reason to represent some code list
entries as individual XML elements in lieu of or in addition to their
representation as values of a code list element.

Cheers,
       Bob
Follow-Ups:
- RE: XML Syntax/Semantics
  - From: "Dick Brooks" <dick@8760.com>