ebxml-core message

Subject: RE: Units of Measure
From: "Miller, Robert (GXS)" <Robert.Miller@gxs.ge.com>
To: ebxml-core@lists.ebxml.org
Date: Mon, 10 Jul 2000 16:19:27 -0400
Hi All,

It's nice to see such a lively discussion on lists.  But I fear a point I
was trying to make in my posting has been overlooked.  I was not trying to
provide a  recommendation for syntax to represent codes, I was trying to
point out that some codes provide a different semantic function than do
other codes.  

Personally, I would not have coded Hispanic as a data element value, as it
makes it harder to hang a semantic ID on the value.  But I might have coded
the value as <Hispanic/> (where Hispanic was in some default namespace, as I
do feel that all element names saved in a repository should be contained
within some namespace). And I could live with someone who coded Hispanic as
the value of an element, provided that element definition provided access to
a process that knew how to use the value to yield its semantics.

Syntax must lead in some manner to understandable semantics.  In X12, the
trail leads through paper documents.  In ebXML, the trail must lead through
XML processable snippets.  

I fully expect that a given semantic entity will often find itself
represented in more than one manner across the available XML syntax
alternatives.  I do not
expect 'code list' values always to be represented by their 'code', whatever
the syntax.

I believe a code list like Unit of Measurement will be torn apart, and the
units
they represent will be related to the individual entities they modify, so
that the syntax prevents things like '<length>15<FeetPerSecond/></length>'.


I suspect it will often prove convenient to specify code list values as
attribute values, which means the parent element will need to provide a
process to identify from the code list value the semantic entity the code
represents.  On the other hand, when some other representation proves more
(useful|concise|readable|whatever) in a given environment, that's fine - so
long as the sementic entity represetned by the code is identifiable.

What worries me is the thought that someone will use a code list (e.g., an
X12 code list) as an attribute value to an XML element; not provide a means
to get at the semantics the code represents (the XML representation of the
semantics of course), and think that the job is done.  Yet I frequently see
code lists represented without thought given to how the recipient processor
is going to be able to identify the semantics associated with the code
value.

I forgot to mention it, but code list values also (conceptually) reside in
(rather restricted) namespaces.  That will likely keep to a minimum the set
of semantic code values an implementation might choose to represent outside
an attribute list, lest they risk name collisions in whatever namespace they
are defining their messages.

Cheers,
         Bob


-----Original Message-----
From: William J. Kammerer [mailto:wkammerer@foresightcorp.com]
Sent: Friday, July 07, 2000 3:07 PM
To: ebxml-core@lists.oasis-open.org
Subject: Re: Units of Measure


Bob Miller brought up interesting aspects of codification, especially
those derived from X12, and means of representing aspects of coded
semantics in XML.

Bob says some code lists perform a 'text alias' service - e.g., X12 DE
1109 Race or Ethnicity Code has values 7 (Not Provided), C (Caucasian),
H (Hispanic), etc.  He says the same code could be represented in XML as
<Ethnicity>Hispanic</Ethnicity>.  I think there are a couple of problems
with this technique of spelling out the value (of what used to be a
simple code).  Though I'll admit it's easier for a human reader to
discern the ethnicity in Bob's example, it will be harder for programs
to process.  Especially considering all the attendant problems of
capitalization (is 'Hispanic' the same as 'hispanic'?) and misspellings.
Data categorized by D.E. 1109 will often be mechanically sorted and
collated by ethnic and racial categories - small discrepancies in the
element value will make this difficult.  This is unlike misspellings and
the like in street addresses, which are generally read by humans only
(it's usually only the 9 digit ZIP which has to be exact for data
processing needs).

It would have been easier if the OMB had devised a definitive code list
for racial and ethnic classifications in its directives, which could
then be used unchanged for data processing purposes.  Instead, the OMB
just rambles on and talks about various types of ethnicity and racial
classifications, leaving it up to X12 to come up with a code list and to
deal with the various ambiguities (maybe this isn't a problem in EDIFACT
since other countries aren't as obsessed with "classifying" their
subjects).

Besides the problems with capitalization and misspellings, you would
also have to deal with the complete redesignation of categories,
depending on political whim and correctness.  What was a "Caucasian"
yesterday may be a "White" today, and "Anglo" tomorrow (regardless how
absurd this sounds since almost all English-speaking Euro-Americans
aren't of English descent at all, and it's probably offensive to Irish
and German Americans).  How would a program deal with these renamings -
keep a table of all the text synonyms?  No, I think classification by
codes, effective in EDI, is needed just as much in XML based core
components.

Then there's the translatability problem with the so-called "Textual"
codes.  Certainly country and currency codes would fall into Bob's
category of 'text alias'.  Maybe <Country>Germany</Country> might be
more understandable by the casual reader, but <Country>DE</Country>
using the ISO country code is easier to process by programs, and is far
more standardized.  And <Country>Deutschland</Country> is just as
readable by English speakers. And is it "United States" or "United
States of America," or is it "Bundesrepublik Deutschland" instead of
"Germany"?  Are we going to expect our programs to do on-the-fly
translations, or to maintain complicated synonym tables?  - remember, I
have to know the country so I can prepare the proper customs papers,
calculate shipping, etc. etc.  Codes were invented to remove ambiguity -
they're just as necessary for unambiguous interpretation of XML data as
with EDI.

And for those codes which perform a 'reference' service, as unit of
measure in Bob's example <Weight Code='Pounds'>10</Weight>, I don't know
why there would be any doubt we would use the standard UN/ECE
recommendation 20 for UOM.  What kind of "Pounds" is Bob talking
about? - I'm sure there are different types, like dry pounds.  And
"Pounds" is plural - what if I have only 1 pound????  Does my program
have to account for the 's' somehow?

Are we trying to make "readable" XML for idiots, or are we trying to
find a better way to perform automated B2B interoperability?  Let's just
use the standardized codes which were invented for practical reasons
pre-dating even Traditional EDI.

William J. Kammerer
FORESIGHT Corp.
4950 Blazer Memorial Pkwy.
Dublin, OH USA 43017-3305
+1 614 791-1600

Visit FORESIGHT Corp. at http://www.foresightcorp.com/
"Commerce for a New World"



=======================================================================
= This is ebxml-core, the general mailing list for the ebXML          =
= Core Components project team. The owner of this list is             =
= owner-ebxml-core@oasis-open.org                                     =
=                                                                     =
= To unsubscribe, send mail to majordomo@lists.oasis-open.org with    =
= the following in the body of the message:                           =
=      unsubscribe ebxml-core                                         =
= If you are subscribed using a different email address, put the      =
= address you subscribed with at the end of the line; e.g.            =
=      unsubscribe ebxml-core myname@company.com                      =
=======================================================================
Follow-Ups:
- Re: Units of Measure
  - From: Martin Bryan <mtbryan@sgml.u-net.com>