ebxml-core message

Subject: Dotted-name Tags (was RE: Long Tags Codes etc. again)
From: John McClure <hypergrove@olympus.net>
To: "Probert, Sue" <Sue.Probert@commerceone.com>,'David RR Webber' <Gnosis_@compuserve.com>,"CRAWFORD, Mark" <MCRAWFORD@lmi.org>
Date: Tue, 17 Apr 2001 16:20:48 -0700
All,
I am writing this moment the specification for the DCN's toolkit of objects,
and would like to share a practitioner's perspective on this thread. To get
to the point, I have concluded that the "dotted-name" is the most practical
solution to the question of whether to have element-type names like
<PurchaserFirstName> or a set of nested elements like
<Purchaser><FirstName>. Or like the more DCN-bludgeoned solution:

      <Role xsi:type='ActorRole' name='Purchaser'>
        <list xsi:type='actors' name='factorActors'>
        <Actor xsi:type='Person'>
          <list xsi:type='entries' name='actorEntries'>
          <Entry xsi:type='Memo' name='FirstName'>
             <content>John</content>
          </Entry>
          </list>
        </Actor>
        <Actor xsi:type='Person'>
          <list xsi:type='entries' name='actorEntries'>
          <Entry xsi:type='Memo' name='FirstName'>
             <content>Pamela</content>
          </Entry>
          </list>
        </Actor>
        </list>
      </Role>

I can report first-hand that two problems exist in the DCN version. When an
element's taxonomy is embedded in a tag (via its tag name, xsi:type, and
name), people sense that's a good thing, that it reflects the underlying
model, that it's a good fit if you're trying to create object-oriented
software to handle the mess, yet are uncomfortable with its seeming lack of
elegance. Despite the fact that, with an RDFS-based parser, the snippet is
reduced to

      <Purchaser>
        <factorActors>
        <Person>
          <actorEntries>
          <FirstName><content>John</content></FirstName>
          </actorEntries>
        </Person>
        <Person>
          <actorEntries>
          <FirstName><content>Pamela</content></FirstName>
          </actorEntries>
        </Person>
        </factorActors>
      </Purchaser>

This still creates discomfort because of the apparent redundancy of the two
list tags which are used to identify the relationship between the role and a
set of actors, and an actor and its set of attributes.

Now, the toolkit contains a class named DCNInterpreter, a CORBA'ed
DOMImplementation class, one that can execute remotely from requester client
applications (or be linked directly into the requestor's process).
Obviously, it would be embarrassing to roam a DOM tree across even a LAN, so
the primary function of the DCN Interpreter is to resolve a (list of)
logical attribute name(s) to information present in a DOM tree representing
a DCN document, returning values encoded as strings. This capability would
be a heckuva value-add for the membership of the Data Consortium because
many of developers won't need to learn the mechanics for and shape of the
DOM tree that represents a DCN document.

The dotted-name syntax works very well for naming resource attributes, and
most intriguingly, it is quite similar to JavaScript syntax. If the
Interpreter can correlate dotted-name syntax with nodes in the DOM tree,
then it can create those nodes in a DOM tree when the element-type name is a
dotted-name, e.g., <Purchaser.FirstName>. The Interpreter of course is
guided by the DC Dictionary (DCD) which defines the <rdfs:Class>es
corresponding to the values of the 'name' attribute present on each of the
15 resource-types defined by the DCN. Thus the Interpreter is able to make
semantic judgments about what the input stream contains.

For <Purchaser.FirstName>, the Interpreter learns from the DCD that
'FirstName' is a subclass of "Memo" and "Entry", and is valid for resources
of type 'Person'. It also learns that 'Purchaser' is a subclass of
'ActorRole' and 'Role'. Put these facts together, and the appropriate DOM
tree can now be generated with little problem.

There are enough other pieces to this that it is appropriate to call this
kind of markup a DCN-specific <!NOTATION>, a subtype of the XML notation
embedded in parsers today. In part because the notation generates a DOM tree
that is DCN-specific, using elements from a DTD that is expected to change
very little over time. And in part because there is no difference (the same
DOM fragment is generated) between

<PurchaseAgreement>
   <Purchaser.1>
	<FirstName>John</FirstName>
   </Purchaser.1>
   <Purchaser.2>
	<FirstName>Pamela</FirstName>
   </Purchaser.2>
</PurchaseAgreement>

and

<PurchaseAgreement>
   <Purchaser.1.FirstName>John</Purchaser.1.FirstName>
   <Purchaser.2.FirstName>Pamela</Purchaser.2.FirstName>
</PurchaseAgreement>

The DCN approach therefore includes standards for
(1) the construction of dotted resource and resource-property names
(2) a notation for XML documents that preserves 'just enough' metadata
(3) an in-memory DOM tree that is based on a *SINGLE* DTD/XSD
(4) a taxonomy of resource-types explicitly coordinated with o-o classes
(5) an IDL interface for extracting resource/attribute information from a
document

I also note that UID's can be easily substituted for the strings in the
dotted-name, however I would make UID support a requirement of the notation
and the dictionary, definitely not of the DTD for the standardized document
held within the standardized DOM tree. IMHO, semantically meaningless UIDs
make sense when exchanging documents across national borders, and that's
about it. I must admit to remaining puzzled over why UIDs seem so attractive
in the face of URIs and XML Namespaces. Mark Crawford's beliefs seem rooted
in an XML Schema and DTD world, one that I hope fervently is left behind by
the RDF Schema world coming upon us.

Finally, I suggest that Rule 11 in the CC Naming Specification (Lines
172-185) be reviewed for elimination of the requirement for a trailing space
during the construction of names. That does't parse when applied to
element-type names, and I see no supportive rationale for such a naming rule
as the names are applicable to other contexts.

John "DD" McClure
Hypergrove Engineering
211 Taylor Street, Suite 32-A
Port Townsend, WA 98368
360-379-3838 (land)

For a discussion group about the Data Consortium Namespace, please
http://groups.yahoo.com/group/DCNArchitecture/join

For the latest Data Consortium Namespace Specification, please see
http://www.dataconsortium.org/namespace/DCN150.DTD.pdf or
http://www.dataconsortium.org/namespace/DCN150.DTD.doc or
http://www.dataconsortium.org/namespace/DCN150.DTD.htm

For the latest Data Consortium Dictionary, please see
http://www.dataconsortium.org/namespace/DCD100.pdf or
http://www.dataconsortium.org/namespace/DCD100.xml (IE5)



> -----Original Message-----
> From: Probert, Sue [mailto:Sue.Probert@commerceone.com]
> Sent: Tuesday, April 17, 2001 1:10 PM
> To: 'David RR Webber'; CRAWFORD, Mark
> Cc: 'ebXML Core' (E-mail)
> Subject: RE: RE: Long Tags Codes etc. again
>
>
> Thanks David.
>
> You put together exactly the same reply as I was about to try to pen. I
> absolutely agree that the UID is the key whether directly or indirectly
> referenced in whatever way suits individual syntax or business solutions.
> Isn't that the whole point of the CC exercise - individually well defined
> and labelled semantics?
>
> regards
>
> Sue
>
> Sue Probert
> Director, Document Engineering
> Commerce One
> Tel: +44 1332 342080
> www.commerceone.com
>
>
> -----Original Message-----
> From: David RR Webber [mailto:Gnosis_@compuserve.com]
> Sent: 17 April 2001 21:04
> To: CRAWFORD, Mark
> Cc: 'ebXML Core' (E-mail)
> Subject: UID: RE: Long Tags Codes etc. again
>
>
> The problem is that BOTH are correct - Martin and Mark,
> and this is all about CONTEXT.
>
> Notice:    SAMPLE 'A'
>
> <BuyerParty>
>     <Name>Mark Crawford</Name>
>     <Address/>
> </BuyerParty>
> <SellerParty>
>    <Name>Martin Bryan</Name>
>    <Address>
>       <County>Gloucestershire</County>
>    </Address>
> </SellerParty>
>
> and then notice:
>
> <PaymentReceipt>
>    <AmountPaid curr="CENTS">2</AmountPaid>
>    <BuyerPartyName>Mark Crawford</BuyerPartyName>
>    <BuyerReference>0023553</BuyerReference>
> </PaymentReceipt>
>
>
> But further notice that none of this is going to ever work - when
> you use names to denote things - then machine processing
> and machine interpretable logic flies out the window and is gone.
> And then the French want different tags names anyway.
>
> At the risk of being a broken record on this UID's, UID's and UID's.
> Use them, embed them, put them in the Registry, build schema
> fragments around them, think UID's, assemble UID based DTD and
> Schema fragments, use UID's to denote context, in fact put UID's
> everywhere - nouns, verb, templates, forms, programs, stylesheets,
> classifications, business processes, CPP's....
>
> Here's what this then brings to the table:
>
> Notice:  SAMPLE 'B'
>
> <BuyerParty UID='EBX001010'>
>     <Name UID='EBX005070'>Mark Crawford</Name>
>     <Address UID='EBX005080'/>
> </BuyerParty>
> <SellerParty UID='EBX001020'>
>    <Name UID='EBX005070'>Martin Bryan</Name>
>    <Address UID='EBX005080'>
>       <County UID='EBX005085'>Gloucestershire</County>
>    </Address>
> </SellerParty>
>
> and then notice:
>
> <PaymentReceipt UID='EBX002010'>
>    <AmountPaid curr="CENTS" UID='EBX002011'>2</AmountPaid>
>    <BuyerPartyName UID='EBX001010:V011'>Mark Crawford</BuyerPartyName>
>    <BuyerReference UID='EBX001080'/>0023553</BuyerReference>
> </PaymentReceipt>
>
> Notice that ALL these UID references go in the DTD or Schema; NOT in the
> well-formed XML - so this document actually looks like A, but
> derives as B,
> when the parser reads both the wellformed instance and the ebXML
> compliant DTD or Schema into memory.
>
> Now you know that BuyerPartyName is a sub-version of the component
> BuyerParty, and when you look it up in the Registry - you discover an
> associative link to EBX005070, the Name component.
>
> I've been telling the W3C for two years that Schema cannot do this, and
> that Registry is the way to store and reference extended semantics.
> Fortunately James Clark and James Tauber understand this, and now
> we have the TREX work - which cleanly separates structure from all
> the other mess.
>
> As part of the work XMLGlobal is doing with NIST we are now demonstrating
> this kind of use of Registry interactions and storing Component
> definitions - using the ebXML Registry API and IE5.0 to show how
> this plays
> out.
>
> I look forward to being able to demonstrate this to interested parties in
> Vienna - and also to sharing lessons learned on actually storing and
> retrieving XML representations of Core Components to/from the Registry.
>
> Thanks, DW.
>
> ------------------------------------------------------------------
> To unsubscribe from this elist send a message with the single word
> "unsubscribe" in the body to: ebxml-core-request@lists.ebxml.org
>
> ------------------------------------------------------------------
> To unsubscribe from this elist send a message with the single word
> "unsubscribe" in the body to: ebxml-core-request@lists.ebxml.org
>
Follow-Ups:
- Re: Dotted-name Tags (was RE: Long Tags Codes etc. again)
  - From: "William J. Kammerer" <wkammerer@foresightcorp.com>
References:
- RE: RE: Long Tags Codes etc. again
  - From: "Probert, Sue" <Sue.Probert@commerceone.com>