[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: RE: XML Syntax/Semantics
Robert, I strongly agree with your point that we need to define semantic identifiers within the ebXML framework. However, putting them in a DTD or Schema is not going to do the job. We need to have a repository to store those identifiers. The ideal situation would be to have a single namespace for those identifiers, where we store the globally agreed semantic identifiers, together with multilingual explanations of the identified semantics. MILR: IMO, only the semantic ID itself need be in the DTD/Schema referenced by the document. And yes, we expect that even that DTD will typically be registered and stored in a repository. I would expect that the bulk of the semantic information would be accessible from a repository using the Semantic ID as an access key. MILR: Having a single namespace for identifiers is a nice dream, but IMO is not practical. For example, X12 and EDIFACT provide two rich sources for identifying semantic entities, and we haven't been able to get those two merged. Please understand, I very much would like to see just one namespace having no duplication of semantic entities. To that end, I would suggest we need to provide some means by which the semantic ID lookup process might in some cases yield a substitute semantic ID. My first submission on this topic may have suggested to some that the assignment of semantic ID was a simple thing. It is not always so. In X12 (and likely in EDIFACT, though I did not look for same before writing this) there are examples in which the first entity value identifies which of several code lists the qualifier comes from, which qualifier in turn provides the semantic meaning to the qualified entity, which in turn is a code list entry to the actual semantic entity. That first qualifier code is just a mechanic needed to access the governing code list. In X12, DE66 / DE67 is such a pair. If DE66 contains the code value '2', then the value in DE67 represents a Standard Alpha Carrier Code, which in turn identifies a party. Had DE66 contained a '1', then the value in DE67 would have represented a D-U-N-S number, which in turn identifies a party (perhaps even the same party identified using an SCAC). The 0.60 requirements doc already mentions this in Section 3.7 (RegRep). The real challenge with XML based document exchange is to map the semantics from one application/XML framework to another. Being able to identify semantics is a key factor in this process. A repository with agreed semantic identifiers, which can be referenced from Schemas or DTDs will be helpful. MILR: Probably more than just 'helpful', as in down right necessary. The problem is maintenance and quality control: how are we going to ensure that the contents of the repository are perfectly understandable (to *everyone* around the globe), and that duplication of semantics is minimized? MILR: Of course, we (ebXML) are not charged with that 'content' responsibility. But we are probably charged with establishing some controls on the kinds of content required, and on setting rules for maintenance and quality control. For example, we can require that every entry include a 'type' specification, one or more 'name' and 'description' specifications (for one or more languages), etc. This problem is not new, we hit upon it in the relatively controlled environment of UN/EDIFACT as well. IMO, a purely technical solution is impossible, as it takes human expertise to control the semantics. All frameworks use semantics, whether they are explicitly identified, or implicitly (through document structure and human interpretable tags). The trick is to enable interoperability by allowing/demanding these frameworks to identify their semantics with universal identifiers. MILR: Amen. We won't fully solve this problem. But we can transform the problem into a set of problems we can solve and address those. That will help some. And maybe those who follow us will find a few more useful transformations. And don't forget that in this new world we can even associate processes with these semantic units, which processes may provide built in 'semantic smarts'. Several attempts have been made to set up such repositories. One of them is ISO BSR, another is the UN-XML Repository (a demo is available at http://xml-edi.tie.nl). Although the UN-XML Repository bears the name of the UN, it is not sanctioned by UN/CEFACT in any way. The name comes from the way it retrieves it's content: All semantic identifiers are extracted from UN/EDIFACT directories. Gait Boxman TIE Product Development The Netherlands -----Original Message----- From: Miller, Robert (GEIS) [mailto:Robert.Miller@geis.ge.com] Sent: woensdag 29 maart 2000 21:27 To: Ebxml Transport Subject: XML Syntax/Semantics There has been little if any discussion to date in TR&P on XML syntax/semantic construction, which seems to be one of our responsibilities to define. It is a topic in which I have much interest, and is a source of considerable concern to our client base due to its potential to impact interoperability. My personal observation of several existing consortium attempts to define XML representation of business data is that they assume all parties will use only the syntax/semantics they define, in which case interoperability is a non-problem. Of course, ebXML could make the same assumption, though surely that would short-circuit the intent of the interoperability requirement we agree exists. Clearly ebXML must support interoperability between traditional EDI and ebXML-based EDI, another stated goal of ebXML, and among multiple nationalities whose native language differs. Such support requires keen attention to XML syntax/semantic detail. It is not acceptable for example to simply equate 'tag names' (XML or otherwise) in two systems and assume the intersection represents the interoperability domain. IMO, there is a pressing need to: 1) Adopt the use of unique semantic ID's for each semantic information item. 2) Support multiple name-owners for such semantic ID's, in recognition of the current absence of a single registry source for such ID's 3) Require the identification of the semantic ID for each semantically distinct information item in a conformant ebXML document. 4) Specify common semantic attributes that associate to a semantically distinct information item, and specify the XML representation of those common semantic attributes. 5) Specify the means by which semantic ID's are conveyed through the XML syntactic constructs we define for ebXML messages, and the means by which the semantic attribute values associated with the semantic ID's are conveyed. I offer some recommendations with respect to these needs: 1) Require in the ebXML DTD/Schema for each XML element the definition of a Semantic ID attribute, suggested attribute name being 'ID' in the ebXML namespace (or schema namespace when/if the W3C Schema work group should choose to define such an attribute.) 2) Define the semantic ID value as consisting of an owner part and an identifier part, akin to the definition of XML element names which use the NAMESPACE capability. Note that the content of an XML attribute is not subject to XML NAMESPACE interpretation; we must define the syntax rules for construction of Semantic ID's. I would point out that the XML syntax rules seem a good choice, as they provide both a shorthand representation of ownership and a URI owner identifier. As we define ebXML documents as part of the ebXML infrastructure, we may need to specify ebXML as owner and assignor of some semantic ID's for XML elements we define, at least until such time as more formal naming agencies adopt and formalize semantic ID's. 3) An XML document which includes some information elements for which no source of semantic information is available would seem to me incomplete. 4) The W3C Schema work group has done some work in this area, defining for example type and length/range attributes. 5) I recognize that a set of 'semantic' attributes could be defined and populated in an ebXML DTD (or Schema). But its presence there may result in unneccesary repeated processing costs. I suggest that such information be made available instead through a query process using the Semantic ID and any supplemental keys such as document source/version as may be appropriate, on a demand basis (Big Schema / Little Schema so to speak). The semantic ID for simple XML elements can be initialized in the DTD/Schema. Code List elements are a little more complicated as each code value represents a unique semantic element in the context of the code list element of which it is a value. There may also be good reason to represent some code list entries as individual XML elements in lieu of or in addition to their representation as values of a code list element. Cheers, Bob
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC