ebxml-core message

Subject: Re: Syntax Free Models
From: "Martin Bryan" <mtbryan@sgml.u-net.com>
To: "Arofan Gregory" <arofan.gregory@commerceone.com>, <Keith.Finkelde@btfinancialgroup.com>, <ebxml-core@lists.oasis-open.org>
Date: Thu, 2 Mar 2000 08:20:21 -0000
Arafon
>
> The point of ebXML core components is to define (as closely as possible)
the
> semantic definitions of reusable data structures, while also defining a
way
> of describing "context" (within industry, region, business process, etc. -
> you understand this bit!) that can allow for local extensions.

Your "while also" is why I have been  busy trying to separate the messaging
sequence from the information set, as these are two separate things that
need to be handled separately rather than being mixed together as they are
today in EDI message syntax and XML
>
> Semantics do not, in my opinion, require a concept of sequence. (Not that
> you can't use one, but sequence is used for many other purposes, as I will
> describe).

Agreed
>
> An example would be an address:
>
> An address could consist of Line1, Line2, and Line3. In this case, you
have
> (a) failed to capture the most useful semantic relationship; and (b)
> hard-coded sequence into your model. You are making an assumption about
the
> context (i.e., that the address exists to be printed on an envelope, etc.)
>
> Instead, we could say that there is an object, Address, that has basic
> properties of: IndividualID, StreetAddress, City, State/Province, Country
> (to use an incomplete example).

I use exactly the same example to explain why the Information Sets in
Information Units must be unordered.
>
> If you understand that these are child properties of Address, and you
> understand what each of them is, then you do not need a sequence in your
> data model - you can sequence them as desired for any purpose (determined
by
> the application of the model).

You do not need a sequence at Information Set level, but it helps if you
have a sequence of Information Sets.
>
> In EDI syntax and XML syntax both, data is represented in a linear form
that
> implies hierarchical relationships through relative positioning. This is
an
> unavoidable aspect of syntax - you have to agree what structure is implied
> by position within the linear data stream. In the case of XML, the
position
> of tags within the sequence of tokens describes hierarchy through
> containership. If <A> comes before <B>, then <B> is a child of <A>. EDI
does
> much the same thing, but uses loops, which imply hierarchy in a very
> different way, but one still relying on sequence.

To validate either an EDI syntax or an XML syntax you need to be able to
check Information Sets conform to a pre-agreed sequence/hierarchy.
>
> As you have seen, the released W3C Schema Rec (hooray, it's finally here!)
> does not contain some of the features that many expected/hoped for. I
doubt
> that it is the final word, and believe it will undergo some refinement as
it
> is implemented - that is the stated purpose of this draft.

Wrong: the latest incomplete spec claims to be "feature complete" when it is
patently not so. In fact this week's version is a disaster waiting to
happen. It is so self-contradictory as to be almost unusable. If you don't
believe me look at the description of the all element in the new Part 0 and
in the XML definition given in Part 1. Part 0 says that this element can
only be used at topmost level, and cannot contain nested groups. Part 1 says
they can nested within choice, all or element definitions. Henry Thompson
claims that the restrictions defined in Part 0 are stated in the text of
Part 1, but this is so obscurely written that I have been unable to identify
it, or to work our how the constraints can possibly be squared with the data
models provided for the XML representation of Schemas. For the time being
no-one can rely on being able to use Schemas to validate messages!

>I think ebXML can
> provide significant requirements as a result of what we find necessary to
> implement what we build (hence the value of a reference implementation).

What I am trying to do is to identify a small (but hopefully "safe from
future attack") subset of the Schema specs that could be buried within the
definitions, using transformations to "build a schema" from the definitions.
[I'm not conformant with this week's draft if the Part 0 definition of All
is correct :-( ]

> Regardless, Schema gives us one syntax for describing applications of
ebXML
> models - XML. We cannot rely on the stability of any single meta-language,
> however, and should not design our models to match the limitations of the
> implementation tools. In the case of EDI message syntax, we have no formal
> meta-language whatsoever.

Agreed - this is why I was trying to come up with a segmented meta-language
that would work with any schema syntax. (Though I don't claim to have
rigourously tested this at this stage. What I am hoping to do is to provoke
comment from experts in this field.)
>
> Syntaxes will determine some meaningful sequencing within a message as
> described in that syntax. This will vary, at the level of the message and
> below. Process descriptions - because they model movement in time - are
> tightly bound up with sequence.

Agreed wholeheartedly, which is why sequences are required a message level,
but not at Information Set level. Note, however, that there needs to be some
formal mechanism for describing messages that is computer and human
understandable. This should be a cross between a commented DTD and a MIG.
This is what my Message definitions are designed to provide. (The lack of a
computer-interpretable format for MIGs has been a problem to the EDI
industry for years.)

> Data descriptions, because they are not
> bound up by time (other than indirectly, by referenceing a sequence
> description as part of identifying their context), should be left
> sequence-free.

Agreed wholeheartedly, which is why sequences are not allowed at this level
(though nesting of Information Sets to create hierarchies is).

>Syntaxes need to be free to use sequence as a way of
> describing our semantic models for a particular implementation.

Models used to validate the use of information within a specific instance
need this, but this information does not need to be part of the definitions
of the semantics.

>All we need
> capture are the hierarchical relationships inherent in the semantics of
the
> data structures we are decribing, and the name-value pairs that constitute
> their children, in a fashion that allows both to be strongly typed.

Agreed. I added such a mechanism to my paper yesterday. (I even managed to
add a mechanism for typed property attributes that I think will be
compatible with both DTDs and Schemas. See the last two or three of screens
of http://www.sgml.u-net.com/neutral.html for details.)

>We don't
> need sequence in our models to do this, and we benefit by not requiring a
> particular sequence at any point before syntax-specific implementation.

Again I agree with you if by models you mean reusable information sets of
the type being defined by the Core Components group rather than the Messages
that need to be defined for particular business processes.
=
> I hope this clarifies ny earlier statements.

It does, and thanks for taking the time to make the clarification.

Martin
References:
- RE: Syntax Free Models
  - From: Arofan Gregory <arofan.gregory@commerceone.com>