ebxml-core message

Subject: RE: Syntax Free Models
From: Arofan Gregory <arofan.gregory@commerceone.com>
To: "'Martin Bryan'" <mtbryan@sgml.u-net.com>, Arofan Gregory <arofan.gregory@commerceone.com>, Keith.Finkelde@btfinancialgroup.com, ebxml-core@lists.oasis-open.org
Date: Wed, 1 Mar 2000 07:30:29 -0800
Martin:

I will do my best to clarify, although I think I may have understood your
position somewhat initially.

The point of ebXML core components is to define (as closely as possible) the
semantic definitions of reusable data structures, while also defining a way
of describing "context" (within industry, region, business process, etc. -
you understand this bit!) that can allow for local extensions.

Semantics do not, in my opinion, require a concept of sequence. (Not that
you can't use one, but sequence is used for many other purposes, as I will
describe).

An example would be an address:

An address could consist of Line1, Line2, and Line3. In this case, you have
(a) failed to capture the most useful semantic relationship; and (b)
hard-coded sequence into your model. You are making an assumption about the
context (i.e., that the address exists to be printed on an envelope, etc.)

Instead, we could say that there is an object, Address, that has basic
properties of: IndividualID, StreetAddress, City, State/Province, Country
(to use an incomplete example).

If you understand that these are child properties of Address, and you
understand what each of them is, then you do not need a sequence in your
data model - you can sequence them as desired for any purpose (determined by
the application of the model).

In EDI syntax and XML syntax both, data is represented in a linear form that
implies hierarchical relationships through relative positioning. This is an
unavoidable aspect of syntax - you have to agree what structure is implied
by position within the linear data stream. In the case of XML, the position
of tags within the sequence of tokens describes hierarchy through
containership. If <A> comes before <B>, then <B> is a child of <A>. EDI does
much the same thing, but uses loops, which imply hierarchy in a very
different way, but one still relying on sequence.

As you have seen, the released W3C Schema Rec (hooray, it's finally here!)
does not contain some of the features that many expected/hoped for. I doubt
that it is the final word, and believe it will undergo some refinement as it
is implemented - that is the stated purpose of this draft. I think ebXML can
provide significant requirements as a result of what we find necessary to
implement what we build (hence the value of a reference implementation).

Regardless, Schema gives us one syntax for describing applications of ebXML
models - XML. We cannot rely on the stability of any single meta-language,
however, and should not design our models to match the limitations of the
implementation tools. In the case of EDI message syntax, we have no formal
meta-language whatsoever. 

Syntaxes will determine some meaningful sequencing within a message as
described in that syntax. This will vary, at the level of the message and
below. Process descriptions - because they model movement in time - are
tightly bound up with sequence. Data descriptions, because they are not
bound up by time (other than indirectly, by referenceing a sequence
description as part of identifying their context), should be left
sequence-free. Syntaxes need to be free to use sequence as a way of
describing our semantic models for a particular implementation.  All we need
capture are the hierarchical relationships inherent in the semantics of the
data structures we are decribing, and the name-value pairs that constitute
their children, in a fashion that allows both to be strongly typed. We don't
need sequence in our models to do this, and we benefit by not requiring a
particular sequence at any point before syntax-specific implementation.

I hope this clarifies ny earlier statements.

Cheers,

Arofan Gregory

-----Original Message-----
From: Martin Bryan [mailto:mtbryan@sgml.u-net.com]
Sent: Tuesday, February 29, 2000 5:15 AM
To: Arofan Gregory; Keith.Finkelde@btfinancialgroup.com;
ebxml-core@lists.oasis-open.org
Subject: Re: Syntax Free Models


Arofan
>
> I hope I don't speak out of turn here, but it seems to me that sequence is
> absolutely an aspect of syntax that we *must* avoid if we are to create an
> information model that is capable of being expressed freely in various
> syntaxes. Hierarchy, on the other hand, is something that exists within
> specific contexts and can (hopefully) be modelled effectively across
> syntaxes.

You can never speak out of turn, but you need to speak a little more clearly
as I do not understand your comments.

What I have heard to date suggests that there is a confusion as to what
"sequences" are meant to do. What I am trying to do in my draft paper is to
restrict where "sequences" can be defined. I see them as a statement of the
ordering of information sets within a specific type of message. Different
user communities can define different orders without having to redefine the
information sets themselves. What I am trying to achieve is a split from the
current trend of trying to do everything in a single hierarchical DTD to
being able to split out the part of the definition that applies to reusable
things (the information sets and the information units) and those that are
specific to a particular application domain (the messages and their
sequences). This is a problem that has in the past afflicted both the XML
and EDI communities.

There is a large difference between this problem and the other part of the
equation, which is concerned with the relationships between business-level
processes, which is where Keith is, I think, coming from. What I was also
trying to point out was that there is a need for hierarchy at the
business-level as well as within a message. We need to know where the
information in our messages is coming from. Without recording this in our
business models we will not be able to maintain messages properly as we will
have no way of knowing what the implications are for any changes we make to
a message that is the source of information used elsewhere in the business
process chain.

I am unclear about from your comments is whether you are against sequences
within a message, or simply between business processes. To claim that
sequences should not be defined as part of a message definition seems to me
wrong. Can you explain why you seem to think it is not needed.

> I think there is a real danger of confounding hierarchy - which is highly
> variable in many cases - and sequence. These are tightly tied in both XML
> and EDI syntaxes today, but in markedly different ways. It is handy to be
> able to express a hierarchy (or context) with a fixed sequence, but it
would
> severely limit the utility of the models we create if we are deterministic
> about this up front.

I don't see how a "hierarchy" can be a "sequence"! Let me clarify what I
mean by hierarchy, sequences and context, to see if we are talking about the
same things. A hierarchy is something where you can tell from your own
parentage what is related to you and what is not. Therefore you can define
paths from yourself to any other member of the hierarchy. A hierarchical
model restricts the sets of paths that can occur in a given hierarchy. A
sequence can only occur at a single level in a hierarchy. It pre-defines the
order of in which siblings are permitted to occur at that level in a
hierarchy. Context, on the other hand, is not constained by the need for
direct path access - it can provide links between trees. The "parent" of a
derived information set/unit can be seen to be from another hierarchy. In
other words it can be expressed as an XML pointer with a URL in front of the
XML path.  I agree that one man's hierarchy is another man's poison, and the
same applies for sequences. Hence the need to define sets in a way that says
as little as possible about sequence, but with the possibility to allow a
hierarchy between sets to be defined where relevant. Similarly you need to
allow people to define messages as relevant sequences of sets, without
allowing them to define hierarchies for the sets themselves.

I know I am not there in clearly defining all these things yet. In fact
yesterday's release of a revised XML Schema spec with restrictions on set
definitiion has severly dented my ability to define what I would like to be
able to do. How this will work in with defining inter-message relationships
needs to be seen.

Martin
Follow-Ups:
- Re: Syntax Free Models
  - From: "Martin Bryan" <mtbryan@sgml.u-net.com>