ebxml-core message

Subject: Re: Syntax Free Models

From: "Martin Bryan" <mtbryan@sgml.u-net.com>
To: <Keith.Finkelde@BTFinancialgroup.com>, <ebxml-core@lists.oasis-open.org>
Date: Mon, 28 Feb 2000 19:32:46 -0000

Keith

In response to your helpful comments I've tried to correct some of the
errors relating to Information Sequence naming that you kindly pointed out
in the earlier draft of my paper on a Syntax-Neutral Definition of Business
Semantics, and to clarify why I feel sequences are a vital component of the
model. Your views on these clarifications would be much welcomed, as would
anyone elses.

I have also taken this opportunity of extending the paper with a first draft
on the way I would like to see definitions of Information Messages,
Information Sets and Information Units exchanged. Whilst I intend, when I
get some free time, to extend this by adding ISO11179-based neutral
definitions, the initial XML models and examples should help you to see
where I am coming from.

I still need to work out models for recording the relationships between
Informatation Units (and maybe even Information Sets) and for recording the
relationships between Information Units and datatypes, but I would like to
start getting some initial reaction to what I am proposing before completing
these sections.

I still have doubts about the relevance of using state to record the
relationship between information units, as you propound. To me there is a
fundamental difference between what goes in the definitions and what goes in
the messages. In the definitions you need to tell people that the intention
is that the contents of one type of information unit should be derived from
another type of information unit. For a message instance you need to state
that this specific information unit was derived from that specific
information unit. What is needed then is some way of checking that the
derivation in the instance conforms to the derivation rules in the model. To
date I have not had time to work out a good way to do latter part of the
process. I am wondering if XSLT templates might provide a guide, but am
unsure yet how to declare this in a syntax neutral manner. An alternative
could be to use some of the techniques for requesting the monitoring of
events that were proposed for ISENS, etc, but at present I cannot see a way
to make this work over the long periods required for message definition
maintenance.  I have yet to find time to try to read up on the UML State
Machine work. Is there a simpleton's explanation of this available that
would be understandable to someone without a background in UML? Any
suggestions on how to approach these problems would be most welcome.

Martin Bryan
Technical Manager, The Diffuse Project
Project Manager, CEN/ISSS EC Workshop DAMSAD project group
----------------------------------------------------------------------------
---------------------------------------
The SGML Centre, 29 Oldbury Orchard, Churchdown, Glos GL3 2PU, UK
Phone/Fax: +44 1452 714029  E-mail: mtbryan@diffuse.org

For details of The SGML Centre visit http://www.sgml.u-net.com

For details of the EU-funded DIFFUSE project visit http://www.diffuse.org

Title: The Components of an Electronic Message

Syntax-Neutral Definition of Business Semantics

Martin Bryan, The SGML Centre
(mtbryan@sgml.u-net.com)

This paper suggests a syntax-neutral way for describing the component parts of electronic messages designed for business-to-business data interchange, and the processes that are used to create and utilize such messages.

To set the scene the paper starts with a restatement of the purpose of messages within business processes, and the way in which messages inter-relate.

What is the purpose of a Business Information Message?

A Business Information Message provides a set of answers to a set of pre-agreed questions. For example, a purchase order may need to provide answers to the following set of questions:

How is this Order to be identified?
When was it issued?
Who issued it?
Who is to be executed by?
Who else is to be informed?
What is to be supplied?
When and where is it to be delivered?
Who authorized the Order?

Business Information Messages form the interface between business processes. Business process form chains and hierarchies, as illustrated in Figure 1:

At the the topmost level you have the overall business processes, e.g. Manufacture, Product Distribution and Retailing. Note that these form a chain of dependent processes. For example, until goods have been manufactured and distributed they cannot normally be retailed. Different industries will have different chains of business processes, and different organizations within a particular industry may require their own definition of their business processes.

At the second level Figure 1 shows a set of subprocesses for retailing to business customers based on the Simpl-EDI business chain. Again there is a chain of dependent processes. An Order must precede an Order Response and Despatch Advice, which must in turn precede a Receipt message and an Invoice, which must precede Payment. Again the sequences of subprocesses is dependent on the industry involved and the trading procedures of a particular business community.

Between some of these subprocesses there may need to be other subprocesses. For example, between despatch and receipt of retailed goods there may be a need to book third party transport services using industry-specific message structures for Collection and Delivery Notification. These processes may in turn have their own subprocesses, for example when a carrier uses containers to move goods between destinations.

Any mechanism for describing the context in which a Business Information Message is used should provide a structured method for describing the relationship of the message to other messages in the process chain, and for identifying the level in the business process hierarchy at which the message is to be used.

Message Relationships

Messages are related because they share answers to the same questions, though not necessarily in the same contexts. For example, an Order Response could contain the answers for the following questions:

How is Response to be identified?
When was it issued?
Who issued it?
Which order does it to refer to?
Who is to receive response?
Who else is to be informed?
What is to be supplied?
What cannot be supplied in the timespan requested?
When and where is it to be delivered?
Who authorized the Response?

Note that many of these questions are identical to those in the original order, though the answers to them may be different. Also notice that the same answer may be required as a response to different questions throughout the process chain. For example, typically the answer to the Who issued it? question in the Order Response form will be the same as the answer to the Who is it to be executed by? question of the order.

What, therefore, needs to be identified during the process of modeling process chains is which answers need new information and which ones rely on identifying the existing answer that originally defined the source of the relevant data.

The Components of a Business Information Message

Figure 2 illustrates the components of an electronic message:

For the purposes of this paper a business information message consists of a number of named information units which contain data that is being transferred between systems. A named group of related information units that provides the information needed to answer a question is called an information set. (Information sets can contain nested information sets as well as information units.) The (unnamed) sequence of related information sets that make up a message, or part of a message, is known as an information sequence. An information message consists of one or more information sequences together with information about the way the information is to be (or has been) exchanged.

The name of an information unit should reflect the general purpose of the data it contains, without any reference to the context in which that data is being used. The name of the information set containing the information units, or one of its associated properties, should uniquely identify the context in which the information units it contains are being applied. It should not, however, identify the process in which the information unit is a part of, only the question that is being answered at the time.

For example, the information units that may be permitted within an address could include the following:

Person
Company
Building
Street
Place
Area
Country
PostCode

A number of information sets could contain these units, each of which must be uniquely named:

Sender (Who issued it?)
Recipient (Who is it to be executed by?)
DeliveryPoint (Where is it to be delivered?)

Each of these information sets could contain any number of the permitted information units, in any order required, providing each information unit only occurs once within the information set. This means, for example, the PostCode could either precede or follow either the Place or Area information units, depending on the way that addresses are expressed in the country concerned.

Other properties of the information set than its name could be used to uniquely identify it, as the following examples suggest:

Party[Role="Sender"]
Party[Role="Recipient"]
Party[Role="DeliveryPoint"]

The information required to create a valid information message should be defined in terms of sequences of permitted information sets, for example:

Order = MessageId, Sender, Recipient, DeliveryPoint, Item+

or by using a more generalized model such as:

Order = MessageId, Party+, Item+

Note that the above examples use + to indicate that an information set is both required and repeatable. Other options would be required to indicate information sets that are "optional and repeatable" (e.g. *), "optional but not repeatable" (e.g. ?) and "required to occur between a minimum and maximum number of times" (e.g. [m,n]).

Information sequences can be nested. More than one option for an information sequence could exist. For example, it may be required to use a Reference in place of the full details of the sender or recipient of the data. For such cases the model for the information message has to be able to indicate a choice at certain points in the sequence, e.g.:

Order = MessageId, (Sender|SenderRef), (Recipient|RecipientRef), DeliveryPoint, Item+

Information messages should be named. The name should identify the process or subprocess that originated the message. This name can be used to uniquely identify the information sequences required to complete the message, which in turn identify the information sets required to record and manage transmission of the message.

Any information unit in an information message should be uniquely identifiable using a sequence of names listed in the order:

InformationMessage/InformationSet/InformationUnit

An alternative way of expressing this would be:

Process/Question/ResponseUnit

Where nested information sequences or information sets are used these components of the unique name should be repeated for each level of nesting. Where properties of information sets have been used to uniquely identify objects these properties should form part of the unique name, e.g.:

Order/Party[Role="Sender]/Country

Figure 3 illustrates the types of process associated with information messages.

A subprocess is responsible for capturing or otherwise processing the information objects that make up an information set, i.e. the set of information need to answer the question adequately. A process is responsible for capturing or otherwise processing the information sets that form an information sequence (a subprocess chain). A process chain describes, but does not necessarily directly control, the way in which information sets in one information message determine the contents of information sets within related messages.

When a subprocess is creating an information set it may require the performance of a number of operations, which could include one or more of the following:

retrieving information objects from a master database
retrieving information objects from an existing message
requesting data from a user.

In many cases the contents of existing information sets will need to be subsetted and/or reordered to create a valid information set within the new message.

A process defines a set of subprocesses that create information sets that conform to a specific information sequence. The relevant subprocesses do not need to be performed sequentially, but their results need to be positioned relative to each other in such a way that the end result is a sequence that conforms to the information sequence model of the information message being created.

A process chain identifies the sequence in which messages must be created if processes are to be able reference information within existing messages in a logical sequence. For example, a typical process chain for retailing could be defined as:

Order � Order Response � Despatch Advice � Receipt Advice � Invoice � Payment Advice

In many cases the production of an information message at one level of the process chain will be dependent on the availability of uniquely defined information sets in preceding message chains. These relationships can be defined using definitions such as:

DespatchAdvice/Party[Role="DeliveryPoint"] � Order/DeliveryPoint
DespatchAdvice/Party[Role="DeliveryPoint"]/Place � Order/DeliveryPoint/Town

Note that the above examples deliberately indicate that there need not be a one-to-one correspondence between the names of the information sets, or of the information units they contain. It is anticipated that process chain definitions will include references to subprocess-to-subprocess mapping rules, which will normally be maintained as a separate reusable resource.

In some cases parallel or alternate processes will need to be spawned. These could be indicated using groups within process chain definitions, e.g.:

Product Data � Price List � Order � (Order Response | Despatch Advice) � Receipt Advice � (Invoice & Tax Control Message) � Payment Advice

But what do these names really mean?

So far I have mostly considered the naming of information objects, rather than with defining their specific meaning. However, without a formal definition of what a name means it cannot be safely used in messages being moved between environments, particularly between multilingual environments.

Figure 5 shows, in a simplified form, the structure of a ISO 11179 data element definition.

What does this tell us? Concepts, which are described using class definitions, are the ISO 11179 equivalent of process and subprocesses. The individual properties of the class are, unfortunately, seen in a more traditional computer programming sense, which looks at name/value pairs, rather than the "bag" of data units that form an information set.

Representation and meaning are more typically, in an 11179 based semantic repository, used to define the meaning of a particular value of a property of the class. For example, if a "country code" property uses ISO 3166 names to identify countries then "GB" is the representation of the semantic whose meaning is "The United Kingdom of Great Britain and Northern Ireland".

Note: The XML equivalent of these four units are element name, attribute name, attribute value (or element content) and definition of attribute values.

How does this fit with the need to develop a syntax-neutral sets of definitions of information units, sets, sequences and messages? We need to be able to assign meanings to each name used to identify a message component. We also need to be able to assign meanings to any set of values that are to be used to constrain the particular values of data containers.

More particularly we need to be able to identify:

Multiple language descriptions of the meaning
Names which share a meaning (e.g. UK in one list and GB in another)
Each of the processes and subprocesses that use a particular set of names to indicate a particular meaning..

How could this be achieved? For any particular "name" used to identify an information unit we need to be to know:

its meaning in language X (and Y and ...)
other names that share the same meaning (e.g. State/County/Lander)
a list of the information sets the information unit forms a part of
if the data unit is one with a constrained set of values represented by codes, a list of the permitted codes, each of which has a statement of the meaning in each relevant language.

For each name assigned to an information set we need to know:

the role of the information set (i.e. what question is it designed to answer)
any other names assigned to the same question
a list of the information units that can form part or all of the information set, together with details of which units cannot be omitted from the information set
a list of the processes that the information set is being used within.

For each process we need to know:

the purpose of the process (described in multiple languages)
any other names assigned to the same process
the set of information sets to be answered before the next process can proceed
the sequence in which information sets should be exchanged
which processes generated any answers that are derived from preceding steps in the process (i.e. the dependencies between this process and the preceding processes in a chain)
which process chains require this process, and which processes within these chains require that the process be completed before they can be activated.

For each process chain we need to know:

the purpose of the chain (described in multiple languages)
any other names assigned to the chain (e.g. purchasing or stock control)
the sequence of processes that form the chain
which sub-chains can be used between any two steps in the process (i.e. any sub-hierarchy of the chain)
which processes are "responsible" for generating which answers when the same answer is used for more than one information set within the chain.

How to represent these information sets

This section suggests ways of representing the information required to generate an Information Message using extensions to the techniques defined in ISO/IEC 11179 and ISO/IEC 15452 which can be encoded using XML that conforms to a set of DTDs.

Note: For the time being the ISO/IEC 11179 extensions have not been fully documented. For the purposes of this exercise I have formally modelled the data using a series of XML DTDs, which are defined below. The examples of the use of each DTD given below need to be aligned with real data - at present they are simplistic examples to show how the process works, rather than realistic examples of an actual encoding.

The ISO/IEC 11179 definition of an Information Message is:

To follow

The following XML declarations can be used to manage records that encode the definition of an Information Message:

<!ELEMENT InformationMessage (GlobalID, Name+, Description*,
                              InformationSequence+) >
<!ELEMENT InformationMessage ID ID #REQUIRED>
<!ELEMENT GlobalID (#PCDATA)>
<!ELEMENT Name (#PCDATA)>
<!ATTLIST Name xml:lang CDATA "EN">
<!ELEMENT Description (#PCDATA)>
<!ATTLIST Description xml:lang CDATA "EN">
<!ELEMENT InformationSequence    (InformationSetReference|
                                  InformationSetOrGroup|
                                  InformationSetAndGroup)+ >
<!ELEMENT InformationSetOrGroup  ((InformationSetReference|
                                   InformationSetAndGroup),
                                  (InformationSetReference|
                                   InformationSetAndGroup)+)>
<!ELEMENT InformationSetAndGroup ((InformationSetReference|
                                   InformationSetOrGroup),
                                  (InformationSetReference|
                                   InformationSetOrGroup)+)>
<!ELEMENT InformationSetReference (#PCDATA)>
<!--InformationSetReference contains a URL that references the 
    documentation of an Information Set.-->

An example of the use of this DTD to exchange information relating to an Information Message is:

<InformationMessage ID="DespatchNote">
  <GlobalID>LGMS12345</GlobalID>
  <Name>Despatch Note</Name>
  <Name xml:lang="FR">Not de Despatch</Name>
  <!--Forgive my poor French!-->
  <Description xml:lang="EN">Notification of goods despatched
                             from supplier</Description>
  <InformationSequence>
    <InformationSetReference>
      http://www.ebxml.org/InfoSets?MessageIdentification
    </InformationSetReference>
    <InformationSetReference>
      http://www.ebxml.org/InfoSets?SupplierIdentification
    </InformationSetReference>
    <InformationSetReference>
      http://www.ebxml.org/InfoSets?DeliveryPointIdentification
    </InformationSetReference>
    <InformationSetReference>
      http://www.ebxml.org/InfoSets?GoodsIdentification
    </InformationSetReference>
  </InformationSequence>
  <InformationSequence>
    <InformationSetReference>
      http://www.ebxml.org/InfoSets?MessageIdentification
    </InformationSetReference>
    <InformationSetAndGroup>
      <InformationSetReference>
         http://www.ebxml.org/InfoSets?SupplierIdentification
      </InformationSetReference>
      <InformationSetOrGroup>
        <InformationSetReference>
          http://www.ebxml.org/InfoSets?DeliveryPointIdentification
        </InformationSetReference>
        <InformationSetReference>
          http://www.ebxml.org/InfoSets?NameAndAddress
        </InformationSetReference>
      </InformationSetOrGroup>
    </InformationSetAndGroup>
    <InformationSetReference>
      http://www.ebxml.org/InfoSets?GoodsIdentification
    </InformationSetReference>
  </InformationSequence>
</InformationMessage>

The ISO/IEC 11179 definition of an Information Set is:

To follow

The following XML declarations can be used to manage records that encode the definition of an Information Set:

<!ELEMENT InformationSet (GlobalID, Name+, Description*,
                          InformationUnitReference+, UsedIn*)>
<!ATTLIST InformationSet ID ID #REQUIRED">
<!ELEMENT GlobalID (#PCDATA)>
<!ELEMENT Name (#PCDATA)>
<!ATTLIST Name xml:lang CDATA "EN">
<!ELEMENT Description (#PCDATA)>
<!ATTLIST Description xml:lang CDATA "EN">
<!ELEMENT InformationUnitReference (#PCDATA)>
<!--InformationUnitReference contains a URL that references 
    the documentation of an Information Unit.-->
<!ELEMENT UsedIn (#PCDATA) >
<!--UsedIn contains a URL that references the documentation of an 
    InformationMessage that the InformationSet is used in.-->

An example of the use of this DTD is:

<InformationSet ID="NameAndAddress">
  <GlobalID>ABCD12345</GlobalID>
  <Name xml:lang="EN">Name and Address</Name>
  <Name xml:lang="FR">Nom et Addresse</Name>
  <Description>Name and address of otherwise unidentified 
               party in transaction</Description>
  <InformationUnitReference>
     http://www.ebxl.org/CoreComponents?PartyName
  </InformationUnitReference>
  <InformationUnitReference>
     http://www.ebxl.org/CoreComponents?BuildingIdentifier
  </InformationUnitReference>
  <InformationUnitReference>
     http://www.ebxl.org/CoreComponents?Street
  </InformationUnitReference>
  <InformationUnitReference>
     http://www.ebxl.org/CoreComponents?Town
  </InformationUnitReference>
  <InformationUnitReference>
     http://www.ebxl.org/CoreComponents?AdministrativeArea
  </InformationUnitReference>
  <InformationUnitReference>
     http://www.ebxl.org/CoreComponents?Country
  </InformationUnitReference>
  <InformationUnitReference>
     http://www.ebxl.org/CoreComponents?PostCode
  </InformationUnitReference>
  <UsedIn>http://www.ebxml.org/Messages?DespatchNote</UsedIn>
</InformationSet>

The ISO/IEC 11179 definition of an Information Unit is:

To follow

The following XML Declarations can be used to manage records that encode the definition of an Information Unit:

<!ELEMENT InformationUnit (GlobalID, Name+, Description*, 
                           EmbeddedUnit*, UsedIn*)>
<!ELEMENT InformationUnit ID ID #REQUIRED>
<!ELEMENT GlobalID (#PCDATA)>
<!ELEMENT Name (#PCDATA)>
<!ATTLIST Name xml:lang CDATA "EN">
<!ELEMENT Description (#PCDATA)>
<!ATTLIST Description xml:lang CDATA "EN">
<!ELEMENT EmbeddedUnit (#PCDATA)>
<!--EmbeddedUnit contains a URL that references the documentation of 
    the embedded InformationUnit.-->
<!ELEMENT UsedIn (#PCDATA) >
<!--UsedIn contains a URL that references the documentation of an 
    InformationSet that the InformationUnit is used in.-->

An example of the use of this DTD is:

<InformationUnit ID="Street">
  <GlobalID>MTB12345</GlobalID>
  <Name xml:lang="EN">Street</Name>
  <Name xml:lang="FR">Rue</Name>
  <Name xml:lang="DE">Strasse</Name>
  <Description xml:lang="EN">The name used to identify a sequence
                             of buildings that bound a 
                             transportation route</Description>
  <UsedIn>http://www.ebxml.org/CoreComponents?NameAndAddress</UsedIn>
<InformationUnit>

Question: Do we need to define a mechanism for defining properties of Information Units?

The following elements can be used to encode records about the way in which one information unit derives from another information unit:

To follow

The following elements can be used to encode records about the meaning of values of Information Units.

To follow - based on XML Schema: Datatype with some rules for adding Global IDs, etc.

Note: These models need to be separated from the Information Unit definition so that datatypes and coded lists can be used in conjunction with different Information Units without needing to be redeclared. A technique for cross-referencing between this model and the information unit definitions needs to be worked out.

Why use Global Identifiers?

The difference between a Global Identifier and the local ID assigned to the record is that while local IDs are resolved with respect to a local URL, global IDs are resolved with respect to a wider scheme. It is expected that the same local ID would need to be applied by different user communities to make their own documentation and examples more readable. The Global ID should allow similarly named Information Messages, Information Sets and Information Units to be distinguished without requiring that they change the name by which they are locally referenced.

Why use URLs for cross-referencing rather than Global Identifiers?

It is anticipated that, for the time being, the standard method of access to information from sites maintaining industry-specific sets of definitions, will be through references resolved using CGI scripts rather than a local database engine. In the longer term it is expected that such links will be replaced by XQL requests, but it has already been stated that these must be expressible as a URL. Therefore an approach based on local resolution of a URL rather than the use of a special resolution mechanism based on the Global ID, such as is used for Digital Object Identifiers, would seem to be the most flexible one. This approach has the further advantage that the locally meaningful ID is used, rather than the less meaningful Global ID.

Follow-Ups:
- Re: Syntax Free Models
  - From: "Tim McGrath" <tmcgrath@tedis.com.au>

References:
- RE: Syntax Free Models
  - From: Keith.Finkelde@BTFinancialgroup.com

ebxml-core message

Syntax-Neutral Definition of Business Semantics

Martin Bryan, The SGML Centre (mtbryan@sgml.u-net.com)

What is the purpose of a Business Information Message?

Message Relationships

The Components of a Business Information Message

But what do these names really mean?

How to represent these information sets

Why use Global Identifiers?

Why use URLs for cross-referencing rather than Global Identifiers?

Martin Bryan, The SGML Centre
(mtbryan@sgml.u-net.com)