ebxml-architecture message

Subject: Re: Questions on XMI
From: sbrodsky@us.ibm.com
To: Terry Allen <tallen@bolt.sonic.net>, ebxml-architecture@lists.oasis-open.org
Date: Fri, 5 May 2000 01:49:55 -0600


Terry,

Thanks for the writeup.  I have some responses which I think should clarify
your questions:

1) The <XMI> tag serves two purposes:  It marks the start of XMI within an
XML or HTML stream (for example, XMI messages embedded in SOAP protocol
packages - see SOAP4J) and also provides a standard place for declaring
Namespace, URI, and Version information.  You need to be able to check the
URIs and Versions before reading a message in order to understand what the
message is.  This mechanism has to be generic for a wide range of messages,
and the simplest way of satisfying these requirements is to have a standard
outer tag that contains the message.

2) Since DTDs have the well-known restriction that order and multiplicity
must both be specified, experience has shown that it's counter-productive
to impose ordering requirements on models that do not specify an order.
(An example is three classes with associations between them.)  Imposing an
order means that valid models would need to modified to permanently
incorporate an order before DTDs could be generated - otherwise there would
be no guarantee that generated DTDs from the model would always be the
same.  The choice is either to modify valid models by adding a new order
and finding a way to reincorporate that order back into the model, or to
reduce the ordering requirements.  Since this is an issue only within DTDs,
it makes little sense to modify the models.

3) The Doctype is one mechanism for identifying versions and models, but
the declaration of the Namespaces and URIs in the XMI element provides
complete information and is more aligned with the overall XML direction.

4) The XMI extension mechanism has proven to be extremely valuable when
sharing information in integration scenarios.  Fundamental operations such
as round-trip exchange and difference/merge between existing software
applications are very difficult without extensions to provide a mechanism
for bridging the application's internal view back to the external DTD
representation.  Since extensions may always be ignored by the document
reader, they pose minimal difficulty to avoid.

The entity extension mechanism below is similar to the one used in the CWM
(data warehouse) XMI DTDs.  In XMI DTDs, you can always declare entities as
long as the fully expanded form is correct.

I think these 4 questions are fairly minor, especially when you consider
what you get in return: an open and supported standard for general object
interchange with wide applicability to some of the most important software
domains: designs (UML, ER), languages (Java, IDL), and databases (CWM).
The list is growing quickly because the cost of adding a new DTD is so low.
Draw a UML model, write an interface, or import a database schema, and you
can generate an XMI DTD with freeware tools.

XMI is directly applicable to the ebXML work being produced in several
domains.  As ebXML groups create models, such as business process for
example, they can generate the XMI DTD directly from the specification.
The hard part is getting the model right - after that the XMI DTD falls
right into place.

Thanks,

-Steve

Stephen A. Brodsky, Ph.D.
Software Architect
Notes Address:  Stephen Brodsky/Santa Teresa/IBM@IBMUS
Internet Address:  sbrodsky@us.ibm.com
Phone:  408.463.5659


Terry Allen <tallen@sonic.net> on 02/09/2000 01:41:56 PM

To:   Stephen Brodsky/Santa Teresa/IBM@IBMUS
cc:   "Iyengar, Sridhar" <Sridhar.Iyengar2@UNISYS.com>,
      scott.nieman@norstanconsulting.com, Terry Allen
      <tallen@bolt.sonic.net>
Subject:  Re: FW: Questions on MOF examples for XMI



Here's the write-up I promised; ignore the stuff about Docbook
if you're pressed for time.

Difficulties in using XMI 1.1 for OASIS and EBXML Registry Documents

Disclaimer:  This document is concerned with the generation of DTDs to
define document types for which document instances may be composed
independently of any UML or MOF implementation.  That is not the
goal of XMI 1.1, which is intended as a transfer syntax.  So the problems
noted here arise from the use (or prospective use) of XMI outside its
intended domain of applicability.  And I really do appreciate the work
that's gone into XMI!


Problem Statement

OASIS's Registry and Repository TC has developed a specification for
implementing
an ISO/IEC 11179 registry for a repository of SGML-and XML-related
entities.
This specification includes DTDs for specific documents required in the
registration process.  These DTDs have been composed by hand, and
reflect the present 11179, without the proposed X3.285 revision, which is
specified using UML models.

EBXML's Registry and Repository WG is developing a UML model of use cases
for a registry and repository for commerce-related UML models and related
entities (including XML-related entities).  It is desired to extend the
OASIS
specification to meet EBXML's needs.

Hence it would be desireable to replace the hand-composed OASIS DTDs
with DTDs generated from EBXML's UML model of use cases.  (Just how
to generate the specific documents required is an issue I won't consider
here.)

To add to this mix, it appears that X3.285 may be reaching the point at
which
it could be put into practice; if that were possible, it might be that
much of the
work of the OASIS and EBXML groups has been done already, and we could
just generate the DTDs we need, using XMI's DTD Production Rules.

Unfortunately, DTDs generated using XMI's DTD Production Rules are not
sufficiently constrained to allow validation of documents composed outside
of an environment that enforces the UML model they (partially) express -
and
for OASIS, at least, we cannot assume such an environment.

Put another way, any number of documents can be composed that will
validate against a DTD generated using XMI's DTD Production Rules,
but that do not contain the information required by the model (contra
XMI 1.1, 6.2, second para).


Specifics

1.  The XMI DTD (I used ad/99-10-05, XMI 1.1 RTF UML DTD) is too loose
to begin with.  Even the degenerate case
     <XMI>
     </XMI>

is valid, as none of the XMI element's contents is required.  (Even the
CWM DTD, which requires XMI.header, does not require anything else,
including all the real content, although it is an invalid DTD due to
ambiguity.)

In addition, the liberal use of the ANY keyword, while well intentioned,
permits nonsensical combinations of declared elements.  (See below for
remarks on a better extension mechanism for DTD syntax.)

2.  The generated DTDs are much looser than the models because of
the design decision to eschew the + and ? repetition operators (e.g.
those in Appendix C of the MOF 1.3 document).

3.  For OASIS and probably for EBXML purposes, it is not useful for
the doctype of instances always to be XMI.  What is wanted is a DTD
at the level of content shown in the Letter DTD of MOF 1.3 C-153, for
use independent of the XMI.content element.  (Invariably, programmers
want to identify the document type from the root element; the references
to the model, metamodel, and anything else could be conveyed by FIXED
attributes within the generated content DTD if needed.)

4.  The extension mechanism, using the XMI.extension element, is
much better thought out than such mechanisms generally are, but
presents certain difficulties in itself:

 - It probably shouldn't be present in generated DTDs at all.
 - When added to a PCDATA content model it produces mixed
     content, which SGMLlers can live with but programmers
     new to XML find all but impossible.
 - As the content model of XMI.extension is ANY, an extension
     declared for use in a specific context can appear anywhere,
     within an XMI.extension element.
 - Most XMLlers would rather not have to deal with the XMI.extension
     container element around the actual extension.
 - BTW, what is the difference between XMI.extension and XMI.extensions?
     The XMI 1.1 spec isn't terribly clear.


So, What To Do?

We could generate the desired DTDs, extract the parts that aren't
boilerplate XMI, and rework the content models (perhaps by algorithm
and Perl) to tighten them up to the point they reflect the model.

We could define a different set of DTD Production Rules and build
software to implement them.

We could go back to XMI 1.0 and see if that works better.

We could require that OASIS and EBXML registry documents be
produced by software that enforces the rules of the model rather
than of the generated DTD, and that they be consumed by software
that acts similarly.

We can state the problem and invite other solutions (which is
what I'm doing).


Another Extension Mechanism, Really Just FYI

Now, in XML Schema different extension mechanisms will be available,
and once we have XML Schema we may not care about these DTD syntax
matters.  However, if a extension mechanism meeting the above
objections is desired, the Docbook DTD mechanism, using parameter
entities, might be considered (credit Eve Maler of Sun for much of this
parameterization).

You may not want to go here, but just for example (in SGML syntax,
easily converted to XML without affect wrt the issues discussed here):

<!ENTITY % term.module "INCLUDE">
<![ %term.module; [
<!ENTITY % local.term.attrib "">
<!ENTITY % term.role.attrib "%role.attrib;">

<!ENTITY % term.element "INCLUDE">
<![ %term.element; [
<!ELEMENT Term - O ((%para.char.mix;)+)>
<!--end of term.element-->]]>

<!ENTITY % term.attlist "INCLUDE">
<![ %term.attlist; [
<!ATTLIST Term
                %common.attrib;
                %term.role.attrib;
                %local.term.attrib;
>
<!--end of term.attlist-->]]>
<!--end of term.module-->]]>

The outer INCLUDE parameter entities can be overriden by an
exterior DTD subset so that substitutes can be provided;
%term.role.attrib; is a user-customizable attribute; the important
part here for attributes is %local.term.attrib;, which is defined
as empty but can be redefined in an exterior subset.

For the element's content model, %para.char.mix; is declared as

<!ENTITY % local.para.char.mix "">
<!ENTITY % para.char.mix
                "#PCDATA
                |%xref.char.class;      |%gen.char.class;
                |%link.char.class;      |%tech.char.class;
                |%base.char.class;      |%docinfo.char.class;
                |%other.char.class;     |%inlineobj.char.class;
                |%synop.class;
                |%ndxterm.class;
                %local.para.char.mix;">

note the %local.para.char.mix; parameter entity; if declared in
an external subset it would have to begin with a | to make the
syntax work correctly.

A document employing a customization layer, as we call it
for Docbook, can include that layer in an internal subset (sometimes
useful but may cause difficulties for an uninformed recipient)
or can reference that layer as an external subset in the DOCTYPE
declaration; the external subset must then reference the Docbook DTD
(makes clear that the document conforms to an altered version of the
DTD).

Unfortunately, you can't play this trick with sequences for
syntactic reasons:

<!DOCTYPE foo [
<!ENTITY % foo.extension "proper.content">
<!ELEMENT proper.content (#PCDATA)>
<!ELEMENT foo (proper.content, (%foo.extension;)?)>
]>
<foo>
<proper.content>bar</proper.content>
</foo>

is valid in SGML, but not in XML, and either way, if foo.extension
is declared as

<!ENTITY % foo.extension "">

the result is invalid.  In Docbook we found that anything
that needed extension could either be overriden or has a *'d
content model that can be extended.

regards, Terry