ebxml-transport message

Subject: Almost-everywhere XML Packaging for ebXML: strawman for discussion
From: "Dale Moberg" <Dale_Moberg@stercomm.com>
To: ebXML-Transport@lists.oasis-open.org
Date: Mon, 13 Mar 2000 10:26:38 -0500

[This draft shows one
way to do XML packaging
primarily within XML--at
a price discussed in section 5.]


XML Packaging Recipe Draft 1.0


Because MIME packaging is already well
understood and defined, it is useful
to develop an initial XML packaging
scheme as a constructive recipe based
on MIME packaging. Also because it is
unlikely that XML has the best solution
for every packaging task, MIME is likely
to be needed, so it will probably be useful to
at least have similar semantics of packaging
even if the syntax differs.

Also it is assumed that each transport
will define a MIME content type for the payload,
whether the payload is MIME packaged
or XML packaged.  (So there will be a SMTP or
HTTP header defining the content-type
of the XML package, even if that information
is effectively repeated in the XML
packaging recipe below...)

Here is a first attempt at the recipe:

1. The overall XML package has its own
element start and stop tag, called here
XMLPackage. So the outside
(minus the prolog; see later discussion)
is:

<XMLPackage>

<!-- This is a comment indicating lots of stuff omitted here.-->

</XMLPackage>

Add to this toplevel format as needed: choose more
 informative tag names or add attributes, eg.

2. The MIME (internal body part) headers
are structured headers and the headers
always have the string "Content-" as a prefix.
The common headers are:
"Content-type", "Content-disposition",
"Content-id", and "Content-length".
(In MIME, if these are omitted default
types are assumed.

Issue: within the
XML package should these have defaults?
Should they also be case-insensitive?)

The recipe idea is to make each one of these
headers a sequence of elements
 for each packaged "unit". The header name
is the element tag, the header value
(other than comments or parameters)
is CDATA for the element, comments are
omitted, parameters are treated as attribute
names, parameter values as attribute
values, semicolons are omitted, and the
"boundary" parameter can be omitted
(not generally used anyway for XML
packaging).

So for the headers (illustrative purposes only),

Content-type: multipart/related; type="ebxml"
Content-disposition: attachment
Content-length: 54000
Content-id: mrebxml

we would obtain:

<Content-type type="ebxml">
multipart/related
</Content-type>
<Content-disposition>
attachment
</Content-disposition>

<Content-length>
540000
</Content-length>
<Content-id>
mrebxml
</Content-id>

3. The headers probably should be grouped
with the message body parts that
they pertain to: Some start and stop tag
conventions need to be created. For
example, we can derive them from the value
of the content-type:

<multipart-related>

<Content-type type="ebxml">
multipart/related
</Content-type>
<Content-disposition>
attachment
</Content-disposition>
<Content-length>
540000
</Content-length>
<Content-id>
mrebxml
</Content-id>

<!-- Body parts in multipart related go here. -->

</multipart-related>


In effect, these start and stop tags will
replace the function of MIME boundaries
in showing where to start and stop.

For a multipart related of ebXML manifest
and  application/xml body parts,
we might have as the inner structure something like:

<application-ebxml-manifest>

<Content-type type="ebxml" charset="utf-8" >
application/ebxml-manifest
</Content-type>
<Content-id>
ebxml-manifest
</Content-id>

<!--first body part payload with no prolog allowed -->

</application-ebxml-manifest>

<application-xml>

<Content-type type="purchase order" charset="utf-8" >
application/ebxml
</Content-type>
<Content-id>
ebxml-purchase-order
</Content-id>

<!--second body part payload no prolog allowed-->

</application-xml>.

<!-- Replace  the comment containing "Body
parts in multipart related go here." by the above material
to get a fully expanded example.. -->

4. Clash of data types or problem of "binary" data.
(By "binary" I mean to indicate
any data stream that would clash with the
character set encoding used for the xml
of the package.)

There are two general solutions: one is to
use a content-transfer-encoding to "hide" the
sequence of unsigned octets from the XML
parser. The other is to use some variety
of virtual containment: for example, put
the data into a second body part, wrap the
XMLpackage and the data into a multipart/related,
and use URIs, URNs, or similar
to point to the data.  Use the unparsed external
entity reference mechanism of XML 1.0
and let the XML application figure out
where the data is and how to obtain it.

I think the MIME mechanism is probably more widely
used for similar problems in W3C drafts,
unless the amount of data is very small: then
there are various other escape mechanisms.

5. Validity checking of the XMLpackage.

IMO, this is a big unsolved issue for XMLpackaging. That is,
suppose we have a schema or DTD
that defines the validity of two (or more) separate
XML documents. We then package the XML documents
into an XML package document and the result
is well-formed. I believe that to avoid multiplying
DTD and Schemas beyond necessity, it would
 be nice if the validity of the XML package could
be defined in terms of the validity of the packaged
XML documents. This amounts to a
distributive rule for validity over the operation
of packaging; that is something like:

Validityof(XMLPackageof(XMLdoc1, ...XMLdocN) ) =
                    Validityof(XMLdoc1) and ... and Validityof(XMLdocN)
                                 and Packaging_was_OK.

I think this property would be nice to have, but
no current validating parsers (that I know about)
are capable of doing this kind of thing. It also
seems to clash with the one prolog constraint
within an XML document, if you think about it.

The alternatives:

1. Just forget about validity for the package, and
treat the XML package as a bit of preprocessing,
Pull out the contained documents and somehow
figure out (there won't be a prolog for each doc!!),
how to check on validity of the documents.
2. Write out a separate DTD or schema for
each packaged possibility. (Ugh.)
3. Forget about validity for ebxmlpackaging
 and just go
with well formed XML.
Put the validity and semantic checks
elsewhere in the processing.
(Given the interest in DTD/schema
for ebXML will this have
any supporters?)
4. There are surely others
but I leave these for the
list, conference calls,
and meetings.


Summary:

The above recipe (suitably corrected for dumb
mistakes and slips) shows that a well-formed XMLpackage
could be constructed and also that is could
be constructed by simply following an XML-ized version
of how MIME packages up body parts.

I think it might be useful to reorder some
of the elements and possibly make
the content-id element the first
element for each "document part"
Details and style issues like this can be hashed
out in the Dallas meeting.

Also the recipe shows that we might want
to use MIME to handle packages
for mixed data types. (XML and binary)

Finally attention has been called to one
area deserving greater
clarification, comment, and discussion
from people who are interested
and infomed in XML theory issues:
how should XMLpackage validity
be understood?

If you were going to write a DTD
or schema for
the packaging recipe given above,
it would quickly become apparent
that no easy way to proceed
is currently available. The problem
isn't with the particular way that
the package is created (MIME shows
that this can be done in an automated
deterministically recognizable way).
The problem is with the idea of
combining separately valid
document types into another one!
XML 1.0, with one prolog preceding
the document, is really not geared
for treating how to  join trees of XML
whose validity is independently
defined.