[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Subject: Almost-everywhere XML Packaging for ebXML: strawman for discussion
[This draft shows one way to do XML packaging primarily within XML--at a price discussed in section 5.] XML Packaging Recipe Draft 1.0 Because MIME packaging is already well understood and defined, it is useful to develop an initial XML packaging scheme as a constructive recipe based on MIME packaging. Also because it is unlikely that XML has the best solution for every packaging task, MIME is likely to be needed, so it will probably be useful to at least have similar semantics of packaging even if the syntax differs. Also it is assumed that each transport will define a MIME content type for the payload, whether the payload is MIME packaged or XML packaged. (So there will be a SMTP or HTTP header defining the content-type of the XML package, even if that information is effectively repeated in the XML packaging recipe below...) Here is a first attempt at the recipe: 1. The overall XML package has its own element start and stop tag, called here XMLPackage. So the outside (minus the prolog; see later discussion) is: <XMLPackage> <!-- This is a comment indicating lots of stuff omitted here.--> </XMLPackage> Add to this toplevel format as needed: choose more informative tag names or add attributes, eg. 2. The MIME (internal body part) headers are structured headers and the headers always have the string "Content-" as a prefix. The common headers are: "Content-type", "Content-disposition", "Content-id", and "Content-length". (In MIME, if these are omitted default types are assumed. Issue: within the XML package should these have defaults? Should they also be case-insensitive?) The recipe idea is to make each one of these headers a sequence of elements for each packaged "unit". The header name is the element tag, the header value (other than comments or parameters) is CDATA for the element, comments are omitted, parameters are treated as attribute names, parameter values as attribute values, semicolons are omitted, and the "boundary" parameter can be omitted (not generally used anyway for XML packaging). So for the headers (illustrative purposes only), Content-type: multipart/related; type="ebxml" Content-disposition: attachment Content-length: 54000 Content-id: mrebxml we would obtain: <Content-type type="ebxml"> multipart/related </Content-type> <Content-disposition> attachment </Content-disposition> <Content-length> 540000 </Content-length> <Content-id> mrebxml </Content-id> 3. The headers probably should be grouped with the message body parts that they pertain to: Some start and stop tag conventions need to be created. For example, we can derive them from the value of the content-type: <multipart-related> <Content-type type="ebxml"> multipart/related </Content-type> <Content-disposition> attachment </Content-disposition> <Content-length> 540000 </Content-length> <Content-id> mrebxml </Content-id> <!-- Body parts in multipart related go here. --> </multipart-related> In effect, these start and stop tags will replace the function of MIME boundaries in showing where to start and stop. For a multipart related of ebXML manifest and application/xml body parts, we might have as the inner structure something like: <application-ebxml-manifest> <Content-type type="ebxml" charset="utf-8" > application/ebxml-manifest </Content-type> <Content-id> ebxml-manifest </Content-id> <!--first body part payload with no prolog allowed --> </application-ebxml-manifest> <application-xml> <Content-type type="purchase order" charset="utf-8" > application/ebxml </Content-type> <Content-id> ebxml-purchase-order </Content-id> <!--second body part payload no prolog allowed--> </application-xml>. <!-- Replace the comment containing "Body parts in multipart related go here." by the above material to get a fully expanded example.. --> 4. Clash of data types or problem of "binary" data. (By "binary" I mean to indicate any data stream that would clash with the character set encoding used for the xml of the package.) There are two general solutions: one is to use a content-transfer-encoding to "hide" the sequence of unsigned octets from the XML parser. The other is to use some variety of virtual containment: for example, put the data into a second body part, wrap the XMLpackage and the data into a multipart/related, and use URIs, URNs, or similar to point to the data. Use the unparsed external entity reference mechanism of XML 1.0 and let the XML application figure out where the data is and how to obtain it. I think the MIME mechanism is probably more widely used for similar problems in W3C drafts, unless the amount of data is very small: then there are various other escape mechanisms. 5. Validity checking of the XMLpackage. IMO, this is a big unsolved issue for XMLpackaging. That is, suppose we have a schema or DTD that defines the validity of two (or more) separate XML documents. We then package the XML documents into an XML package document and the result is well-formed. I believe that to avoid multiplying DTD and Schemas beyond necessity, it would be nice if the validity of the XML package could be defined in terms of the validity of the packaged XML documents. This amounts to a distributive rule for validity over the operation of packaging; that is something like: Validityof(XMLPackageof(XMLdoc1, ...XMLdocN) ) = Validityof(XMLdoc1) and ... and Validityof(XMLdocN) and Packaging_was_OK. I think this property would be nice to have, but no current validating parsers (that I know about) are capable of doing this kind of thing. It also seems to clash with the one prolog constraint within an XML document, if you think about it. The alternatives: 1. Just forget about validity for the package, and treat the XML package as a bit of preprocessing, Pull out the contained documents and somehow figure out (there won't be a prolog for each doc!!), how to check on validity of the documents. 2. Write out a separate DTD or schema for each packaged possibility. (Ugh.) 3. Forget about validity for ebxmlpackaging and just go with well formed XML. Put the validity and semantic checks elsewhere in the processing. (Given the interest in DTD/schema for ebXML will this have any supporters?) 4. There are surely others but I leave these for the list, conference calls, and meetings. Summary: The above recipe (suitably corrected for dumb mistakes and slips) shows that a well-formed XMLpackage could be constructed and also that is could be constructed by simply following an XML-ized version of how MIME packages up body parts. I think it might be useful to reorder some of the elements and possibly make the content-id element the first element for each "document part" Details and style issues like this can be hashed out in the Dallas meeting. Also the recipe shows that we might want to use MIME to handle packages for mixed data types. (XML and binary) Finally attention has been called to one area deserving greater clarification, comment, and discussion from people who are interested and infomed in XML theory issues: how should XMLpackage validity be understood? If you were going to write a DTD or schema for the packaging recipe given above, it would quickly become apparent that no easy way to proceed is currently available. The problem isn't with the particular way that the package is created (MIME shows that this can be done in an automated deterministically recognizable way). The problem is with the idea of combining separately valid document types into another one! XML 1.0, with one prolog preceding the document, is really not geared for treating how to join trees of XML whose validity is independently defined.
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]
Powered by eList eXpress LLC