RE: MS 021c: Content-Type: multipart/related;charset clarificatio n,plea

ebxml-transport message

Subject: RE: MS 021c: Content-Type: multipart/related;charset clarificatio n,please

From: "Patil, Sanjay" <Spatil@netfish.com>

To: 'Gait Boxman' <gait.boxman@tie.nl>, ebxml-transport@lists.ebxml.org

Date: Tue, 10 Oct 2000 18:49:05 -0700

This topic was brought up before. I am cut-pasting excerpts from a previous discussion.

My understanding was that the "charset" attribute at multipart/related level would be

removed, for the same reasons you pointed out.

Dick and others, please comment.

thanks,
Sanjay Patil
----------------------------------------------------------------------------------------------------------
Work Phone: 408 350 9619 http://www.netfish.com

on Fri, 28 Jul 2000 23:32:20 -0400 (EDT) ....

[Prasad Yendluri]

3. BTW what does charset specification at message envelope level (multipart/related) mean? Does it imply, the spec is applicable to all parts in the message (hdr + payload)? I mean what is the purpose of specifying it at that level, as opposed to individual parts (e.g. application/xml, that already supports encoding attribute).

[Dick Brooks]

Several people have provided feedback that a charset parameter would make life easier on implementers. I investigated the use of this attribute in both MIME (RFC 2046 section 4.1.2) and HTTP (RFC 2616, section 3.4). Both strongly encourage its use, especially for text entities. The HTTP spec states:

"Applications SHOULD limit their use of character sets to those defined by the IANA registry."

RFC 2387, the multipart/related spec, does not include a "charset" parameter and use of this parameter at the ebXML message envelope level appears to be a non-standard use of the multipart/related media type.

The character set of the payload body part is determined at runtime, and cannot be "set" within the context of the ebXML packaging spec. I suppose the packaging spec could "RECOMMEND" that payload envelopes include the charset attribute when appropriate. I'll be happy to include this "suggestion" under the section describing the payload, if the group thinks this would help. Do I hear a Yea or a Nay?

With regard to the ebXML header document, it is a given that this document is expressed in XML and all XML compliant processors must be capable of handling both UTF-8 and UTF-16 (ref section 2.2 of XML 1.0 spec). The XML prolog contains an attribute (called "encoding") to identify the character set used in the document. It seems to me the information regarding character set is of most value to the XML processor and for this reason the identification of character set should be stated within the XML prolog as opposed to the MIME envelope. By placing the encoding into the XML prolog the XML parser will then have access to the information and can invoke proper processing. If the character encoding were placed into the MIME envelope then the program used to parse the XML document would have to be made "aware" of the encoding either programmatically or by altering the XML document prolog to include the encoding attribute, set to the character set that was identified in the Content-type header. BUT, making such alterations is risky, especially when dealing with signed documents.

The bottom line is this; It appears the character set encoding for the ebXML header should be identified within the XML prolog of the ebXML header document. The ebXML payload envelope Content-type is dynamic and germane to the type of object contained in the payload, in other words. the Content-type is determined by the implementer at runtime. It would appear the BEST we can do within the packaging spec is RECOMMEND that implementers specify the charset attribute within the Content-type of all payload body parts whenever appropriate.

I feel like I'm rambling, but hopefully this diatribe will help you understand my conclusions.

thanks,
Sanjay Patil
----------------------------------------------------------------------------------------------------------
Work Phone: 408 350 9619 http://www.netfish.com

-----Original Message-----
From: Gait Boxman [mailto:gait.boxman@tie.nl]
Sent: Tuesday, October 10, 2000 7:24 AM
To: ebxml-transport@lists.ebxml.org
Subject: MS 021c: Content-Type: multipart/related;charset clarification, please

Hi,

can someone clarify me on section 7.2.1 of Messaging Service Specification V.021c?

It says that the Content-Type header for the multipart has a charset attribute, and I'm puzzled on it's use.

Why would there be a charset specification on a multipart which can contain multiple bodyparts using different encodings and charsets?

I could think of only two reasons:

1. to specify an alternate character set for the multipart header and boundary data,

2. to specify a default character set for the contained body parts.

However, RFC 2046 states (on the bottom of page 18) that all header fields and boundary delimiters are represented in 7bit US-ASCII, which rules out reason one. Furthermore, RFC 2046 states (on the bottom of page 17/top of page 18) a the default RFC 822 content-type for bodyparts with no headers.

So far, I have found that the charset attribute is applicable for text content types, but not for multiparts.

Did I miss something in the RFC's, is there some other source I did not check. Or was it put in by accident?

Thanks, Gait Boxman.