Disposition of comments received on draft-ietf-appsawg-xml-mediatypes-07

This document contains all the comments received wrt draft-ietf-appsawg-xml-mediatypes-07, together with my response. They are divided into two sections, the first for more substantive comments, the second for more editorial ones. Ones where my response has been more-or-less negative are sorted to the end of their respective sections, and shown with a pink background. Where a comment includes a quote (actual or suggested) from the draft, this is shown with a green background.

Substantive/contentious comments

Minor/editorial/overtaken/duplicate comments

Comment on section 2.2, Bray

[UNICODE] defines three "encoding forms", which are independent of serialization

"independent of serialization" mean? I think the UTF-* are actually serializations of unicode codepoints. I suppose UTF-16 is sort-of semi-independent of serialization, but UTF-8 never is.

Response: Changed this to read as follows

[UNICODE] defines three "encoding forms", namely UTF-8, UTF-16, and UTF-32. As UTF-8 can only be serialized in one way, the only possible label for UTF-8-encoded documents when serialised into MIME entities is "utf-8". UTF-16 XML documents, however, can be serialised into MIME entities in one of two ways: either big- endian, labelled (optionally) "utf-16" or "utf-16be", or little- endian, labelled (optionally) "utf-16" or "utf-16le".

Response: and added the following (removing the earlier version from 3.1), per my reply to SM:

UTF-32 has four potential serializations, of which only two (UTF-32BE and UTF-32LE) are given names in in [UNICODE]. Support for the various serializations varies widely, and security concerns about their use have been raised. The use of UTF-32 is NOT RECOMMENDED for XML MIME entities.

Comment on section 3, Rushforth

First off, this is supposed to be the registration of XML for use in the IETF environment. As such it needs to instruct us on how the three different authorities of character encoding metadata are being applied:

  1. W3C/XML encoding declaration
  2. IETF/MIME charset parameter
  3. UNICODE BOM

So, does the advice given by the draft conform to Postel's law? What I would hope for would be advice that if followed over time, would result in a simpler situation.

Response: Without actually using Postel's name, I've expanded the introduction to section 3 to try to make clear that this is exactly my goal in the spec.

Comment on section 3.1, Bray

2nd last para of 3.1, beginning "XML MIME producers are RECOMMENDED to provide means for XML MIME entity authors to determine what value" baffles me. I just read it 3 times and I don't get it. Could we have an example or something? I also think I disagree with my guess as what it's trying to say. I tend to think the tools are going to do a better job of figuring out the right charset labeling than your typical document author.

Ah, I got it finally; as the other thread said, what this is *really* talking about is configuring your web server and so on. So this is OK, except for I think the word ~author~ is misleading since document authors shouldn~t be expected to understand Unicode encodings or webserver considerations. So maybe something like:

XML MIME producers are RECOMMENDED to provide means to control what value, if any, is given to charset parameters for XML MIME entities, for example by enabling Web server configuration of filename-to-Content-Type-header mappings on a file- by-file or suffix basis.

Response: That looks good to me---adopted, with slight modifications

Comment on section 3.1, SM

XML-unaware MIME producers MUST NOT supply a charset parameter with an XML MIME entity unless the entity's character encoding is reliably known.

The title of that section is "XML MIME producers". The above states a (RFC 2119) requirement for a "XML-unaware MIME producer". Can an agent which is not capable of processing XML MIME entities and detecting the XML encoding declaration follow the requirement, and does the requirement even apply given that the requirements being specified are for XML MIME producers?

Response: I've tried to clarify this in the intro to section 3

Comment on section 3.1, SM

As a nit, the unless clause in the [above] requirement (sentence) seems odd. It may be simpler to set requirements for "XML-aware" MIME producers only.

Response: Maybe, as for the following para., this is just too much of an in-crowd thing. What it means to those of us who have struggled with this situation for the last 10 years is "Sysadmins: do not configure your apache servers to serve XML and/or XHTML with a charset param of iso-8859-1 by default unless you really know what your users are shipping"

Response: I've attempted to clarify this by adding a note about server defaults being bad

Comment on section 3.1, SM

XML MIME producers are RECOMMENDED to provide means for XML MIME entity authors to determine what value, if any, is given to charset parameters for their entities, for example by enabling user-level configuration of filename-to-Content-Type-header mappings on a file- by-file or suffix basis.

The "entity authors" in the above is not clear. Is it a person or an agent?

Response: See above for reply to similar query by Tim Bray.

Comment on section 3.1, SM

The use of UTF-32 is NOT RECOMMENDED for XML MIME entities

I suggest having a short explanation about why the use of UTF-32 is not recommended instead of only saying that it is not recommended.

Response: Added a short explanation.

Comment on section 3.1, SM

XML-aware consumers MUST follow the requirements in section 4.3.3 of [XML] that directly address this case.

This is a requirement by reference to an external specification. I am listing this as it is unusual.

Response: I agree it's unusual, but justified in this case I think as we're really sitting on the boundary between XML as such, and MIME.

Comment on section 3.2, Bray

The crucial "this specification sets the priority as follows:" indented para in 3.2. I think a little more is needed. The crucial corner case is when you've got a MIME-header charset that is just wrong but an XML-aware receiver can in fact sort things out based on the encoding declaration. What does being 'authoritative' mean concretely? Is it the RFC's recommendation that the receiver SHOULD refuse to parse the the XML even though it could? If so, we should say so explicitly.

Response: I'll clarify that by 'authoritative' is meant 'do it this way', while acknowledging that this will not (cannot) always do the 'right' thing.

Comment on section 3.2, SM

XML-unaware MIME consumers SHOULD NOT assume a default encoding in this case.

Would a XML-unaware MIME consumer be following this specification?

Response: See above

Comment on section 3.3, Bray

[T]ypo, "thatUTF-16", space needed, also "entitiesnot" in the same sentence.

Response: Fixed, silently

Comment on section 4.2, SM

Interoperability considerations: XML has proven to be interoperable across both generic and task-specific applications and for import and export from multiple XML authoring and editing tools. Validating processors provide maximum interoperability. Although non-validating processors may be more efficient, they are not required to handle all features of XML. For further information, see sub-section 2.9 "Standalone Document Declaration" and section 5 "Conformance" of [XML].

The paragraph is about interoperability considerations. The text comes out as saying that "XML is great". :-) Are there any interoperability issues to consider? That's what the reader might wish to know.

Response: Fair enough. This is very old prose, which I hadn't touched. I've repeated the UTF-8 advice, and pointed back to section 3.

Comment on section 8.1, Bray

[S]ome spacing problems in the NOTE

Response: Fixed, silently

Comment on section 8.1, SM

Media subtypes that do not represent XML MIME entities MUST NOT be allowed to register with a '+xml' suffix.

It would be easier to say that the "+xml" suffix can only be registered for media subtypes that represent XML MIME entities.

Section 8.1 is about IANA registrations. I would read it as guidance for IANA and people requesting a registration. I suggest phrasing the relevant text from that perspective and moving that text into the IANA Considerations section.

Response: Done -- I've left an introductory bit, so the 8.2 doesn't come completely out of the blue. Double negation removed. 8.3 moved as well.

Comment on section 9.8, Bray

all processors will treat the enclosed entity as iso-8859-1 encoded. That is, the 'UTF-8' encoding declaration will be ignored.

Is this really true in practice? I suspect not; so perhaps you should say "all processors which conform to this specification will". Hm, or perhaps the real issue is that in this case, you can't predict what will happen; some implementations will ignore the MIME header, others will drop-kick the XML because of the inconsistency.

Response: Changed to 'conformant processors'. Also elsewhere in section 9.

Comment on section 3, Rushforth

Also, advice crafted for the internet environment, wherein self-describing messages are the standard. So my thinking is that two of the three authorities should be deprecated for XML MIME entities, that is, XML messages in the internet environment i.e. the IETF authority should be favoured.

Response: We simply can't do that in the face of widespread deployed software (all major web-browsers) which give precedence to the BOM over the charset param. Again, the emphasis is on ensuring that all the sources of encoding information present are consistent, and on using UTF-8, to minimize the chances of interop failure.

Comment on section 8.3, SM

Registrations for new XML-based media types which do not use the '+xml' suffix SHOULD, in specifying the charset parameter and encoding considerations, define them as: "Same as [charset parameter / encoding considerations] of application/xml as specified in RFC XXXX."

Why is this a RFC 2119 "should"?

Response: This sub-section is largely unchanged from section 7 of 3023, which uses 2119 throughout for obligations on registration. I think that makes sense, so I've left this alone.

Comment on section 8.3, SM

These registrations SHOULD also make reference to RFC XXXX in specifying magic numbers, base URIs, and use of the BOM.

I suggest rephrasing this as guidance and not as a RFC 2119 recommendation. Please note that I do not have a strong opinion about this as it may be a matter of style.

Response: As above, left alone

Comment on section 8.3, SM

These registrations MAY reference the application/xml registration in RFC XXXX in specifying interoperability and fragment identifier considerations, if these considerations are not overridden by issues specific to that media type.

Why is this a RFC 2119 "may"?

Response: As above

Comment on section 9, SM

I suggest moving the examples in Section 9 to an appendix.

Response: They were in a main section in 3023, and I'd rather not