This document contains all the comments received wrt draft-ietf-appsawg-xml-mediatypes-06, together with my response. They are divided into two sections, the first for more substantive comments, the second for more editorial ones. Ones where my response has been more-or-less negative are sorted to the end of their respective sections, and shown with a pink background. Where a comment includes a quote (actual or suggested) from the draft, this is shown with a green background.
If the future is UTF-8, UTF-8, UTF-8, then the two documents should say so, right at the beginning.
Going forward, XML producers SHOULD use UTF-8 exclusively, without any BOM. For compatibility with existing implementations, processing rules are given.
[Ned Freed adds:]
I believe the future is UTF-8, including but not limited to its use in XML, and we should do what we can to promote it. But beliefs about the future don't necessarily belong in an RFC.
Moreover, is this the right place and the right organization to make such a statement about XML? The IETF doesn't own the XML specification, the W3C does. And this is a document about how to register XML media types, not about how to use XML.
Mind you, given my own beliefs I don't personally object to such a statement if there is consensus to include it. I just wonder if it is appropriate.
[Larry Masinter adds:]
I believe the future is UTF-8, including but not limited to its use in XML, and we should do what we can to promote it. But beliefs about the future don't necessarily belong in an RFC.
We have many documents that give guidelines.
Moreover, is this the right place and the right organization to make such a statement about XML? The IETF doesn't own the XML specification, the W3C does. And this is a document about how to register XML media types, not about how to use XML.
Organization: yes. We're giving guidelines for communicating text using MIME, and the interaction of the 'charset' parameter in the metadata with other sources of encoding information.
Place: Partly. Guidelines for future text media types belong in a MIME BCP.
Mind you, given my own beliefs I don't personally object to such a statement if there is consensus to include it. I just wonder if it is appropriate.
Although the general policy doesn't belong in appsawg-xml-mediatypes, but It would be appropriate to advise senders of text/xml and application/xml to send UTF-8, to not include a BOM, to recommend whether or not to use a charset="UTF-8" parameter, to recommend whether or not to include an internal charset declaration, even when receivers recognize and interpret other encodings.
This is an interoperability consideration for a revised specification of a widely deployed protocol, based on the belief that UTF-8 is becoming generally even more the default in text-based MIME types.
Those kinds of forward-interoperability requirements seem to be common in protocols... a SHOULD introduces a new policy to correct past interoperability difficulties.
[John Cowan adds:]
[I'm] concerned that this is put on producers rather than transmitters. [I]t's perfectly reasonable for producers to produce other encodings locally. For example, there are many pages produced in Windows encodings.
Response: Added "The use of UTF-8, without a BOM, is RECOMMENDED for all XML MIME entities." as a 1-line paragraph in section 3. Short and sweet, and by restricting itself to XML MIME entities, I hope it avoids turf-war issues and confusion wrt XML for internal consumption somewhere, ref. Freed and Cowan.
XML MIME producers are RECOMMENDED to provide means for XML MIME entity authors to determine what value, if any, is given to charset parameters for their entities, for example by enabling user-level configuration of filename-to-Content-Type-header mappings on a file- by-file or suffix basis.
The example really needs recommendation not just for producers, but also for consumers that store received XML in a file system.
Receiver -> file system
  File system -> sender
and you recommend that the "file system" link be enhanced to remember charset if it isn't UTF-8.
For this purpose you could recommend that the file suffix ".xml" be reserved for UTF-8 encoded XML and use separate file extensions For XML encoded in any other charset.
[Mark Baker adds:]
A problem I see with that is that it's lossy; in many (most?) cases a +xml type will be inbound, and .xml is associated with application/xml.
Response: I don't want to go where you want me to here. It's out of scope, and even starting to address it would require a substantial analysis by cases. To say nothing of the fact that it's nowhere supported.
XML MIME producers
"XML MIME producers" generating a MIME body (who SHOULD encode the XML as UTF-8 without a BOM, and SHOULD also include a UTF-8 encoding declaration)
"XML MIME wrappers" those that are not re-encoding an XML body and just want to deliver it via MIME who can follow the guidelines in 3.1.
I don't really understand what a "XML-unaware MIME producer" is. If they're "unaware", why are they reading this spec?
Response: As I said in email, "it means any MIME producer who respects media type registrations as far as possible, including this one, but doesn't have XML processing capabiities, so in particular can't detect or make use of XML encoding declarations." I've added a few words to try to clarify this, at the top of section 3 where the terminology is set out
The reference to Appendix B is too broad -- the documents listed there do not give application semantics at all.
Response: Differentiate between language-level (you get that) and application-level (you get pointers to that)
The examples in security considerations, inherited from the previous document, seem contrived. Henry suggested privately that perhaps mentioning entity spoofing of various sorts might be in order?
Response: In fact various sorts of entity spoofing are already discussed at some length -- happy to get specific suggestions for additions/removals.