Disposition of comments received on draft-ietf-appsawg-xml-mediatypes-02

This document contains all the comments received wrt draft-ietf-appsawg-xml-mediatypes-02, together with my response. They are divided into two sections, the first for more substantive comments, the second for more editorial ones. Ones where my response has been more-or-less negative are sorted to the end of their respective sections, and shown with a pink background. Where a comment includes a quote from the draft, this is shown with a green background.

Substantive/contentious comments

Comment on section 1, 3, SM

The first paragraph of Section 1 and the first paragraph of Section 3 could be merged instead of having "this specification standardizes" repeated.

Response: Done, and other aspects of the intro to 3 simplified

Comment on section 3, HST

Fix the reference to section 3.1 in the 'reasons' discussion to point to 3.6/3.6.1, probably

Response: Done, and most of the history removed

Comment on section 4, HST

Add something about the UTF-8 BOM?

Response: Done

Comment on section 6, Erik Wilde

The media type dependent mechanism for embedding the base URI in a MIME entity of type application/xml, text/xml, application/xml- external-parsed-entity or text/xml-external-parsed-entity is to use the xml:base attribute described in detail in [XBase].

not that i think this matters in practice, but this means that it is impossible to use anything other than XML Base for this, right? XML itself really does not say anything about it, so it seems than text/xml goes a little bit in the direction of the infoset set, specifying additional (but fewer) constraints. is the intention to make this a MUST? if so, wouldn't it be appropriate to use normative language? and if not, wouldn't it be good to say that using XML Base probably is a Really Good Idea, but not actually required?

Response: Ah, thanks -- added XML Base to 'Additional information' for the relevant type registrations in section 3, pointing to section 6, and reworded section 6 accordingly, to say that "An XML ... entity MAY use ... xml:base [to set the base URI]"

Comment on section 6, Erik Wilde

Note that the base URI may be embedded in a different MIME entity, since the default value for the xml:base attribute may be specified in an external DTD subset or external parameter entity.

so this means that the base URI changes depending on whether the used XML processor is validating or not, right? maybe it would be worth spelling this out, because it may come as a surprise to some.

Response: Added a sentence to that effect.

Comment on section 9, HST

Add another 'inconsistent' example with e.g. utf-8 charset param and UTF-16 BOM?

Response: Done

Comment on section 9.4, HST

"Since the charset parameter is not provided in the Content-Type header, in this case XML processors MUST treat the "utf-16" encoding and/or the BOM as authoritative."

May be misleading if we decide that inband is always authoritative.

Response: Modified this and other examples to take account of the BOM>charset>decl rules

Comment on section References, Björn Höhrmann

It seems odd that UTF-16 is a normative reference but UTF-8 is not. The UTF-16 reference should probably be non-normative.

Response: This could go either way. I judged that this sentence:

As described in [RFC2781], the UTF-16 family MUST NOT be used with media types under the top-level type "text" except over HTTP or HTTPS (see section 19.4.1 of [RFC2616] for details).

Response:  amounted to a normative appeal to RFC2781, but that there was nothing like that for ISO-8859 (or ASCII). But I don't feel strongly about this either way --- anyone else?

BH: The RFC 2119 keywords should only be used when removing them would also remove a requirement. That does not seem to be the case in this text, it just recites a requirement spelled out in RFC 2781. So this should not be a "MUST NOT" (make it lowercase or replace with cannot or "is not usable" or something).

SM: The "MUST NOT" could be changed to lower case as it describes a requirement from RFC 2781. The reference would still be normative.

Response: I'll go with that

Comment on section Meta, Erik Wilde

Updates: 4289, 6839 (if approved)

i guess i have the same question here as julian: why would it update RFC 6839, instead of just referencing it? if this is just the technicality of RFC 6839 referencing this draft, then that would probably answer my question.

BH: The update to RFC 6839 is that RFC 6839 no longer defines the `+xml` structured suffix, draft-ietf-appsawg-xml-mediatypes does that once approved. Note that RFC 6839 took that role from RFC 3023 which it updates for the same reason.

JR: Yes and no. RFC 6838 established an IANA registry for this, so from now on it's not necessary anymore to do the "update/obsolete" dance.

BH: I do not know about "necessary", but if RFC 6839 contained only the `+xml` registration I would expect draft-ietf-appsawg-xml-mediatypes to obsolete RFC 6839 just like it obsoletes RFC 3023; but since RFC 6839 contains more than `+xml` it can only "partially obsolete" it, and "Updates" conveys that more clearly than saying nothing.

JR: The point I was trying to make is that we now have a registry. The registry is authoritative; if we move the definition of "+xml", it's sufficient to simply update the registry.

Response: I don't agree. If we don't include 'Updates: 6839', then 6839 will not get an 'Updated by [3023bis]', and anyone reading 6839 will be free to conclude that it defines +xml, which would be false. So I'm going to leave it.

Comment on section 3.6, XML Core WG

In Section 3.6 there is another problematic "only" and a bad comma: for "The charset parameter MUST only be used, when the charset is reliably known and agrees with the in-band XML encoding declaration" read "The charset parameter MUST NOT be used if the charset is not reliably known. If it is used, it MUST agree with the XML encoding declaration."

Response: Overtaken, and in part we disagree. The para. in question now reads:

Response: "The charset parameter MUST NOT be used unless the charset is reliably known. This information will be used by all processors to determine authoritatively the charset of the XML MIME entity in the absence of a BOM."

Comment on section 4, XML Core WG

In Section 4, the sentence "Similarly, when converting from another encoding into "utf-16", the BOM MUST be added after conversion is complete" is incorrect; it should read, "Similarly, when converting from another encoding into "utf-16", either an appropriate encoding declaration MUST be added or modified, or a BOM MUST be added."

Response: I wrote it that way because 4.3.3 in the XML spec [1] says
    "Entities encoded in UTF-16 *must* . . .begin with the Byte Order Mark"
So I don't think a change is appropriate.

Comment on section 5, Julian Reschke

If [XPointerFramework] and [XPointerElement] are inappropriate for some XML-based media type, it SHOULD NOT follow the naming convention '+xml'.

Really? Why not? What about application/xhtml+xml?

Response: How is its syntax or semantics for fragments inconsistent with XPointer?

Minor/editorial/overtaken/duplicate comments

Comment on section Meta, Björn Höhrmann

I note it says "Updates: 4289" but does not reference RFC 4289 anywhere.

Response: Ah, another orphan. Thanks, I'll get rid of that, and the entry in the Informative References section.

Comment on section Meta, SM

The draft updates RFC 4289. There isn't any explanation about what is being updated.

Response: I've removed the mention of 4289, as it is no longer referenced.

Comment on section Meta, Julian Reschke

"Updates: 4289, 6839 (if approved)"

Really?

Response: Not 4289, removed. But 6839 yes -- it has an sort-of interim registration of +xml, which this does update.

Comment on section Abstract, Julian Reschke

Major differences from [RFC3023] are alignment of charset handling for text/xml and text/xml-external-parsed-entity with application/ xml, the addition of XPointer and XML Base as fragment identifiers and base URIs, respectively, mention of the XPointer Registry, and updating of many references.

I don't think this needs to be in the Abstract. Also, references are discouraged here because the abstract should be usable stand-alone. So maybe move into the Introduction.

Response: Moved as suggested.

Comment on section Abstract, Erik Wilde

This specification also standardizes a convention (using the suffix '+xml') for naming media types outside of these five types when those media types represent XML MIME entities.

isn't that convention standardized by RFC 6839 already? i guess the problem is that instead of defining a registry that can be updated, any change to the suffixes needs to update RFC 6839? maybe rephrase this to say "updates the convention", because when reading this abstract, people knowing RFC 6839 may be wondering what's going on.

BH: It seems s/standardizes/defines/ would address this comment.

Response: Reword to make clear it's not establishing the suffix convention, just using it

Comment on section Abstract, Erik Wilde

Major differences from [RFC3023] are alignment of charset handling for text/xml and text/xml-external-parsed-entity with application/ xml, the addition of XPointer and XML Base as fragment identifiers and base URIs, respectively, mention of the XPointer Registry, and updating of many references.

agree with julian that this is should not be part of the abstract. it's useful and probably could be easily moved to someplace else.

Response: Done

Comment on section 3, SM

However, developers of such media types are STRONGLY RECOMMENDED to use this specification as a basis for their registration.

The sentence needs an editorial pass, i.e. it is RECOMMENDED. I suggest dropping the "STRONGLY" instead of having to explain the uppercase.

Response: Done

Comment on section 3, SM

I suggest referencing draft-ietf-httpbis-p3-payload-18 instead of discussing about RFC 2616 and the HTTPbis drafts.

Response: There's no mention of charset in p3-payload? In any case per other comments I've changed this ref to p2-semantics, where the default removal is discussed.

Comment on section 3, Dave Cridland

Top of page 6, inside §3, there's a slightly mangled paragraph:

The top-level media type "text" has some restrictions on MIME entities and they are described in [RFC2045] and [RFC2046]. In particular, for transports other than HTTP [RFC2616] or HTTPS (which uses a MIME-like mechanism). the UTF-16 family, UCS-4, and UTF-32 are not allowed However, section 4.3.3 of [XML] says:

Beyond the apparent mangling, I'm pretty sure that nothing prevents the use of UTF-16 in a text/xml within email, as long as it has a suitable Content-Transfer-Encoding for the transmission path (ie, base64, probably). The encoding considerations in §3.1, which §3.2 points to, seem to agree with me.

I think what it's trying to say is that an XML document using UTF-16 encoding will by definition have NUL octets, and these are not allowed unencoded along 7- or 8-bit transmission paths, and therefore must be encoded using Base64 on non-binary paths.

BH: It's partly about http://tools.ietf.org/html/rfc2046#section-4.1.1.

Response: The whole enclosing bulleted list, being of historical interest only, has now been removed.

Comment on section 3, Julian Reschke

document entities The media types application/xml or text/xml MAY be used

s/used/used./

Response: Done (silently)

Comment on section 3, Julian Reschke

Application/xml and application/xml-external-parsed-entity are recommended. Compared to [RFC2376] or [RFC3023], this specification alters the charset handling of text/xml and text/xml-external-parsed- entity, treating them no differently from the respective application/ types. The reasons are as follows:

s/Application/application/

Also, avoid lowercase "recommended" it it's not a "RECOMMENDED".

Response: Restructured and moved to RECOMMENDED

Comment on section 3, Julian Reschke

Conflicting specifications regarding the character encoding have caused confusion. On the one hand, [RFC2046] specifies "The default character set, which must be assumed in the absence of a charset parameter, is US-ASCII.", [RFC2616] Section 3.7.1, defines that "media subtypes of the 'text' type are defined to have a default charset value of 'ISO-8859-1'", and [RFC2376] as well as [RFC3023] specify the default charset is US-ASCII.

I think this just repeats history already captureed in RFC 6557. Do we really need to repeat it over here?

Response: Removed the whole bulleted list in favour of a much simpler summary based on the existing 3rd bullet

Comment on section 3, Julian Reschke

The current situation, reflected in this specification, has been simplified by [RFC6657] updating [RFC2046] to remove the US-ASCII default. Furthermore, in accordance with [RFC6657]'s other recommendations, [HTTPbis] changes [RFC2616] by removing the ISO-8859-1 default and not defining any default at all.

This is a bit misleading as the change in httpbis predates RFC6657 significantly.

Response: Inverted

Comment on section 3, Julian Reschke

The top-level media type "text" has some restrictions on MIME entities and they are described in [RFC2045] and [RFC2046]. In particular, for transports other than HTTP [RFC2616] or HTTPS (which uses a MIME-like mechanism). the UTF-16 family, UCS-4, and

It would be helpful if the reference to 2045/6 would be a bite more specific.

I'd also prefer to get rid of all RFC2616 references except when referring to the specification's history.

Response: Per previous+1 response, whole bullet removed

Comment on section 3, Julian Reschke

However, developers of such media types are STRONGLY RECOMMENDED to use this specification as a basis for their registration. In particular, the charset parameter, if used, MUST agree with the in- band XML encoding of the XML entity, as described in Section 3.6, in order to enhance interoperability.

There's no "STRONGLY" keyword. In general, I'd avoid to use BCP14 keywords for recommendations to people.

Response: removed STRONGLY, but left RECOMMENDED as this is about a registration, for which normative language is appropriate

Comment on section 3, Erik Wilde

The top-level media type "text" has some restrictions on MIME entities and they are described in [RFC2045] and [RFC2046]. In particular, for transports other than HTTP [RFC2616] or HTTPS (which uses a MIME-like mechanism). the UTF-16 family, UCS-4, and UTF-32 are not allowed However, section 4.3.3 of [XML] says:

s/mechanism). the/mechanism), the/

i am not quite understanding the paranthesis after HTTPS. isn't HTTP doing exactly the same as HTTP, only over a safe transport? what does "which uses a MIME-like mechanism" refer to?

Response: This whole bulleted list is gone

Comment on section 3, Erik Wilde

XML provides a general framework for defining sequences of structured data. In some cases, it may be desirable to define new media types that use XML but define a specific application of XML, perhaps due to domain-specific display, editing, security considerations or runtime information.

the "perhaps" part seems a bit modest. there's quite a large set of web service designers thinking that unless you are exposing generic XML facilities (such as an XML database), you shouldn't be using a generic XML media type. so i would not make this sound as restricted as it is sounding now.

Response: Reworded to make it stronger

Comment on section 3, XML Core WG

In Section 3, the sentence beginning "Thus, although all XML processors" says "(except for HTTP)", but should say "(except for HTTP, HTTPS, and other protocols if base64 transport encoding is in use)". This is spelled out later in the document.

Response: Overtaken: the enclosing list has been deleted

Comment on section 3, XML Core WG

"Furthermore, such media types may allow UTF-8 or UTF-16 only and prohibit other charsets" has a misplaced "only" which makes it unclear; it should read "Furthermore, such media types may forbid charsets other than UTF-8 (or other than UTF-8 or UTF-16)

Response: Changed (silently) to "may allow only UTF-8 and/or UTF-16 and prohibit..."

Comment on section 3.1, Julian Reschke

Encoding considerations: This media type MAY be encoded as appropriate for the charset and the capabilities of the underlying MIME transport. For 7-bit transports, data in either UTF-8 or

I don't understand the "MAY" here.

Response: This text is unchanged from 3023 itself. But I agree it reads oddly. I've tried to reword, but please review carefully: it's now longer, but I hope more explicit.

Comment on section 3.1, Julian Reschke

Published specification: Extensible Markup Language (XML) 1.0 (Fifth Edition) [XML], Extensible Markup Language (XML) 1.1 (Second Edition) [XML1.1].

OK, so I can use the same media type for both XML 1.0 and 1.1. However, the way this is phrased makes it appear as if XML 1.1 is somehow more ... recent when in fact it was a dead-end.

I recommend dropping the references about 1.1 from everywhere, and just have a single place that points out that what's said about 1.0 is also true for 1.1.

Response: Did this, more or less, in the course of considerably tightening up section 7, q.v.

Comment on section 3.1, Erik Wilde

Interoperability considerations: XML has proven to be interoperable across both generic and task-specific applications and for import and export from multiple XML authoring and editting tools. For

s/editting/editing/

Response: Done

Comment on section 3.1, Erik Wilde

Applications that use this media type: XML is device-, platform-, and vendor-neutral and is supported by a wide range of generic XML tools (editors, parsers, Web agents, ...) and task-specific applications.

i am not sure this needs to be spelled out, but in between generic XML and task-specific applications, there's a large set of generic XML-based formats (such as Atom) which does not really seem to fit well into this characterization of applications?

Response: Added a clause

Comment on section 3.1, Erik Wilde

Although no byte sequences can be counted on to always be present, XML MIME entities in ASCII-compatible charsets

s/charsets/character sets/

Response: Done, passim

Comment on section 3.5, Julian Reschke

Interoperability considerations: XML DTDs have proven to be interoperable by DTD authoring tools and XML browsers, among others.

What is an "XML browser"? If this is about web browsers I really have my doubts that they work interoperably :-)

Response: Changed to "XML validator" passim

Comment on section 3.6, Julian Reschke

The charset parameter MUST only be used, when the charset is reliably known and agrees with the in-band XML encoding declaration. This

s/used,/used/

Also, what if there is no in-band declaration?

Response: Overtaken -- this section has been rewritten -- please review carefully!

Comment on section 3.6, Julian Reschke

authoritatively the charset of the XML MIME entity. The charset parameter can also be used to provide protocol-specific operations, such as charset-based content negotiation in HTTP.

That's misleading. charset-based content negotiation happens by use of Accept-Encoding, bot the charset parameter.

Response: I've removed that whole sentence -- if you think there was something of value there to be rescued, let me know.

Comment on section 3.6.1, Julian Reschke

There are several reasons that the charset parameter is optionally allowed. First, recent web servers have been improved so that users

That text is 12 years old. We may want to drop or rephrase it :-)

Response: This whole subsection is gone now.

Comment on section 3.6.1, Julian Reschke

can specify the charset parameter. Second, [RFC2130] (informative) specifies that the recommended specification scheme is the "charset" parameter.

That refers to a document from 1996. Is this really relevant here?

Response: This whole subsection is gone now.

Comment on section 3.6.1, Julian Reschke

On the other hand, it has been argued that the charset parameter should be omitted and the mechanism described in Appendix F of [XML] (which is non-normative) should be solely relied on. This approach would allow users to avoid configuration of the charset parameter; an XML document stored in a file is likely to contain a correct encoding declaration or BOM (if necessary), since the operating system does not typically provide charset information for files. If users would like to rely on the in-band XML encoding declaration or BOM and/or to conceal charset information from non-XML processors, they can omit the parameter.

This now is really the recommended approach, no? Maybe the whole of 3.6.1 should be removed then.

Response: This whole subsection is gone now.

Comment on section 5, Julian Reschke

Uniform Resource Identifiers (URIs) may contain fragment identifiers (see Section 3.5 of [RFC3986]). Likewise, Internationalized Resource Identifiers (IRIs) [RFC3987] may contain fragment identifiers.

s/may/can/

Also, the reference to RFC3987 really doesn't add anything useful here.

Response: Done and removed, respectively.

Comment on section 5, Julian Reschke

See Section 8.1 for additional rquirements which apply when an XML- based MIME media type follows the naming convention '+xml'.

s/rquirenents/requirements/

Response: Done

Comment on section 5, Julian Reschke

When a URI has a fragment identifier, it is encoded by a limited subset of the repertoire of US-ASCII [ASCII] characters, as defined in [RFC3986]. When an IRI contains a fragment identifier, it is encoded by a much wider repertoire of characters. The conversion between IRI fragment identifiers and URI fragment identifiers is presented in Section 7 of [RFC3987].

I recommend to drop the IRI specific part. This is not specific to XML types.

Response: Done

Comment on section 5, Erik Wilde

The syntax and semantics of fragment identifiers for the XML media types defined in this specification are based on the [XPointerFramework] W3C Recommendation. It allows simple names, and more complex constructions based on named schemes. When the syntax of a fragment identifier part of any URI or IRI with a retrieved media type governed by this specification conforms to the syntax specified in [XPointerFramework], conformant applications MUST

not sure: s/conformant/conforming/

Response: Done

Comment on section 5, Erik Wilde

interpret such fragment identifiers as designating that part of the retrieved representation specified by [XPointerFramework] and whatever other specifications define any XPointer schemes used. Conformant applications MUST support the 'element' scheme as defined

again, not sure: s/Conformant/Conforming/

Response: Done

Comment on section 5, Erik Wilde

See Section 8.1 for additional rquirements which apply when an XML-

s/rquirements/requirements/

Response: Done

Comment on section 5, Erik Wilde

When a URI has a fragment identifier, it is encoded by a limited subset of the repertoire of US-ASCII [ASCII] characters, as defined in [RFC3986]. When an IRI contains a fragment identifier, it is

this reads a bit odd. what about saying:

"URIs [RFC3986] are encoded in a limited subset of the repertoire of US-ASCII [ASCII], and therefore this encoding applies to fragment identifier parts of URIs as well."

Response: Overtaken

Comment on section 6, Julian Reschke

Note that the base URI may be embedded in a different MIME entity, since the default value for the xml:base attribute may be specified in an external DTD subset or external parameter entity.

s/may/might/ s/may/can/

Response: Done

Comment on section 6, Erik Wilde

Section 5.1 of [RFC3986] specifies that the semantics of a relative URI reference embedded in a MIME entity is dependent on the base URI. The base URI is either (1) the base URI embedded in context, (2) the base URI from the encapsulating entity, (3) the base URI from the Retrieval URI, or (4) the default base URI, where (1) has the highest precedence.

s/where (1) has the highest precedence./sorted by declining precedence./

Response: Reworded to follow 3986 more closely

Comment on section 6, Erik Wilde

The media type dependent mechanism for embedding the base URI in a MIME entity of type application/xml, text/xml, application/xml- external-parsed-entity or text/xml-external-parsed-entity is to use the xml:base attribute described in detail in [XBase].

maybe rename the reference to [XMLBase] to reflect the name of the spec?

Response: Done

Comment on section 7, Julian Reschke

application/xml, application/xml-external-parsed-entity, and application/xml-dtd, text/xml and text/xml-external-parsed-entity are to be used with [XML] In all examples herein where version="1.0" is

s/[XML]/[XML]./

Response: Done

Comment on section 8, Julian Reschke

This specification recommends the use of a naming convention (a suffix of '+xml') for identifying XML-based MIME media types,

s/MIME// (there may be more instances of this)

Response: Done (found one other)

Comment on section 8, Julian Reschke

whatever their particular content may represent, in line with the

What is the "whatever their particular content may represent" about?

Response: Removed

Comment on section 8, Julian Reschke

When a new media type is introduced for an XML-based format, the name of the media type SHOULD end with '+xml'. This convention will allow

Which may be in conflict with the SHOULD NOT I complained about earlier on :-)

Response: Added a (somewhat rebarbative) clarification

Comment on section 8, Julian Reschke

NOTE: Section 14.1 of HTTP [RFC2616] does not support Accept headers of the form "Accept: */*+xml" and so this header MUST NOT be used in this way. Instead, content negotiation [RFC2703] could potentially be used if an XML-based MIME type were needed.

Please cite HTTPbis P2. Also, content negotiation is defined by HTTP, not RFC 2703.

Response: HTTP citation updated. 2703 sentence removed, as it appears to have not been widely adopted.

Comment on section 8, Julian Reschke

XML generic processing is not always appropriate for XML-based media types. For example, authors of some such media types may wish that the types remain entirely opaque except to applications that are specifically designed to deal with that media type. By NOT following the naming convention '+xml', such media types can avoid XML-generic processing. Since generic processing will be useful in many cases, however -- including in some situations that are difficult to predict ahead of time -- those registering media types SHOULD use the '+xml' convention unless they have a particularly compelling reason not to.

I recommend to avoid the use of SHOULD here. Just explain the pros and cons.

Response: Reworded

Comment on section 8, Julian Reschke

The registration process for specific '+xml' media types is described in [RFC6838] and [RFC6839]. The registrar for the IETF tree will

Just RFC6838, as far as I can tell.

Response: Removed, and vice versa in Section 10

Comment on section 8, Erik Wilde

MIME entities by comparing the subtype to the pattern '*/*+xml'. (Of course, 4 of the 5 media types defined in this specification -- text/ xml, application/xml, text/xml-external-parsed-entity, and application/xml-external-parsed-entity -- also represent XML MIME entities while not conforming to the '*/*+xml' pattern.)

maybe: s/Of Course/For historical reasons/

Response: replaced with "However note that"

Comment on section 8, Erik Wilde

The registration process for specific '+xml' media types is described in [RFC6838] and [RFC6839].

just [RFC6838], i think.

Response: Done

Comment on section 8.1, Julian Reschke

The use of the charset parameter is STRONGLY RECOMMENDED, since this information can be used by XML processors to determine authoritatively the charset of the XML MIME entity. If there are some reasons not to follow this advice, they SHOULD be included as part of the registration. As shown above, two such reasons are "UTF-8 only" or "UTF-8 or UTF-16 only".

That's misleading. People may read it as saying that the *presence* of the charset parameter is RECOMMENDED.

Response: Changed to 'Enabling the..."

Comment on section 8.2, Julian Reschke

In practice these constraints imply that for a fragment identifier addressed to an instance of a specific "xxx/yyy+xml" type, there are three cases: For fragment identifiers matching the syntax defined in Section 5, where the fragment identifier resolves per the rules specified there, then process as specified there;

Section 5 does not define the syntax (other then referencing XPointer). So this is a bit hard to process.

Response: Pointed directly to XPointer...

Comment on section 8.2, Julian Reschke

For fragment identifiers _not_ matching the syntax defined in Section 5, then process as specified in "xxx/yyy+xml".

What would be an example for this case?

Response: Added a media frag. example

Comment on section 9, Julian Reschke

All the examples below apply to all five media types declared above in Section 3, as well as to any media types declared using the '+xml' convention. See the XML MIME entities table (Section 3, Paragraph 2)

Well, unless that type does not define the charset parameter, right?

Response: Added a caveat

Comment on section 9, Julian Reschke

This section is non-normative. In particular, note that all "MUST" language herein reproduces or summarizes the consequences of normative statement already made above, and have no independent normative force.

Can we avoid the use of MUST here, then? :-)

Response: See resolution to Björn Höhrmann's comment on References below

Comment on section 9.1, Julian Reschke

printable or base64. For an 8-bit clean transport (e.g., 8BITMIME ESMTP or NNTP), or a binary clean transport (e.g., HTTP), no content- transfer-encoding is necessary.

...as HTTP does not even define content-transfer-encoding. (same applies to parts below)

Response: Added qualification in various places

Comment on section 9.12, sm

If sent using a 7-bit transport (e.g., SMTP) or an 8-bit clean transport (e.g., 8BITMIME ESMTP or NNTP), the XML MIME entity MUST be encoded in quoted-printable or base64.

The RFC 2119 requirement are in several places (if I recall correctly) in the draft. I suggest having the requirement in one place.

Response: There's a para at the top of the examples section (9) which says that the MUST etc. language is a recap of normative statements elsewhere -- isn't it better to leave them as pretty much exact copies, rather than confusingly reword them? I've just changed to lower case -- see below

Comment on section 9.2, Julian Reschke

As described in [RFC2781], the UTF-16 family MUST NOT be used with media types under the top-level type "text" except over HTTP or HTTPS (see section 19.4.1 of [RFC2616] for details). Hence this example is

Not sure how that section of 2616 is relevant here.

Response: Good catch -- this was copied from 2376!, where it was a reference to 2068 -- updated

Comment on section 9.2, Erik Wilde

For application... cases, if sent using a 7-bit transport (e.g.,

s/application.../application\/.../

Response: Done (silently)

Comment on section 9.4, Julian Reschke

Omitting the charset parameter is NOT RECOMMENDED for application/... when used with transports other than HTTP or HTTPS---text/... SHOULD NOT be used for 16-bit MIME with transports other than HTTP or HTTPS (see discussion above (Section 9.2, Paragraph 6)).

Please avoid uppercasing not-BCP14 keywords :-)

Response: See above

Comment on section 9.4, Erik Wilde

Omitting the charset parameter is NOT RECOMMENDED for application/... when used with transports other than HTTP or HTTPS---text/... SHOULD NOT be used for 16-bit MIME with transports other than HTTP or HTTPS (see discussion above (Section 9.2, Paragraph 6)).

s/HTTPS---text/HTTPS. text/

Response: Done (silently)

Comment on section 10, Julian Reschke

As described in Section 8, this specification updates the [RFC6838] and [RFC6839] registration process for XML-based MIME types.

My understanding is that the registration process is defined in 6838 only.

Response: Corrected this to be about the registration of +xml and thus 6839 only

Comment on section 11, Julian Reschke

the most dangerous option available to crackers is redefining default

s/crackers/attackers/

Response: Done

Comment on section 11, Erik Wilde

XML MIME entities contain information which may be parsed and further processed by the recipient's XML system.

s/XML system/system/

Response: deleted "'s XML system" altogether.

Comment on section 11, Erik Wilde

These entities may contain and such systems may permit explicit system level commands to be executed while processing the data. To the extent that an XML system

s/XML system/XML-based system/

i guess i am struggling here to understand what "XML system" is supposed to mean. just the XML processing parts? or the complete system that works with XML-based data? clarifying this upfront might help.

Response: Reworded to clarify, and remove the phrase "XML system" altogether

Comment on section 11, Erik Wilde

will execute arbitrary command strings, recipients of XML MIME entities may be a risk. In general, it may be possible to specify commands that perform unauthorized file operations or make changes to the display processor's environment that affect subsequent operations.

where do these two rather specific command types (file access, display processor) come from? probably from resolving references within XML content, and trying to render it, but maybe either make that a little more explicit, or keep the warning more general?

Response: Removed the last sentence altogether -- the preceding bit has already said all that's necessary.

Comment on section 11, Erik Wilde

The simplest attack involves adding declarations that break validation. Adding extraneous declarations to a list of character XML-entities can effectively "break the contract" used by documents. A tiny change that produces a fatal error in a DTD could halt XML processing on a large scale. Extraneous declarations are fairly obvious, but more sophisticated tricks, like changing attributes from being optional to required, can be difficult to track down. Perhaps the most dangerous option available to crackers is redefining default values for attributes: e.g., if developers have relied on defaulted attributes for security, a relatively small change might expose enormous quantities of information.

this of course only matters if the processing model actually uses a schema language that supports default values. that in itself is something that has been discussed for a long time in terms of benefits and risks. maybe it would be worth pointing out that the security problem only exists when a certain processing model (in this case defined by the choice of schema language) is chosen.

Response: Added a small qualification

Comment on section 11, Erik Wilde

Apart from the structural possibilities, another option, "XML-entity spoofing," can be used to insert text into documents, vandalizing and perhaps conveying an unintended message. Because XML permits multiple XML-entity declarations, and the first declaration takes precedence, it's possible to insert malicious content where an XML-

s/it's/it is/

Response: Done, silently

Comment on section 12, Julian Reschke

Checking reference nits using http://greenbytes.de/tech/webdav/rfc2629xslt/rfc2629xslt.html#checking-references: ...

For all the W3C documents, I recommend to pull in the <reference> elements from http://greenbytes.de/tech/webdav/rfc2629xslt/w3c-references.html -- this makes the format consistent and enables automatic up-to-date checking.

Response: Done

Comment on section Appendix B, Julian Reschke

Fourth, many references are updated, and the existence and relevance of XML 1.1 acknowledged. Finally, a number of justifications and

As far as I can tell, XML 1.1 is totally irrelevant...

Response: Clarified, not perhaps to your taste :-)

Comment on section Meta, Erik Wilde

[M]aybe this is simply out of scope, but since i recently noticed that XSD never got around to defining its own media type: what about including a registration for XSD in this draft? DTDs are covered, RNG has its own types, and it would be useful in general if XSD had a well-defined media type as well.

Response: I agree, it is out of scope :-)

Comment on section Meta, Erik Wilde

3. XML Media Types This specification standardizes three media types related to XML MIME entities: application/xml (with text/xml as an alias), application/ xml-external-parsed-entity (with text/xml-external-parsed-entity as an alias), and application/xml-dtd. Registration information for these media types is described in the sections below.

it would be useful to add application/xsd+xml in the updated spec, since XSD does not have its own media type.

Response: Duplicate! Not going to happen in this spec. this time, sorry

Comment on section Meta, Erik Wilde

[A]n editorial note: sometime, the text is written like this:

- "For more information, see Appendix F of [XML]"

and sometimes like this:

- "a limited subset of the repertoire of US-ASCII [ASCII]"

maybe make the entire text consistent by either using references as a noun or not.

Response: But sometimes it is one and sometimes the other. Sometimes the sentence is about the spec., and sometimes it's about something else, which is in the spec.

Comment on section 3.1, Erik Wilde

(including UTF-8) often begin with hexadecimal 3C 3F 78 6D 6C ("<?xml"), and those in UTF-16 often begin with hexadecimal FE FF 00 3C 00 3F 00 78 00 6D 00 6C or FF FE 3C 00 3F 00 78 00 6D 00 6C 00 (the Byte Order Mark (BOM) followed by "<?xml"). For more information, see Appendix F of [XML].

maybe turn the reference [XML] into [XML1.0], so that it's always easy to see which version you're referencing?

Response: Consensus of other comments is to go in the other direction, i.e. to make clear that [XML] is generic

Comment on section 5, SM

Document authors SHOULD NOT use unregistered schemes. Scheme authors SHOULD register their schemes.

I suggest turning the above two sentences into "it is RECOMMENDED to only use the schemes listed in the XPointer Scheme Registry" and add a pointer to how to register a scheme.

Response: This language was the result of lengthy discussion some years ago, and I'm reluctant to change it. The division into two sentences corresponds to the division of responsibility between document authors and scheme authors. I've added a reference to the registration process.

Comment on section 8.2, Erik Wilde

8.2. +xml Structured Syntax Suffix Registration

maybe a formality, but does it need to be registered if it has been an integral part of RFC 6839, and will be updated by RFC 3032bis? doesn't this update mean it exists without having to be registered?

Response: Just as we register application/xml above in order to replace 3023, so we register +xml to update 6839.

Comment on section 9, SM

I suggest moving Section 9 to an appendix as it is non-normative.

Response: I'd rather not -- it was a main section in 3023 and 2376, and I note that other RFCs have extended non-normative examples in mainline sctions. . .

Comment on section 9.1, Julian Reschke

Content-type charset: charset="utf-8"

Maybe it would be less confusing to say: "charset specified in content-type:"

Response: I think it's clear enough -- it's introduced at the top of the whole section 9

Comment on section 9.9, Julian Reschke

Since the charset parameter is provided in the Content-Type header and differs from the XML encoding declaration, MIME and XML processors will not interoperate. MIME processors will treat the enclosed entity as UTF-8 encoded. That is, the "iso-8859-1" encoding will be ignored. XML processors on the other hand will ignore the charset parameter and treat the XML entity as encoded in iso-8859-1.

Do we have a definition of "MIME processor"?

Response: No -- most of its uses are copied from 3023, or even 2378. Is one really necessary? Where would you put it?

Comment on section 11, Dave Cridland

I don't know if it's worthwhile discussing XML based attacks such as the Billion Laughs attack; I don't think it'd hurt.

Response: The last paragraph of section 11 does exactly this.