Internet Media Type registration, consistency of use

TAG Finding 3 June 2002 (Revised 4 September 2002)

This version:
Tim Bray


Internet Media Types are an important part of the Web architecture. This finding discusses three aspects of Internet Media Types: registration by W3C Working Groups, consistency between Internet Media Type and content, and consistency in the communication of character encoding information.

Status of this document

This document has been produced by the W3C Technical Architecture Group (TAG). This version includes changes that have not yet been approved by the TAG regarding (1) registration requirements and (2) charset header information.

The TAG approved the previous draft of this finding at its 3 June 2002 teleconference. The TAG originally reached consensus on this issue at its 28 Jan 2002 teleconference, and after its 20 May 2002 teleconference announced to www-tag. The TAG notes that Tantek Çelik expressed dissent about this finding. At their 16 Dec 2002 teleconference, the TAG agreed to add a publication date to this document, consistent with the TAG's expectation that findings no longer be modified in place.

These findings were derived from discussion of TAG issues w3cMediaType-1, customMediaType-2, and nsMediaType-3 but in some cases extend beyond the specifics of the issue that was raised.

Additional TAG findings, both approved and in draft state, may also be available. The TAG expects to incorporate this and other findings into a Web Architecture Document that will be published according to the process of the W3C Recommendation Track.

The terms MUST, SHOULD, and SHOULD NOT are used in this document in accordance with RFC 2119 [RFC2119].

Please send comments on this finding to the publicly archived TAG mailing list www-tag@w3.org (archive).


  1. Registration of Media Types by W3C Working Groups
  2. Consistency of Media Types and Response Contents
  3. Consistency in Communicating Character Encoding
  4. References

1. Registration of Media Types by W3C Working Groups

W3C Working Groups engaged in defining a language SHOULD arrange for the registration of an Internet Media Type (defined in RFC 2046 [RFC2046]) for that language; see [IANAREG] for registration instructions. The IETF registration forms MUST be available for review along with the specification no later than Candidate Recommendation (or at last call if the Working Group expects to advance directly to Proposed Recommendation). The IETF registration forms SHOULD be available for review no later than last call.

The conventions and framework established by RFC 3023 [RFC3023] SHOULD be followed when registering an Internet Media Type for a language that uses XML syntax.

2. Consistency of Media Types and Message Contents

The architecture of the Web depends on applications making dispatching and security decisions for resources based on their Internet Media Types and other MIME headers. It is a serious error for the response body to be inconsistent with the assertions made about it by the MIME headers. Web software SHOULD NOT attempt to recover from such errors by guessing, but SHOULD report the error to the user to allow intelligent corrective action.

An example of incorrect and dangerous behavior is a user-agent that reads some part of the body of a response and decides to treat it as HTML based on its containing a <!DOCTYPE declaration or <title> tag, when it was served as text/plain or some other non-HTML type.

Examples of such inconsistencies that have been observed on the Web include:

3. Consistency in Communicating Character Encoding

The first example in the preceding section is a particularly troublesome case. Section 7.1 of [RFC3023] states:

The use of the charset parameter is STRONGLY RECOMMENDED, since this information can be used by XML processors to determine authoritatively the charset of the XML MIME entity.

and states that when used it is always authoritative. However, a receiving application can, with very high reliability, determine the encoding of an XML document by reading it, without reference to any external headers and this is reflected by RFC 3023 in the following sections:

Thus there is no ambiguity when the charset is omitted, and the STRONGLY RECOMMENDED injunction to use the charset is misplaced for application/xml and for non-text "+xml" types. Consequently, for XML representations, server-side applications SHOULD only supply a charset header when there is complete certainty as to the encoding in use. Otherwise, an error will cause a perfectly usable representation to be rejected by an architecturally sound client.

We recommend that section 7.1 of [RFC3023] be amended to something like the following:

The use of the charset parameter, when the charset is reliably known and agrees with the encoding declaration, is RECOMMENDED, since this information can be used by non-XML processors to determine authoritatively the charset of the XML MIME entity.

4. References

"RFC2046: Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types", N. Freed and N. Borenstein, November 1996. Available at http://www.ietf.org/rfc/rfc2046.txt.
"RFC2119: Key words for use in RFCs to Indicate Requirement Levels", S. Bradner, March 1997. Available at http://www.ietf.org/rfc/rfc2119.txt
"RFC3023: XML Media Types", M. Murata, S. St. Laurent, D. Kohn, January 2001. Available at http://www.ietf.org/rfc/rfc3023.txt.
"How to Register a Media Type with IANA". This is an informal document intended to capture best practice for requests that a Mime Type defined by a W3C Recommendation be registered in the IANA registry. This document may change as W3C learns from experience or as processes in the various organizations evolve. This document is available at http://www.w3.org/2002/06/registering-mediatype.

Last modified: $Date: 2002/12/17 13:06:11 $ by $Author: ijacobs $. $Revision: 1.33 $