Copyright © 2003 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use, and software licensing rules apply.
The architecture of the Web depends on applications having a shared understanding of the messages exchanged between agents (clients, servers, intermediaries, etc.) and a shared expectation of how the payload of the message -- a representation -- will be interpreted by the recipient. The Web architecture uses representation metadata to indicate the sender's intentions to the recipient whenever the protocols used for communication allow such metadata to be communicated. In particular, dispatching and security-related decisions regarding the processing of a message are often based on values provided in representation metadata fields, such as the "Content-Type" field of HTTP and MIME. In this finding, we review the architectural design choice that metadata provided by an origin server be authoritative. We also examine why client behavior that misrepresents the user or resource provider is harmful. Finally, we consider how specification authors should incorporate these points into their work.
This document has been developed for discussion by the W3C Technical Architecture Group. This draft finding addresses issue contentTypeOverride-24 and partly addresses issue errorHandling-20. The TAG finding "Internet Media Type registration, consistency of use" also includes material related to this issue.
Publication of this finding does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time.
Additional TAG findings, both approved and in draft state, may also be available. The TAG expects to incorporate this and other findings into a Web Architecture Document that will be published according to the process of the W3C Recommendation Track.
The terms MUST, SHOULD, and SHOULD NOT are used in this document in accordance with [RFC2119].
Please send comments on this finding to the publicly archived TAG mailing list www-tag@w3.org (archive).
1 Summary of Key Points
2 Scenarios
3 Why metadata from the resource provider is authoritative
    3.1 Multiple interpretations possible; only one authoritative
    3.2 Metadata and efficiency
    3.3 Self-describing data
4 Why agent behavior that misrepresents the user or
resource provider is harmful
    4.1 Examples of inconsistencies
    4.2 Client handling of inconsistencies
5 Hints in specifications
6 Handling misconfigured servers
7 Conclusion
8 Future Work
9 References
10 Acknowledgments
The following are the key architectural points of this finding:
For example, when an HTTP message contains a representation as its payload (within its message body), the HTTP header field "Content-Type", if present, defines the Internet media type of that representation. The recipient is not allowed to "guess" the media type unless no such metadata was provided with the representation.
Section 2 presents scenarios where these principles/points have been ignored and poses the question of what has been ignored and by whom. Section 3 discusses the motivation for a Web architecture where origin metadata is authoritative. Section 4 examines the potential harm caused by agents that misrepresent the user or silently disregard authoritative headers. Section 5 discusses the interaction between metadata hints in format specifications and protocol headers. Section 6 suggests ways in which server management can alleviate problems due to inconsistencies between provided content and configured metadata.
Scenario 1: Stuart runs his own Web server at
"http://www.example.org/". He creates an HTML page and means to serve
it as "text/html", but misconfigures the origin 
server so that the content is
served via HTTP/1.1 [RFC2616] as "text/plain". Tim's
browser looks inside the page, detects some markup that suggests that
this is an HTML document (e.g., a <!DOCTYPE
declaration or <title> element), and quietly
renders it according to the HTML specification rather than as plain
text. Janet's browser displays the content as plain text.
Which party has neglected a principle of Web architecture: Stuart for the server misconfiguration, Tim's browser for silently overriding the server's headers, or Janet's browser for not detecting that the content looked like HTML?
Answer 1: By silently overriding authoritative metadata, Tim's browser did not respect Web architecture principles that promote shared understanding.
Scenario 2: Norm publishes an XHTML document that includes:
<link href="cool-style" type="text/css" rel="stylesheet"/>
Norm's "cool-style" is an XSLT style sheet, but Norm has set
type to "text/css". Stuart has configured the
origin server so that "cool-style" is served via HTTP/1.1 as
"application/xslt+xml". With a user agent that understands
XSLT but not CSS, Janet requests the content that includes this
link element. As it loads the page, Janet's user agent
reads the type hint and does not fetch "cool-style."
Which party is responsible for the fact that Janet did not receive content she should have: Stuart for the server configuration, Norm for stating that the "cool-style" sheet is served as "text/css" when in fact it's served with a different media type, or Janet's user agent for not double-checking the media type with the server?
Answer 2: Though not a violation of principles of Web architecture, Norm's mislabeling of content deprived Janet of content she should have received.
In the sections below, we explore these answers in more detail.
Successful communication between two parties using a piece of information relies on shared understanding of the meaning of the information. Arbitrary numbers of independent parties can identify and communicate about a Web resource. To give these parties the confidence that they are all talking about the same thing when they refer to "the resource identified by the following URI ..." the design choice for the Web is, in general, that the provider of a resource assigns the authoritative interpretation of representations of the resource. A representation is an octet sequence that consists logically of two parts:
In terms of Web architecture, the authoritative interpretation of representations is communicated as follows:
Interpretation of bits on the Internet is governed by protocol specifications. The HTTP/1.1 specification, for example, delegates assignment of the interpretation for a message entity (the representation enclosed within a message) to the header fields "Content-Encoding" and "Content-Type", where the latter's value is defined by an Internet media type (e.g., "text/html" or "image/jpeg") that, in turn, identifies a registered data format specification (e.g., XHTML, CSS, PNG, XLink, RDF/XML, etc.). The IANA registry maps media types to data format specifications.
For instance, in the IANA registry, the content type "text/html" is associated with [RFC2854], which in turn states that:
Thus, by serving a representation with media type "text/html", the resource provider declares that the HTML 4.01 Recommendation governs the authoritative interpretation. By serving representation data (even HTML data) with media type "text/plain", the resource provider declares that [RFC2046] and [RFC2646] govern the authoritative interpretation. This is the first piece of explaining why Tim's browser in scenario 1 is the culprit.
The fact that there is one authoritative interpretation of representation data does not imply that there is only one possible interpretation. The Web's model is designed to enable global understanding by having parties agree to follow a small set of rules for interpreting bits (starting with the media type). Parties may reach local agreements independently, but they do not change the authoritative interpretation of the representation data.
One benefit of using metadata to guide processing is improved efficiency. For instance, when a server sends XML data and proper metadata, a client can determine the media type after rapid inspection of a short string. It is considerably more expensive in processing time to start up an XML parser to guess the media type.
Data is "self-describing" if it includes enough information to allow two parties to establish a consistent interpretation without additional clues. If the author intends for the data to be interpreted in a manner other than what is self-described (e.g., "treat this XML content as plain text"), then clarifying metadata is required (e.g., in protocol headers). Providing redundant metadata for data that is self-describing can lead to inconsistencies, however. Thus, for example, server managers SHOULD NOT in general specify the character encoding for XML data in protocol headers since the data is self-describing.
Below we examine appropriate client behavior when inconsistencies are detected between what the resource provider declares the media type to be through metadata and any type information available by inspection of the representation data itself.
A user agent represents the user for interactions with servers. Misrepresentation may lead to violations of privacy, security holes, and just plain confusion. User agent behavior that misrepresents the user or resource provider ultimately undermines trust on the Web and is thus considered harmful. Some examples of potential security violations include:
A client that ignores authoritative metadata without the user's consent undermines the goal of creating a shared information space.
In scenario 1, in terms of Web architecture, Stuart is innocent; misconfiguration of the server is not an architectural error, it's just a human error. Instead, Tim's browser is the culprit since it misrepresents the resource provider by ignoring the authoritative metadata, without Tim's consent. Janet's browser respected the "Content-Type" header field, and by doing so, helps Janet and Stuart detect a server misconfiguration.
Examples of inconsistencies between headers and representation data that have been observed on the Web include:
Clients should detect such inconsistencies but should not resolve them without involving the user (e.g., by securing permission or at least providing notification).
Another form of inconsistency is when the client expects to receive metadata that includes media type information and the server does not provide this information. For instance, HTTP/1.1 [RFC2616], section 7.2.1 describes client behavior in the case when the server sends no media type:
application/octet-stream.This excerpt is consistent with the principle that metadata from the origin server is authoritative when present. HTTP/1.1 allows a client to guess when no "Content-Type" header is present; in this case, representation data that is self-describing is likely to lead to a consistent interpretation among multiple parties.
In the absence of metadata from the origin server, a flexible client would do even more than merely guess and silently proceed. For instance, in different configurations the client could:
In Scenario 2, Norm is responsible for
Janet not having access to content she was meant to receive. The HTML
4.01 Recommendation states that "Authors who use [the
type] attribute take responsibility to manage the risk
that it may become inconsistent with the content available at the link
target address." Janet's client could have done more
than merely read the type hint and decide to skip the
"cool-style." Users benefit from clients that allow different
configurations for handling hints, including:
It is not a violation of Web architecture when a client overrides server metadata and processes representation data in a non-authoritative manner, as long as the client is not misrepresenting the user or resource provider. For instance, an application does not violate Web architecture when it receives a "Content-Type" header of "text/html" and, rather than following the HTML 4.01 Recommendation, provides the service of validating the HTML, detecting broken links, converting it to another format, or rendering it as plain text. The problem arises when the user agent engages in non-authoritative behavior without the user's awareness or consent.
Some format specifications allow authors to include in content
"hints" for servers and clients. For instance, the
http-equiv attribute of the HTML meta
element was intended for servers (not clients). In HTML 2.0 [RFC1866], section 5.2.5, the attribute is specified as
follows:
The HTML 4.01 attribute type for the link
element (used in Scenario 2) gives
clients a hint about what the media type of a representation
of the linked resource is likely to be.
A format specification that includes hints for clients should make clear that when these hints interact with server metadata, they are advisory only. Format specifications SHOULD NOT include requirements for clients to override server metadata without user consent. An architecturally sound description of an advisory attribute might read:
The W3C Recommendation SMIL 2.0 [SMIL20] is
consistent with the current finding in this regard since the
definition of the type attribute (section
7.3.1) specifies circumstances in which type is
supposed to take precedence over server metadata.
The rationale frequently provided by specification designers for why the author should be able to override server metadata in content is to work around misconfigured servers. In many environments, authors do not have sufficient access to server managers to affect server configuration. The TAG does not believe that author-specified overrides are the proper solution to this problem (for the reasons cited above, including security risks and masking of the problem).
Server managers can help reduce the risk of error through careful assignment of representation metadata (especially that which applies across representations). In particular: