Copyright © 2003 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use, and software licensing rules apply.
The architecture of the Web depends on applications making dispatching and security decisions for resources based on their Internet Media Types and other MIME headers. In this finding, we review the architectural design choice that MIME headers be authoritative. We also examine why client behavior that misrepresents the user or server is harmful. Finally, we consider how specification authors should incorporate these points into their work.
This draft incorporates comments from Norm Walsh, Rob Lanphier, Tim Berners-Lee, Stuart Williams and discussion from the TAG's 7 July 2003 teleconference.
This document has been developed for discussion by the W3C Technical Architecture Group. This finding addresses issue contentTypeOverride-24 and partly addresses issue errorHandling-20. The TAG finding "Internet Media Type registration, consistency of use" also includes material related to this issue.
Publication of this finding does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time.
Additional TAG findings, both approved and in draft state, may also be available. The TAG expects to incorporate this and other findings into a Web Architecture Document that will be published according to the process of the W3C Recommendation Track.
The terms MUST, SHOULD, and SHOULD NOT are used in this document in accordance with [RFC2119].
Please send comments on this finding to the publicly archived TAG mailing list www-tag@w3.org (archive).
1 Summary of Key Points
2 Scenarios
3 Why MIME headers are authoritative
3.1 Self-describing data
4 Why user agent behavior that misrepresents the user is harmful
4.1 Client handling of inconsistencies
5 Hints in specifications
6 Handling misconfigured servers
7 Conclusion
8 Future Work
9 References
10 Acknowledgments
The following are the key architectural points of this finding:
Section 2 presents scenarios where these principles/points have been ignored and poses the question of what has been ignored and by whom. Section 3 discusses the motivation for a Web architecture where MIME headers are authoritative. Section 4 examines the potential harm caused by user agents that misrepresent the user or silently disregard authoritative headers. Section 5 discusses the interaction between metadata hints in format specifications and protocol headers. Section 6 suggests ways in which server management can alleviate some header/content inconsistencies.
Scenario 1: Stuart runs his own Web server at
http://www.example.org/
. He creates an HTML page and
means to serve it as text/html
, but
misconfigures the server so that the content is served via HTTP/1.1
[RFC2616] as
text/plain
rather than as text/html
. Tim's
browser looks inside the page, detects some markup that suggests that
this is an HTML document (e.g., a <!DOCTYPE
declaration or <title>
element), and quietly
renders it according to the HTML specification rather than as plain
text. Janet's browser displays the content as plain text.
Which party has neglected a principle of Web architecture: Stuart for the server misconfiguration, Tim's browser for silently overriding the server's headers, or Janet's browser for not detecting that the content looked like HTML?
Answer 1: By silently overriding authoritative headers, Tim's browser did not respect Web architecture principles that promote shared understanding.
Scenario 2: Norm publishes an XHTML document that includes:
<link href="cool-style" type="text/css" rel="stylesheet"/>
Norm's "cool-style" is an XSLT style sheet, but Norm has set
type
to text/css
. Stuart has configured the
server so that "cool-style" is served via HTTP/1.1 as
application/xslt+xml
. With a user agent that understands
XSLT but not CSS, Janet requests the content that includes this
link
element. As it loads the page, Janet's user agent
reads the type
hint and does not fetch "cool-style."
Which party is responsible for the fact that Janet did not receive
content she should have: Stuart for the server configuration, Norm for
stating that the "cool-style" sheet is served as text/css
when in fact it's served with a different content type, or Janet's
user agent for not double-checking the content type with the
server?
Answer 2: Though not a violation of principles of Web architecture, Norm's mislabeling of content deprived Janet of content she should have received.
In the sections below, we explore these answers in more detail.
Successful communication between two parties about a piece of information relies on shared understanding of the meaning of the information. On the Web, thousands of independent parties can identify and communicate about a Web resource. To give these parties the confidence that they are all talking about the same thing when they refer to "the resource identified by the following URI ..." the design choice for the Web is, in general, that the owner of a resource assigns the authoritative interpretation of its representations. In terms of Web architecture, the authoritative interpretation of representations is communicated as follows:
Generally the interpretation of bits on the Internet is governed by a protocol specification (e.g., HTTP/1.1 and FTP). In the case of HTTP, that specification delegates the interpretation of the message entity to a format specification (e.g., XHTML, CSS, PNG, XLink, RDF/XML, and SMIL animation), identified by MIME type.
There are benefits to allowing different interpretations of a bag of bits depending on context. For flexibility, some protocols like HTTP/1.1 allow resource owners to direct the interpretation of a bag of bits by sending metadata along with the bits. In HTTP/1.1 a a response from the server can include a bag of bits (the "entity body") and metadata about those bits (the "entity headers", including Content-Type, Content-Language, and Content-Encoding). In Web architecture terms, a bag of bits plus metadata is called a representation of a resource. In practice, the MIME mechanism defined in RFC2046 is used to associate a bag of bits with metadata. MIME headers are key to understanding the authoritative interpretation for a bag of bits.
This model does not imply that a given set of bits can only be interpreted as the author intended. The model is designed to enable global understanding by having parties agree to follow a small set of rules for interpreting bits (starting with the MIME type). Parties may reach local agreements independently, but they do not change the authoritative interpretation of the bits.
Another benefit of separating metadata that guides interpretation from data is improved efficiency. For instance, when a server sends XML data and labels the data correctly through MIME headers, a client can dispatch processing after rapid inspection of the metadata (typically short strings). It is much more expensive if the client has to start up an XML parser to guess the content type.
A particularly important piece of metadata is the content type
header, which instructs a client on which specification to follow
first in order to interpret a bag of bits; that specification may
invoke others recursively. For convenience, the MIME mechanism
includes a registry of content type/specification bindings maintained
by the Internet Assigned Numbers Authority [IANA]. For
instance, in the IANA registry, the content type
text/html
is associated with [RFC2854],
which in turn states that:
Thus, by serving a bag of bits with content type
text/html
, the resource owner declares that the HTML 4.01
Recommendation governs the authoritative interpretation. By serving a
bag of bits (even HTML bits) with content type
text/plain
, the resource owner declares that [RFC2046] and [RFC2646] govern the authoritative
interpretation. This is the first piece of explaining why Tim's
browser in scenario 1 is the culprit.
A sequence of bits is "self-describing" if it includes enough information to allow two parties to figure out how to interpret it the same way without additional clues. If the author intends for the data to be interpreted in a manner other than what is self-described (e.g., "treat this XML content as plain text"), then clarifying metadata is required (e.g., in protocol headers).
Below we examine appropriate client behavior when inconsistencies are detected between what the server declares the content type to be through metadata and any type information available by inspection of the data itself.
A user agent represents the user for interactions with servers. User agent behavior that misrepresents the user or misrepresents the server ultimately undermines trust on the Web and is thus considered harmful. Misrepresentation may lead to violations of privacy, security holes, and just plain confusion. Some examples of potential security violations include:
text/plain
header,
detects that the content is a shell script, and executes
the script on the user's machine without the user's knowledge.text/whatever
. A client inside the firewall receives
a piece of content labeled text/plain
, detects that
it's really text/whatever
, and interprets it according
to the text/whatever
specification. The client
thus violates the firewall configuration.A client that ignores authoritative server headers without informing the user undermines the goal of creating a shared information space.
In scenario 1, in terms of Web
architecture, Stuart is innocent; misconfiguration of the server is
not an architectural error, it's just a human error. Instead, Tim's
browser is the culprit since it misrepresents the server by ignoring
the authoritative headers, without informing Tim. Janet's browser
respected the text/plain
header, and by doing so, helps
Janet and Stuart detect a server misconfiguration.
Examples of inconsistencies between headers and a bag of bits that have been observed on the Web include:
Clients should detect such inconsistencies but should not resolve them without involving the user (e.g., by securing permission or at least providing notification).
Another form of inconsistency is when the client expects a MIME header and the server doesn't send one. For instance, HTTP/1.1 [RFC2616], section 7.2.1 describes client behavior in the case when the server sends no content type header:
application/octet-stream
.This excerpt is consistent with the principle that the content type header, when present, is authoritative. HTTP/1.1 allows a client to guess when no content type is present; in this case, content that is self-describing is likely to lead to a coherent interpretation.
For this reason, servers should only supply a character encoding header when there is complete certainty as to the encoding in use. Otherwise, an error will cause a perfectly usable representation to be rejected by an architecturally sound client. Section 7.1 of [RFC3023] states:
However, a receiving application can, with very high reliability,
determine the character encoding of an XML document by reading it,
without reference to any external headers and this is reflected by RFC
3023 in section 8.9, 8.10, and 8.11. Thus there is no ambiguity when
the character encoding header is omitted, and the STRONGLY RECOMMENDED
injunction to use the character encoding header is misplaced for
application/xml
and for non-text +xml
types.
We recommend that section 7.1 [RFC3023] be amended to something like the following:
In the absence of header information, a flexible client would do even more than merely guess and silently proceed. For instance, in different configurations the client could:
In Scenario 2, Norm is responsible for
Janet not having access to content she was meant to receive. The HTML
4.01 Recommendation states that "Authors who use [the
type
] attribute take responsibility to manage the risk
that it may become inconsistent with the content available at the link
target address." Janet's client could have done more
than merely read the type
hint and decide to skip the
"cool-style." Users benefit from clients that allow different
configurations for handling hints, including:
It is not a violation of Web architecture when a client overrides
server headers and processes a bag of bits in a non-authoritative
manner, as long as the client is not misrepresenting the user or
server. For instance, an application does not violate Web architecture
when it receives a content header of text/html
and,
rather than following the HTML 4.01 Recommendation, provides the
service of validating the HTML, detecting broken links, converting it
to another format, or rendering it as plain text. The problem arises
when the user agent engages in non-authoritative behavior without the
user's awareness or consent.
Some format specifications allow authors to include in content
"hints" for servers and clients. For instance, the
http-equiv
attribute of the HTML meta
element is intended for servers (not clients). In HTML 2.0 [RFC1866], section 5.2.5, the attribute is specified as
follows:
The HTML 4.01 attribute type
for the link
element (used in Scenario 2) gives
clients a hint about what the content type of the linked resource is
likely to be.
A format specification that includes hints for clients should make clear that when these hints interact with server headings, they are advisory only. Format specifications should not include requirements for clients to override server headers without user consent. An architecturally sound description of an advisory attribute might read:
The W3C Recommendation SMIL 2.0 [SMIL20] is outmoded
in this regard since the definition of the type
attribute
(section
7.3.1) specifies circumstances in which type
is
supposed to take precedence over server headers.
The rationale frequently provided by specification designers for why the author should be able to override server headers in content is to work around misconfigured servers. In many environments, authors do not have sufficient access to server managers to request that the server be configured for a new or special MIME type. The TAG does not believe that author-specified overrides is the proper solution to this problem (for the reasons cited above, including security risks and masking of the problem). Instead the TAG recommends the following (in addition to suggested client behavior):
text/plain
or
application/octet-stream
) when the content type
is unknown.