Client Handling of Authoritative Metadata

DRAFT TAG Finding 27 January 2004

This version:
Latest version:
http://www.w3.org/2001/tag/doc/mime-respect (XML)
Previous versions:
10 December 2003 Draft, 9 July 2003 Draft, 25 June 2003 Draft
Ian Jacobs, W3C


The architecture of the Web depends on applications having a shared understanding of the messages exchanged between agents (for example, clients, servers, and intermediaries) and a shared expectation of how the payload of the message -- a representation -- will be interpreted by the recipient. The Web architecture uses representation metadata to indicate the sender's intentions to the recipient whenever the protocols used for communication allow such metadata to be communicated. In particular, dispatching and security-related decisions regarding the processing of a message are often based on values provided in representation metadata fields, such as the "Content-Type" field of HTTP and MIME. In this finding, we review the architectural design choice that metadata provided by a sender be authoritative. We also examine why client behavior that misrepresents the user or representation provider is harmful. Finally, we consider how specification authors should incorporate these points into their work.

Status of this Document

This document has been developed for discussion by the W3C Technical Architecture Group. This draft finding addresses issue contentTypeOverride-24 and partly addresses issue errorHandling-20. The TAG finding "Internet Media Type registration, consistency of use" also includes material related to this issue.

This the 27 January 2004 draft of this finding incorporates some suggestions from Roy Fielding and Stuart Williams. The editor also tried to condense the finding and make it more linear and readable.

Publication of this finding does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time.

Additional TAG findings, both approved and in draft state, may also be available. The TAG expects to incorporate this and other findings into a Web Architecture Document that will be published according to the process of the W3C Recommendation Track.

The terms MUST, SHOULD, and SHOULD NOT are used in this document in accordance with [RFC2119].

Please send comments on this finding to the publicly archived TAG mailing list www-tag@w3.org (archive).

Table of Contents

1 Summary of Key Points
2 Scenarios
3 Why the representation provider is the authoritative source of representation metadata
    3.1 Interpretation of Representation Data
    3.2 Interpretation of Representation Metadata
4 Inconsistency between representation data and metadata
    4.1 Self-describing data and Risk of Inconsistency
    4.2 Reducing the Risk of Inconsistency
    4.3 Recipient Handling of Inconsistency
5 Metadata Hints in Specifications
6 Future Work
7 References
8 Acknowledgments

1 Summary of Key Points

The following are the key architectural points of this finding:

  1. A representation provider is the authoritative source of representation metadata, including the Internet media type.
  2. It is an error for an agent to ignore or override authoritative metadata without the consent of the party the agent represents.
  3. Inconsistency between representation data and metadata is an error.
  4. Format specifications SHOULD NOT include requirements for clients to override server metadata without user consent.

Section 2 presents scenarios where these principles/points have been ignored and poses the question of what has been ignored and by whom. Section 3 discusses the motivation for a Web architecture where the representation provider is the authoritative source of metadata. Section 4 examines what happens when an inconsistency between representation metadata and data arises and how to handle it. Section 5 discusses how specifications may include metadata hints for clients and servers.

2 Scenarios

Scenario 1: Stuart runs his own Web server at "http://www.example.org/". He creates an HTML page and means to serve it as "text/html", but misconfigures the Web server so that the content is served via HTTP/1.1 [RFC2616] as "text/plain". Tim's browser looks inside the page, detects some markup that suggests that this is an HTML document (e.g., a <!DOCTYPE declaration or <title> element), and quietly renders it according to the HTML specification rather than as plain text. Janet's browser displays the content as plain text.

Which party has neglected a principle of Web architecture: Stuart for the server misconfiguration, Tim's browser for silently overriding the server's headers, or Janet's browser for not detecting that the content looked like HTML?

Answer 1: By silently overriding metadata from the representation provider, Tim's browser did not respect Web architecture principles that promote shared understanding and security.

Scenario 2: Norm publishes an XHTML document that includes:

<link href="cool-style" type="text/css" rel="stylesheet"/>

Norm's "cool-style" is an XSLT style sheet, but Norm has set type to "text/css". Stuart has configured the Web server so that "cool-style" is served via HTTP/1.1 as "application/xslt+xml". With a user agent that understands XSLT but not CSS, Janet requests the content that includes this link element. As it loads the page, Janet's user agent reads the type hint and does not fetch "cool-style."

Which party is responsible for the fact that Janet did not receive content she should have: Stuart for the server configuration, Norm for stating that the "cool-style" sheet is served as "text/css" when in fact it's served with a different media type, or Janet's user agent for not double-checking the media type with the server?

Answer 2: Though not a violation of principles of Web architecture, Norm's mislabeling of content deprived Janet of content she should have received.

In the sections below, we explore these answers in more detail.

3 Why the representation provider is the authoritative source of representation metadata

Successful communication between two parties using a piece of information relies on shared understanding. In the Web architecture, agents identify resources with URIs and they communicate resource state information by exchanging representations. A representation of resource state is an octet sequence that consists logically of two parts:

  1. Representation data: electronic data, expressed in one or more formats (e.g., XHTML, SVG, PNG) used separately or in combination.
  2. Representation metadata, including the Internet media type (e.g., "text/html" or "image/jpeg") of the representation data.

Arbitrary numbers of independent parties can use a URI identify and communicate about a Web resource. To make it possible for these parties to interpret representations of the same resource in a consistent manner (according to a common set of specifications), the design choice for the Web is that the representation provider is the authoritative source of representation metadata. Thus, if the representation provider asserts (through protocols) that "the following representation data has the Internet media type text/html", that assertion is authoritative. In the HTTP/1.1 protocol, the "Content-Type" header field is used to communicate an Internet Media type.

Separating representation metadata from data to guide processing provides a number of benefits, including improved efficiency. For instance, when a representation provider sends XML data and proper metadata, a recipient can determine the media type after rapid inspection of a short string. It is considerably more expensive in processing time to start up an XML parser to guess the media type.

3.1 Interpretation of Representation Data

An Internet media type asserts "this representation data is X" where X is a short string such as "text/html" or "image/png". The IANA media type registry) maps these short strings to data format specifications (e.g., XHTML, CSS, PNG, XLink, RDF/XML, etc.) via intermediate media type registrations. For instance, in the IANA registry, the content type "text/html" is associated with [RFC2854], which in turn states that:

The text/html media type is now defined by W3C Recommendations; the latest published version is [HTML401].

Once an agent knows how the representation provider has identified the representation data, the agent may process it in a number of ways. For instance, if the representation provider identifies representation data as having Internet media type "text/html", a recipient might, depending on application context:

  • Render the HTML content in a graphical user agent.
  • Check the validity of the markup.
  • Check for broken links.
  • Spell check the document
  • Render the HTML as synthesized speech.
  • Transform the HTML into some other data format.

Note that some of these applications may rely on the fact that the representation data really is HTML, while others (e.g., a spell checker) may not.

3.2 Interpretation of Representation Metadata

A user agent represents the user for protocol-level interactions with representation providers. A user agent that does not respect protocol specifications can violate user privacy, produce security holes, and otherwise create confusion. Some examples of potential security violations include:

  • A user agent ignores a "Content-Type" header with value "text/plain", detects that the content is a shell script, and executes the script on the user's machine without the user's knowledge.
  • A firewall is configured to keep out content of type "text/*". An agent inside the firewall receives a representation with Internet media type "text/plain", detects that the representation data is encoded in a format "F" and interprets the representation according to the "F" specification. The agent thus violates the firewall configuration.

Because of risks such as these, it is an error for an agent to ignore or override authoritative metadata without the consent of the party the agent represents. For instance, the HTTP/1.1 specification states, "If and only if the media type is not given by a "Content-Type" field, the recipient MAY attempt to guess the media type via inspection of its content and/or the name extension(s) of the URI used to identify the resource."

In scenario 1, in terms of Web architecture, Stuart is innocent; misconfiguration of the server is not an architectural error, it's just a human error. Instead, Tim's browser is the culprit since it misrepresents the resource provider by ignoring the authoritative metadata, without Tim's consent. Janet's browser respected the "Content-Type" header field, and by doing so, helps Janet and Stuart detect a server misconfiguration.

Note the difference between an agent taking authoritative metadata into account and an agent ignoring the metadata without the consent of the user. The first scenario below is an error, the second is not:

  • The agent silently ignores the authoritative Internet media type "text/plain" and renders (without user consent) the representation data as though the Internet media type were "text/html".
  • The agent recognizes the authoritative Internet media type of "text/plain" and pretty-prints the content as though it were HTML ("text/html") because that is what the user has chosen to do with the content.

4 Inconsistency between representation data and metadata

Although there are benefits to separating representation metadata from data, there are risks as well. In particular, the representation provider may create inconsistencies by misassigning metadata. Inconsistency between representation data and metadata is an error. Examples of inconsistencies between headers and representation data that have been observed on the Web include:

Recipients SHOULD detect inconsistencies between representation data and metadata but MUST NOT resolve them without the consent of the user (e.g., by securing permission or at least providing notification).

4.1 Self-describing data and Risk of Inconsistency

Data is "self-describing" if it includes enough information to allow two parties to establish a consistent interpretation without additional clues. If the representation provider intends for the data to be interpreted in a manner other than what is self-described (e.g., "treat this XML content as plain text"), then clarifying metadata is required (e.g., in protocol headers). As illustrated above, providing redundant metadata for data that is self-describing can lead to inconsistencies.

Representation providers SHOULD NOT in general specify the character encoding for XML data in protocol headers since the data is self-describing.

4.2 Reducing the Risk of Inconsistency

Representation providers can help reduce the risk of inconsistency through careful assignment of representation metadata (especially that which applies across representations). In particular:

  • Server software designers SHOULD NOT specify a default Internet media type in the default configuration shipped with the server.
  • Server managers SHOULD be wary of specifying a default Internet media type.
  • Server managers SHOULD NOT specify an arbitrary Internet media type (e.g., "text/plain" or "application/octet-stream") when the Internet media type is unknown.
  • Content authors SHOULD inform server managers of metadata misconfigurations.
  • Server managers SHOULD provide authors with a means to override a metadata configuration when it is inappropriate for a specific representation. This does not mean that the representation data should override the representation metadata, only that the author should have a way to supply correct metadata.

4.3 Recipient Handling of Inconsistency

Once an agent has detected an inconsistency between representation data and metadata, application context and user preferences guides a range error handling. For instance, in different configurations a user agent might:

  • Remain silent when forced to guess, or
  • Inform the user that a guess has been made, or
  • Allow the user to direct the client's processing of the representation data (e.g., by invoking a particular handler or saving to disk).

In Scenario 2, Norm is responsible for Janet not having access to representation data she was meant to receive. The HTML 4.01 Recommendation states that "Authors who use [the type] attribute take responsibility to manage the risk that it may become inconsistent with the content available at the link target address." Janet's client could have done more than merely read the type hint and decide to skip the "cool-style." Users benefit from clients that allow different configurations for handling hints, including:

  • Query the server, and when there is an inconsistency, choose the (authoritative) server metadata, or
  • Query the server, and when there is an inconsistency, prompt the user for instructions on how to proceed.

5 Metadata Hints in Specifications

Some format specifications allow content authors to provide metadata hints for servers and clients. For instance, the http-equiv attribute of the HTML meta element was intended for servers (not clients). In HTML 2.0 [RFC1866], section 5.2.5, the attribute is specified as follows:

HTTP servers may read the content of the document <head> to generate header fields corresponding to any elements defining a value for the attribute HTTP-EQUIV.

The HTML 4.01 attribute type for the link element (used in Scenario 2) gives clients a hint about what the media type of a representation of the linked resource is likely to be.

A format specification that includes metadata hints for clients should make clear that when these hints interact with server metadata, they are advisory only. Format specifications SHOULD NOT include requirements for clients to override server metadata without user consent. An architecturally sound description of an advisory attribute might read:

The author may provides a hint to the client about the likely Internet media type of representations of the designated resource. Although the client MUST treat server metadata (including that provided by the file system) as authoritative, the client MAY use the hint in a number of ways, including as a preference when negotiating with the server, as input to a decision to retrieve a representation, or to recover from a misconfigured server. However, the client MUST NOT override the server's headers (by using the hint or any other mechanism) without the consent of the user.

Specification designers contend that content authors should be able to override server metadata through representation data in order to work around misconfigured servers. In many environments, authors do not have sufficient access to server managers to affect server configuration. The TAG does not believe that author-specified overrides in representation data are the proper solution to this problem (for the reasons cited above, including security risks and masking of the problem).

Section 2.2.2 of the W3C Proposed Recommendation Speech Recognition Grammar Specification Version 1.0 [SRGS10] describes agent behavior that is consistent with this finding.

The W3C Recommendation SMIL 2.0 [SMIL20] is inconsistent with the current finding in this regard since the definition of the type attribute (section 7.3.1) specifies circumstances in which type is supposed to take precedence over server metadata.

6 Future Work

  1. The TAG is working with the authors of [RFC3023] to revise section 7.1 of that RFC, which suggests behavior regarding character encoding metadata that is inconsistent with the current finding.
  2. Reviewers of this finding asked whether similar architectural principles apply to headers sent in the direction of client to the server. This is the TAG's issue putMediaType-38: "Relation of HTTP PUT to GET, and whether client headers to server are authoritative."

7 References

Internet Assigned Numbers Authority (IANA) (See http://www.iana.org/.)
T. Berners-Lee, D. Connolly Hypertext Markup Language - 2.0, RFC1866, November 1995. (See http://www.ietf.org/rfc/rfc1866.)
N. Freed, N. Borenstein Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types, RFC2046, November 1996. (See http://www.ietf.org/rfc/rfc2046.txt.)
S. Bradner Key words for use in RFCs to Indicate Requirement Levels, RFC2119, March 1997. (See http://www.ietf.org/rfc/rfc2119.txt.)
R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee Hypertext Transfer Protocol -- HTTP/1.1, RFC2616, June 1999. (See http://www.ietf.org/rfc/rfc2616.txt.)
The Text/Plain Format Parameter, RFC2646, August 1999. (See http://www.ietf.org/rfc/rfc2646.txt.)
D. Connolly, L. Masinter The 'text/html' Media Type, RFC2854, June 2000. (See http://www.ietf.org/rfc/rfc2854.txt.)
M. Murata, S. St. Laurent, D. Kohn XML Media Types, RFC3023, January 2001. (See http://www.ietf.org/rfc/rfc3023.txt.)
J. Ayars et al. Synchronized Multimedia Integration Language (SMIL 2.0), W3C Recommendation, 7 August 2001. (See http://www.w3.org/TR/2001/REC-smil20-20010807/.)
A. Hunt, S. McGlashan eds. Speech Recognition Grammar Specification Version 1.0 , W3C Proposed Recommendation, 18 December 2003. (See http://www.w3.org/TR/2003/PR-speech-grammar-20031218/.)

8 Acknowledgments

Dan Connolly generously provided significant input to this finding. Martin Dürst, Roy Fielding, Philipp Hoschka, Rob Lanphier, Stuart Williams, and Norm Walsh also provided valuable input. Many thanks to all reviewers for their contributions to this finding.