Copyright © 2005 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use, and software licensing rules apply.
In Web architecture, communication between agents consists of exchanging messages with predefined syntax and semantics: a shared expectation of how each message's control data and payload (representation data and metadata) will be interpreted by the recipient. When supported by the communication protocol, the Web architecture uses representation metadata to indicate the sender's intentions regarding how the recipient should interpret the representation data. For example, HTTP and MIME use the value of the "Content-Type" header field to indicate the Internet media type of the representation, which influences the dispatching of handlers and security-related decisions made by recipients of the message. In this finding, we review the architectural design choice that metadata provided in a received message be considered authoritative. We examine why recipient behavior that fails to respect authoritative metadata can be harmful and under what conditions (user consent) such behavior is allowed. Finally, we consider how specification authors should incorporate these design constraints into their work.
This DRAFT document has been developed for discussion by the W3C Technical Architecture Group as a finding to address the TAG issues contentTypeOverride-24, putMediaType-38, and portions of errorHandling-20. It is an update to the previously approved finding of 25 February 2004. Please send comments on this finding to the publicly archived TAG mailing list www-tag@w3.org (archive).
Publication of this finding does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. Additional TAG findings, both approved and in draft state, are also available. The TAG expects to incorporate this and other findings into a Web Architecture Document that will be published according to the process of the W3C Recommendation Track.
The terms MUST, SHOULD, and SHOULD NOT are used in this document in accordance with [RFC2119].
1 Summary of key points
2 Defining authoritative metadata
3 Why metadata from an encapsulating container is authoritative
3.1 Role of Internet Media Types
3.2 Why embedded metadata is less authoritative
3.3 Why external reference metadata is least authoritative
3.4 What to do when there is no authoritative metadata
4 Overriding authoritative metadata
4.1 Inconsistency between representation data and metadata
4.2 Reducing inconsistency
4.3 Avoiding silent recovery
4.4 Obtaining user consent
5 Metadata hints in specifications
6 Scenarios
6.1 Bad server configuration
6.2 Good server configuration
6.3 Misconfiguration and metadata hints
6.4 Conflicting metadata during distributed authoring
7 Future Work
8 References
9 Acknowledgments
The following are the key architectural points of this finding:
Representation metadata received in an encapsulating container, such as within the header fields of a message, is authoritative in defining the nature of the representation received.
Inconsistency between representation data and metadata is an error that should be discovered and corrected rather than silently ignored.
It is an error for an agent to ignore or override authoritative metadata without the consent of the party the agent represents.
Specifications MUST NOT work against the Web architecture by requiring or suggesting that a recipient override authoritatve metadata without user consent.
The sequence of numbers "324033" might be a license plate number in the state of Arkansas or an old-style telephone number in Italy. Although there do exist some self-descriptive data formats, we generally rely on context to define the purpose, format, and meaning of data. One way to provide a context for interpretation is metadata.
Metadata is simply defined as data about other data. Metadata can be expressed while referencing data externally, while encapsuling data in a container, and by embedding metadata within the data being described. The following table provides examples of how various forms of metadata can be expressed during Web interactions:
metadata | |||
---|---|---|---|
describes | how | where | example |
resource | external reference | message fields | HTTP's "Allow" header field in a response message describes the request methods allowed by the resource for which the response was generated. |
data format | Link relationship values (rel/rev attributes) are often used to describe metadata relationships between resources. | ||
other sources | RDF can associate metadata with a resource by reference to its URI. | ||
message | encapsulating | layers | Protocols are often implemented as a stack of layered protocols, with each lower-layer protocol providing context for higher layers. |
embedded | message syntax | HTTP's response messages begin with "HTTP/" and a version number. | |
message fields | HTTP's "Date" header field describes the clock time at the origin when the message was generated. | ||
representation | external reference | identifiers | Schemes based on old (non-metadata) protocols, such as gopher and ftp, include or imply metadata information about the representation as part of the identifier. |
data format | Type attributes are sometimes used to express expectations about representation types for pre-access content selection. | ||
encapsulating | message fields | HTTP and MIME use the value of the "Content-Type" header field to indicate the representation's media type. | |
archival formats | Archives often include catalog data that associates metadata with parts of the archive. | ||
embedded | data format | Magic numbers, DOCTYPEs, and XML namespaces are all means for making data formats self-descriptive. HTML's "META" elements and RDF/XML assertions can describe metadata about the enclosing representation. |
The table above demonstrates that the same metadata may be expressed in various forms. The representation media type [RFC2046], in particular, plays such an important role in the Web architecture that its value can be described in many different locations. Given multiple sources of metadata and the possibility that those sources may be inconsistent, the architect must decide what source of metadata has the highest priority and thus shall be considered authoritive in determining the desired behavior of the recipient. Furthermore, given the presence of self-descriptive data formats, a decision must be made on whether to respect the declared metadata over whatever might be learned by inspecting the data itself.
For Web architecture, a design choice has been made that metadata received in an encapsulating container MUST be considered authoritative and used in preference to metadata found by inspection of the data, declared by embedded metadata, or provided by external reference. Although this design choice is generally applicable to any container format, including archival formats that encapsulate other data, the most significant interpretation for Web architecture is that representation metadata found within the header fields of a received message shall be considered authoritative for the representation encapsulated within that message.
Representation metadata does not constrain the receiving agent to process the representation data in one particular way. What it does is allow the sender of a representation to express its intentions regarding how the data should be interpreted by a recipient. A recipient can then choose, based on its own purpose, design, and configuration, how it will react to those intentions on behalf of the party employing the agent. For example, a browser traversing a link may behave differently depending on how the link was selected, a maintenance spider may ignore a data format's rendering instructions, and an editor may treat every representation as a source for editing rather than display.
This treatment of authoritative metadata applies equally to clients, servers, and intermediaries. A server receiving a representation MUST respect the client's expressed intentions regarding the metadata for that representation and either act in accord with those intentions or respond with an appropriate redirection or error message.
The rationale for our choice of authoritative metadata is difficult to describe using abstractions. Let's consider a specific example of the media type of a received representation and explain why each of the other sources of metadata are not considered authoritative.
An Internet media type [RFC2046] is a short name, such as "text/html", that is associated with a data format specification and processing model through registration in the IANA media type registry. For example, "text/html" in the IANA registry is associated with [RFC2854], which in turn states that:
The text/html media type is now defined by W3C Recommendations; the latest published version is [HTML401].
The media type indicates the intended processing model for a representation, including such issues as whether the data should be rendered, stored, or executed. In practice, media types are thus usable for selecting handlers to implement those functions. A media type, therefore, is not simply an indication of data format; it also refers to a standardized interpretation of that data format. In fact, many different media types share a single data format, while others represent a superset of formats.
If the authoritative media type of a representation were to be determined
by inspection of embedded metadata in a self-descriptive format, then a sender
could not choose different interpretations for a single representation
based on the declared media type. For example, an owner might want to provide
links to separate resources that differ only in how a given HTML representation
should be rendered. A message containing the header field
Content-Type: text/html
would indicate that standard HTML
processing is desired, whereas the header field Content-Type: text/plain
would indicate that the data should be viewed as plain text without
HTML rendering. Since the representation data in both messages are
identical, this functionality is only possible if metadata of the
containing message is considered more authoritative in describing the data
than whatever could be learned from inspection of the data itself.
Placing authoritative metadata in message fields also enables more efficient processing of messages. It is far easier to dispatch behavior on the basis of inspecting metadata (typically a short string) than it is to invoke a generic document parser and try to divine the purpose of data by inspecting the data itself (with no guarantee of success).
If the authoritative media type of a representation were to be determined by external reference, then resources could be prevented from evolving independently from their references. For example, standards for hypermedia data formats evolve over time, whereas it is preferred that URIs remain persistent over time. If metadata guessed by inspecting the identifier were to be considered authoritative, then references would break when the representation media type changes. Similarly, a type attribute provided with a reference would suffer the same problem it were considered authoritative.
Intermediaries (i.e., proxies and gateways) perform significant functions in Web architecture, such as encapsulating legacy services, enhancing client functionality, and moderating the risk of interactions across firewalls. Those functions can only be performed correctly if the semantics of a given message are expressed within that message. In contrast, metadata associated by an external reference is only visible to the user agent that selects the reference: intermediaries are not aware of that context. If a message recipient treats external metadata as authoritative over that found in the message, then the intermediaries are effectively bypassed and their functionality is lost.
Finally, external references are usually made by third-parties: people who are neither the resource owner nor the user. Allowing a third-party to override the intent of the sender of a message means that the client must trust both the resource owner and the supplier of the reference, introducing yet another attack vector and its associated complications to secure configuration and monitoring.
There are, of course, times when a representation is provided without any containing metadata, such as when the sender is not certain of the intended metadata or when the protocol being used does not support metadata. That is why the HTTP/1.1 specification [RFC2616] states:
"If and only if the media type is not given by a "Content-Type" field, the recipient MAY attempt to guess the media type via inspection of its content and/or the name extension(s) of the URI used to identify the resource."
In other words, when there is no authoritative metadata, the receiving agent MAY attempt to guess the appropriate metadata based on inspection of the data and/or the reference, though such guessing should be limited to media types that are safe to use in that context.
Recognition of authoritative metadata is important because it influences the default processing behavior for Web interactions. However, representation metadata is also susceptible to misconfiguration, and user agents frequently try to "simplify" the Web by automatically "correcting" perceived "errors" in those configurations.
Choosing to ignore or override authoritative metadata is only allowed within the Web architecture when the user has given consent. Recipients SHOULD detect inconsistencies between representation data and metadata but MUST NOT resolve them without the consent of the user.
Although there are benefits to separating representation metadata from data, there are risks as well. In particular, the resource owner may create inconsistencies by misconfiguring resources or by failing to reassign metadata after a change of representation. Inconsistency between representation data and metadata is an error. Examples of inconsistencies between metadata and representation data that are frequently observed on the Web include:
The character encoding of text-based content being inconsistent with metadata about the character encoding. For some formats, such as XML, such inconsistencies can be quickly detected.
Server-wide default metadata being incorrectly assigned to new or rarely-used media types or content encodings.
Superset media types being used when a more specific media type is intended, such as the use of "application/xml" when there exists a more specific media type corresponding to the root element.
Web software developers, webmasters, and resource owners can help reduce inconsistency through careful assignment of representation metadata. In particular:
Server software designers SHOULD NOT specify default representation metadata, such as media type, character encoding, or content language, within the standard configuration shipped with the server.
Server software designers SHOULD provide a means to set representation metadata at the same level of granularity and permission that is needed to author those representations.
Server managers SHOULD NOT specify an arbitrary Internet media type (e.g., "text/plain" or "application/octet-stream") when the representation media type is unknown.
Server managers SHOULD provide each author with the means and permission to set the configuration of metadata for any representations under the author's control.
Resource owners SHOULD test for correct metadata and inform server managers of metadata misconfigurations.
Authoritative metadata SHOULD NOT be provided external to the representation if it does not add clarity to that communication. For example, the character encoding of XML data formats is self-descriptive within the data and SHOULD NOT be included in a charset parameter of the media type unless that distinction is significant to the resource (e.g., for comparison during content negotiation of multiple XML representations that differ only by character encoding).
As described above, inconsistency between representation data and metadata is an error. However, the tendency for some agents to attempt silent recovery from such errors is also an error. Silent recovery from error perpetuates what could be easily fixed if the resource owner is simply informed of that error during their own testing of the resource.
Web agents SHOULD have a configuration option that enables the display or logging of detected errors. Such a display need not be disruptive of the user experience; for example, a graphical browser might display a small "bug" button in the user interface to indicate a detected error so that an interested user (i.e., the resource owner) can select the button, inspect the error, and perhaps modify the agent's choice on how to recover from that error.
Some applications of the Web cannot tolerate error. For example, medical information systems must be designed so as to detect errors that might cause relevant information to be rendered invisible. In general, it is better to design Web systems that are capable of fulfilling more stringent requirements, even if their default configuration is to be lenient.
A user agent represents the user for protocol-level interactions with resource providers. A user agent that does not respect the Web protocol specifications can violate user privacy, introduce security holes, and otherwise create confusion. For example, a broken user agent could trigger a security failure by ignoring a received "Content-Type" header with value "text/plain", guessing that representation data is a shell script, and then executing the script on the user's machine without the user's awareness. The other agents in the system (origin server and intermediaries) have sent or forwarded the message with the expectation that the user agent will not attempt to execute the script, at least not without some additional action deliberately chosen by the user. If the user agent violates those expectations, it violates the protections that may have been put in place for the user's self-protection.
Because of those risks, it is an error for an agent to ignore or override authoritative metadata without the consent of the party employing the agent.
Consent does not imply that the receiving agent must interrupt the user and require selection of one option or another. User consent may be achieved in the form of pre-selected configuration options, modes, or selectable user interface toggles, with appropriate reporting to the user when the agent detects an error. Naturally, the appropriate consent mechanism will be unique to each type of receiving agent and application context. It is therefore beyond the scope of this finding to anticipate the range of possible errors and ways in which interface designers might obtain user feedback to address them.
Some format specifications allow content authors to provide
metadata hints for servers and clients. For instance, the
http-equiv
attribute of the HTML meta
element was intended for servers (not clients). In HTML 2.0 [RFC1866], section 5.2.5, the attribute is specified as
follows:
HTTP servers may read the content of the document <head> to generate header fields corresponding to any elements defining a value for the attribute HTTP-EQUIV.
The HTML 4.01 link
element has an attribute type
that gives clients a hint about the likely media type if one were to
retrieve a representation of the identified resource.
The MyFormat specification specifies a type
attribute
with external references that supposedly takes precedence over any other
media type received as authoritative metadata. When type
is
present, receiving agents are instructed to use its value and ignore
any conflicting metadata provided by the sender.
The MyFormat specification designers rationale for this departure from
Web architecture is that such a definition of the type
attribute
allows content authors to work around misconfigured servers. They contend
that this is necessary because, in many environments, content authors may
not have sufficient access to the server configuration to assign the
correct media type where it belongs.
Should the MyFormat specification designers be allowed to ignore a
principle of Web architecture and define type
in this way
just to remedy a potential configuration problem?
Answer: Errors involving inconsistent metadata cannot be "fixed" by adding metadata to external references --- the metadata is inconsistent for all recipients of the message, not just the user agent. An agent that silently overrides server-provided metadata can create security risks and prevent errors from being detected and corrected.
A format specification that includes metadata hints for clients must make clear that, when these hints interact with server metadata, they are advisory only. Format specifications MUST NOT include requirements for clients to override server metadata without user consent.
An architecturally sound description of an advisory attribute might read:
The author may provide a hint to the client about the likely Internet media type of representations of the designated resource. Although the client MUST treat server metadata (including that provided by the file system) as authoritative, the client MAY use the hint in a number of ways, including as a preference when negotiating with the server, as input to a decision to retrieve a representation, or to recover from a misconfigured server. However, the client MUST NOT override the server's authoritative metadata without the consent of the user.
A good example of such a description can be found in the W3C Recommendation Speech Recognition Grammar Specification Version 1.0 [SRGS10], which describes agent behavior that is consistent with this finding in section 2.2.2.
In contrast, the W3C Recommendation Synchronized Multimedia
Integration Language (SMIL 2.0) [SMIL20] is
inconsistent with this finding. The definition of the type
attribute in
section 7.3.1
specifies that the value of type
takes precedence over
authoritative metadata for some protocols. The specification is in error.
Under no circumstances can a format specification change the meaning of
protocol interaction on the Web. Implementers MUST disregard that statement
in SMIL 2.0 and treat the type attribute as merely a means for
content selection or for when authoritative metadata is unavailable.
The scenarios in this section illustrate some issues that arise when the architectural points described in this finding are ignored.
Stuart runs his own Web server at "http://www.example.org/". He
creates an HTML page and means to serve it as "text/html", but
misconfigures the Web server so that the content is served via
HTTP/1.1 [RFC2616] as "text/plain". Janet's browser
retrieves the page and displays the content as plain text.
Tim's browser retrieves the page, detects some markup that suggests
it is an HTML document (e.g., a <!DOCTYPE
declaration or
<html>
element) and, without informing Tim,
proceeds as though the content was declared to be "text/html", rendering it
according to the HTML and CSS specifications.
Which party has neglected a principle of Web architecture: Stuart for the server misconfiguration, Tim's browser for silently overriding the HTTP headers from the server, or Janet's browser for not detecting that the content looked like HTML?
Answer: By silently overriding metadata from the representation provider in the HTTP headers, Tim's browser did not respect Web architecture principles that promote shared understanding and security.
Misconfiguration of the server is a fixable error. If Stuart was using Janet's browser, he would see that error immediately and fix it. However, if Stuart uses the same browser as Tim for his testing, Stuart would not be informed of the error. Tim's browser is the culprit here because it misrepresents the resource owner by ignoring the authoritative metadata without Tim's consent. Janet's browser respected the "Content-Type" header field and, in doing so, helps Janet detect a server misconfiguration.
Stuart runs his own Web server at "http://www.example.org/". He creates a text page that describes an example of a security vulnerability in a client-side scripting language using sample code. Since Stuart wants users to read the code, not execute it, he assigns the media type "text/plain" to the representation. Janet's browser retrieves the page and displays the content as plain text. Tim's browser retrieves the page, detects the script language, and executes it, promptly sending a rude message to everyone on Tim's address list (including Tim's mom).
Which party has neglected a principle of Web architecture: Stuart for serving content about a vulnerability or Tim's browser for silently overriding the HTTP headers from the server?
Answer: By silently overriding metadata from the representation provider in the HTTP headers, Tim's browser did not respect Web architecture principles that promote shared understanding and security.
Authoritative metadata is an important aspect of Web architecture. Agents that ignore authoritative metadata are broken, sometimes dangerously so, and should not be used. Software cannot assume that a configuration is wrong just because it is unusual.
Norm publishes an XHTML document that includes this link:
<link href="cool-style" type="text/css" rel="stylesheet"/>
Although the link refers to an XSLT style sheet, Norm has set the
type
attribute to "text/css". Stuart has configured the
Web server so that the style sheet is served via HTTP/1.1 as
"application/xslt+xml". With a user agent that understands XSLT but
not CSS, Janet requests the content that includes this link. As it
interprets the representation data, Janet's user agent reads the
type
hint and does not fetch the style sheet."
Which party is responsible for the fact that Janet did not receive content she should have: Stuart for the server configuration, Norm for stating that the style sheet is served as "text/css" when in fact it's served with a different media type, or Janet's user agent for not double-checking the media type with the server?
Answer: Norm's mislabeling of content deprived Janet of content she should have received.
Norm is responsible for Janet not having access to representation data
she was meant to receive. The HTML 4.01 Recommendation states that
"Authors who use [the type
] attribute take
responsibility to manage the risk that it may become inconsistent with
the content available at the link target address." Janet's
client could have done more than merely read the type
hint and decide to skip the style sheet. Users benefit from clients
that allow different configurations for handling hints, including:
Query the server, and when there is an inconsistency, choose the authoritative metadata, or
Query the server, and when there is an inconsistency, prompt the user for instructions on how to proceed.
[unfinished]
The meaning of any HTTP message is defined by the contents of that message as interpreted according to the HTTP standard. If a client requests that a server store a representation at a given URI and the server's configuration states that the given URI implies metadata inconsistent from what has been provided by the client, then the server should reject the request using an appropriate HTTP status code.
In other words, if a webdav client performs a
PUT /something.html HTTP/1.1 Host: example.org Content-type: application/pdf ...
and example.org knows that it has been configured such that all resources with identifiers ending in in ".html" are represented in the "text/html" format, then the server has four choices:
ignore the "application/pdf" metadata provided by the client, store the representation as-is, and serve it later as "text/html".
change the configuration such that future 200 responses to
GET /something.html
will be served as "application/pdf",
thus preserving the client's stated intent.
accept the request only in the sense of it being a requested change of resource state, resulting in the PDF representation being "converted" to HTML for later responses.
respond with "415 Unsupported Media Type" and a message stating why the request is inconsistent with the resource.
(1) is clearly a bad idea because the inconsistency is an error and failing to report an error is bad design.
(2) may be feasible on some HTTP servers that combine configuration for both authoring and read-only services, but most production HTTP servers do not work that way, and automatically overriding a server configuration is more likely to hide pilot-error rather than do what the user actually wants.
(3) is a complicated option that preserves REST semantics but not those of a dumb filesystem. It is one of those server-side magic tricks that tends to annoy people who think HTTP is a file protocol, which suits me just fine provided that it isn't mandatory.
(4) properly informs the user of the inconsistency (enabling them to choose the right workaround), works in all cases, but wastes some bandwidth.
Answer: (1) is a bug, (2) is bad implementation, (3) is a nifty feature when the user is making an informed request, and (4) is the right answer in all other cases.
The TAG is working with the authors of [RFC3023] to revise section 7.1 of that RFC, which suggests behavior regarding character encoding metadata that is inconsistent with this finding.
The first edition of this finding was edited by Ian Jacobs and included substantial input from Roy T. Fielding, Stuart Williams, and Dan Connolly. Martin Dürst, Philipp Hoschka, Rob Lanphier, and Norman Walsh provided reviews of prior drafts that improved this finding. This second edition has additionally benefited from the comments of Noah Mendelsohn.