<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type='text/xsl' href='versioning.xsl'?>
<!DOCTYPE spec PUBLIC
    "-//W3C//DTD Specification V2.10//EN"
    "http://www.w3.org/2002/xmlspec/dtd/2.10/xmlspec.dtd" [
  <!-- ================================================================ -->
  <!ENTITY draft.day "12">
  <!ENTITY draft.month "04">
  <!ENTITY draft.monthname "April">
  <!ENTITY draft.year "2006">
  <!ENTITY iso6.doc.date "&draft.year;-&draft.month;-&draft.day;">
  <!ENTITY http-ident "http://www.w3.org/2001/tag/doc/mime-respect">
]>
<spec w3c-doctype='other' other-doctype='TAG Finding'>
<?CVS $Id: mime-respect.xml,v 1.54 2006/04/12 14:08:17 vquint Exp $?>
<header>
<title>Authoritative Metadata</title>
<w3c-designation>&http-ident;-&iso6.doc.date;</w3c-designation>
<w3c-doctype>TAG Finding</w3c-doctype>
<pubdate><day>&draft.day;</day>
<month>&draft.monthname;</month>
<year>&draft.year;</year>
</pubdate>
<publoc>
<loc href="&http-ident;-&draft.year;&draft.month;&draft.day;">&http-ident;-&draft.year;&draft.month;&draft.day;</loc>
</publoc>
<latestloc><loc href="&http-ident;">&http-ident;</loc></latestloc>
<prevlocs>
<loc href="&http-ident;.xml">XML source</loc>,
<loc href="http://www.w3.org/2001/tag/doc/mime-respect-20060307">07 March
2006</loc>,
<loc href="http://www.w3.org/2001/tag/doc/mime-respect-20051205">05 December 2005</loc> (draft),
<loc href="http://www.w3.org/2001/tag/doc/mime-respect-20040225">25 February 2004</loc>
</prevlocs>
<authlist>
<author>
 <name>Roy T. Fielding</name>
 <affiliation>Day Software</affiliation>
</author>
<author>
 <name>Ian Jacobs</name>
 <affiliation>W3C</affiliation>
</author>
</authlist>
<copyright>
<p>
<loc href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">Copyright</loc> &#xA9; &draft.year;
<loc href="http://www.w3.org/">W3C</loc><sup>&#xAE;</sup>
(<loc href="http://www.lcs.mit.edu/">MIT</loc>,
<loc href="http://www.ercim.org/">ERCIM</loc>,
<loc href="http://www.keio.ac.jp/">Keio</loc>),
All Rights Reserved. W3C
<loc href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</loc>,
<loc href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</loc>,
<loc href="http://www.w3.org/Consortium/Legal/copyright-documents">document use</loc>, and
<loc href="http://www.w3.org/Consortium/Legal/copyright-software">software licensing</loc>
rules apply.
</p></copyright>

<abstract>

<p>In Web architecture, communication between agents consists of exchanging
messages with predefined syntax and semantics: a shared expectation of
how each message's control data and payload (representation data and
metadata) will be interpreted by the recipient.  When supported by the
communication protocol, the Web architecture uses representation metadata
to indicate the sender's intentions regarding how the recipient should
interpret the representation data.  For example, HTTP and MIME use the value
of the &quot;Content-Type&quot; header field to indicate the Internet
media type of the representation, which influences the dispatching
of handlers and security-related decisions made by recipients of the
message.  In this finding, we review the architectural design choice
that metadata provided in an encapsulating container, such as the metadata
provided in the header fields of a received message, be considered
authoritative.  We examine why recipient behavior that fails to respect
authoritative metadata can be harmful and under what conditions such
behavior is allowed.  Finally, we consider how specification authors and
implementers should incorporate these design constraints into their work.</p>
</abstract>

<status>
<p>This document has been developed by the <loc
href="/2001/tag/">W3C Technical Architecture Group</loc> as a finding to
address the TAG issues
<loc href="http://www.w3.org/2001/tag/ilist#contentTypeOverride-24">contentTypeOverride-24</loc>,
<loc href="http://www.w3.org/2001/tag/ilist#putMediaType-38">putMediaType-38</loc>,
<loc href="http://www.w3.org/2001/tag/ilist.html#RFC3023Charset-21">RFC3023Charset-21</loc>,
and portions of
<loc href="http://www.w3.org/2001/tag/ilist.html#errorHandling-20">errorHandling-20</loc>.
It is an update to the
<loc href="http://www.w3.org/2004/02/23-tag-summary.html#contentTypeOverride-24">previously approved</loc> finding of
<loc href="http://www.w3.org/2001/tag/doc/mime-respect-20040225">25 February 2004</loc>.
Please send comments on this finding to the publicly archived TAG
mailing list <loc href="mailto:www-tag@w3.org">www-tag@w3.org</loc>
(<loc href="http://lists.w3.org/Archives/Public/www-tag/">archive</loc>).</p>

<p>The TAG approved this finding at its <loc href="/2001/tag/2006/04/11-tagmem-minutes.html#item08">11 April 2006 teleconference</loc>.
Publication of this finding does not imply endorsement by the W3C
Membership.
<loc href="/2001/tag/findings">Additional TAG findings</loc>, both
approved and in draft state, are also available. The TAG expects to
incorporate this and other findings into a Web Architecture Document
that will be published according to the process of the <loc
href="/Consortium/Process-20010719/tr#Recs">W3C Recommendation
Track</loc>.</p>

<p>The terms MUST, SHOULD, and SHOULD NOT are used in this document
in accordance with <bibref ref="rfc2119"/>.</p>
</status>

<pubstmt>
<p>World-Wide Web Consortium, TAG Finding, 2005.</p>
</pubstmt>
<sourcedesc>
<p>Created in electronic form.</p>
</sourcedesc>
<langusage>
<language id="EN">English</language>
</langusage>
<revisiondesc>
<slist>
<sitem>&draft.year;-&draft.month;-&draft.day;: Published draft</sitem>
</slist>
</revisiondesc>
</header>
<body>

<div1 id="intro">
<head>Summary of key points</head>

<p>The following are the key architectural points of this finding:</p>

<olist>
<item><p>Metadata received in an encapsulating container, such as the
metadata within the header fields of a message that describe the data
enclosed within that message, is authoritative in defining
the nature of the data received.</p></item>

<item><p>Inconsistency between representation data and metadata is an
error that should be discovered and corrected rather than silently
ignored.</p></item>

<item><p>An agent MUST NOT ignore or override authoritative
metadata without the consent of the party employing the agent.</p></item>

<item><p>Specifications MUST NOT work against the Web architecture
by requiring or suggesting that a recipient override authoritative
metadata without user consent.</p></item>
</olist>
</div1>

<div1 id="what">
<head>Defining authoritative metadata</head>

<p>The sequence of numbers "324033" might be a license plate number in
the state of Arkansas or an old-style telephone number in Italy.
Although there do exist some self-descriptive data formats, we generally
rely on context to define the purpose, format, and meaning of data.
One way to provide a context for interpretation is metadata.</p>

<p>Metadata is simply defined as data about other data.
Metadata can be expressed while referencing data externally, while
encapsulating data in a container, and by embedding metadata within the
data being described.  The following table provides examples of how
various forms of metadata can be expressed during Web interactions:</p>

<table border="1" cellpadding="3">
<thead><tr><th colspan="4">metadata</th></tr></thead>
<tbody><tr>
 <th>describes</th>
  <th>how</th>
   <th>where</th>
    <th>example</th>
</tr><tr>
 <td rowspan="3">resource</td>
  <td rowspan="3">external reference</td>
   <td>message fields</td>
    <td>HTTP's "Allow" header field in a response message describes the
        request methods allowed by the resource for which the response
        was generated.</td>
</tr><tr>
   <td>data format</td>
    <td>Link relationship values (rel/rev attributes) are often used to
        describe metadata relationships between resources.</td>
</tr><tr>
   <td>other sources</td>
    <td>RDF can associate metadata with a resource by reference to its URI.</td>
</tr>

<tr>
 <td rowspan="3">message</td>
  <td>encapsulating</td>
   <td>layers</td>
    <td>Protocols are often implemented as a stack of layered protocols,
        with each lower-layer protocol providing context for higher layers.</td>
</tr><tr>
  <td rowspan="2">embedded</td>
   <td>message syntax</td>
    <td>HTTP's response messages begin with "HTTP/" and a version number.</td>
</tr><tr>
   <td>message fields</td>
    <td>HTTP's "Date" header field describes the clock time
        at the origin when the message was generated.</td>
</tr>

<tr>
 <td rowspan="5">representation</td>
  <td rowspan="2">external reference</td>
   <td>identifiers</td>
    <td>Schemes based on old (non-metadata) protocols, such as gopher and ftp,
        include or imply metadata information about the representation as
        part of the identifier.</td>
</tr><tr>
   <td>data format</td>
    <td>Type attributes are sometimes used to express expectations about
        representation types for pre-access content selection.</td>
</tr><tr>
  <td rowspan="2">encapsulating</td>
   <td>message fields</td>
    <td>HTTP and MIME use the value of the &quot;Content-Type&quot; header
        field to indicate the representation's media type.</td>
</tr><tr>
   <td>archival formats</td>
    <td>Archives often include catalog data that associates metadata with
        parts of the archive.</td>
</tr><tr>
  <td>embedded</td>
   <td>data format</td>
    <td>Magic numbers, DOCTYPEs, and XML namespaces are all means for
        making data formats self-descriptive.
        HTML's "META" elements and RDF/XML assertions can describe
        metadata about the enclosing representation.</td>
</tr></tbody>
</table>

<p>The table above demonstrates that the same metadata may be expressed in
various forms.  The representation media type <bibref ref="rfc2046"/>,
in particular, plays such an important role in the Web architecture that its
value can be described in many different locations.  Given multiple sources
of metadata and the possibility that those sources may be inconsistent, an
architect must decide what source of metadata has the highest priority and thus
shall be considered authoritative in determining the desired behavior of the
recipient.  Furthermore, given the presence of self-descriptive data formats, a
decision must be made on whether to respect the declared metadata over whatever
might be learned by inspecting the data itself.</p>

<p role="constraint">Metadata received in an encapsulating container MUST
be considered authoritative and used in preference to metadata found by
inspection of the data, declared by embedded metadata, or provided by
external reference.</p>

<p>For Web architecture, a design choice has been made that metadata
received in an encapsulating container MUST be considered authoritative
and used in preference to metadata found by inspection of the data,
declared by embedded metadata, or provided by external reference.
Although this design choice is generally applicable to any container
format, including archival formats that encapsulate other data, the most
significant interpretation for Web architecture is that representation
metadata found within the header fields of a received message shall be
considered authoritative for the representation encapsulated within
that message.</p>

<p>Representation metadata does not constrain the receiving agent to
process the representation data in one particular way.  What it does is
allow the sender of a representation to express its intentions regarding
how the data should be interpreted by a recipient.  A recipient can then
choose, based on its own purpose, design, and configuration, how it will
react to those intentions on behalf of the party employing the agent.
For example, a browser traversing a link may behave differently
depending on how the link was selected, a maintenance spider may ignore
a data format's rendering instructions, and an editor may treat every
representation as a source for editing rather than display.</p>

<p>This treatment of authoritative metadata applies equally to clients,
servers, and intermediaries.  A server receiving a representation MUST
respect the client's expressed intentions regarding the metadata for
that representation and either act in accord with those intentions or
respond with an appropriate redirection or error message.</p>

</div1>

<div1 id="why">
<head>Why metadata from an encapsulating container is authoritative</head>

<p>The rationale for our choice of authoritative metadata is difficult
to describe using abstractions. Let's consider a specific example
of the media type of a received representation and explain why each of
the other sources of metadata are not considered authoritative.</p>

<div2 id="media-type">
<head>Role of Internet Media Types</head>

<p>An Internet media type <bibref ref="rfc2046"/> is metadata in the form
of a short name (e.g., "text/html") that associates the data with a
specific format specification and preferred interpretation. The association
is formally accomplished through registration of the media type in the
<loc href="http://www.iana.org/assignments/media-types/index.html">IANA
media type registry</loc>.
For example, "text/html" in the IANA registry is associated with
<bibref ref="rfc2854"/>, which in turn states that:</p>

<slist>
<sitem>The text/html media type is now defined by W3C Recommendations;
the latest published version is [HTML401].</sitem>
</slist>

<p>A media type is not simply an indication of data format; it also
refers to a preferred interpretation of that data format.  This preferred
interpretation may impact the recipient's functional decisions, such as
whether the data is rendered, stored, or executed.  In practice,
media types are often used as the key for selecting an appropriate
handler to interpret the data received.  It is possible for a single
data format to be associated with multiple media types and for a single
media type to describe a superset of many different data formats.</p>

<p>As explained above for representation metadata in general, we refer to
the media type as describing the sender's preferred, intended, and
definitive <emph>interpretation</emph> of the data, rather than as defining a
specific processing model for the recipient.  Each agent will interpret
received data according to its own function and configuration, perhaps
informed by the media type, and all that is required for Web interaction
is that the intention be faithfully communicated.  It is assumed that the
recipient software will follow those intentions, when appropriate, to the
extent that it has been instructed to do so by the agent's user.</p>

</div2>

<div2 id="embedded">
<head>Why embedded metadata is less authoritative</head>

<p>If the authoritative media type of a representation were to be determined
by inspection of embedded metadata in a self-descriptive format, then a sender
could not indicate different interpretations for a single representation
based on the declared media type.  For example, an owner might want to provide
links to separate resources that differ only in how a given HTML representation
is intended to be rendered. A message containing the header field
<code>Content-Type: text/html</code> would indicate that the sender intends
the recipient to interpret the representation as hypertext, using the rendering
process defined by the HTML standard, whereas the header field
<code>Content-Type: text/plain</code> would indicate that the sender intends
the recipient to treat the data as plain text without HTML rendering.
Since the representation data is the same in both messages, this
functionality is only possible if metadata of the containing message is
considered more authoritative in describing the data than whatever could be
learned from inspection of the data itself.</p>

<p>Placing authoritative metadata in message fields also enables more
efficient processing of messages.  It is far easier to dispatch behavior
on the basis of inspecting metadata (typically a short string) than it is
to invoke a generic document parser and try to divine the purpose of data
by inspecting the data itself (with no guarantee of success).</p>

</div2>

<div2 id="external">
<head>Why external reference metadata is least authoritative</head>

<p>If the authoritative media type of a representation were to be determined
by external reference, then resources could be prevented from evolving
independently from their references.  For example, standards for
hypermedia data formats evolve over time, whereas it is preferred
that URIs remain persistent over time.  If metadata guessed by inspecting
the identifier were to be considered authoritative, then references would
break when the representation media type changes.  Similarly, a type attribute
provided with a reference would suffer the same problem it were considered
authoritative.</p>

<p>Intermediaries (i.e., proxies and gateways) perform significant functions
in Web architecture, such as encapsulating legacy services, enhancing client
functionality, and moderating the risk of interactions across firewalls.
Those functions can only be performed correctly if the semantics of a given
message are expressed within that message.  In contrast, metadata associated
by an external reference is only visible to the user agent that selects the
reference: intermediaries are not aware of that context.  If a message
recipient treats external metadata as authoritative over that found in
the message, then the intermediaries are effectively bypassed and their
functionality is lost.</p>

<p>Finally, external references are usually made by third-parties: people
who are neither the resource owner nor the user. Allowing a third-party
to override the intent of the sender of a message means that the client
must trust both the resource owner and the supplier of the reference,
introducing yet another attack vector and its associated complications
to secure configuration and monitoring.</p>

</div2>

<div2 id="missing">
<head>What to do when there is no authoritative metadata</head>

<p>There are, of course, times when a representation is provided without
any containing metadata, such as when the sender is not certain of the
intended metadata or when the protocol being used does not support metadata.
That is why the HTTP/1.1 specification <bibref ref="rfc2616"/> states:</p>

<eg><quote>If and only if the media type is not given by a "Content-Type"
field, the recipient MAY attempt to guess the media type via inspection of
its content and/or the name extension(s) of the URI used to identify the
resource.</quote></eg>

<p>In other words, when there is no authoritative metadata, the receiving
agent MAY attempt to guess the appropriate metadata based on inspection
of the data and/or the reference, though such guessing should be limited
to media types that are safe to use in that context.</p>

</div2>
</div1>

<div1 id="overriding">
<head>Overriding authoritative metadata</head>

<p>Recognition of authoritative metadata is important because it
influences the default behavior for Web interactions.  However,
representation metadata is also susceptible to misconfiguration, and
user agents frequently try to &quot;simplify&quot; the Web by automatically
&quot;correcting&quot; perceived &quot;errors&quot; in those configurations.
</p>

<p>Recipients SHOULD detect inconsistencies between representation data
and metadata but MUST NOT resolve them without the
<loc href="#consent">consent of the user</loc>.
Choosing to ignore or override authoritative metadata is only allowed
within the Web architecture when the user has given consent.</p>

<div2 id="inconsistency">
<head>Inconsistency between representation data and metadata</head>

<p>Although there are benefits to separating representation metadata
from data, there are risks as well. In particular, the resource owner
may create inconsistencies by misconfiguring resources or by failing to
reassign metadata after a change of representation.
Inconsistency between representation data and metadata is an error.</p>

<p role="practice">Recipients SHOULD detect inconsistencies between
representation data and metadata.</p>

<p>Examples of inconsistencies between metadata and representation data
that are frequently observed on the Web include:</p>

<ulist>
<item><p>The character encoding of text-based content being inconsistent
with metadata about the character encoding. For some formats, such as XML,
such inconsistencies can be quickly detected.</p></item>

<item><p>Server-wide default metadata being incorrectly assigned to new or
rarely-used media types or content encodings.</p></item>

<item><p>Superset media types being used when a more specific media type
is intended, such as the use of "application/xml" when there exists a
more specific media type corresponding to the root element.</p></item>
</ulist>
</div2>

<div2 id="reducing-inconsistency">
<head>Reducing inconsistency</head>

<p>Web software developers, webmasters, and resource owners can help
reduce inconsistency through careful assignment of representation metadata.
</p>

<p role="practice">Server software designers (implementers) SHOULD provide
a means to set representation metadata at the same level of granularity and
permission that is needed to author those representations.</p>

<p>Metadata configuration needs to be authored by the same people who have
the ability to change the data being described. If all of the authoring is
done by the webmaster, then it makes sense to have one central location for
defining the metadata configuration.  In contrast, if the right to author
representations has been delegated, such as through varying ownership within
the server's hierarchical URI space, then the ability to author metadata
configuration should be delegated as well.</p>

<p role="practice">Server managers (webmasters) SHOULD provide each resource
owner (author) with the means and permission to set the configuration of
metadata for any representations under the author's control.</p>

<p>For example, the Apache httpd has a configuration directive,
<loc href="http://httpd.apache.org/docs/2.2/mod/core.html#allowoverride">AllowOverride FileInfo</loc>,
which delegates the authority to define metadata to the owners of each
directory.  It follows, therefore, that "AllowOverride FileInfo" should be set
for any directory containing representations that are authored by people who
do not have permission to change the central server configuration.</p>

<p role="practice">Resource owners (authors) SHOULD test for correct metadata
and inform server managers of metadata misconfigurations.</p>

<p>This requires that authors be able to detect errors, which will be
discussed below.</p>

<p role="practice">Server software designers (implementers) SHOULD NOT specify
default representation metadata, such as media type, character encoding, or
content language, within the standard configuration shipped with the server.
</p>

<p>Instead of specifying a default for metadata, it is better for
representations to be sent without that metadata.  That allows the recipient
to guess the metadata instead of being forced to either accept incorrect
metadata or be tempted to violate Web architecture by ignoring it.</p>

<p role="practice">Server managers (webmasters) SHOULD NOT specify an arbitrary
Internet media type (e.g., "text/plain" or "application/octet-stream") when the
media type is unknown.</p>

<p>It is better to send no media type if the resource owner has failed to
define one for a given representation.</p>

<p role="practice">Authoritative metadata SHOULD NOT be provided external to
the representation if it does not add clarity to that communication.</p>

<p>For example, the character encoding of XML data formats is self-descriptive
within the data and SHOULD NOT be included in a charset parameter of the
media type unless that distinction is significant to the resource (e.g., for
comparison during content negotiation of multiple XML representations
that differ only by character encoding).</p>

</div2>

<div2 id="silent-recovery">
<head>Avoiding silent recovery</head>

<p>As described above, inconsistency between representation data and
metadata is an error.  However, the tendency for some agents to attempt
silent recovery from such errors is also an error.  Silent recovery
from error perpetuates what could be easily fixed if the resource owner
is simply informed of that error during their own testing of the resource.</p>

<p role="practice">Web agents SHOULD have a configuration option that enables
the display or logging of detected errors.</p>

<p>Revealing errors when they occur need not be disruptive of the user
experience. For example, a graphical browser might display a small "bug"
button in the user interface to indicate a detected error so that an
interested user (i.e., the resource owner) can select the button, inspect
the error, and perhaps modify the agent's choice on how to recover from
that error.  Naturally, the appropriate mechanism will be unique to each type
of receiving agent and application context.</p>

<p>Some applications of the Web cannot tolerate error.  For example,
medical information systems must be designed so as to detect errors that
might cause relevant information to be rendered invisible.  In general, it is
better to design Web systems that are capable of fulfilling more stringent
requirements, even if their default configuration is to be lenient.</p>
</div2>

<div2 id="consent">
<head>Obtaining user consent</head>

<p>A user agent represents the user for protocol-level interactions
with resource providers. A user agent that does not respect the Web
protocol specifications can violate user privacy, introduce security
holes, and otherwise create confusion. For example, a broken user agent could
trigger a security failure by ignoring a received "Content-Type" header with
value "text/plain", guessing that representation data is a shell script,
and then executing the script on the user's machine without the user's
awareness. The other agents in the system (origin server and intermediaries)
have sent or forwarded the message with the expectation that the user
agent will not attempt to execute the script, at least not without
some additional action deliberately chosen by the user.  If the user
agent violates those expectations, it violates the protections that may
have been put in place for the user's self-protection.</p>

<p role="constraint">An agent MUST NOT ignore or override authoritative
metadata without the consent of the party employing the agent.</p>

<p>Consent does not imply that the receiving agent must interrupt
the user and require selection of one option or another.
User consent may be achieved in the form of pre-selected
configuration options, modes, or selectable user interface toggles,
with appropriate reporting to the user when the agent detects an error.
Naturally, the appropriate consent mechanism will be unique to each type
of receiving agent and application context.  It is therefore
beyond the scope of this finding to anticipate the range of possible
errors and ways in which interface designers might obtain user
feedback to address them.</p>

<p>Likewise, consent may be implied by the nature or type of interaction
being performed by the agent.  For example, a script that "mirrors" content
from the Web into files on an FTP server is probably going to ignore
metadata.  Similarly, XInclude <bibref ref="XInclude"/> processing has the
implied consent of the user to transform data from one source to another
and thus should only result in errors when the transformation is unsuccessful.
Note, however, that this functionality imposes a social burden on XInclude
processors to be sure that the resulting composed document does not violate
the user's security constraints.</p>

</div2>
</div1>

<div1 id="metadata-hints">
<head>Metadata hints in specifications</head>

<p>Some format specifications allow content authors to provide
metadata hints for servers and clients. For instance, the
<code>http-equiv</code> attribute of the HTML <code>meta</code>
element was intended for servers (not clients). In HTML 2.0 <bibref
ref="rfc1866"/>, section 5.2.5, the attribute is specified as
follows:</p>

<slist>
<sitem>HTTP servers may read the content of the document
&lt;head&gt; to generate header fields corresponding to any elements
defining a value for the attribute HTTP-EQUIV.</sitem>
</slist>

<p>The HTML 4.01 <code>link</code> element has an attribute <code>type</code>
that gives clients a hint about the likely media type if one were to
retrieve a representation of the identified resource.</p>

<p role="constraint">Specifications MUST NOT work against the Web architecture
by requiring or suggesting that a recipient override authoritative metadata
without user consent.</p>

<p>A format specification that includes metadata hints for clients
must make clear that, when these hints interact with server metadata,
they are advisory only. These hints provide metadata by external reference
and thus will not be known to all of the other (intermediary) recipients
of the representation. Errors involving inconsistent metadata cannot be
"fixed" by adding metadata to external references, since the metadata
is inconsistent for all recipients of the message (not just the user agent).
An agent that silently overrides server-provided metadata can create
security risks and prevent errors from being detected and corrected.</p>

<p>An architecturally sound description of an advisory attribute might
read:</p>

<slist>
<sitem>The author may provide a hint to the client about
the likely Internet media type of representations of the designated
resource. Although the client MUST treat server metadata (including
that provided by the file system) as authoritative, the client MAY use
the hint in a number of ways, including as a preference when
negotiating with the server, as input to a decision to retrieve a
representation, or to recover from a misconfigured server. However,
the client MUST NOT override the server's authoritative metadata
without the consent of the user.</sitem>
</slist>

<p>A good example of such a description can be found in the W3C
Recommendation <emph>Speech Recognition Grammar Specification
Version 1.0</emph> <bibref ref="SRGS10"/>, which describes agent behavior
that is consistent with this finding in
<loc href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/#S2.2.2">section 2.2.2</loc>.</p>

<p>In contrast, the W3C Recommendation <emph>Synchronized Multimedia
Integration Language (SMIL 2.0)</emph> <bibref ref="SMIL20"/> is
inconsistent with this finding. The definition of the <code>type</code>
attribute in
<loc href="http://www.w3.org/TR/2005/REC-SMIL2-20050107/extended-media-object.html#adef-media-type">section 7.3.1</loc>
specifies that the value of <code>type</code> takes precedence over
authoritative metadata for some protocols.  That specification is in error.
Under no circumstances can a format specification change the meaning of
protocol interaction on the Web. Implementers MUST disregard that statement
in SMIL 2.0 and treat the type attribute as merely a means for
content selection or for when authoritative metadata is unavailable.
The error has been corrected in SMIL 2.1 <bibref ref="SMIL21"/>.</p>

</div1>

<div1 id="scenarios">
<head>Scenarios</head>

<p>The scenarios in this section illustrate some issues that arise when
the architectural points described in this finding are ignored.</p>

<div2 id="bad-config-scenario">
<head>Bad server configuration</head>

<p>Stuart runs his own Web server at "http://www.example.org/". He
creates an HTML page and means to serve it as "text/html", but
misconfigures the Web server so that the content is served via
HTTP/1.1 <bibref ref="rfc2616"/> as "text/plain".  Janet's browser
retrieves the page and displays the content as plain text.
Tim's browser retrieves the page, detects some markup that suggests
it is an HTML document (e.g., a <code>&lt;!DOCTYPE</code> declaration or
<code>&lt;html&gt;</code> element) and, without informing Tim,
proceeds as though the content was declared to be "text/html", rendering it
according to the HTML and CSS specifications.
</p>

<p>Which party has neglected a constraint of Web architecture: Stuart
for the server misconfiguration, Tim's browser for silently overriding
the HTTP headers from the server, or Janet's browser for not detecting
that the content looked like HTML?</p>

<p>Answer: By silently overriding the authoritative metadata from the
HTTP headers, Tim's browser did not respect Web architecture constraints
that promote shared understanding and security.</p>

<p>Misconfiguration of the server is a fixable error.  If Stuart had been
using Janet's browser to test, he would have seen the error immediately and
fixed it long before either Tim or Janet made their requests.  However, if
Stuart used the same browser as Tim for his testing, Stuart would not have
been informed of the error.  The software developers of Tim's browser are
the culprit here because the product misrepresents the resource owner by
ignoring the authoritative metadata without Tim's consent. Janet's browser
respected the "Content-Type" header field and, in doing so, helps Janet
detect a server misconfiguration.</p>

</div2>

<div2 id="good-config-scenario">
<head>Good server configuration</head>

<p>Stuart runs his own Web server at "http://www.example.org/". He
creates a text page that describes an example of a security vulnerability
in a client-side scripting language using sample code.  Since Stuart
wants users to read the code, not execute it, he assigns the media type
"text/plain" to the representation.  Janet's browser retrieves the page
and displays the content as plain text.
Tim's browser retrieves the page, detects the script language, and
executes it, promptly sending a rude message to everyone on
Tim's address list (including Tim's mom).</p>

<p>Which party has neglected a constraint of Web architecture: Stuart
for serving content about a vulnerability or Tim's browser for silently
overriding the HTTP headers from the server?</p>

<p>Answer: By silently overriding the authoritative metadata from the
HTTP headers, Tim's browser did not respect Web architecture constraints
that promote shared understanding and security.</p>

<p>Authoritative metadata is an important aspect of Web architecture.
Agents that ignore authoritative metadata are broken, sometimes
dangerously so, and should not be used. Software cannot assume that
a configuration is wrong just because it is unusual.</p>

</div2>

<div2 id="hint-scenario">
<head>Inconsistent metadata hints</head>

<p>Norm publishes an XHTML document that includes this link:</p>

<eg><![CDATA[
<link href="cool-style" type="text/css" rel="stylesheet"/>
]]></eg>

<p>Although the link refers to an XSLT style sheet, Norm has set the
<code>type</code> attribute to "text/css". Stuart has configured the
Web server so that representations of the resource "cool-style" are
served via HTTP/1.1 as "application/xslt+xml". With a user agent that
understands XSLT but not CSS, Janet requests the content that includes
this link. As it interprets the representation data, Janet's user agent
reads the <code>type</code> hint and does not fetch the style sheet.</p>

<p>Which party is responsible for the fact that Janet did not receive
content she should have: Stuart for the server configuration, Norm for
stating that the style sheet is served as "text/css" when in fact it's
served with a different media type, or Janet's user agent for not
double-checking the media type with the server?</p>

<p>Answer: Norm's mislabeling of content deprived
Janet of content she should have received.</p>

<p>Norm is responsible for Janet not having access to representation data
she was meant to receive. The HTML 4.01 Recommendation states that
<quote>Authors who use [the <code>type</code>] attribute take
responsibility to manage the risk that it may become inconsistent with
the content available at the link target address.</quote> Janet's
client could have done more than merely read the <code>type</code>
hint and decide to skip the style sheet, but the specific purpose of
that hint is to reduce unnecessary requests and the associated latency.</p>

<p>Users often benefit from agents that perform metadata consistency
checks as part of special "authoring" or "testing" modes.  Such checks
might query the server and check for inconsistency, thus allowing the
metadata to be tested by authors without incurring overhead during
operation by normal users.</p>

</div2>

<div2 id="dav-scenario">
<head>Conflicting metadata during distributed authoring</head>

<p>The meaning of any HTTP message is defined by the contents of that
message as interpreted according to the HTTP standard.  If a client requests
that a server store a representation at a given URI and the server's
configuration states that the given URI implies metadata inconsistent
from what has been provided by the client, then the server should reject
the request using an appropriate HTTP status code.</p>

<p>For example, if a WebDAV client performs a</p>

<eg><![CDATA[
   PUT /something.html HTTP/1.1
   Host: example.org
   Content-type: application/pdf
   ...
]]></eg>

<p>and example.org knows that it has been configured such that all
resources with identifiers ending in in ".html" are represented
in the "text/html" format (i.e., the server has been configured not
to simply accept whatever the client wants for any given identifier),
then the server could choose one of four potential choices for handling
the request:</p>

<olist>
<item><p>ignore the "application/pdf" metadata provided by the client, store
the representation as-is, and serve it later as "text/html".</p></item>

<item><p>change the configuration such that future 200 responses to 
<code>GET /something.html</code> will be served as "application/pdf",
thus preserving the client's stated intent.</p></item>

<item><p>accept the request only in the sense of it being a requested change of
resource state, meaning that the PDF representation is automatically converted
to HTML for use by later responses.</p></item>

<item><p>respond with "415 Unsupported Media Type" and a message stating why the
request is inconsistent with the resource.</p></item>
</olist>

<p>Ignoring the "application/pdf" metadata provided by the client (1) is
clearly a bad idea because the inconsistency is an error and failing to
report an error is bad design.</p>

<p>Automatically changing the conflicting configuration (2) is appropriate
if and only if the author has the ability to selectively override the server's
configuration on a per-representation basis, has configured their Web space
to do so, and the result of accepting the PUT does change that configuration.
The primary use-case for this style of override is to continue supporting
well-known "cool" URIs even though the identifier appears to contain metadata
that is inconsistent with the current media type.  A better solution, though,
is to simply redirect the old identifier to a new URI that does not contain
an apparent file extension.  Unfortunately, the main problem with accepting
the override is that the inconsistency may have been due to pilot error
rather than user intention.  A good rule of thumb is to provide this behavior
as a configuration option that is not the default.</p>

<p>Performing on-the-fly type conversion, as in (3), is a complicated option
that preserves Web semantics but can lead to unexpected results for authors
that consider the Web interface to be just another dumb filesystem.  This
should only be done when the resource owner has specifically configured the
resource (or space of resources) to process state changes in this manner.
A better solution is to redirect the user to a codependent resource that
provides "application/pdf" views of the shared state; the user can then
choose whether or not to apply the state change to that resource, which
will have a metadata configuration consistent with the representation
being PUT, and thus preserve both Web and filesystem semantics.</p>

<p>Responding with a "415 Unsupported Media Type" error (4) is, in most
cases, the right answer unless the server has been specifically configured
for options (2) or (3).  Although it costs time and bandwidth, responding
with an informative error message allows the user to inspect both the
request being made and the server's current configuration, change
whichever one is incorrect, and thereby establish the correct metadata
for the resource's representations <emph>before</emph> allowing the PUT
to succeed.</p>

</div2>
</div1>

<div1 id="future-work">
<head>Future Work</head>

<p>The TAG is working with the authors of <bibref ref="rfc3023"/>
to revise section 7.1 of that RFC, which suggests behavior regarding
character encoding metadata that is inconsistent with this finding.
More information on that issue
(<loc href="http://www.w3.org/2001/tag/ilist.html#RFC3023Charset-21">RFC3023Charset-21</loc>)
can be found in the TAG finding on
<loc href="http://www.w3.org/2001/tag/2002/0129-mime">Internet Media Type
registration, consistency of use</loc> <bibref ref="TAG-21"/>.</p>

</div1>

<div1 id="references">
<head>References</head>

<blist>

<bibl id="iana" href="http://www.iana.org/" key="IANA">Internet
Assigned Numbers Authority (IANA)</bibl>

<bibl id="rfc1866" href="http://www.ietf.org/rfc/rfc1866"
key="RFC1866">T. Berners-Lee, D. Connolly. <titleref>Hypertext Markup
Language - 2.0</titleref>, RFC1866, November 1995.</bibl>

<bibl id="rfc2046" href="http://www.ietf.org/rfc/rfc2046.txt"
key="RFC2046">N. Freed, N. Borenstein. <titleref>Multipurpose Internet
Mail Extensions (MIME) Part Two: Media Types</titleref>, RFC2046,
November 1996.</bibl>

<bibl id="rfc2119" href="http://www.ietf.org/rfc/rfc2119.txt"
key="RFC2119">S. Bradner. <titleref> Key words for use in RFCs to
Indicate Requirement Levels</titleref>, RFC2119, March 1997.</bibl>

<bibl id="rfc2616" href="http://www.ietf.org/rfc/rfc2616.txt"
key="RFC2616">R. Fielding, J. Gettys, J. Mogul, H. Frystyk,
L. Masinter, P. Leach, T. Berners-Lee. <titleref>Hypertext Transfer
Protocol -- HTTP/1.1</titleref>, RFC2616, June 1999.</bibl>

<bibl id="rfc2854" href="http://www.ietf.org/rfc/rfc2854.txt"
key="RFC2854">D. Connolly, L. Masinter. <titleref>The 'text/html'
Media Type</titleref>, RFC2854, June 2000.</bibl>

<bibl id="rfc3023" href="http://www.ietf.org/rfc/rfc3023.txt"
key="RFC3023">M. Murata, S. St. Laurent, D. Kohn. <titleref>XML Media
Types</titleref>, RFC3023, January 2001.</bibl>

<bibl id="SMIL20" href="http://www.w3.org/TR/2005/REC-SMIL2-20050107/"
key="SMIL20">J. Ayars et al. <titleref>Synchronized Multimedia Integration
Language (SMIL 2.0), Second Edition</titleref>, W3C Recommendation, 7 January
2005.</bibl>

<bibl id="SMIL21" href="http://www.w3.org/TR/2005/REC-SMIL2-20051213/"
key="SMIL21">D. Bulterman et al., eds. <titleref>Synchronized Multimedia Integration
Language (SMIL 2.1)</titleref>, W3C Recommendation, 13 December 2005.</bibl>

<bibl id="SRGS10" href="http://www.w3.org/TR/2004/REC-speech-grammar-20040316/"
key="SRGS10">A. Hunt, S. McGlashan, eds. <titleref>Speech Recognition Grammar
Specification Version 1.0</titleref>, W3C Recommendation, 16 March 2004.</bibl>

<bibl id="TAG-21" href="http://www.w3.org/2001/tag/2002/0129-mime"
key="TAG-21">T. Bray, ed. <titleref>Internet Media Type registration,
consistency of use</titleref>, W3C TAG Finding, 4 September 2002.</bibl>

<bibl id="XInclude" href="http://www.w3.org/TR/xinclude/"
key="XInclude">J. Marsh, D. Orchard, eds. <titleref>XML Inclusions (XInclude)
Version 1.0</titleref>, W3C Recommendation, 20 December 2004.</bibl>

</blist>

</div1>

<div1 id="acks">
<head>Acknowledgments</head>

<p>The first edition of this finding was edited by Ian Jacobs and
included substantial input from Roy T. Fielding, Stuart Williams, and
Dan Connolly.  Martin Dürst, Philipp Hoschka, Rob Lanphier, and Norman Walsh
provided reviews of prior drafts that improved this finding. This second
edition has additionally benefited from the comments of Noah Mendelsohn,
Mark Baker, and Julian Reschke.</p>

</div1>

</body>
</spec>
