<?xml version="1.0" encoding="UTF-8"?>
<!-- <!DOCTYPE spec SYSTEM "C:\XMLspec\spec-prod\dtd\xmlspec.dtd" [ -->
<!DOCTYPE spec SYSTEM "http://www.w3.org/2002/xmlspec/dtd/2.3/xmlspec.dtd" [
  <!-- ================================================================ -->
  <!ENTITY draft.day "07">
  <!ENTITY draft.month "11">
  <!ENTITY draft.monthname "November">
  <!ENTITY draft.year "2006">
  <!ENTITY iso6.doc.date "&draft.year;-&draft.month;-&draft.day;">
  <!ENTITY http-ident "http://www.w3.org/2001/tag/doc/metadataInURI-31">
]>

<?xml-stylesheet type="text/xsl" href="./metaDataInURI.xsl"?>

<!-- NOAH NOTE TO SELF:  Undo this comment before posting final draft -->
<!-- <?xml-stylesheet type="text/xsl" href="http://www.w3.org/2001/tag/doc/metadataInURI.xsl"?> -->

<!-- <?xml-stylesheet type="text/xsl" href="http://www.w3.org/2002/xmlspec/html/1.8/xmlspec.xsl"?> -->
<!-- <?xml-stylesheet type="text/xsl" href="./xmlspec.xsl"?> -->
<!-- <?xml-stylesheet type="text/xsl" href="C:\XMLspec\spec-prod\html\xmlspec.xsl"?> -->

<spec xmlns:xlink="http://www.w3.org/1999/xlink">
  <header>
    <title>The use of Metadata in URIs</title>
    <w3c-designation>http://www.w3.org/2001/tag/doc/metaDataInURI-20061107</w3c-designation>
    <w3c-doctype>DRAFT TAG Finding</w3c-doctype>
    <pubdate>
      <day>07</day>
      <month>November</month>
      <year>2006</year>
    </pubdate>
    <publoc>
      <loc href="http://www.w3.org/2001/tag/doc/metaDataInURI-31-20061107.html" xlink:type="simple" xlink:show="replace" xlink:actuate="onRequest">http://www.w3.org/2001/tag/doc/metaDataInURI-31-20061107.html</loc>
    </publoc>
    <latestloc>
      <loc href="http://www.w3.org/2001/tag/doc/metaDataInURI-31" xlink:type="simple" xlink:show="replace" xlink:actuate="onRequest">http://www.w3.org/2001/tag/doc/metaDataInURI-31</loc> 
  ( 
  <loc href="http://www.w3.org/2001/tag/doc/metaDataInURI-31-20061107.xml" xlink:type="simple" xlink:show="replace" xlink:actuate="onRequest">XML</loc> 
  ) 
  </latestloc>
    <prevlocs>
      Unapproved Editors Drafts:       <loc href="http://www.w3.org/2001/tag/doc/metaDataInURI-31-20061001.html" xlink:type="simple" xlink:show="replace" xlink:actuate="onRequest">http://www.w3.org/2001/tag/doc/metaDataInURI-31-20061001.html</loc>, <loc href="http://www.w3.org/2001/tag/doc/metaDataInURI-31-20061016.html" xlink:type="simple" xlink:show="replace" xlink:actuate="onRequest">http://www.w3.org/2001/tag/doc/metaDataInURI-31-20061016.html</loc>
, <loc href="http://www.w3.org/2001/tag/doc/metaDataInURI-31-20060609.html" xlink:type="simple" xlink:show="replace" xlink:actuate="onRequest">http://www.w3.org/2001/tag/doc/metaDataInURI-31-20060609.html</loc>, <loc href="http://www.w3.org/2001/tag/doc/metaDataInURI-31-20060511.html" xlink:type="simple" xlink:show="replace" xlink:actuate="onRequest">http://www.w3.org/2001/tag/doc/metaDataInURI-31-20060511.html</loc>, <loc href="http://www.w3.org/2001/tag/doc/metaDataInURI-31-20030708.html" xlink:type="simple" xlink:show="replace" xlink:actuate="onRequest">http://www.w3.org/2001/tag/doc/metaDataInURI-31-20030708.html</loc>, <loc href="http://www.w3.org/2001/tag/doc/metaDataInURI-31-20030704.html" xlink:type="simple" xlink:show="replace" xlink:actuate="onRequest">http://www.w3.org/2001/tag/doc/metaDataInURI-31-20030704.html</loc> (W3C Member-only)
    </prevlocs>
    <authlist>
      <author>
        <name>Noah Mendelsohn</name>
        <email href="mailto:noah_mendelsohn@us.ibm.com">noah_mendelsohn@us.ibm.com</email>
      </author>
      <author>
        <name>Stuart Williams</name>
        <email href="mailto:skw@hp.com">skw@hp.com</email>
      </author>
    </authlist>
    <status>
      <p>Editors DRAFT</p>
      <p>This document has been developed for discussion by the <loc href="http://www.w3.org/2001/tag/" xlink:actuate="onRequest" xlink:type="simple" xlink:show="replace">W3C Technical Architecture Group</loc>.  This
finding addresses the TAG issue <loc href="http://www.w3.org/2001/tag/ilist#metadataInURI-31" xlink:actuate="onRequest" xlink:type="simple" xlink:show="replace">metadataInURI-31</loc>.</p>
      <p>The content of this document is intended for discussion and does NOT necessarily represent a consensus position of the TAG.  
The <loc href="http://www.w3.org/2001/tag/doc/metaDataInURI-31-20061016.html" xlink:type="simple" xlink:show="replace" xlink:actuate="onRequest">previous draft</loc> of this finding was <loc href="http://www.w3.org/2001/tag/2006/09/19-minutes.html#item04" xlink:type="simple" xlink:show="replace" xlink:actuate="onRequest">discussed at the TAG teleconference of 19 September 2006</loc>.
At that meeting, it was agreed to draft one additional section on malicious
use of misleading metadata in URIs.
This version contains 
<loc href="#confusingmalicious" xlink:type="simple" xlink:show="replace" xlink:actuate="onRequest">
an initial draft of that section</loc>,
as well as a few other minor corrections.
</p>
<p>
An <loc href="http://www.w3.org/2001/tag/2006/02/metadatainURI31Roadmap.html" xlink:actuate="onRequest" xlink:type="simple" xlink:show="replace">informal guide to previous discussion of this topic</loc> is available and may be useful to reviewers of this draft. </p>
<p>The terms MUST, MUST NOT, SHOULD, and SHOULD NOT are used in this document in accordance with <bibref ref="RFC2119"/>.</p>
      <p>Publication of this finding does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time.</p>
      <p>
        <loc href="http://www.w3.org/2001/tag/findings" xlink:actuate="onRequest" xlink:type="simple" xlink:show="replace">Additional TAG findings</loc>, both approved and in draft state, may also be available. </p>
      <p>Please send comments on this finding to the publicly archived TAG mailing list <loc href="mailto:www-tag@w3.org" xlink:actuate="onRequest" xlink:show="replace" xlink:type="simple">www-tag@w3.org</loc>
(<loc href="http://lists.w3.org/Archives/Public/www-tag/" xlink:actuate="onRequest" xlink:show="replace" xlink:type="simple">archive</loc>).</p>
    </status>
    <abstract/>
    <langusage>
      <language/>
    </langusage>
    <revisiondesc>
      <p/>
    </revisiondesc>
  </header>
  <body>
    <div1>

      <head>Introduction</head>
<p>
Web-based software uses URIs to designate resources.
The authority who creates a URI is responsible for assuring that it is associated with the intended resource,
and that operations targeted to the URI manipulate or return the appropriate data.
Many URI schemes offer a flexible structure that can also be used to carry additional information,
called metadata, about the resource.
Such metadata might include
the title of a document,
the creation date of the resource, the MIME media type that is likely
to be returned by an HTTP GET, a digital signature usable to
verify the integrity or authorship of the resource content, or hints
about URI assignment policies that would allow one to guess the URIs
for related resources.
</p>
<p>
This finding addresses several questions regarding such metadata in URIs:<olist>
          <item>
            <p>What information about a resource can or should be embedded 
in its URI?</p>
          </item>
          <item>
            <p>What metadata can be <emph>reliably</emph> determined 
from a URI, and in what circumstances is it appropriate to rely on the correctness of such information?</p>
          </item>
          <item>
            <p>In what circumstances is it appropriate to use information from a URI as a hint as to the nature of a resource or its representations?</p>
          </item>
        </olist>
The first question is primarily of concern to URI assignment authorities, who must choose a suitable URI for each resource they control.
The other questions are focused on people and software making use of 
URIs, whether at the resource authority or elsewhere.
  Of course, the questions are related insofar as one reason for an authority to encode metadata is for the benefit of resource users.
</p>

<p>
The TAG has earlier published a finding <emph>Authoritative Metadata</emph> <bibref ref="AUTHMETA"/>, which
explains how to determine correct metadata in cases where conflicting information has been provided.
This finding is concerned with just one possible means of 
determining resource metadata, i.e. from the URI itself.
The TAG publication <bibref ref='AWWW'/> discusses related issues under the heading of <loc href="http://www.w3.org/TR/webarch/#uri-opacity" xlink:actuate="onRequest" xlink:show="replace" xlink:type="simple">URI Opacity</loc>; this finding provides additional detail and guidance on the encoding of metadata into URIs, and on when it is or isn't appropriate to attempt to infer metadata from a URI.
</p>
</div1>

<div1>
<head>Encoding and using metadata in URIs</head>
<p>
This section uses simple examples to illustrate some
issues that arise when encoding metadata in URIs, or
when relying on information gleaned from such URIs.
<emph>Good Practice Notes</emph> are provided to explain how to use the
Web effectively, and <emph>Constraints</emph> are given
where necessary for using
the Web correctly.
As these examples show, encoding or not encoding metadata in a URI
or deciding whether to rely on such metadata
is often a tradeoff, involving some benefits and some costs.
In such cases, choices should be made that best meet the needs of particular
resource providers and users.
</p>
<div2 id="erroneous">
<head>Reliability of URI metadata</head>
<p>
Consider Martin, who is using a Web-based bug tracking system to
investigate some software problems.  He sees a bug report which says:
</p>
<discussionPoint id="badxmlexample">
"See http://example.org/bugdata/brokenfile.xml for an example of XML that is not well-formed."
</discussionPoint>
<p>
The bug tracking system is built to show examples just as they are entered
into the system, so for http://example.org/bugdata/brokenfile.xml it returns
a stream of (poorly formed) XML with Content-Type <code>text/plain</code>.
That Content-Type should cause a properly configured browser 
to show Martin the erroneous text just as it was recorded:
</p>
<pre id="badxmlexampledata">
&lt;?xml version="1.0">
&lt;PetList>
  &lt;Dog>Rover&lt;/Dog>
  <emph>&lt;Cat>Felix&lt;/Fish></emph>
&lt;/PetList>
</pre>
<p>
Unfortunately, Martin uses a browser that incorrectly attempts to infer the
format of the returned data from the URI suffix.
Keying on the ".xml" in the URI, it launches
an XML renderer for what should have been plain text.
When Martin attempts to view the faulty file,
he sees instead a browser error saying that the 
erroneous XML could not be displayed.
</p>
<p role="constraint"><a name="unverifiedConclusion" id="unverifiedConclusion"></a>
<em>Constraint:</em> 
Web software MUST NOT depend on the correctness of metadata inferred
from a URI, except when the encoding of such metadata is documented
by applicable standards and specifications.
</p>
<p>Such standards and specifications include pertinent Web and Internet
RFCs and Recommendations such as <bibref ref="URI"/>,
as well as documentation provided by the URI assignment authority.</p>
<p>
Martin's browser is in error because its inference that the URI suffix provides file type metadata is not provided for by normative Web specifications or, we may assume, in documentation from the assignment authority.
A correctly written browser would have shown the faulty XML
as text, or might conceivably have shown a 
warning about the apparent mismatch between the type inferred from
the URI  and
the returned Content-Type.
(Martin's browser is also ignoring TAG finding 
"Authoritative Metadata" <bibref ref="AUTHMETA"/>, which mandates
that the Content-Type HTTP header takes precedence even if
type information had somehow been reliably encoded in the URI.)
</p>

<p>
Note that the constraint refers to <emph>conclusions</emph> drawn by software, 
which must
be trustworthy, as opposed to guesses made by people. 
As discussed in <specref ref="guessing"/>, guessing is
something that people using the Web do quite
often and for good reason.
Software tends to be long lived and widely distributed.
Thus software dependencies on undocumented URI metadata result not only
in buggy systems, but in inappropriate expectations that 
authorities will constrain their URI assignment policies 
and representation types to match
dependencies in the clients.  For both of these
reasons, the constraint above requires that software must not have
such dependencies.
</p>

<p>There is certain metadata that Martin or his browser
can reliably determine from the URI.
For example, the URI conveys that the  http scheme has been used, 
and that attempts to access the resource should be directed to the IP address
returned from the DNS resolution of the string "example.org".  
These conclusions are supported by normative specifications such as <bibref ref="URI"/> and <bibref ref="HTTP"/>. 
</p>
</div2>


<div2 id="guessing">
<head>Guessing information from a URI</head>

<p>Bob is walking down a street, and he sees an advertisement
on the side of a bus:</p>
<discussionPoint id="convenient">
"For the best Chicago Weather information on the Web, visit http://example.org/weather/Chicago."
</discussionPoint>
<p>
Bob goes home and types the URI into his browser, which does indeed
display for him a Chicago weather forecast.
Bob then realizes that he'll be visiting Boston, and he guesses that a Boston weather page might be available at a similar URI:
</p>
<discussionPoint id="convenient">
Bob guesses the Boston weather might be found at "http://example.org/weather/Boston".
</discussionPoint>
<p>
He types that into his browser and reads the response that comes back.
</p>
<p>
Bob is using the original URI for more than its intended purpose,
which is to identify the Chicago weather page. 
Instead, he's inferring from it information about the structure of a Web site
that, he guesses, might use a uniform naming convention for the
weather in lots of cities. 
So, when Bob tries the Boston URI, he has to be prepared
for the possibility that his
guess will prove wrong:  
Web architecture does not guarantee that the retrieved page, if there is one,
has the weather for Boston, or indeed that it contains
any weather report at all.
Even if it does,
there is no assurance that it is current weather, that it
is intended for reliable use by consumers, etc.
Bob has seen an advertisement listing just the Chicago URI, and that is the only one that the URI authority has
warranted will be a useful weather report.
</p>
<p>
Still, the ability to explore the Web informally and experimentally is very
valuable, and Web users act on such guesses about URIs all the time.
Many authorities
facilitate
such flexible use of the Web by assigning URIs in an orderly and predictable
manner.
Nonetheless, in the example above, 
Bob is responsible for determining whether
the information returned is indeed what he needs.
</p>
</div2>
<div2 id="forms">
<head>HTML Forms, and Documenting Metadata Assignment Policies</head>
<p>Bob would not have had to guess the Boston weather URI if the
authority had documented its URI assignment policy.
Assignment authorities have no obligation to provide such documentation,
but it can be a useful way of advertising in bulk the URIs for 
a collection of related resources.  For example, an advertisement might
read:
</p>
<discussionPoint id="parameterized">
"For the best weather information for your city, visit http://example.org/weather/<emph>your-city-name-here</emph>."
</discussionPoint>
<p>
Reading that advertisement, Bob can reasonably assume that 
weather reports are available by substituting specific
city names into the URI pattern <code>http://example.org/weather/<emph>your-city-name-here</emph></code>.
Moreover, the advertisement claims that the weather information 
obtainable at those URIs is "the best", so Bob can
assume that the weather reports are trustworthy and current.
</p>

<p>
HTML forms <bibref ref='HTMLForms'/> and now XForms <bibref ref='XFORMS'/> 
each provide a means by which an authority can 
assert its support
for a class of parameterized URIs,
while simultaneously 
programming Web clients to prompt for the necessary parameters.
For example, a Web site <code>http://example.org/weatherfinder</code>
might offer
a city lookup page containing the following HTML form fragment:
</p>
<pre id="htmlForm">
&lt;FORM ACTION="http://example.org/cityweather" METHOD="GET">
  For what city would you like a weather report: 
  &lt;INPUT TYPE="TEXT" NAME="city">?
  &lt;INPUT TYPE="SUBMIT" VALUE="Get the weather">
&lt;/FORM>
</pre>
<p>
A browser receiving this form, or Bob if he
views the source of the form, is assured that
the assigning authority is supporting an entire class of URIs of the form:
</p>
<pre id="URIClass">
http://example.org/cityweather?city=<emph>CityName</emph>
</pre>
<p>
The same HTML Form is also a computer program, executable by the
browser, that prompts for and retrieves representations for all such URIs,
and the English text in the form assures Bob that these are indeed
for weather reports.
Bob is not guessing the encoding of the URI
or the nature of the resources referenced &#8212;
he is acting on authoritative information provided by 
the assignor of the URIs.
He can assume not just that he will get weather reports for certain cities,
but that no URIs in the class correspond to anything other than
weather reports (though some may correspond to no resource at all).
Bob could, with this assurance, write his own software to construct
and use such URIs to retrieve weather reports.
Of course, the typical Web user would neither directly inspect
the URIs nor write software to build them, 
but would instead type in city names and push the handy "Get the weather"
button on his or her browser screen.
</p>
<p>
Note that the example carefully specifies that the
HTML form is sourced from the same authority as the 
individual weather URIs that the form queries.
In fact, it is also common for 
the <code>ACTION</code> attributes in HTML forms to refer to URIs 
from other authorities.
In such cases, it is the provider of the form rather than the 
assigning authority for the queried URIs who is responsible for the claims
made in the form.
In particular, users (and software)
should check the origin of HTML forms before depending on the URI assignment
patterns that they appear to imply.
Of course, you can always use such a form to perform a query and see
what comes back;
what you can't do is blame the assignment authority if the generated
URIs either don't
resolve (status code 404)
or return representations that don't
match the expectations established when reading the
form (you got a football score instead of a weather report).  
</p>
</div2>
<div2 id="authorityuse">
<head>Authority use of URI metadata</head>
<p>
In the examples in <specref ref="forms"/> above, resource metadata 
(I.e. the city associated with each resource) was encoded into URIs primarily for the benefit of users such as Bob,
or to facilitate use of the HTML Forms or XForms acting on those users' behalf.
</p>
<p>
Often, metadata is encoded into a URI not primarily for the benefit of users,
but to facilitate management of the resources themselves.  For example,
assume that the administrators at example.org have established
a policy of assigning URIs based on the media types
of representations:  
all GIF images are named with URIs ending in ".gif", and all JPEG images
are named with URIs ending in ".jpeg", and so on.
Although <specref ref="erroneous"/> warned that <emph>users</emph> of a resource cannot
rely on undocumented naming conventions to determine media types and
other information about a resource, the <emph>owner</emph> of a resource
controls such naming and can depend on it.
Example.org may therefore rely on their
policy in an Apache Web Server .htaccess file, which causes
the correct media type to be served automatically for each resource:
<pre>
&lt;Files ~ ".*\.gif">
  ForceType 'image/gif'
&lt;/Files>
&lt;Files ~ ".*\.jpg">
  ForceType 'image/jpeg'
&lt;/Files>
</pre>
</p>

<p>
Even if it does not document this policy publicly, example.org's 
own Web servers
can safely depend on it.
</p>


<p role="practice"><a name="clearURI" id="clearURI"></a>
<em>Good Practice:</em> 
URI assignment authorities and the Web servers deployed for them
may benefit from
an orderly mapping from resource metadata into URIs.</p>
<p>
In addition to filename-based conventions, authorities may
choose to base URIs on database keys, customer identifiers, or other
information that makes it easy to associate a URI with information
pertinent to the corresponding resource.
Such encodings are both useful and common on the Web, but there can also
be drawbacks to including such information in URIs.
Some of those problems are discussed in the three sections immediately below.
</p>
</div2>
<div2 id="convenientURIs">
<head>URIs that are convenient for people to use</head>
<p>URIs optimized for use by the assignment authority may sometimes be 
inconvenient for resource users.
Consider Mary who is walking down the street,
and who sees the same weather advertisement as Bob:</p>

<discussionPoint id="convenient">
"For the best Chicago Weather information on the Web, visit http://example.org/weather/Chicago."
</discussionPoint>

<p>Like Bob,
Mary is pleased to learn about a valuable Web site, and she finds
that the URI itself is quite easy both to remember and to
type into her browser.
This is because,
in addition to the required scheme and authority components, the URI
is based on the word <emph>weather</emph> and the city name <emph>Chicago</emph>, both of which fit her expectations for this resource.
</p>
<p>
The next day, Mary sees another advertisement reading:
</p>
<discussionPoint id="inconvenient">
"For the best Atlanta Weather information on the Web, visit http://example.org/123Hx67v4gZ5234Bq5rZ."
</discussionPoint>
<p>Mary is annoyed, because the URI is both
difficult to remember and hard to transcribe accurately.
She guesses that the authority has assigned this URI for its own
convenience (see <specref ref="authorityuse"/>) rather than for hers.
Although Web architecture does not require that URIs be easy 
to understand or suggestive of the resource named, it's handy if those
intended for direct use by people are.
</p>
<p role="practice"><a name="usefulURI" id="usefulURI"></a>
<em>Good Practice:</em> 

URIs intended for direct use by people should be 
easy to understand, and should be suggestive of the resource
actually named.</p>
<p>Note that the second
URI might be based on a database key
that facilitates efficient access to the weather data at the server
(see <specref ref="authorityuse"/>);
such a URI might have been a good choice if it were intended only for use in
HTML hyperlinks,
rather than in an advertisement on the side of a bus.</p>

</div2>
<div2 id="nochanging">
<head>Changing metadata</head>
<p>
URIs should generally not encode metadata that will change,
regardless of whether the encoding policy is established 
to benefit URI assignment authorities, resource users, or both.
Consider a web site that organizes document URIs according 
to the documents' lead author or editor.
Thus, the documents:
</p>
<pre id="toomuch">
http://example.org/documents/editor/BobSmith/document1
http://example.org/documents/editor/BobSmith/document2
</pre>
<p>
are named for their editor, Bob Smith.
Bob retires, and Mary Jones takes over as editor for document1.
If the URI is changed to encode her name,
then existing links break, but
if the URI is not changed, the naming policy is violated.
By encoding into the URI metadata that will change, the
authority has put itself in a difficult position.
</p>
<p role="practice"><a name="authorityKnows" id="authorityKnows"></a>
<em>Good Practice:</em> 
Resource metadata that will change SHOULD NOT be encoded in a URI. 
</p>
<p>
Indeed, RDF statements about the resource, headers returned with representations (e.g. Content-Type) or metadata embedded in the representations themselves (e.g. HTML &lt;META> tags) are all better alternatives 
for conveying such volatile metadata about the resource.
</p>
</div2>

<div2 id="hideforsecurity">
<head>Hiding metadata for security reasons</head>
<p>
A bank establishes a URI assignment policy in which account numbers
are encoded directly in the URI.  For example, the URI http://example.org/customeraccounts/456123 accesses information for account number 456123.
A malicious worker at an Internet Service Provider notices these URIs
in his traffic logs, and determines the bank account numbers for
his Internet customers. 
Furthermore, if access controls are not properly in place,
he might be able to guess the URIs for other accounts, and to attempt
to access them.
</p>
<p role="practice"><a name="secureURI" id="secureURI"></a>
<em>Good Practice:</em> 
URI assignment authorities SHOULD NOT put
into URIs metadata that is to be kept confidential.</p>
<p/>
</div2>
<div2 id="confusingmalicious">
<head>Confusing or malicious metadata</head>
<p>
Although a URI suffix such as <code>.jpeg</code> or <code>.exe</code> plays no role in establishing the media type of a Web resource,
such suffixes are often significant in operating system filenames and to users.
This situation can be confusing to users, and may in some cases be exploited by malicious web sites to cause harm.
</p>
<p>
First, there is the question of how content is rendered.
As discussed in <specref ref="erroneous"/>, the Content-Type is the authoritative source of information on representation typing and so should be
used as the basis for rendering.
Still, users may be confused to see a file rendered in a manner that
seems inconsistent with the suffix in the URI.
The situation for users
is further complicated if media types are misapplied by the server.
For example, video files are too often served as <code>text/plain</code>, PNG images may be mis-typed as <code>image/jpeg</code>, and so on.
In part to deal with such mislabeled content, 
or perhaps for other reasons, browsers sometimes use
heuristics to guess type information from the
the suffix of the URI.  Unless done with permission from the user,
such inferences violate the constraints set out 
in TAG finding "Authoritative Metadata" <bibref ref="AUTHMETA"/>, but
with the user's permission such inferences
may be necessary for correct rendering of mislabeled content.
</p>
<p> 
Users may also call upon their browsers to save content to a local filesystem,
and often the filename suggested is derived from the URI.
However,
in the many operating systems that encode typing information in the filename
suffix, the mapping from served representation
to default saved filename must be made with care.
If the URI suffix (such as .jpeg) is significant to the local filesystem,
and especially if the type inferred from it is inconsistent with 
the authoritative media type, users may inadvertently save files that
the local operating system considers to be mis-typed.
</p>
<p role="practice"><a name="savedFilename" id="savedFilenames"></a>
<em>Good Practice:</em> 
When saving files from the Web, user agents SHOULD suggest filenames
that represent appropriate mappings of the authoritative
media type to local file naming conventions.
User agents MAY suggest filenames based on the URI suffix or other metadata,
but SHOULD do so only if those names are consistent with the authoritative media type.
In 
cases where such consistent mappings are not offered,
or where the data saved may be harmful (e.g. an untrusted executable file),
user agents SHOULD warn users of the associated risks.
</p>
<p>
Malicious use media types and URI metadata is never acceptable.
Consider Bob who is looking on the Web for pictures of his favorite movie star.
He views in his browser an image that has been served
with media type <code>image/jpeg</code>.
Underneath the picture is a label that says: "Right click on this picture
and select 'Save Link As...' or 'Save Target As...'
 to save a copy of this picture on your hard drive."
The HTML is:
</p>
<pre id="twoURIs">
&lt;a href="./malicious.exe">
  &lt;img src="./moviestar.jpg"/>
&lt;/a>

&lt;p>
Right click on the picture above and select 
'Save Link As...' or 'Save Target As...'
to save a copy on your hard drive.
&lt;/p>
</pre>
<p>
The linked executable is served with a media type of application/octet-stream,
but the web site is counting on the common practice in which user agents
carry the ".exe" suffix from the served URI to the saved filename.
</p>
<p>
Bob has heard that some sites with movie star pictures have been untrustworthy,
but knowing a bit about computers, he believes that saving a picture
file should be relatively safe.
Bob doesn't notice that the link is to a URI ending
in .exe, and that his browser will 
likely save a malicious executable to his hard drive.
One can imagine circumstances in which linking to an executable from an image
would be legitimate (e.g. linking from the image of a box of packaged
software to an associated executable download), but this web site is
malicious.
It carefully mislabels the link, by erroneously claiming that
its instructions will to saving of a picture rather than an
executable.
</p>
<p>
Some of the confusion in the examples may arise from metadata
inferred from URIs,
but some traces to the fact that the example involves two
separate URIs.
Bob probably doesn't realize that on the Web, the resource used 
to render the picture is not in general the one that would be
linked by clicking on the picture.
The confusion in such situations is compounded
by the tendency of users, browsers, and operating
systems, each in their own way, to infer type information
from URI and/or filename suffixes.
Of course, the inclusion of instructions that are intended to mislead
users such as Bob is always unacceptable.
</p>
<p role="practice"><a name="dontConfuse" id="dontConfuse"></a>
<em>Good Practice:</em> 
Web sites SHOULD use hyperlinks and URI metadata 
in a manner that minimizes confusion for users, and SHOULD 
NOT misleadingly apply common 
conventions for encoding type information into filenames
and URIs.
</p>
</div2>



    </div1>
    <div1>
      <head>Conclusions</head>
<p>The principle conclusions of this finding are:</p>
<ulist>
      <item><p>It is legitimate for assignment authorities to encode static identifying properties of a resource, e.g. author, version, or creation date, within the URIs they assign. This may contribute to the unique assignment of URIs. It may also contribute to the use of efficient mechanisms for dereferencing resources within origin servers e.g. use of database keys within URIs.</p></item>

      <item><p>Assignment authorities may publish specifications detailing the structure and semantics of the URIs they assign. Other users of those URIs may use such specifications to infer information about resources identified by URI assigned by that authority.</p></item>

      <item><p>The ability to explore and experiment is important to Web users.  Users therefore benefit from the ability to infer either the nature of the named resource, or the likely URI of other resources, from inspection of a URI.
Such inferences are reliable only when supported by normative specifications
or by documentation from the assignment authorities.
In other cases, users should be aware that their inferences 
may be incorrect and the effect could be malicious.
</p></item>
      <item><p>People and software using URIs assigned outside of their own authority should make as few inferences as possible about a resource based on its URI.  The more dependencies a piece of software has on particular constraints and inferences, the more fragile it becomes to change and the lower its generic utility.</p></item>
</ulist>
    </div1>


    <div1>
      <head>References</head>
      <blist>
        <bibl id="AUTHMETA" href="http://www.w3.org/2001/tag/doc/mime-respect">"Authoritative Metadata"; W3C; TAG Finding; R.T. Fielding, I.Jacobs; April 2006</bibl>
	<bibl id='AWWW' href='http://www.w3.org/TR/webarch/'>I.Jacobs, 
N. Walsh, <titleref>Architecture of the World Wide Web</titleref>.
W3C. December, 2004.</bibl>
         <bibl id="HTTP" href="http://www.iana.org/rfc/rfc2616">"Hypertext Transfer Protocol - HTTP/1.1"; IETF; RFC 2616; R. Fielding, J. Gettys, J. Mogul, H. Frystyk, P. Leach, L. Masinter, T. Berners-Lee; June 1999</bibl>
        <bibl id="HTMLFORMS" href="http://www.w3.org/TR/html4/interact/forms.html">"HTML 4.01 Specification (Forms Chapter)"; W3C; D. Raggett, A. Le Hors, I. Jacobs; December 1999</bibl>
        <bibl id="RFC2119" href="http://www.ietf.org/rfc/rfc2119.txt">S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. IETF. March, 1997.</bibl>

        <bibl id="URI" href="http://www.ietf.org/rfc/rfc3986">"Uniform Resource Identifiers (URI): Generic Syntax";  RFC3986; IETF; T. Berners-Lee, R. Fielding, L. Masinter; August 1998</bibl>
        <bibl id="XFORMS" href="http://www.w3.org/TR/xforms/">"XForms 1.0"; W3C; J.M. Boyer, D. Landwehr, R. Merrick, T. V. Raman, M. Dubinko, L. Klotz ; 2006 (2nd Edition)  
</bibl>
      </blist>
    </div1>
  </body>
</spec>

