<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE spec PUBLIC "-//W3C//DTD Specification V2.1//EN"
               "http://www.w3.org/XML/1998/06/xmlspec-v21.dtd" [
  <!-- ================================================================ -->
  <!ENTITY draft.day "15">
  <!ENTITY draft.month "03">
  <!ENTITY draft.monthname "Mar">
  <!ENTITY draft.year "2002">
  <!ENTITY iso6.doc.date "&draft.year;-&draft.month;-&draft.day;">
  <!ENTITY http-ident "http://www.w3.org/2001/tag/doc/identify">
]>
<spec w3c-doctype="other">
  <?CVS $Id: identify.xml,v 1.3 2006/06/29 11:50:15 NormanWalsh Exp $?>
  <header>
    <title>What Does a URI Identify?</title>
    <w3c-designation>&http-ident;-&iso6.doc.date;</w3c-designation>
    <w3c-doctype>TAG Draft</w3c-doctype>
    <pubdate>
      <day>&draft.day;</day>
      <month>&draft.monthname;</month>
      <year>&draft.year;</year>
    </pubdate>
    <publoc>
      <loc href="&http-ident;">&http-ident;</loc>
(<loc href="&http-ident;.html">HTML</loc>,
<loc href="&http-ident;.xml">XML</loc>)</publoc>
    <!--
    <latestloc>
      <loc href="&http-ident;">&http-ident;</loc>
    </latestloc>
<prevlocs></prevlocs>
-->
    <authlist>
      <author>
        <name>Norman Walsh</name>
        <affiliation>Sun Microsystems, Inc.</affiliation>
        <email href="mailto:Norman.Walsh@Sun.COM">Norman.Walsh@Sun.COM</email>
      </author>
      <author>
        <name>Stuart Williams</name>
        <affiliation>Hewlett Packard, Inc.</affiliation>
        <email href="mailto:skw@hplb.hpl.hp.com">skw@hplb.hpl.hp.com</email>
      </author>
    </authlist>
    <copyright>
      <p>
        <loc href="http://www.w3.org/Consortium/Legal/ipr-notice-20000612#Copyright">Copyright</loc> &#xA9; 2002
<loc href="http://www.w3.org/">W3C</loc>
        <sup>&#xAE;</sup>
(<loc href="http://www.lcs.mit.edu/">MIT</loc>,
<loc href="http://www.inria.fr/">INRIA</loc>,
<loc href="http://www.keio.ac.jp/">Keio</loc>),
All Rights Reserved. W3C
<loc href="http://www.w3.org/Consortium/Legal/ipr-notice-20000612#Legal_Disclaimer">liability</loc>,
<loc href="http://www.w3.org/Consortium/Legal/ipr-notice-20000612#W3C_Trademarks">trademark</loc>,
<loc href="http://www.w3.org/Consortium/Legal/copyright-documents-19990405">document use</loc>, and
<loc href="http://www.w3.org/Consortium/Legal/copyright-software-19980720">software licensing</loc>
rules apply.
</p>
    </copyright>
    <abstract>
      <p>A one-page answer to the question "what does a URI identify?"
@@say more</p>
    </abstract>

<status>
<p>This document has been developed for discussion by the
W3C Technical Architecture Group.</p>

<p>This document is the work of the editors. It is a draft
with no official standing. It does not necessarily represent the
consensus opinion of the TAG and it may not even represent the
consensus opinion of the editors.</p>

<p>Comments may be directed to the W3C TAG mailing list <loc
href="mailto:www-tag@w3.org">www-tag@w3.org</loc>
(<loc
href="http://lists.w3.org/Archives/Public/www-tag/">archive</loc>).</p>

<p>Publication of this document by W3C indicates
no endorsement by W3C or the W3C Team, or any W3C Members.
</p>

</status>

    <pubstmt>
      <p>Chicago, Vancouver, Mountain View, et al.: World-Wide Web Consortium,
TAG Note, 2002.</p>
    </pubstmt>
    <sourcedesc>
      <p>Created in electronic form.</p>
    </sourcedesc>
    <langusage>
      <language id="EN">English</language>
    </langusage>
    <revisiondesc>
      <slist>
        <sitem>2002-03-05: Initial draft</sitem>
      </slist>
    </revisiondesc>
  </header>
  <body>
    <div1 id="sec-intro">
      <head>Introduction</head>
      <p>In order for two or more parties to communicate meaningfully, they
must have some shared frames of reference. They must speak a common
language, for example, and they must use a shared vocabulary. They must
also have some means to identify the things they are discussing.</p>
      <p>In the physical world, humans in close proximity can identify
things with informal, relative identifiers: <quote>that yellow
car</quote> or <quote>the oval chair</quote>. Communicating over
greater distances, we can rely on shared experiences: <quote>the Monty
Python <quote>parrot sketch</quote>
        </quote> or <quote>the second
edition XML specification</quote>. This works in part because we have
a lot of shared experiences and in part because there are such great
differences between objects in the physical world; it would be a rare
person indeed that would have difficulty distinguishing between an
oval chair and the second edition XML specification.</p>
      <p>But in web space, two factors come into play that make such
informal identifiers impractical: first, most of the objects that
humans want to talk about are documents on the web which are much less physically
distinct than chairs and cars and second, the participants involved in the communication
are not always human beings, sometimes they are software agents of one
form or another and they simply don't cope with informality that
well.</p>
      <p>In web space, we need more precise identifiers and
URIs, Uniform Resource Identifiers, satisfy that requirement.</p>
    </div1>
    <div1 id="sec-identify">
      <head>What Does a URI Identify?</head>
      <p>
        On the web URIs identify resources.<quote>Any information that can be named can be a resource.</quote>
        <bibref ref="RFC2396"/>.  In fact, this relationship can be taken as axiomatic: if a resource has a URI, it is identifiable on the web.  If it does not, it is not.</p>
      <ednote>
        <edtext>I don't think this is entirely the case. If we want to cover things that may be represented as blank-nodes in RDF. These are resources that may be described by relationships to other resources, but that may be left unnamed. (skw)</edtext>
      </ednote>
      <p>For a large class of resources: web pages, email addresses, etc.,
this relationship is fairly obvious. What's less obvious is that this
relationship applies to more abstract resources as well. The way one
talks about a person, or real estate, or love in a way that network agents
can process it, is by giving it a URI.</p>
      <p>The broader question of how your URI for love and my URI for
happiness are related is a different and orthogonal question.</p>
      <div2 id="sec-url">
        <head>What about URLs?</head>
        <p>URLs (uniform resource locators), like URNs (uniform resource
names), are a subset of the general class of uniform resource
identifiers. The distinction between URI, URL, URN, and the rest of the
sometimes described UR<emph>x</emph> identifiers is often more confusing
than useful. We'll confine ourselves to the term <quote>URI</quote>,
meaning all of them.</p>
      </div2>
      <div2 id="sec-resources">
        <head>Resources</head>
        <p>In general resource is a time varing conceptual mapping to a set of entities or values which are equivalent <bibref ref="Fielding"/>. For example the resource identified by the URI <loc href="http://www.w3.org">http://www.w3.org</loc> names the concept of the W3C home page which is itself a mapping to set of values returned as hypertext. The value returned (the hypertext) that changes frequently as new infomation displaces old information from the home page. The set of values mapped by a resource are equivalent resource representations and/or resource identifiers (giving further indirection or redirection). Dereferencing a resource identifier yields a representation of the current value of the referenced resource. At some time, t, the set of values that a resource maps to may be empty, which allows a concept to be identified before a realisation of the concept exists (or indeed after it has been retired).</p>
        <p>For example, for the W3C technical report collection it is common practice to assign a URI which always references the current version of a given technical report. In addition, each published version of a technical report is assigned a distinct URI that references that specific version. These two URIs reference two different resources or two different concepts: a specific version of a technical report and the current version of that same report. At some point in time dereferencing either URI may yield the same resource representation. However, at some later instant dereferencing the URI that references the current version of the technical report may yield a different set of representations than deferencing the resource referenced by the version specific URI. With the version specific URI there is a commitment, as a matter of W3C policy, that the set of resource representations referenced by that URI will not change over time, whereas, with the URI that references the current version of the report, there is a commitment, as a matter of W3C policy, that the resource representations referenced by that URI will always represent the most up-to-date published version of the report.</p>
        <p>The important point to note is that in general a resource is a time varying mapping to a value, and not simply the value returned by deferencing the resource at a particular moment in time.</p>
        <p>A further point to note about resources is that their identifiers expose nothing more than their identity (see <loc href="#Opaque">Opaque</loc> below). It is a matter for the authority that assigns an identifier to a resource to say what that resource means and what commiment it makes to sustainin the meaning of that resource.</p>
        <p>
          <bibref ref="RFC2396"/> is clear that "A resource can be anything that has identity". This is not a closed definition. Are there more things that can be regarded as resources than just those with assigned URIs (or URI references)? RDF provides the ability to described resources by their relationship to one another which leads to the notion of existentally qualified resources. For example, there exists a person whose internet mailbox is identified by the URI <loc href="mailto:timbl@w3.org">mailto:timbl@w3.org</loc>. This identifies the person of Tim Berners-Lee by reference to the URI of his internet mailbox without it being necessary to assign a URI to identify the concept of the person Tim Berners-Lee. Of course dereferencing such a resource might prove to be an interesting challenge.</p>
      </div2>
      <div2 id="sec-resourceprop">
        <head>Properties of Resources</head>
        <p>It is appealing to ask when two resources are the same resource, or perhaps more particularly when two resource identifiers identify the same resource. These are difficult questions. Two different URI's may identify the same resource, but it is only the authorities that asssign those URIs that can make the commitment to them identifying the same resource.</p>
        <p>However, as they say "Cool URIs don't change" <bibref ref="Cool"/>
          <bibref ref="Axioms"/>. There are strong social expectations that once a URI is assigned to identify a particular resource, then it should continue indefinitely to refer to that same resource or concept. This is of course best practice, but it is a matter of policy and commitment on the part of authorities assigning URIs rather than a constraint imposed by technological means.</p>
        <p>We are dealing here with two time dependent mappings. Firstly a time dependent maping between and identifier and a resource and then a time dependent mapping as describe above, between a resource and a set of equivalent values. The binding between an identifier and a resource is not fixed for all time and neither is the mapping between a resource and its current value. At some instant in time two URIs may reference the same resource, and at some later instant they may reference different resources. It is the authority controlling the assignment of a given URI to identify a particular resources that determine what resource a given URI references at a given time. Some URI schemes may give stronger guarantees about the temporal stability of the URI to resource mapping. Some organisations as a matter of policy may give stronger guarantees than those intrinsic to the schemes that they use for assigning URIs.</p>
        <!--
<p>For example, the URI "http://www.w3.org/" currently identifies the homepage of the World Wide Web Consortium. An nslookup yields 18.29.1.34, 18.29.1.35 and 18.7.14.127 as being the IP addresses associated with the DNS domain name "www.w3.org". So at this time the URI "http://18.29.1.34/" references the same resource as "http://www.w3.org/". Actually from the outside even that is may be a flawed conclusion to make. From the outside it is only possible to determine that the resources representations returned are equivalent for the purposes of the entity receiving them. However, there is no commitment maintaining the DNS mapping between www.w3.org and the IP address 18.29.1.34 into the long term - in the future deferencing http://18.29.1.34/ may yield a representation of something that is not the W3C homepage. </p>
-->
        <p>The assignment of meaning and resources to an identifier comes under the control of some authority (much as a parent assigns a name to a child, although giving it meaning is perhaps a little harder). We can only know that two identifiers reference the same resource because the authorities that assign the identifiers assert (directly or indirectly) that they identify the same resource, and even then, such assertions may not hold for all time.</p>
      </div2>
    </div1>
    <div1 id="sec-syntax">
      <head>Syntax of URIs</head>
      <p>In high-level terms, a URI consists of a scheme
(<code>http:</code>, <code>urn:</code>, <code>isbn:</code>, ...)
followed by a scheme-specific string, and an optional fragment
identifier. Some schemes are hierarchical, allowing for both relative
and absolute URIs, and some are not, allowing only absolute URIs.</p>
      <p>Although knowledge of the scheme provides some information about
the components of the scheme-specific string, for example, that
absolute URIs in the <code>http:</code> scheme begin with a DNS name,
it is generally inappropriate to make assertions about the content of
the resource identified by a URI from the content of the URI itself.
In particular, it is an error to assume that a URI that happens
to end with the string <quote>
          <code>.html</code>
        </quote> contains an HTML
document. 
</p>
      <div2>
        <head>What about Fragment Identifiers?</head>
        <p>If a URI contains an sharp character (a <quote>
            <code>#</code>
          </quote>), the string
that follows the <quote>
            <code>#</code>
          </quote> is a fragment identifier. Fragment
identifiers are a mechanism for identifying part of a resource. For example,
in a URI that retrieves an HTML document (a resource representation), the fragment identifier
<code>#foo</code> can be used to reference the element with the ID <quote>
            <code>foo</code>
          </quote> within that document.
</p>
<ednote>
  <edtext> There seems to have been  some controversy over the terms URI and URI Reference. The terminology of <bibref ref="RFC2396"/> is to define a URI Reference as an optional URI (absolute or relative) followed optionally by a # and a fragment identifier. I feel that in our architectural writings that we need to make a consistent choice over the use of the terms URI and URI reference. URIs seem to cover only a subset of URI references and URI's with frag-ids seem to reference parts of a resource representation modulo mapping between qnames and URI references and RDFs use of # characters in namespace names and URIs with frag ids to name graph nodes that RDF also call resources. (skw)</edtext>
</ednote>
        <p>This means
that in general, it's not possible to determine what a fragment
identifier means without retreiving the resource into which it points.</p>
<p>The fragment identifier identifies some sub-part of a resource representation. The syntax and interpretation of a fragment identifier is determined by
the MIME media type of a resource representation. This is considered  a design flaw <bibref ref="Fragments"/>.</p>
        <p>A URI that consists of only a fragment identifier (i.e, one that begins with a <quote>
            <code>#</code>
          </quote>) always points into the document that
contains the URI, irrespective of the effective base URI.</p>
      </div2>
    </div1>
    <div1 id="sec-semantics">
      <head>Semantics of URIs</head>
      <p>URIs have a small number of semantic properties independent of the resources
that they identify. The URIs of a particular scheme may have additional semantics,
that's a question for the specification that defines each scheme.</p>
      <glist>
        <gitem id="Uniform">
          <label>URIs are Uniform</label>
          <def>
            <p>In any context that allows a URI, any URI may be used. It is an error to say
that only URIs of a specific scheme are allowed in a certain context.</p>
            <p>Uniformity "...allows different different types of resource identifier to be used in the same context, even when the mechanisms used to access those resources may differ; it allows uniform semantic interpretation of common syntactic conventions across different types of resource identifiers; it allows intoduction of new types of resource identifiers without interfering with the way that existing identifiers are used; and, it allows the identifiers to be reused in many different contexts, thus permitting new applications or protocols to leverage a pre-existing, large and widely-used set of resource identifiers." <bibref ref="RFC2396"/>
            </p>
          </def>
        </gitem>
        <gitem id="Universal">
          <label>URIs are Universal</label>
          <def>
            <p>URIs may be used to identify any identifiable thing, anywhere. Also, resource of significance should be assigned a URI. <bibref ref="Axioms"/>
            </p>
            <p>An absolute URI always means the same thing, regardless of the context in which
it occurs. It is an error to assert that you can construct a context in which
absolute URIs have different meaning than they have outside that context.
(The same holds for relative URIs, except that context may change the effective
base URI.)
</p>
            <ednote>
              <edtext>I'm not sure that Universal is intended to imply some stability of meaning so much as they may be applied to anything that is identifiable. (skw)</edtext>
            </ednote>
          </def>
        </gitem>
        <gitem id="Opaque">
          <label>URIs are Opaque</label>
          <def>
            <p>It is an error to assert properties about the content of a resource
based solely on the content of the URI used to identify it.<bibref ref="Axioms"/> For example, it is an error to infer anything about the nature of a resource or its available representations from the presense of a a trailing <quote>.html</quote>, <quote>.asp</quote> or <quote>.png</quote> or from the presense of strings like <quote>servlet</quote> or <quote>cgi-bin</quote> within a URI.
            </p>
          </def>
        </gitem>
        <gitem id="Consistent">
          <label>URIs are Consistent</label>
          <def>
            <p>The resource identified by a particular URI should always be
<quote>the same</quote>, when it is identified by that URI. This does not mean
that the stream of bits associated with a URI (if, in fact, there is one) can
never change. The notion of <quote>sameness</quote> cannot be absolutely
identified.</p>
            <p>For some URIs, an unchanging stream of bits is entirely
appropriate, but others, the resource identified by the URI for
today's weather or the current time of day, for example, are expected
to vary even though they remain the same in perfectly understandable
ways.</p>
          </def>
        </gitem>
        <gitem id="NotUnique">
          <label>URIs are Not Unique</label>
          <def>
            <p>Although the resource identified by a URI should be consistent, it
does not follow that different URIs must always refer to different
resources. It is perfectly reasonable for a resource to be identified
by several different URIs.</p>
          </def>
        </gitem>
        <gitem id="Transcribable">
          <label>URIs are Transcribable</label>
          <def>
            <p>The syntax of a URI reference is defined in terms of a sequence of characters. Although these characters are drawn from those expressable using the US-ASCII character set, their primary form is as a sequence of characters and not as a sequence of octets under some character set encoding exchanged between computers or stored in computer files. This makes URI references transcribable in such away that they can be passed around as part of social communication every bit as easily as they can be exchanged by technological means. They have intruded our daily life. They appear in television advertising; they are spoken on the radio or over the telephone; they are printed in newspapers and books; jotted on pieces of paper and sent in letters. Oh yes, and they appear occasionally on Web sites. </p>
          </def>
        </gitem>
      </glist>
    </div1>
    <div1 id="references">
      <head>References</head>
      <div2 id="normative">
        <head>Normative References</head>
        <blist>
          <bibl id="RFC2396" href="http://www.ietf.org/rfc/rfc2396.txt">IETF "RFC 2396: Uniform Resource Identifiers (URI): Generic Syntax", T. Berners-Lee, R. Fielding, L. Masinter, August 1998. </bibl>
        </blist>
      </div2>
      <div2 id="informative">
        <head>Non-Normative References</head>
        <blist>
          <bibl id="Axioms" href="http://www.w3.org/DesignIssues/Axioms.html">"Universal Resource Identifiers - Axioms of Web Architecture", T. Berners-Lee, living document dated December 1996.</bibl>
<bibl id="Fragments" href="http://www.w3.org/DesignIssues/Fragment.html">"Fragment Identifiers on URIs", T. Berners-Lee, living document dated April 1997.</bibl>
          <bibl id="Fielding" href="http://www.cs.virginia.edu/~cs650/assignments/papers/p407-fielding.pdf">"Principled Design of the Modern Web Architecture", R.T. Fielding and R.N. Taylor, UC Irvine, </bibl>
          <bibl id="Cool" href="http://www.w3.org/Provider/Style/URI.html">"Cool URI's don't change." T. Berners-Lee, W3C, 1998 </bibl>
        </blist>
      </div2>
    </div1>
  </body>
</spec>
