<?xml version="1.0" encoding="UTF-8"?>
<!-- <!DOCTYPE spec SYSTEM "C:\XMLspec\spec-prod\dtd\xmlspec.dtd" [ -->
<!DOCTYPE spec SYSTEM "http://www.w3.org/2002/xmlspec/dtd/2.3/xmlspec.dtd" [
  <!-- ================================================================ -->
  <!ENTITY draft.day "04">
  <!ENTITY draft.month "07">
  <!ENTITY draft.monthname "July">
  <!ENTITY draft.year "2003">
  <!ENTITY iso6.doc.date "&draft.year;-&draft.month;-&draft.day;">
  <!ENTITY http-ident "http://www.w3.org/2001/tag/doc/metadataInURI-31">
]>
<?xml-stylesheet type="text/xsl" href="http://www.w3.org/2002/xmlspec/html/1.8/xmlspec.xsl"?>
<!-- <?xml-stylesheet type="text/xsl" href="C:\XMLspec\spec-prod\html\xmlspec.xsl"?> -->
<spec xmlns:xlink="http://www.w3.org/1999/xlink">
  <header>
    <title>The use of Metadata in URIs</title>
    <w3c-designation>http://www.w3.org/2001/tag/doc/metaDataInURI-20030708</w3c-designation>
    <w3c-doctype>DRAFT TAG Finding</w3c-doctype>
    <pubdate>
      <day>8</day>
      <month>July</month>
      <year>2003</year>
    </pubdate>
    <publoc>
      <loc href="http://www.w3.org/2001/tag/doc/metaDataInURI-31-20030708.html" xlink:type="simple" xlink:show="replace" xlink:actuate="onRequest">http://www.w3.org/2001/tag/doc/metaDataInURI-31-20030708.html</loc>
    </publoc>
    <latestloc>
      <loc href="http://www.w3.org/2001/tag/doc/metaDataInURI-31" xlink:type="simple" xlink:show="replace" xlink:actuate="onRequest">http://www.w3.org/2001/tag/doc/metaDataInURI-31</loc> 
  ( 
  <loc href="http://www.w3.org/2001/tag/doc/metaDataInURI-31-20030708.xml" xlink:type="simple" xlink:show="replace" xlink:actuate="onRequest">XML</loc> 
  ) 
  </latestloc>
    <prevlocs>
      Unapproved Editors Draft: <loc href="http://www.w3.org/2001/tag/doc/metaDataInURI-31-20030704.html" xlink:type="simple" xlink:show="replace" xlink:actuate="onRequest">http://www.w3.org/2001/tag/doc/metaDataInURI-31-20030704.html</loc>
    </prevlocs>
    <authlist>
      <author>
        <name>Stuart Williams</name>
        <email href="mailto:skw@hplb.hpl.hp.com">skw@hplb.hpl.hp.com</email>
      </author>
    </authlist>
    <status>
      <p>Editors DRAFT</p>
      <p>This document has been developed for discussion by the <loc href="http://www/w3.org/2001/tag/" xlink:actuate="onRequest" xlink:type="simple" xlink:show="replace">W3C Technical Architecture Group</loc>.  This
finding addresses the TAG issue <loc href="http://www.w3.org/2001/tag/ilist#metadatainURI-31" xlink:actuate="onRequest" xlink:type="simple" xlink:show="replace">metadataInURI-31</loc>.</p>
      <p>The content of this document is intended for discussion and does NOT necessarily represent a concensus position of the TAG.</p>
      <p>Publication of this finding does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time.</p>
      <p>
        <loc href="http://www/w3.org/2001/tag/findings" xlink:actuate="onRequest" xlink:type="simple" xlink:show="replace">Additional TAG findings</loc>, both approved and in draft state, may also be available. The TAG expects to
incorporate this and other findings into a Web Architecture Document that will be published according to the process of the <loc href="http://www/w3.org/Consortium/Process-20010719/tr#Recs" xlink:actuate="onRequest" xlink:show="replace" xlink:type="simple">W3C Recommendation Track</loc>.</p>
      <p>Please send comments on this finding to the publicly archived TAG mailing list <loc href="mailto:www-tag@w3.org" xlink:actuate="onRequest" xlink:show="replace" xlink:type="simple">www-tag@w3.org</loc>
(<loc href="http://lists.w3.org/Archives/Public/www-tag/" xlink:actuate="onRequest" xlink:show="replace" xlink:type="simple">archive</loc>).</p>
    </status>
    <abstract/>
    <langusage>
      <language/>
    </langusage>
    <revisiondesc>
      <p/>
    </revisiondesc>
  </header>
  <body>
    <div1>
      <head>Introduction</head>
      <p>This finding addresses two related question:<olist>
          <item>
            <p>What, if anything, can be inferred from a URI used to identify a resource?</p>
          </item>
          <item>
            <p>What information about a resource can or should be embedded in a URI used to identify that resource?</p>
          </item>
        </olist>
      </p>
      <p>
      The first question is focussed on people and software making use of URIs assigned outside of their own authority (observers). The second question is focussed on people and software acting in the role of or on behalf of a URI assignment authority (authorities) for URI assignments within the scope of that authority.  The opaque nature of URIs  is considered from the point of view of observers and authorities.</p>
      <div2>
        <head>Summary of Prinicples</head>
        <p>
          <olist>
            <item>
              <p>People and software making use of URIs assigned outside of their own authority (i.e. observers) MUST NOT attempt to infer properties of the referenced resource except as licensed by relevant normative specifications or by URI assignment policies published by the relevant URI assignment authority.</p>
              <p> Relevant normative specifications include the URI specification <bibref ref="URI"/>, registered URI scheme specifications (see <loc href="http://www.iana.org/assignments/uri-schemes">IANA URI Scheme Registry</loc>) and other normative specifications that specify structured use of URI's or URI components.</p>
              <p>For example, based solely on inspection of a URI it is unsound to infer:
<olist>
                  <item>
                    <p>that a retrival operation on a URI that ends <code>.html</code> will return an HTML representation of the resource a content-type of text/html.</p>
                  </item>
                  <item>
                    <p>that a resource identified by a URI whose authority component ends <code>.ca</code> is hosted in Canada or operated by a Canadian organisation.</p>
                  </item>
                </olist>
              </p>
            </item>
            <item>
              <p>A URI assignment authority MAY publish specifications detailing its URI assignment policies. Policies may detail the use of resource properties (version, creation date, author) in making URI assignments. Policies may detail the structure and semantics of URI query components served by resources under the control of the authority.</p>
              <p>People and software using URIs assigned by that authority (i.e. observers) MAY make use of such published information. For example the structure of URIs used to identify W3C technical reports is documented in <bibref ref="PUBRULES"/>. However, observers are cautioned that assignment policies are not generally subjected to standardization and may be changed by the relevant authority at any time. Software programmed on the basis of such a policy is at risk of becoming obsolete.</p>
            </item>
          </olist>
        </p>
        <ednote>
          <edtext>I'm not sure how or whether we should account for specifications other than URI scheme specifications that might impose some structure on URIs. In the case of XML Schema Component Designators the structuring constraints only impact the fragment identifier. Likewise for identifying abstract WSDL components in the current WSDL draft - although one of the options being considered would be to apply structure with the URI path component. Also, there seems to me to be a difference between imposing structure on fragments (because their interpretation is media-type dependent) and on the path component.</edtext>
        </ednote>
        <ednote>
          <edtext>Members of the TAG have suggested that the finding end at this point and that discussion in sections 2 and 3 be deleted. The editor would appreciate feedback on this suggestion.</edtext>
        </ednote>
        <ednote>
          <edtext>Members of the TAG have suggested that this finding be reduced simply to a statement to the effect of "Don't peek inside URIs". The editor would appreciate feedback on this suggestion.</edtext>
        </ednote>
      </div2>
    </div1>
    <div1>
      <head>Deriving Information from URI Syntax and Structure</head>
      <ednote>
        <edtext>The current draft of RFC2396bis <bibref ref="URI"/> is used throughout as the referenced URI specification.</edtext>
      </ednote>
      <p>
The generic syntax of URIs is specified in <bibref ref="URI"/>. At the top level a URI is divided into four syntactic components, <emph>scheme</emph>, <emph>hier-part</emph>, an optional <emph>query</emph>, and an optional <emph>fragment</emph>.</p>
      <div2>
        <head>Scheme Component</head>
        <p>The <emph>scheme</emph> component of a URI is the primary determinant of the syntax and semantics of the remainder of a URI. It determines the relevant URI scheme specification which should be registered in the <loc href="http://www.iana.org/assignments/uri-schemes">IANA URI Scheme Registry</loc>. A URI scheme specification may constrain the sort of resource that identifiers assigned using that scheme reference. For example, <olist>
            <item>
              <p>
                <bibref ref="RFC2368"/> states that <quote>The mailto URL scheme is used to designate the Internet mailing address of an individual or service.</quote>
              </p>
            </item>
            <item>
              <p>
                <bibref ref="RFC1738"/> states <quote>The FTP URL scheme is used to designate files and directories on Internet hosts accessible using the FTP protocol (RFC959).</quote>
              </p>
            </item>
            <item>
              <p>
                <bibref ref="RFC2392"/> defines the intended use of the mid and cid URI schemes for referencing messages and message parts.</p>
            </item>
          </olist>
A URI scheme may also delegate the specification of such constraints to other specifications. For example the URN Specification,<bibref ref="RFC2141"/>,  introduces the concept of URN Namespaces and <bibref ref="RFC3406"/>  then delegates the specification of URN assignment procedures, contraints on the type of resource being identified and other syntactic and semantic constraints (if any) to the associated URN Namespace specification.</p>
        <p>Thus, either directly or via delegation to other specifications, some URI schemes enable the type of resource being referenced to be determined, others do not.</p>
      </div2>
      <div2>
        <head>Authority and Path Components</head>
        <p>In the case of URI's where the <emph>hier-part</emph> component enables some authority to be identified and that authority (which may be a specification) describes the syntax and semantics of identifiers assigned under that authority, it is possible to use such a description to derive information about a resource. For example, on the basis of the W3C Publication Rules for technical reports <bibref ref="PUBRULES"/> it is legitimate to conclude that the following about the resource identified by the URI <loc href="http://www.w3.org/TR/2003/WD-xquery-20030502/">http://www.w3.org/TR/2003/WD-xquery-20030502/</loc>: <ulist>
            <item>
              <p>The referenced resource is a Working Draft of a W3C technical report.</p>
            </item>
            <item>
              <p>The referenced resource was published on 2nd May 2003.</p>
            </item>
            <item>
              <p>The shortname of the referenced resource is <quote>xquery</quote>.</p>
            </item>
            <item>
              <p>The most recent published version of the technical report is accessible at <loc href="http://www.w3.org/TR/xquery">http://www.w3.org/TR/xquery</loc>
              </p>
            </item>
          </ulist>
        </p>
        <p>A common mistake is to make inferences based on the last segment in the path component of a URI. For example, it is common to assume that a URI with a path that ends in <quote>.html</quote> identifies an HTML document, or that a path ending in <quote>.jpeg</quote> identifes an image encoded acording to the JPEG specification. These inferences are not licensed by the <bibref ref="URI"/>. When a transfer protocol provides a means to convey media-type information, that is the authoritative determinant of media content-type (and inconsistencies are an error that should not be silently ignored (see <bibref ref="MIMEOverride"/>)</p>
      </div2>
      <div2>
        <head>Query Component</head>
        <p>HTML, XHTML and XFORMs all define mechanisms whereby a client can construct the query portion of a URI from information submitted in a web form. The transformation of completed user agent forms into URI is a powerful means to enable the bookmarking of queries that may then be shared with others or repeated. When the query components of a URI are constructed either by the action of completing a form or executing a script that orignated from the same authority as the constructed reference, the structure of the query remains opaque to the end-user. The user-agent that constructed the URI did so in accordance with a specification it received, as a form or a script, from the authority that resolves the resulting reference. That authority retains all the normal freedoms to organise the way it uses its URI space, to create new query parameters, to change scripts and forms etc. </p>
        <p>However, if the authority also publishes the syntax and semantics of the query parameters that it uses, 3rd parties may independently construct URI with particular semantic intent. 3rd parties that find such constructed URI useful will create content and applications that depend upon their structure. This will then either limit the freedom for the assignment authorities to evolve there site (modulo 'Cool URI's don't Change' <bibref ref="CoolURI"/>) without causing breakages elsewhere or it places a maintainence burden on the dependent applications and content.</p>
      </div2>
      <div2>
        <head>Fragment Component</head>
        <p>
          <bibref ref="URI"/> defines fragment identifiers in the context of retrival operations where the fragment identifier is resolved in a manner specified by the media-type of the resulting representation. Fragment identifiers are also used in contexts where they merely play a role as part of an identifier in systems where no retrieval is intended. In general, in the absense of a media-type it is not possible to infer properties of a resource or its representations from the fragment identifier component of a URI.</p>
      </div2>
    </div1>
    <div1>
      <head>Structuring URI Assignments</head>
      <p>People and software making URI assignments may only assign URIs for which they are the relevant authority. Authority to make URI assignments is delegated from the URI specification, <bibref ref="URI"/>, to URI scheme specfications. Registered URI schemes are listed in the  <loc href="http://www.iana.org/assignments/uri-schemes">IANA URI Scheme Registry</loc>. URI schemes themselves may futher delegate the authority to assign URIs under that scheme to other specifications or to people or software acting in the role of URI assignment authorities.</p>
      <p>URI assignment authorities should not reassign URIs as this leads to ambiguity over what a given URI identifies (see <bibref ref="CoolURI"/>). </p>
      <div2>
        <head>Scheme Component</head>
        <p>The assignment of URIs from a particular URI scheme should respect any constraints on the type of resource identified imposed by the relevant URI scheme specification. The choice of scheme will be influenced by: <olist>
            <item>
              <p>The available resource access mechanisms: HTTP, FTP, filesystem.</p>
            </item>
            <item>
              <p>The type of resource identified by the assigned URI.</p>
            </item>
            <item>
              <p>The authorities the resource owner has to assign URIs.</p>
            </item>
          </olist>
        </p>
      </div2>
      <div2>
        <head>Authority and Path Components</head>
        <p>A URI assignment authority must operate within the constraints imposed by the delegation path(s) which established its authority. Beyond that the authority is free to organise the URI space under its control in any manner it chooses.</p>
        <p>An assignment authority may make use of resource properties in the construction of URI that identify the resource.  Properties used in the construction of resource identifiers should be static with respect to changes in the state of the resource, it is not very useful if the identify of a resource varies with its current state.</p>
      </div2>
      <div2>
        <head>Query Components</head>
        <p>Query components are often used to carry identifying information in the form of keyword/value pairs. Many references that contain query components arise through the use of forms or from the client side execution of scripts. Both forms and scripts generally originate under the control of the assignment authority without the need to publish a specification of the structure and semantics of the associated queries. </p>
        <p>However, URIs containing queries are also propagated as bookmarks and in email messages.  Assignment authorities should take care to maintain the URI assignments under their authority, <bibref ref="CoolURI"/>. Changing the spelling or value space of keywords may result in the failure of references bookmarked by others, or software that has been programmed based on any published details of the structure and semantics of queries strings.</p>
        <p>The URI specification <bibref ref="URI"/>  makes no statement about the equivalence of URI that vary simply in the ordering of query keyword/value pairs. URI that differ only in the ordering of query keyword/value pairs are different URI, however in many cases they may identify the same resource.</p>
        <ednote>
          <edtext>Don't know if this is correct, but couldn't find a mention. Probably ought to say something about whether or not inconsistent ordering of query parameters has any significant effect on Web performance - due to missed cache hit opportunities.</edtext>
        </ednote>
      </div2>
      <div2>
        <head>Fragment Components</head>
        <ednote>
          <edtext>Need to add some material here, but I think it risks getting tangled up with httpRange-14 and fragmentInXML-28. Going to pause for now and await inspiration/feedback.</edtext>
        </ednote>
      </div2>
    </div1>
    <div1>
      <head>Conclusions</head>

      <p>It is legitimate for assignment authorities to encode static identifying properties of a resource, e.g. author, version, or creation date, within the URIs they assign. This may contribute to the unique assignment of URIs. It may also contribute to the use of efficient mechanisms for dereferencing resources within origin servers e.g. use of session-ids within URIs.</p>

      <p>Assignment authorities may publish specifications detailing the structure and semantics of the URIs they assign. Other users of those URIs may use such specifications to infer information about resources identified by URI assigned by that authority.</p>

      <p>URI scheme specifications may make constraints about the type of resources identifiers assigned under that scheme may reference. Also, they may delegate the right to make such constraints to other specifications. Assignment authorities should honor such constraints in the assignments that they make. Other users of those URIs may make use of any constraints specified by such delegation chains rooted in the URI specification <bibref ref="URI"/> to infer information about a referenced resource.</p>

      <p>People and software using URIs assigned outside of their own authority should make as few inferences as possible about a resource based on its identity and inferences arising from delegated authority. The more dependencies a piece of software has on particular constraints and inferences, the more fragile it becomes to change and the lower its generic utility.</p>

    </div1>
    <div1>
      <head>References</head>
      <p>@@TODO@@ This list needs pruning once the document is finished.</p>
      <blist>
        <bibl id="RFC1738" href="http://www.ietf.org/rfc/rfc2616">"Uniform Resource Locators (URL)"; IETF; RFC1738; T. Berners-Lee, L. Masinter, M.McCahill; December 1994</bibl>
        <bibl id="RFC2141" href="http://www.ietf.org/rfc/rfc2141">"URN Syntax"; IETF; RFC2141; R. Moats; May 1997</bibl>
        <bibl id="RFC2368" href="http://www.iana.org/rfc/rfc2368">"The mailto URL scheme"; IETF; RFC 2368;  P. Hoffman, L. Masinter, J. Zawinski;  July 1998</bibl>
        <bibl id="RFC2392" href="http://www.iana.org/rfc/rfc2392">"Content-ID and Message-ID Uniform Resource Locators"; IETF; RFC 2392; E. Levinson; August 1998</bibl>
        <bibl id="RFC2396" href="http://www.ietf.org/rfc/rfc2396">"Uniform Resource Identifiers (URI): Generic Syntax";  RFC2396; IETF; T. Berners-Lee, R. Fielding, L. Masinter; August 1998</bibl>
        <bibl id="RFC3406" href="http://www.ietf.org/rfc/rfc3406">"Uniform Resource Names (URN) Namespace Definition Mechanisms";RFC3406; IETF; L. Daigle, D.W. van Gulik, R. Iannella, P. Faltstrom; October 2002</bibl>
        <bibl id="CoolURI" href="http://www.w3.org/Provider/Style/URI.html">"Cool URIs don't change"; W3C; Tim Berners-Lee; 1998</bibl>
        <bibl id="MIMEOverride" href="http://www.w3.org/2001/tag/doc/mime-respect.html">"Client handling of MIME headers"; W3C; Draft TAG Finding ; I.Jacobs; May 2003</bibl>
        <bibl id="XsdComponents" href="http://www.w3.org/TR/2003/WD-xmlschema-ref-20030109/">"XML Schema: Component Designators"; W3C; Working Draft; 09 January 2003</bibl>
        <bibl id="WSDL12" href="http://www.w3.org/TR/2003/WD-wsdl12-20030303/#wsdl-uri-references">"Web Services Description Language (WSDL) Version 1.2"; Appendix C "URI References for WSDL constructs" ; W3C; Working Draft; 3 March 2003</bibl>
        <bibl id="PUBRULES" href="http://www.w3.org/2003/05/27-pubrules.html">"Publication Rules"; W3C; May 2003</bibl>
        <bibl id="URI" href="http://www.apache.org/~fielding/uri/rev-2002/rfc2396bis.html">"Uniform Resource Identifier (URI): Generic Syntax"; IETF; T. Berners-Lee, R. Fielding, L. Masinter; Currently being revised.  
</bibl>
      </blist>
      <p>The IETF Internet Draft <loc href="http://www.apache.org/~fielding/uri/rev-2002/rfc2396bis.html">draft-fielding-uri-rfc2396bis-03</loc> is expected to obsolete <loc href="http://www.ietf.org/rfc/rfc2396">RFC 2396</loc>, which is the current URI standard. <loc href="http://www.w3.org/TR/webarch">"Architecture of the World Wide Web"</loc> uses the concepts and terms defined by <loc href="http://www.apache.org/~fielding/uri/rev-2002/rfc2396bis.html">draft-fielding-uri-rfc2396bis-03</loc>, preferring them to those defined <loc href="http://www.ietf.org/rfc/rfc2396">RFC 2396</loc>. The TAG is tracking the evolution of <loc href="http://www.apache.org/~fielding/uri/rev-2002/rfc2396bis.html">draft-fielding-uri-rfc2396bis-03</loc>.</p>
      <!--
        <bibl id="RFC2255" href="http://www.iana.org/rfc/rfc2255">"The LDAP URL Format"; IETF; RFC 2255; T. Howes, M. Smith; December 1997</bibl>
        <bibl id="RFC3253" href="http://www.ietf.org/rfc/rfc3253.txt">"Versioning Extensions to WebDAV"; IETF; RFC 3253; G. Clemm, J. Amsden, T. Ellison; C. Kaler, J. Whitehead; March 2002</bibl>
        <bibl id="RFC2518" href="http://www.ietf.org/rfc/rfc2518.txt">"HTTP Extensions for Distributed Authoring - WEBDAV"; IETF; RFC 2518; Y. Goland, E. Whitehead, A Faizi, S. Carter, D. Jensen; February 1999</bibl>
         <bibl id="RFC2616" href="http://www.iana.org/rfc/rfc2616">"Hypertext Transfer Protocol - HTTP/1.1"; IETF; RFC 2616; R. Fielding, J. Gettys, J. Mogul, H. Frystyk, P. Leach, L. Masinter, T. Berners-Lee; June 1999</bibl>
        <bibl id="RFC3187" href="http://www.ietf.org/rfc/rfc3187">"Using International Standard Book Numbers as Uniform Resource Names"; RFC3187; IETF;   J. Hakala, H. Walravens; October 2001</bibl>
        <bibl id="DesignOpacity" href="http://www.w3.org/DesignIssues/Axioms#opaque">"Universal Resource Identifiers - Axioms of Web Architecture"; W3C Web Design Issues; T. Berners-Lee; Originally December 1996</bibl>
        <bibl id="ISO-ISBN">NISO/ANSI/ISO 2108:1992 Information and documentation  - International standard book number (ISBN)</bibl>
        <bibl id="XPointer" href="http://www.w3.org/TR/2003/REC-xptr-framework-20030325/">"XPointer Framework"; W3C; W3C Recommendation; March 2003</bibl>
         <bibl id="W3CTRURI" href="http://www.w3.org/2001/06/manual/#doc-id">"W3C Manual of Style", Section "Document Identification"; W3C; Ed. Susan Lesch; February 2003</bibl>

-->
    </div1>
  </body>
</spec>
