<?xml version="1.0" encoding="UTF-8"?> 
<?xml-stylesheet type="text/xsl" href="../../../doc/xmlspec.xsl"?>
<!DOCTYPE spec SYSTEM
"http://www.w3.org/2002/xmlspec/dtd/2.6/xmlspec.dtd" [ 
<!--
================================================================
--> 
<!ATTLIST spec xmlns:xlink CDATA #IMPLIED>
<!ENTITY mdash " &#8212; "> 
<!ENTITY epsilon "&#949;"> 
<!ENTITY Oacute "&#211;"> 
<!ENTITY eacute "&#233;"> 

<!-- CHANGE "SEND COMMENTS TO" ADDRESS BEFORE ANNOUNCING -->

<!ENTITY draft.day "25"> 
<!ENTITY draft.monthname "June"> 
<!ENTITY draft.year "2011">
]>

<!-- tbd: refer to Tim's 'generic resources' -->

<spec xmlns:xlink="http://www.w3.org/1999/xlink" w3c-doctype="wd" role="editors-copy"> 
  <header>

    <title> Information Resources and Web Metadata
    </title>

  <!-- 
    <w3c-designation>http://www.w3.org/TR/2009/WD-hash-in-url-20090415/</w3c-designation> 
  -->
    <w3c-doctype>Editor's Draft</w3c-doctype> 
    <pubdate> 
      <day>&draft.day;</day>
      <month>&draft.monthname;</month> 
      <year>&draft.year;</year>
    </pubdate> 

    <publoc> 
      <!-- 
      No stable URI for the version you're looking at.  When citing,
      please the above date.
       -->

      <loc href="http://www.w3.org/2001/tag/awwsw/ir/20110625/" >
        http://www.w3.org/2001/tag/awwsw/ir/20110625/
      </loc>
    </publoc>

    <prevlocs>
      <loc href="http://www.w3.org/2001/tag/awwsw/ir/20110517/" >
        http://www.w3.org/2001/tag/awwsw/ir/20110517/
      </loc>
    </prevlocs>

    <altlocs>
      <loc role="xml" href="ir.xml"
           xlink:type="simple">XML</loc>
    </altlocs>
    <latestloc> 
      <loc href="http://www.w3.org/2001/tag/awwsw/ir/latest/" 
        >http://www.w3.org/2001/tag/awwsw/ir/latest/</loc> 
    </latestloc>  

    <authlist> 
      <author>

        <name>Jonathan A. Rees
        </name> 
        <email href="mailto:rees@mumble.net"
	   >rees@mumble.net</email> 
      </author>

    </authlist> 

    <status> 
      <p>
        This report has been developed by the 
        <loc href="http://www.w3.org/2001/tag/awwsw/"
          >AWWSW Task Group</loc>
        of the
        <loc href="http://www.w3.org/2001/tag/"
          >W3C Technical Architecture Group</loc>
        in connection with
        TAG issue 57 <bibref ref="issue-57"/>,
	"Mechanisms for obtaining information about the meaning of a
        given URI".
      </p> 

      <p>
        Publication of this draft
        does not imply endorsement by the W3C Membership. This is
        a draft document and may be updated, replaced or obsoleted by
        other documents at any time.
      </p> 

      <p>
	Please send comments on this
	document to the publicly archived TAG mailing list 
	<loc
	    href="mailto:www-tag@w3.org">www-tag@w3.org</loc>
	(<loc href="http://lists.w3.org/Archives/Public/www-tag/"
	   >archive</loc>).
	<!-- 
        Please send comments on this
        document to the editor at
	<loc href="mailto:rees@mumble.net" 
	 >rees@mumble.net</loc>.
	  -->
        Much of the initial development of this document was discussed
	on the
        public-awwsw@w3.org mailing list, with archive at 
        <loc href="http://lists.w3.org/Archives/Public/public-awwsw/" 
         >http://lists.w3.org/Archives/Public/public-awwsw/</loc>.
      </p>
    </status> 

    <abstract> 
      <p>
	<!-- 
        This note provides a pragmatic treatment of the "information
        resource" abstraction that has been used in Web Architecture
        discussions as a way of explaining what dereferenceable URIs
        might name.
	It is proposed that an information resource is a formal
        construction whose logical purpose is to 
        be the subject of metadata assertions.
	Treating information resources in this way helps to
        demystify them and leads to a simple
        explanation of the resource/representation relationship.
 -->

This note considers the semantics of metadata in which the subject of
the metadata (the "data") is specified using a URI that may be
dereferenced on the Web.  This situation is complicated in that agents
might obtain different information on different dereference
operations, raising the question of what the subject of the metadata
is, and what might be true or not of it.  
      </p>
      <p>
It is proposed that the practical purpose of the
"information resource" abstraction in Web architecture
is to supply suitable subjects for
this kind of metadata.

	Relating information resources to metadata in this way makes
	concrete the value proposition for the rule that a URI
	should name the information resource related to dereference of
	that URI.  It is hoped that this analysis will be of use in
	future work aimed at strengthening or modifying consensus
	around this rule.
      </p>
    </abstract> 

    <langusage> 
      <language id="en-US">English</language> 
    </langusage>

    <revisiondesc> 
      <p>
        <ulist> 
          <item>
            <p>$Id: ir.xml,v 1.2 2011/06/25 13:53:04 jrees Exp $
            </p>
          </item>          
        </ulist> 
      </p> 
    </revisiondesc> 
  </header>

  
  <body> 

    <div1 id="intro">
    <head>Introduction</head>
    <p>
      It is common to say things like "the title of http://example/hen 
      is 'Trouv&eacute;e'", or, in a machine-readable language such as
      Turtle, 
    </p>
    <eg>
    &lt;http://example/hen> dc:title "Trouv&eacute;e".</eg>
    <p>
      with the intent of saying something about what you get from dereferencing
      the URI 'http://example/hen'.
      This manner of speaking is mysterious in two ways.  First,
      dereferencing this 
      URI might yield different results at different times or at the
      same time to different clients.
      There may be differences in layout, format, or content as the
      host improves its site or adapts to client preferences.
      Because there is variability in what you get,
      it may be that some results may have that title, while others
      don't.  Is this a problem?  If not, why not?
    </p>
    <p>
      Second, the statement suggests that there is something that has
      that title - a thing that the URI refers to.  What is
      the nature of that thing and what can we say about it?  Is it some
      particular dereference result, or some
      other kind of entity that is somehow related to all dereference
      results?
    </p>
    <p>
      This note is a <emph>post hoc</emph> rational reconstruction of
      Web metadata intended to answer these questions.  It proceeds in
      three stages. 
      First, the idea of generic entities that have metadata is
      introduced, without any particular reference to the Web.
      Second, it is suggested that there are generic entities
      on the Web associated with URIs.  Third, it is suggested that
      while these entities are fundamentally independent of their names, 
      it is useful to name them using
      the URIs with which they're associated, as opposed to
      some other kind of name.
    </p>
    <p>
      We are using "Web metadata" as a shorthand to describe a
      particular situation. 
      There is much metadata on the Web for which
      attention to complications introduced by URI dereference is not
      relevant, including embedded metadata (e.g. XMP) 
      and traditional bibliographic records.  These other aspects of
      metadata on the Web will not be covered in this note.
    </p>

    </div1>

    <div1 id="metadata">
    <head>Generic metadata</head>
    <p>
      Metadata is data about data or information about
      information.<footnote>Metadata as "data about data"
      (and not about some other kind of thing) is the conventional
      dictionary definition and matches the use of the term in
      information science.  Sometimes the word is (ab)used as a
      synonym for "data" (about something).
      This alternative usage will be avoided.
      </footnote>
      Typical metadata includes information about some information
      entity's content (title, word count, topic, format, language, etc.) and
      provenance (author, publisher, publication date, revision
      history, etc.).<footnote>
      "Entity" is being used in the dictionary sense, not in the HTTP
      or XML sense - the purpose of the word is merely to convert
      "information" from a mass noun 
      to a count noun.</footnote>
      Because metadata is information about information,
      it might be stated of any kind of information entity, such
      as a document, image, or audio recording.
    </p>

    <p>
      The same metadata may apply to multiple information entities, as
      when an HTML document and a PDF document both have the same
      title, author, date, word count, topic, and so on as a
      consequence of having been generated from a common source.  It
      will be useful to have a term to apply in the situation where
      metadata does not explicitly specify a particular subject, so define a
      "metadata predicate" to be metadata of this sort.<footnote>One
      might ask, are
      there predicates that aren't metadata predicates?  
      Most of the predicates one might think of in this context, such
      as those formed using the Dublin Core, FOAF, and RDFS
      vocabularies, are metadata
      predicates, and they are closed under boolean combinations. However,
      to make the theory consistent, it is necessary to exclude certain
      predicates such as "is a specific information
      entity".  Future work along these lines ought to include a
      rigorous definition of "metadata predicate".</footnote>
      In this case we would have a metadata predicate that is true of
      documents that have a particular title, author, and so on (whatever
      is common to the HTML and PDF versions), while the metadata
      predicate "is an HTML document" would be true of one format but not the
      other.
    </p>

    <p>
      The situation where collections of information entities are
      related to one another in some way (e.g. via revision,
      translation, or reformatting) is quite common.  People often
      play a grammatical trick in this situation, where a class of
      related entities is treated as if it were a single generic
      entity.  For a non-information example, we might say "the tapir
      has a prehensile snout" referring not to an individual tapir but
      to tapirs in general.  If there were a tapir in front of us the
      statement would indeed be true of that specific tapir, but "the tapir"
      refers not to that tapir but to a "generic tapir".  
      The generic tapir might be said to "generalize" the specific one.
    <p>

    </p>
      Similarly,
      if we say "Elizabeth Bishop wrote that poem about a hen"
		then
      "that poem about a hen"
      refers not to some specific information
      entity with a definite length, layout, and format, but to a
      class of information entities that have in common, among other 
      things, that they're
      by Elizabeth Bishop and are poems.  The specific
      entity that I 
      read and the one that you read may differ, but if so it will be
      in ways that are not important to what we're talking about.
      (See <bibref ref="GR"/>.)
    </p>

    <p>
      The reason we consider these generic entities to exist is so
      that we can say things about them as if they were specific -
      i.e. so that we can apply predicates to them - and avoid the
      need to express a universal quantification ("every tapir")
      explicitly.  
      A metadata predicate therefore holds of a generic
      information entity when, and only when, it holds of the information
      entities that the generic information entity generalizes.
    </p>

    <p>
      Put formally, if M[] is a metadata predicate and
      G is a generic information entity,
    </p>

    <olist>
      <!-- <item>  -->
        M[G] if and only if {M[S] for all S such that G generalizes S}.
      <!-- </item>  -->
    </olist>

    </div1>

    <div1 id="web">
    <head>Web metadata</head>

    <p>
      We now relate this idea to the Web.  The Web works as follows: A set
      of governing specifications (<bibref ref="3986"/>, etc.) and
      namespaces (e.g. DNS)
      "authorize" servers and APIs to yield certain 
      "representations"<footnote>Following <bibref ref="3986"/>,
      "representation" is used
      to mean content (an octet sequence) tagged with media type and perhaps
      other information meant to guide interpretation of the content.
      "Representation" is used here as a term of art; these representations
      don't necessarily "represent" anything at all.
      </footnote>
      (specific information entities) in response to
      requests to dereference
      a given URI.  Let's say that in this situation a representation is
      "authorized for" the URI.  This formulation is neutral
      with regard to protocol, but HTTP is an important point of reference:
      With a properly functioning infrastructure, an HTTP request GET U will
      yield a 200 OK response carrying representation Z only when Z is
      authorized for U.
    </p>

    <p>
      When only one representation is authorized for a URI, a server,
      cache, or API will yield that representation (or fail to yield any).
      The set of authorized representations may vary over time.
      Application scenarios in which multiple representations are authorized at
      one time for a single URI include content
      negotiation variants (such as versions in multiple language),
      representations that vary depending on user identity or session state, or
      overlapping cache lifetimes (Expires:) for different versions of a
      changing document.
    </p>

    <p>
      The following defines what it means for a generic information entity
      to be "on the Web" at a given URI:
    </p>

    <ulist>
       G is "on the Web" at U means that U's authorized representations
       are exactly those representations that G generalizes.
       <!-- 
       <footnote>
       To avoid potential conflicts with published definitions of "has
       representation" that
       may be incompatible with what we're calling "generalizes",
       we'll continue to say that an information resource "generalizes"
       a representation, instead of the more familiar 
       expression where an information resource "has" a representation.
       </footnote>
       -->
    </ulist>

    <p>
      We take as axiomatic that for any nonempty class of representations
      there is a generic information entity that generalizes those and only those
      representations.  This lets us say:
    </p>

    <ulist>
       For any URI U having authorized representations, there is an
       generic information entity G such that G is on the Web at U.
    </ulist>

    <p>
      Now where does this get us?  To say that any representation retrieved
      from "http://example/hen" has (or will have) "Trouv&eacute;e" as its title,
      we can write (in Turtle <bibref ref="turtle"/>)
    </p>

    <eg>
    [ir:onWebAt "http://example/hen"] dc:title "Trouv&eacute;e".</eg>

    <p>
      (where ir:onWebAt is the name for the "on the Web at" property in some
      yet-to-be-standardized vocabulary).
      This is a useful thing to say, since it is predictive: It tells
      someone that if they dereference that URI, they will get
      something with that dc:title.  They may not see the exact same
      representation that the
      agent who wrote the metadata saw, but it will be close enough that the
      metadata still applies.
    </p>

    <p>
      The agent that authorizes representations for a URI is in a good
      position to write metadata relating to that URI, since they can
      ensure that the metadata is true for any representation they
      authorize.  On the 
      other hand, other agents can be correct in writing
      metadata, if they know something about how the controlling agent
      manages its namespace (web site).  Guaranteed correctness is not
      always necessary, however, and metadata may just express a
      reasonable or useful belief.  One can be confident when there is a
      credible and irrevocable public commitment regarding authorized
      representations, as 
      there is for, say, the data: URI scheme, but the
      representations authorized for http: scheme URIs,
      as the http: scheme is currently formulated,
      ultimately
      depend on those institutions such as ICANN that in practice
      control domain names, 
      making such all statements of metadata contingent.<footnote>
      http: metadata is in this sense no different from any other
      objective statement of what the world is like.
      A Web metadata assertion is checkable, which gives it great utility.
      But it is only checkable in Popper's sense that any set of
      experiments can only corroborate or falsify it, not prove
      it.</footnote>
    </p>

    <p>
      The following diagram illustrates the various entities involved
      and their relationships.  Dashed lines indicate relationships
      that are equivalent to universally quantified statements.
    </p>
	<graphic source="generic.png"
		 alt="Relationships among URI, IR, representations, metadata"/>

    </div1>

    <div1>
    <head>Naming information entities</head>
    <p>
      A common practice is to use an absolute URI as a name for a
      (generic) information entity that is on the Web at that URI.
      This practice is
      parsimonious: It would be more complicated than necessary for a
      single URI to be used on the Web in one way, and to name in
      another way.
      If this is done for the above example, we would write
    </p>

    <eg>
    &lt;http://example/hen> dc:title "Trouv&eacute;e".</eg>

    <p>
      to give the title of the information entity on the Web at
      'http://example/hen'.
      Because using URIs like this is common &mdash; some might say
      obvious &mdash; 
      practice, such a statement is often 
      understood, without 
      further explanation, as saying something about representations
      retrieved using the given URI.
    </p>

    <p>
      However, use of URIs in this way is
      not a foregone conclusion.  Should there be any doubt
      as to whether the URI will be understood in this way, one might write 
    </p>

    <eg>
    &lt;http://example/hen> ir:onWebAt "http://example/hen".
    &lt;http://example/hen> dc:title "Trouv&eacute;e".</eg>

    <p>
      to be explicit about what one means.
    </p>

    <p>
      In the event that the URI is unavailable to name the
      information 
      entity because it is already used to name 
      something else, then some other name can be used
      to refer to an information entity on the Web at that URI.
      In Turtle, this could be blank node notation such as
      [ir:onWebAt "http://example/hen"],
      or a different URI:
    </p>

    <eg>
    :poem ir:onWebAt "http://example/hen".
    :poem dc:title "Trouv&eacute;e".</eg>

    <p>
      Whether we can expect in general that a dereferenceable URI
      will be understood as a name for a (generic) information entity on the
      Web at that URI
      is the essence of the heated httpRange-14 debate 
      <bibref ref="issue-14"/>, which
      is essentially a turf war over use of the URI namespace.  Those who
      consider it important to write Web
      metadata have an interest in the manner described above, since it
      gives obvious names to entities on the Web and therefore 
      an easy way to
      say things about them.<footnote>The httpRange-14 rule as 
      stated in the TAG's resolution <bibref ref="issue-14-resolved"/> is 
      weaker than it needs to be in order to be practically useful.  It only
      says that a 200 response implies that the resource is an
      information resource; it doesn't say <emph>which</emph>
      information resource it is, so you could follow the letter of
      the rule and 
      end up with a URI naming an information resource that bears no relation to
      what is obtained by dereferencing the URI.
      Fortunately the resolution seems to be implicitly understood as
      meaning that the URI "identifies" the information
      resource whose associated representations were the ones coming
      from dereferences of that URI.
      It is likely that the authors of the resolution
      considered 
      it so obvious that the URI would "identify" that information
      resource, and not some other one, that it 
      didn't occur to them to specify this.  Nevertheless the wording
      has led to an unfortunate focus on the distracting and
      unimportant question of 
      whether something <emph>is</emph> an information resource, as
      opposed to the consequential question of which resource (of
      whatever kind) is named.
      </footnote>
      Those who don't care about talking about the Web in this way may see an
      opportunity to put the URIs in question to
      uses better suited to their applications.  
      If the httpRange-14 rule is not generally
      respected, then the meaning of <emph>all</emph> dereferenceable
      URIs will be put in doubt, and
      new notational conventions for metadata similar to the
      above constructions using ir:onWebAt will have to be instituted
      for use in potentially all Web metadata.
    </p>
    </div1>

    <div1>
      <head>Information resources</head> 

      <p>
	We can say that "information resource" (the conventional term in
	Web architecture) subsumes "generic information
	entity" as above.  In order to account for what happens on
	the Web, we would need to be able to further
	distinguish between information resources that differ only in
	the circumstances in which the representations that they specialize
	are authorized.  For example, when a login procedure establishes
	a local context, a representation might be authorized in one login
	session and not in another.  An information resource
	generalizing exactly the same representations but authorizing
	them in different login sessions would be considered a
	different information resource.
      </p>

      <p>
	This definition of "information
	resource" is not the same as the mysterious one found in
	<bibref ref="webarch"/>, but it may serve better in many of
	the contexts in which the term "information
	resource" is currently used.
      </p>

    </div1>

    <div1>
      <head>References</head> 
      <blist> 

        <bibl id="issue-57"
              href="http://www.w3.org/2001/tag/group/track/issues/57">
          <titleref href="http://www.w3.org/2001/tag/group/track/issues/57"
           >Issue-57: Mechanisms for obtaining information about the meaning 
             of a given URI</titleref>.
          W3C Technical Architecture Group, 2007-2011.
        </bibl> 

	<bibl id="GR"
	      href="http://www.w3.org/DesignIssues/Generic.html">
	  Tim Berners-Lee.
	  <titleref href="http://www.w3.org/DesignIssues/Generic.html">
	  Generic resources</titleref>.
	  Design note, 2006-2009.
	</bibl>

        <bibl id="3986"
              href="http://www.ietf.org/rfc/rfc3986.txt">
          T. Berners-Lee, R. Fielding, L. Masinter.
	  <titleref href="http://www.ietf.org/rfc/rfc3986.txt"
           >Uniform Resource Identifier (URI): Generic Syntax</titleref>.
          RFC 3986, IETF, 2005.
        </bibl> 

        <bibl id="webarch"
              href="http://www.w3.org/TR/webarch/">
          Ian Jacobs and Norman Walsh, editors.
	  <titleref href="http://www.w3.org/TR/webarch/"
           >Architecture of the World Wide Web, Volume One</titleref>.
          W3C Recommendation, December 2004.
        </bibl> 

	<bibl id="turtle"
	      href="http://www.w3.org/TeamSubmission/2011/SUBM-turtle-20110328/">
	  David Beckett and Tim Berners-Lee.
	  <titleref href="http://www.w3.org/TeamSubmission/2011/SUBM-turtle-20110328/"
	   >Turtle - Terse RDF Triple Language</titleref>.
	  W3C Team Submission, 2011.
	</bibl>

        <bibl id="issue-14"
              href="http://www.w3.org/2001/tag/group/track/issues/14">
          <titleref href="http://www.w3.org/2001/tag/group/track/issues/14"
           >Issue-14: What is the range of the HTTP dereference 
	   function?</titleref> 
          W3C Technical Architecture Group, 2002-2005.
        </bibl> 

        <bibl id="issue-14-resolved"
              href="http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039.html">
          Roy Fielding.
	  <titleref href="http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039.html"
           >[httpRange-14] Resolved.</titleref>
          Email to www-tag list, 2005.
        </bibl> 


	<!-- 
        <bibl id="reach"
	      href="http://www.w3.org/2000/10/swap/doc/Reach.html">
	  Tim Berners-Lee.
	  <titleref href="http://www.w3.org/2000/10/swap/doc/Reach.html"
	   >Reaching out onto the Web</a>.
	  W3C, Semantic Web Application Platform project, 2006.
        </bibl> 
	-->

      </blist>
    </div1>

    <div1>
      <head>Acknowledgments</head>
      <p>
        David Booth, Harry Halpin, Michael Hausenblas, Nathan Rixham, and
        Alan Ruttenberg contributed to
        the creation of this note.  Thanks to Taylor Campbell and
        Stéphane Corlosquet for
        comments on drafts.
      </p>
    </div1>

  </body> 
</spec>

  <!-- 
_____________________________________________________________________________
    <p>
      The following amendments to the TAG's httpRange-14 rule would
      clarify it and give it useful and reliable meaning:
    </p>

    <olist>
      <item>
	Generalize from HTTP 200 responses to dereference over any protocol.
      </item>

      <item>
	Specify that when a URI dereferences, the URI "identifies" not just
        any information resource, but an information resource that
        is "on the Web" at that URI.
      </item>

      <item>
        Clarify that 303 and 2xx should not both be used for the same URI.
      </item>
    </olist>


{foo
Define a "representation" as in
RFC 3986, an octet sequence plus some type information (media type).
Define an "information resource" to be any information entity (generic
or specific).  [!! that is not a representation]
}

    <div1 id="new">
      <head>Metadata and the Web</head>
      <p>
        The World Wide Web is described as a network
	of linked resources.  Resources are named by URIs so that they
	can be accessed and linked to.  When one accesses a resource
	one obtains associated digital content: a document,
	image, audio recording, or similar entity capable of being
	copied between computers.  By tradition we call a unit of 
	digital content
	a <emph>representation</emph>, with the understanding that
	this is a
	term of art without any implication of anything representing
	anything else.
      </p>
      <p>
        Representations are concrete, in that they can be copied
        and inspected via computational processes.  They are amenable to
        description using conventional
        metadata vocabularies such as Dublin Core [tbd: bibref] or 
	FOAF [tbd:bibref].  However, 
	the nature of resources and their "association" with
	representations is unclear.  
      </p>
      <p>

WORK IN PROGRESS...

U is a (the) web-name for A.
  Axiom: {is the web-name of} is functional (but partial).


If U dereferences to R, and U web-names A, then R is associated with A.

How do you know whether U is a name for A?

How do you know whether there is a link from A to B?

Resource A is linked to resource B if a representation assoc. with A contains a
reference to B, using B's URI.

        The nature of resources and their "association" with
	representations has been a constant source of confusion
	in Web architecture discussions.
	A common case is where digital content is made to be part of
	the Web.  The resource is then the content, and it is
	associated with itself (or something equivalent to itself).

	When there is only one associated representation
	the resource and associated representation

	Usually content is tagged to
	indicate its type ("media type") and similar
	information such as language.

      </p>
      <p>
        Such simple resources, which we'll call "tagged content
        things" or TCTs, are amenable to  
        description using conventional
        metadata vocabularies such as Dublin Core [tbd: bibref] or 
	FOAF [tbd:bibref].  For example,
        if :c is a TCT, one might write (in Turtle)
      </p>
      <eg>
        :c dc:title "Southern Pierids in New England". </eg>
      <p>
        to say that :c has the given title.  Whether or not :c
        actually has that title can be determined by inspection, which
        one does by accessing :c using its URI 
	(i.e. by dereferencing its URI).
      </p>

      <p>
        Not all resources are this simple, however.
	The Web is implemented as a set of servers that respond to
        requests to access resources by name.  It was realized early in the
        Web's history that it is useful, for a variety of reasons, for
        different requests
        using a single URI to yield distinct TCTs
        (e.g. documents with differing content).
	For example, one might dereference
	'http://languagelog.ldc.upenn.edu/nll/' obtaining :c1 (a TCT),
        then dereference the same URI obtaining :c2 (another TCT),
	such that
      </p>
      <eg>
        :c1 dc:creator :MarkLiberman . </eg>
      <p>
        but not
      </p>
      <eg>
        :c2 dc:creator :MarkLiberman . </eg>
      <p>
        The explanation is that something changed
	such that the resource named by
	'http://languagelog.ldc.upenn.edu/nll/' has
        different associated TCTs at different times.
        (Change over time is not the only source of variation.
	It's easy to imagine cases in which content
        negotation by language or media type, variation between sessions
	for different users, or a source of randomness
	leads to associated TCTs differing in title, topic,
        or other properties.)
      </p>
      <p>
        This says nothing about the nature of non-simple
	resources or the nature of their association with TCTs.
	However, it is common practice to ascribe metadata to
        resources
	even when they have multiple associated TCTs with different 
	metadata.  For example, if :r is the resource
        named by 'http://languagelog.ldc.upenn.edu/nll/', one might
        write
      </p>
      <eg>
        :r dc:title "Language Log". </eg>
      <p>
	even if the values of the dc:creator property vary among
	associated TCTs.
	Logically this seems to risk confusion and contraction.
	However,
	such metadata is generally written and understood without
        difficulty. Why?
      </p>
      <p>
        The answer is that one tends to assert metadata 
	for a resource if the metadata is invariant across the
	resource's associated TCTs - 
	that is, if one is confident that someone 
        accessing the resource will obtain a TCT that has
        that metadata.  If it is possible, for some future TCT :t
        obtained by accessing :r, that :t is <emph>not</emph> a dc:creator of
        :MarkLiberman, then one simply doesn't write :r dc:creator
        :MarkLiberman.
      </p>
      <p>
	Because it is impossible to check all future associated TCTs,
	metadata for a non-TCT can be corroborated or
	falsified, but never definitely established.
      </p>


      <example>
      <head>Sample information resource</head>
	<graphic source="ir.png"
		 alt="Relationships among URI, IR, versions, metadata"/>
	<p>
	  In this example, the URI 'http://languagelog.ldc.upenn.edu/nll/'
	  dereferences on two different occasions to two different
	  TCTs (tagged-content-things). 
	  (Perhaps the document was edited, or is available in two
	  different languages.)
	  These tagged-content-things are 
	  of the resource named 'http://languagelog.ldc.upenn.edu/nll/'.
	  "Language Log" is the title of both TCTs.  If "Language Log" is a
	  topic of <emph>every</emph> TCT of
	  the resource, it will also be considered
	  a topic of the resource itself.
	</p>
	<p>
	  Dashed lines indicate relationships that are induced by
	  circumstances.
	</p>
      </example>
      <p>
	TBD: When the URI does not name the IR (in RDF). IRs not on the web.
	AWWW's definition.  Interpretation of httpRange-14.
      </p>
    </div1>


    <div1>
      <head>Association</head>
      <p>
        We have been intentionally vague about the "associated with"
        relationship, which is key to this framework.  What kinds of
        things can have associated TCTs?  How do we know whether a
        given TCT is associated with a particular information resource?
      </p>
      <p>
        Rather than attempt to answer this directly, we'll simply
        say that two constraints govern the "associated with"
        relationship, whatever it is.
	These are the metadata contagion rule introduced above, and
        the equation of association with access on the Web.
      </p>
      <div2>
        <head>Metadata contagion</head>
	<p>
	  Formally, the property given above relating 
	  resource metadata 
	  to metadata for all of its associated TCTs
	  can be stated as follows: 
	  Suppose M[] is a unary metadata predicate, and r is a
	  resource.  Then 
	  M[r] 
	  if and only if 
	  {M[c] for every TCT c associated with r}.
	</p>
	<p>
  	  That is, a resource 
	  that is associated with some set of TCTs has to be something
	  to which this set's invariant metadata applies.  For example,
	  if a cat were a resource, and it had associated TCTs that
	  all shared a title "Poker for cats", then the cat would also
	  have to have that title.  If cats cannot have titles, then one
	  of these premises (such as the association of the TCTs with
	  the cat) would have to be false.
	</p>
      </div2>

      <div2>
        <head>Dereference implies association</head>
	<p>
	  We have also been vague about "naming".  Different things have
	  different names under different naming systems and contexts,
	  so it does not 
	  make sense to speak of "the name" or "the URI" of a resource.
	  However, the resource naming system that the Web uses
	  is of special interest.  If a URI u names a resource r on
	  the Web, write r = WR(u).  For example,
	  AV('http://languagelog.ldc.upenn.edu/nll/') would be the
	  resource 
	  whose associated TCTs include those that are obtained by
	  dereferencing 'http://languagelog.ldc.upenn.edu/nll/'.
	  Since resources are on the Web in
	  order to be accessed, this is the same as saying that r is
	  accessed by dereferencing u.
	</p>
	<p>
	  To say that dereference implies association is to say that
	  for each URI u and resource r, if r = WR(u), and u
	  dereferences to c, then that c is associated with r.  That
	  is, the resource WR(u) includes among its associated TCTs
	  any TCT you can get by referencing u.
	</p>
	<p>
	  There is no reason to think that all information
	  resources have Web URIs (i.e. are WR(u) for some u), and good
	  reason to suppose that many of them don't.
	</p>
      </div2>
    </div1>


    <div1>
      <head>Natural history of information resources</head>
      <p>
	The following explains the particular theory of "information
	resources" assumed in this report.  The theory is
	independent of how one refers to information resources.
	More elaborate theories
	are certainly possible, but this is all we <emph>need</emph> to
	assume in order to explain how they work and what they are good for.
      </p>

      <p>
	Each information resource has one or more
	associated <emph>versions,</emph> where each version 
	is a <emph>tagged-content-thing,</emph> consisting of
	fixed content (octet
	sequence) and additional information (media type, language)
	affecting the interpretation of the content.
	Different versions may be appropriate at different times 
	or in different interaction contexts.
	No particular meaning is implied by the word "version;" the
	word is chosen as suggestive of its most common use.
      </p>

      <p>
	Metadata statements such as those giving authorship, title,
	and topic are true or false of tagged-content-things
	in the obvious way &mdash; they are true according to the
	content, its interpretation, or its provenance.
	Such statements
	also apply to arbitrary information resources in a systematic way,
	as follows: If
	a statement is true all versions of the
	information resource, 
	then the statement should be taken as true of the information
	resource, and vice versa.
      </p>

      <p>
	Operationally, this means that if you have knowledge of 
	an information resource's versions, you can write metadata using
	the information resource as subject, and someone reading this
	metadata can then apply that metadata to whatever version
	they access.
      </p>

      <p>
	An information resource need not be accessible via a URI, or
	even have any associated URI at all.  An information resource
	might exist only inside a local file system or database, or it
	might be ephemeral.
      </p>

    </div1> 

    <div1 id="ir-ref">
      <head>Using a URI to refer to the information resource
	    accessible via that URI</head>
      <p>
	To refer to the information resource accessible via a
	URI when that URI is dereferenceable, one generally uses the
	URI itself.
	E.g. 'http://example/ir' refers to IR('http://example/ir'),
	if 'http://example/ir' is dereferenceable.
	One might use such a URI in a
	metadata statement, for example: "The creator of
	http://example/ir is Carol", 
	or, expressed equivalently in Turtle,
      </p>
      <eg>
	&lt;http://example/ir> dc:creator "Carol". </eg>

      <p>
	If one wants to refer to an information resource, 
	but it isn't accessible via any URI, one might choose a URI,
	publish the information resource's versions
	at that URI, and then use the URI to refer to the
	information resource.
      </p>

      <p>
	An agent who encounters a URI and wants to know what the URI means
	can dereference it, and if the
	dereference is successful (HTTP 2xx status as opposed to 303 or 404 or
	anything else),<footnote>
	  Simple redirects (301, 302, 307) are generally taken as
	  transparent with respect to dereference, but this is a
	  side issue that we don't want to take up in this report.
	</footnote>
	the agent can take the URI
	to be a reference to the information resource 
	that is accessible via that 
	URI.<footnote>
	  The "u refers to IR(u)" convention is a common and intuitive interpretation of
	  the HTTP specification and is in widespread
	  use.  In 2005 the W3C TAG confirmed this interpretation 
	  (in contrast to "IR(u) defines u") in 
	  its "httpRange-14 resolution"
	  <bibref ref="issue-14-resolved"/>.
	</footnote>
      </p>

    </div1>

      <eg>
        &lt;http://www.w3.org/History/1989/proposal.html>
	  dc:creator &lt;http://www.w3.org/People/Berners-Lee/card#i>. </eg>
      <p>
        to say that a node, which we are calling
	'http://www.w3.org/History/1989/proposal.html',
	has, as its dc:creator, a person, who we are calling
	'http://www.w3.org/People/Berners-Lee/card#i' .
      </p>

      <p>
	Let's write HN(u) = the node in the hypertext network that has the
	URI u as its name, if there is one.
        If we interpret &lt;http://www.w3.org/History/1989/proposal.html>
	to be HN('http://www.w3.org/History/1989/proposal.html'), then
	this statement would be justified by Tim Berners-Lee
        being the dc:creator of that node.  That is, because the node
	is a tagged-content-thing, one can inspect the content, and
        check using any available method whether Berners-Lee dc:created it.
        The meaning of the metadata assertion is determined by the
        content in the obvious way.
      </p>
      <p>
        The Web does in fact include simple nodes of this sort, but it
	is in general much richer than this, and a direct URI/content
	association is untenable.
	Dereferencing a URI can yield 
        different tagged-content-things depending
        on a wide variety of variables including time, preferred
        language, user-agent, login session, server IP address
	&mdash; 
	in fact just about
	anything, at the whim of the domain administrator.  This calls
	into question just what is meant by HN(u) - if not all nodes
	are tagged-content-things, what are the ones that aren't?
      </p>
      <p>
        It's not obvious that this question requires an answer, but 
	metadata practice seems to require that some explanation of
	these entities be
	provided.  What has happened is that people 
        continue to write metadata <emph>as if</emph> metadata
	subjects were tagged-content-things.  For
        example:
      </p>
      <eg>
        &lt;http://www.w3.org/TR/sparql11-query/>
	  dc:creator &lt;http://www.w3.org/People/Eric/ericP-foaf.rdf#ericP>.
      </eg>
      <p>
        For the node HN('http://www.w3.org/TR/sparql11-query/'), we
        retrieve different tagged-content-things at different times,
        because there are different versions of the document at
        different times.
        Clearly the metadata statement is supposed to have something
        to do with these 
	tagged-content-things, but
	what if some of them have Eric as a dc:creator, and some don't?
        Would the truth of the statement be judged
	against <emph>all</emph> of the tagged-content-things,
	against <emph>one</emph> of them, or against
 	<emph>some</emph> of them?  Or should the statement be
 	considered meaningless? 
      </p>
      ...
      <p>
        We apply the term <emph>information resource</emph> to this
        expanded class of hypertext nodes including
        these variable entities, with simple tagged-content-things as
        a special case.
	It is proposed that
        the safest, most natural, and most useful semantics 
	of metadata for information resources
	is universal quantification over its present and future
        tagged-content-things.  This is because,
	  when someone writes a metadata formula, they cannot
 	  usually anticipate which particular tagged-content-thing 
 	  someone who reads their formula will access.  Therefore it
 	  is prudent to write only formulas that
 	  will be true regardless of what anyone will encounter later
 	  on.
      </p>
 (named by URIs such as 
	'http://languagelog.ldc.upenn.edu/nll/')
  (A retrieved TCT,
	when it isn't what's named by a URI,
        ordinarily goes nameless and is therefore difficult to 
	specify, at least in RDF.)

	<p>
	  mechanism of the Web does suggest that there are
	  URI that dereferences to at least one TCT, there is a particular
	  resource that has as its associated TCTs the TCTs that one gets by
	  dereferencing that URI.  For each dereferencable URI u,
	  define AV(u) be the 
	  resource defined in this way.  
	</p>
	<p>
	  Because "association" is meant to idealize access,
	  

      -->
