Re: "tdb" and "duri" URI schemes... from Jonathan Rees on 2010-11-02 (www-tag@w3.org from November 2010)

From: Jonathan Rees <jar@creativecommons.org>
Date: Tue, 2 Nov 2010 14:18:46 -0400
To: Larry Masinter <masinter@adobe.com>
Cc: "www-tag@w3.org" <www-tag@w3.org>
Message-ID: <AANLkTimwpfjD3Lez+ZyW+ckWQuJa2+Xke2WbnYkDDZeM@mail.gmail.com>
On Tue, Nov 2, 2010 at 11:38 AM, Larry Masinter <masinter@adobe.com> wrote:
> This idea has been bouncing around for such a long time,
> but I updated the document
>
> http://tools.ietf.org/html/draft-masinter-dated-uri-07
>
> based on comments.
>
> While this isn't posed as a "TAG" submission, since the
> TAG has been discussing persistence for a long time,
> are there any changes you think I should make (references,
> discussions, etc.) I should make before asking for this
> to be published?
>
> Larry
> --
> http://larry.masinter.net

A few comments: (your draft is indented, my comments outdented)

I hope that you will be coordinating with others who are
working on similar issues.

Somewhere you need to include a warning that two clients can observe
completely different content for the same resource, at exactly the
same time.  Time is not adequate to "identify" anything about what
anyone actually observed since it potentially depends on all the
details of the observation (IP address, cookies, which physical server
responded, etc.).  All a DURI really says is that someone observed
something at the given URI at a certain time - and this has to be
taken on trust.


  Network Working Group                                        L. Masinter
  Internet-Draft                                                     Adobe
  Intended status: Informational                          October 22, 2010
  Expires: April 25, 2011


    The 'tdb' and 'duri' URI schemes, based on dated URIs
   draft-masinter-dated-uri-07

  Abstract

     This document defines two URI schemes.  The first, 'duri' (standing
     for "dated URI"), allows indicating a URI as of a particular date
     (and time).

It is not the URI that is "of a particular date".  Rather it is
(according to your URI/resource theory articulated in RFC 3986) the
binding of the URI to a resource and the condition (state, whatever)
of the resource to which the URI is bound.  I think you should say
"indicating a resource as of a particular date" since that can be read
as covering both bases.

    This allows explicit reference to the "time of
     retrieval", similar to the way in which bibliographic references
     containing URIs are used.

I would say "are written", not "are used".

While many people will know what you're talking about, some of your
audience won't.  You need to either remove the "similar to" or expand
on it.

     The second scheme, 'tdb' ( standing for "Thing Described By"),
     provides a way of using a way of minting URIs for anything that can
     be described,

Please don't propagate this "anything that can be described" meme.
It's a silly and meaningless distinction.  Just say "anything" or "any
resource".

     with the ability to fix the description to a given date
     or time.

I would not put the time-binding feature directly into tdb:, but
rather use an orthogonal design that uses the time-binding ability of
the subject URI.  That is, write tdb:duri:...  and put the time in the
DURI.  Then, if the resource is already dated (via URI scheme, or some
other "persistent" method) there's no need to repeat the time in the
tdb:.  This orthogonality also simplifies the specification.

        The 'tdb' URI scheme may reduce the need to define define
     new URN namespaces merely for the purpose of creating stable
     identifiers for concepts or abstractions: it provides a ready means
     for identifying "non-information resources" by semantic indirection
     -- a way of creating a URI for anything.

Have you checked this hypothesis out with anyone who has created or is
thinking of creating a URN namespace?  Remember that URN namespaces
give something that duri: and tdb: don't, which is a registry (stored
in the RFC corpus) of namespace definitions maintained by IETF.  As
this is the primary value proposition of URNs, I think it unlikely
that a URN customer would forego it.

  1.  Overview and Requirements

     The URI schemes defined here address several related problems:

  1.1.  Persistent identifiers

     [RFC1737] defines several requirements for Uniform Resource Names.
     In particular, it requires "persistence":

 Persistence: It is intended that the lifetime of a URN be
 permanent.  That is, the URN will be globally unique forever, and
 may well be used as a reference to a resource well beyond the
 lifetime of the resource it identifies or of any naming authority
 involved in the assignment of its name.

Interesting, hadn't heard that definition attached to that term before

     Many people have wondered how to create globally unique and
     persistent identifiers.  There are a number of URI schemes and URN
     namespaces already registered.  However, an absolute guarantee of
     both uniqueness and persistence is very difficult.

Nay, impossible.

     In some cases, the guarantee of persistence comes through a promise
     of good management practice, such as is encouraged in "Cool URLs
     don't change" [COOL].  However, relying on promise of good management
     practice is not the same as having a design that guarantees
     reliability independent of actual administrative practice.

Adhering to the design would itself be an "administrative practice".
This is a difference of degree, not of kind.  Your point is a good one
but it needs to be said in a way that does not oversell it.

     A primary design goal for URIs is that they are intended to mean the
     same thing, no matter in what context they appear: a "Uniform" way to
     Identify a Resource.  However, even when URIs have Uniform meaning
     from the point of view of the source of the reference, they don't
     guarantee stability over time.  Despite best efforts and intentions,
     identifying information can change in unpredictable ways: domain
     names can disappear or be reassigned, name assigning organizations
     can change structure, responsibility, disappear, merge, or change in
     unpredictable ways.

Again, the point is good, but it could be expressed better.  You don't
define "uniform meaning" or "stability" or "identifying information".
I don't know what you mean by "from the point of view of the source"
-- are you referring to Humpty Dumpty's view that a word means
precisely what he wants it to mean?

I think that again you are trying to simplify by sidestepping the 3986
factoring of URI -> representation into URI -> resource ->
representation.  Simplification is good, but in this case it will just
amplify the current confusion around this point.  Maybe there is a way
to replace 3986 with a simpler theory and eliminate the intermediary
"resource", but this document is not the place to do it.

(I'm not suggesting you use the term "representation", or not.)

The way to go with duri: is to say that the duri: "identifies" a
resource that is the condition (or state) of the subject resource at
the given time.  duri:T:X is this resource: "What the resource
identified by 'X' at time T was like at time T".  The dual "at time T"
is needed because there are two time-varying mappings at play, one
from URI to resource and another from resource to "representation" (or
other time-dependent properties, for non-"informationresources").

     There is a significant dependence in the interpretation of many URNs
     with the concept of "naming authority".  The authority is presumably
     some individual or organization both to insure uniqueness of
     assignment and also to help with understanding the meaning of the
     link between the name and the named.

You are doing a disservice to URNs by talking of name assignment
authority and name resolution service (not authority) in the same
breath.  These are orthogonal functions.  At the birth of a namespace
the functions are often carried out by the same organization, and this
is why people tend to conflate the two ideas.  But the distinction is
essential, I think, to the nature of URNs, so it should be celebrated,
not glossed.

     However, authorities, whether individuals or organizations, have a

That should be "resolution services", not "authorities".

     lifetime, and must be consulted at some point to understand the
     bindings.  The functioning of names as unique identifiers and holders
     of meaning depends on having a reliable infrastructure of consulting
     the authority or the authorities records to determine the thing
     referenced.



  Masinter                 Expires April 25, 2011                 [Page 4]

  Internet-Draft      The 'tdb' and 'duri' URI schemes        October 2010


  1.2.  URIs for abstractions

"Abstractions" is a terrible ontological category.  What's an example
of something that's not an abstraction?  I think the distinction
you're looking for is - in the words of the passage you quote below -
"resources not accessible via the Internet".  Just say that instead.

     The description of URIs [RFC3986] describes a range for 'Resource'
     that is quite broad:

 This specification does not limit the scope of what might be a
 resource; rather, the term "resource" is used in a general sense
 for whatever might be identified by a URI.  Familiar examples
 include an electronic document, an image, a source of information
 with a consistent purpose (e.g., "today's weather report for Los
 Angeles"), a service (e.g., an HTTP-to-SMS gateway), and a
 collection of other resources.  A resource is not necessarily
 accessible via the Internet; e.g., human beings, corporations, and
 bound books in a library can also be resources.  Likewise,
 abstract concepts can be resources, such as the operators and
 operands of a mathematical equation, the types of a relationship
 (e.g., "parent" or "employee"), or numeric values (e.g., zero,
 one, and infinity).

     One might use a URI such as "mailto:" email address to identify a
     person,

You cannot use "mailto:" to identify a person without contradicting
the "mailto:" URI scheme registration.  Please don't suggest this.

     or a "http:" URI to identify an abstract comment.  However,
     this leaves the question of how one might identify, within the same
     context, both the system mailbox and the person to which it is
     assigned, or the web page at a http URI and the concept it describes.

Not "concept" - again web pages can describe anything, not just
concepts (whatever those are!).  How about "and what it describes" or
"the entity it describes" or "the thing it describes" or "the resource
it describes".

     The 'tdb' URI scheme allows ready assignment of URIs for abstractions
     that are distinguished from the media content that describes them.

Not "abstractions".  What you are saying is that the "media content"
describes something - not necessarily an abstraction - and the 'tdb'
scheme allows ready assignment of a URI to whatever it describes
(which is not the media content except in pathological situations).

     The goal, then, of the 'tdb' URI scheme is to provide a mechanism
     which is, at the same time:

 permanent: The identity of the resource identified is not subject
 to reinterpretation over time.

Well, anything written by anyone is subject to reinterpretation,
that's just the way it goes.  But I guess your intent is clear.

Why not just say that the URI is not subject to reinterpretation over
time, and skip the identity/identified bit?  The interpretation *is*
what's identified, right?

 explicitly bound: The mechanism by which the identified resource
 can be determined is explicitly included in the URI.

I don't get this.  If an http: URI yields representations, then the
*server* knows what the resource is, but it's unlikely any client does
- all they know is a couple of representations that they happened to
get; and if the server has been offline for ten years, how can
*anyone* determine what the resource is?

I think you mean to be talking about objectivity: what's meant by the
DURI, or how it's to be interpreted, is explicit in the DURI.

 useful for non-networked items: Allows identification of resources
 outside the network: people, organizations, abstract concepts.

 no administration: The mechanism does not depend on reliable
 administrative processes of authorities for either assignment or
 interpretation.

other than adherence to this RFC, that is.


  2.  Syntax

...

  3.  Semantics

  3.1.  'duri' Semantics

     It is traditional in convention references and citations in printed

conventional

     works to include the date of publication; this practice serves the
     important purpose that the context of the naming can be determined.

Since the context can't necessarily be determined, how about
"important purpose of determining the context of the naming".

Although determining anything about the context other than the time,
if that's possible at all, would require investigative work.

     The meaning of a 'duri' URI is "the resource that was identified by
     the <encoded-URI> (after hex decoding) at the date(time) given".

If one resource can have one representation at one time, and a
different representation at a future time, as is permitted by 3986,
then this is not good enough for your purposes.  You also want to say
that it's the resource as it was at that time.  See above.

     For example, "duri:2001:http://www.ietf.org" is a persistent
     identifier to "http://www.ietf.org" as of 2001.  A 'duri' URI may not
     be a resource locator in a practical sense: the time of location has
     not yet arrived or has passed.

"may not" => "is not necessarily"

Not sure what you mean by "time of location".  I would say something
like "the binding and/or condition may not yet..."


  3.2.  'tdb' Semantics

     The 'tdb' URI scheme is intended to be useful for describing
     entities, concepts, abstractions, and other items which may not
     themselves be network accessible resources, but have been at some
     point described by network accessible resources.

Re "describing" please don't introduce yet another near miss for
"identify" (term of art) - we already have "name" and "designate".
Also you're introducing "item" as a new near-synonym for "resource" or
"thing".

     A 'tdb' URI is intended to be used where the <encoded-URI> identifies
     a 'document' (something a person could read, peruse, understand) or a
     fragment thereof, where the document describes some thing or concept.

Concepts are not things?  That suggests there are *other* things that
aren't things, too...  How about just "describes some thing" or
described something" or "describes some resource".

     The 'tdb' URI itself then identifies the subject of that document.
     It is common practice to give a reference for a concept by including
     a pointer to a document, segment, phrase that defines the concept;
     'tdb' attempts to capture this practice in URI space.

What is the relation between concepts and resources?
What if the document has more than one subject / defines more than one
concept?

     For example, one might use "tdb:2008:http://www.ietf.org" as a
     persistent identifier for the Internet Engineering Task Force, as
     described by the "http://www.ietf.org" in 2008.

I prefer tdb:duri:2008:http://www.ietf.org

     The 'tdb' URI scheme differs from other URI or URN methods for
     identifying abstractions because the designation of what is actually
     identified by the 'tdb' doesn't depend on knowing the intention of
     the "assigner" of the identifier.  Unlike "tag", "info", "cid", "mid"
     or related schemes, the identification is not dependent on the
     context of use.  The 'tdb' URI scheme can be thought of as giving a



  Masinter                 Expires April 25, 2011                 [Page 7]

  Internet-Draft      The 'tdb' and 'duri' URI schemes        October 2010


     way to invoke a level of semantic indirection to URI resolution.

     While one could imagine using 'tdb' without a date, it would leave
     the possibility that a reference that is unambiguous at one time
     might become ambiguous at some other time.  There are two ways that
     the date is useful for 'tdb' URIs: it fixes the time of access of the
     resource, for variable descriptions, and it fixes the time of
     interpretation, for descriptions whose meaning (in natural language)
     might vary.

Why is it that tdb: fixes the time of interpretation, but duri:
doesn't?

If I know the date of a document's publication, I will do my best to
interpret the document in the way that it would have been interpreted
around that time.

Whether this needs to be specified by the URI scheme, or is just
common sense, is not clear.

  3.3.  Timestamp Semantics

     It is traditional in convention references and citations in printed

conventional

     works to include the date of publication; this practice serves the
     important purpose that the context of the naming can be determined.

     While one could imagine using 'tdb' without a timestamp, it would
     leave the possibility that a reference that is unambiguous at one
     time might become ambiguous at some other time.  There are two ways
     that the date is useful for 'tdb': it fixes the time of access of the
     resource, for variable descriptions, and it fixes the time of
     interpretation, for descriptions whose meaning (in natural language)
     might vary.  While normally, in a literary work in natural language
     which makes a reference to another work, both the reference itself
     and the work referenced are dated, e.g., a footnote in an article
     written in 1967 might talk about a "private communication" which
     itself had a date.  The difference between a URI and a conventional
     literary reference is the desire to be able to extract the URI from
     its context and still retain its meaning.

     The meaning of a timestamp is the interval specified by the
     granularity of the time range indicated, in the UTC time zone, as
     described in [RFC3339].  If necessary, timestamps can include times
     and even fractional times, so that a generator of 'duri' or 'tdb'
     URIs can be arbitrarily precise.

     If there is any ambiguity of the resource within the range of time
     indicated (for example, if the timestamp consists only of a year, and
     the resource changes over the course of the year), then the resource
     state as of the very last instant of the range indicated should be
     used.

     Timestamps are allowed to be specified with as much precision as
     needed.  This keeps most 'duri' and 'tdb' URIs relatively short.







  Masinter                 Expires April 25, 2011                 [Page 8]

  Internet-Draft      The 'tdb' and 'duri' URI schemes        October 2010


  4.  Use as a Locator

     A 'duri' URI is not directly useful as a resource locator, since many
     resources vary their content over time.

     A 'tdb' URI is not a resource locator in a practical sense, since it
     explicitly requires human interpretation.  However, it allows one to
     know that a resource was described at some point in time; whether the
     description is still available, or whether that description is still
     meaningful, is not guaranteed.

The resource needn't have been described or even observed at the
indicated time.  All you know is what you've already said above, that
the reference or description relates to the condition of a resource at
a particular time, where the resource in question was the one that at
that time was "identified" (or "described") by the subject URI.

     ...

     One might consider using 'tdb' with a "data" URI to designate
     concepts that can be described uniquely briefly inline.  For example,

   tdb:2001:data:,The%20US%20president

     names the concept described by the (text/plain) string "The US
     president" at the very last instant of 2001.

The president is a concept?

         Of course, this
     practice is only useful if the referent of the data is (or was at the
     time) completely unique.  Since "data" does not contain a way to
     designate content-language,

(hey, this is a bug, how about if we fix it?)

     the string in question would have to not
     be ambiguous as to its language.  In the case of 'data', there is no
     assigning authority at all; the interpretation of the 'tdb' depend on
     the interpreting community.

Although your RFC doesn't explicitly say what resources data: URIs
"identify" or "locate", it seems pretty clear from all the examples
that they identify things that are pretty close to strings.  Therefore
RFC 2397 *is* the assigning authority.  Interpreting the data: URI as
a string is straightforward; interpreting that string as something
else has nothing to do with the data: URI scheme, so the comment about
"no assigning authority" is confused.

     Using 'tdb' or 'duri' with an embedded 'urn:' might not seem to be
     too useful,

tdb, very useful, why wouldn't it seem so?

   but it might be useful where the assignment of names in a
     URN namespace are not, in practice, permanent, or that one might want
     to refer to the assignment as of a given date.  In this case, it is
     possible to use a "urn" within a 'duri', e.g.,

    duri:2000:urn:ietf:std:50

     might be used to refer to "the document that the IETF considered to
     be STD 50, as of the last instant of 2000".

     For 'tdb', many URIs identify resources which do not clearly describe
     anything at all.  The "home page" for an organization isn't nearly as
     good a resource to use to describe an organization as the
     organization's "about" page.

Depends on what the home page says, and what the about page says.

       But it is up to the minter of the 'tdb'
     URI to choose wisely.

  6.2.  Useful timestamps

     Timestamps far in the future are suspect, because the future content
     of a description resource cannot usually be reliably predicted.
     Timestamps which preceed the availability of the description resource
     should not be used either.  For example, using a http URI with a
     timestamp before the description resource is also not recommended.

     However, although these practices are not recommended, there is no
     assurance that they haven't been used; by itself, a 'tdb' URI by
     itself does not constitute an assertion that the description resource
     was available or assigned at the date specified.

     Note that the use of the "very last instant" allows for the
     conventional bibliographic convention that a work published in 2009
     can use "2009" as the date string, to refer to the work in the year
     of publication.



  Masinter                 Expires April 25, 2011                [Page 10]

  Internet-Draft      The 'tdb' and 'duri' URI schemes        October 2010


  6.3.  Free assignment

     Because of the many possible schemes that can be used in the
     <encoded-URI> portion, there should be no difficulty in almost any
     computational process being able to assign 'duri' or 'tdb' URIs at
     will.  Of course, it is necessary for there to be some resource which
     is available at some point in time, and to have a clock which is
     accurate to the granularity of the frequency of assignment.

  6.4.  Resolution

     There are no direct resolution servers or processes for 'duri' or
     'tdb' URIs.  However, a 'duri' URI might be "resolvable" in the sense
     that a resource that was accessed at a point in time might have the
     result of that access cached or archived in an Internet archive
     service.  See, for example, the "Internet Archive" project [archive].
     And a 'tdb' URI is "resolvable" in the sense that the description
     resource can be accessed and interpreted.

     Clients without access to an Internet archive service might take the
     decoded <encoded-URI> of a 'duri' and attempt resolution of *that*
     identifier.  This will give an approximation whose reliability
     depends on the what has happened in the time since the date
     indicated.

  6.5.  Why Names with Semantics?

     There are a number of URI and URN schemes that create otherwise
     unbound "names", where the scheme only provides for uniqueness, with
     some other agent or process or context providing the authority to
     interpret the meaning of the identifier at some point in the future.
     'duri' and 'tdb' is different, in that it is the agreement between
are
     the describer (the agent creating the URI) and the receiver of the
     URI (the agent interpreting the URI) to agree upon the semantics
     without any reference to any third party.

  6.6.  Avoiding MetaData

     One might consider the timestamp in a 'duri' or 'tdb' URI to be just
     one piece of additional metadata about the URI, and consider adding
     other pieces of metadata as annotation.

     However, the use of the timestamp is intended primarily as a
     mechanism of accomplishing uniqueness over time.  No other bit of
     metadata or description readily fills that purpose.  Further, the
     date is not descriptive (an assertion about the URI) but merely
     refining.




  Masinter                 Expires April 25, 2011                [Page 11]

  Internet-Draft      The 'tdb' and 'duri' URI schemes        October 2010


  6.7.  Avoiding 'duri' and 'tdb'

     Many applications of URIs already provide a context of timestamp.
     For example, one could imagine a hypertext system where the URIs
     contained within a document were intended to refer to the resources
     as of the date of the enclosing document.  This would be a reasonable
     interpretation of URIs within an Internet archive system, for
     example.

     Some applications of URIs already implicitly use the level of
     interpretive indirection that is explicit with 'tdb', For example,
     within an ontology language definition, the URIs used for abstract
     concepts, individuals and so forth are generally considered the
     "thing described by" the URI.

     In addition, the 'application/rdf+xml' Media Type [RFC3870] uses the
     fragment identifier resolution as an explicit way of identifying
     abstract concepts that are described by an RDF document.

"Abstract concept" is too limiting as RDF is used for concrete
non-concepts as well.

  6.8.  'tdb' and levels of indirection

     The 'tdb' scheme introduces a level of semantic indirection.  The
     puzzles and confusions about use and mention, name and reference, and
     levels of indirection have been puzzling and amusing for quite a
     while.

 "It's long," said the Knight, "but it's very, very beautiful.
 Everybody that hears me sing it--either it brings tears into their
 eyes, or else--"
 "Or else what?" said Alice, for the Knight had made a sudden
 pause.
 "Or else it doesn't, you know.  The name of the song is called
 'Haddock's Eyes.'"
 "Oh, that's the name of the song, is it?"  Alice said, trying to
 feel interested.
 "No, you don't understand," the knight said, looking a little
 vexed.  "That's what the name is called.  The name really is 'The
 Aged Aged Man.'"
 "Then I ought to have said 'That's what the song is called'?"
 Alice corrected herself.
 "No, you oughtn't: that's quite another thing!  The song is called
 'Ways and Means': but that's only what it's called, you know!"
 "Well, what is the song, then?" said Alice, who was by this time
 completely bewildered.
 "I was coming to that," the Knight said.  "The song really is
 'A-sitting On A Gate': and the tune's my own invention."  [LOOK]

I don't see what this section contributes.  As an alternative I would
suggest giving a reference to any useful exposition of the difference
between an utterance and the meaning of an utterance, such as

duri:20101102:http://en.wikipedia.org/wiki/Use%E2%80%93mention_distinction
Received on Tuesday, 2 November 2010 18:19:17 UTC