Re: naive question: why prefer absolute URIs to # URIs for linked data? from Jonathan Rees on 2011-09-02 (www-tag@w3.org from September 2011)

From: Jonathan Rees <jar@creativecommons.org>
Date: Fri, 2 Sep 2011 12:44:50 -0400
To: Ian Davis <me@iandavis.com>
Cc: Harry Halpin <hhalpin@ibiblio.org>, Manu Sporny <msporny@digitalbazaar.com>, www-tag@w3.org
Message-ID: <CACHXnaq1JoyZ9=EcC_JK5vXc6faV5Yyr3ibLegwuzpuZxGspow@mail.gmail.com>
On Wed, Aug 31, 2011 at 4:52 PM, Ian Davis <me@iandavis.com> wrote:
> I think there are a number of contributing factors:
>
> 1) architecturally the meaning of the fragment is determined by the media
> type of the representation. Thus meaning of a hash URI depends on how you
> access it.

OK, this is mostly true (see below), but I'd like to dive deeper in
order to understand this particular failure in detail. In order for
someone to experience this particular failure, they would have to
publish a representation at the URI for which the meaning determined
by the media type is different from the meaning that they want the
fragid-URI to have. So I'd like to see examples where this is the
case, since I would think it would be easy enough for someone to just
restrict the available representations to those giving meaning
consistent with the meaning that's desired. For us to get this
argument to stick, we'd have to show that publishers really need their
URIs to have meanings that are different from what the media type of
at least one representation would dictate (or to put it another way:
that they really need to publish a representation that indicates
fragid semantics different from what they want it to be).

The actual wording in 3986 is:

   The semantics of a fragment identifier are defined by the set of
   representations that might result from a retrieval action on the
   primary resource.  The fragment's format and resolution is therefore
   dependent on the media type [RFC2046] of a potentially retrieved
   representation

This is slightly different from what you say above, but not
consequentially so. The important thing is what would happen in
practice, not necessarily what the spec says.

The TAG has been talking about how the 3986 fragid semantics story
might be inadequate for current needs, so if you have any input into
that discussion it would be most welcome. Is there some change to 3986
that would make nose-following function better in the case of RDF use
of hashful URIs? Would it be helpful if the RDF semantics of fragids
were divorced from what 3986 says?

> 2) Fragments are not sent to the server when they are dereferenced which
> means the server has to guess what information to send. If you're storing
> data for that URI in a database you have to key it against the hashless
> version of the URI along with all other URIs that share that hashless part.
> Also the server can't log accesses to the full URI which means you don't get
> accurate analytics.

OK, this makes perfect sense when there is more than one hash URI per
hashless URI. But again, one could easily work around this. So the
question is, why do publishers need to create more than one hash URI
per hashless URI? Is this a matter of aesthetics, or is there some
other reason?

The one-hashful-per-hashless approach has been promoted by a number of
people over the years (mostly in blog posts), and I haven't really
heard a good refutation of it; so maybe you can help me out.

> 3) You can't use HTTP headers or status codes to refer to a hash URI. For
> example you can't 404 a hash URI or redirect it.

This is a good argument: lack of good error messages. But regarding
redirects, I'd think that's only an issue if redirection has to be
differential based on the fragid, and that only happens when there's
more than one fragid per hashless URI.

> 4) The role of the fragment is changing in modern web development practice.
> Its becoming a bearer of state and/or part of the interaction architecture
> of an application. See #! URLs or javascript techniques for tabbed pages.

I can see this, but fragid innovation is happening all the time,
indeed the feature seems designed for innovation. Are you saying that
there will be confusion because the publisher will use the same fragid
as both application parameter and semweb token, and the two uses will
be incompatible; or will the fragid be hijacked by CMS or other tools
so that using semweb-style fragids can't be used at all? A concrete
scenario would be really helpful here.

Thanks for your help here - I honestly don't mean to be adversarial,
but these questions have been bothering me and only someone who's
deeper into infrastructure than I am can help me out.

Best
Jonathan

> Ian
>
> On 28 Aug 2011 18:27, "Jonathan Rees" <jar@creativecommons.org> wrote:
>> Question to the broader www-tag readership (and beyond):
>>
>> I don't want to start another argument, I just want to understand the
>> position that it is necessary to use absolute (i.e. hashless) URIs
>> instead of hash URIs for semantic web / linked data purposes, and
>> record the reasons for this position somewhere. I attempted this in
>> http://www.w3.org/2001/tag/awwsw/issue57/20110625/#hash but I feel
>> the case I made against # URIs there is not convincing.
>>
>> That is, suppose you want a URI to use in RDF as a reference (name,
>> "identifier", whatever) for something other than the web page
>> (document, "information resource", whatever) at that URI. Why is it so
>> important that the URI be absolute, instead of one containing # ? So
>> important that the defense of this right would precipitate storms of
>> email messages, many containing quite strong language?
>>
>> This question is at the root of the httpRange-14 / ISSUE-57 dispute,
>> since if # URIs worked for everyone there would be no pressure to use
>> absolute URIs, and therefore no fight about whether you can use 200 or
>> are required to use 303. So I'd like to understand this better than I
>> do.
>>
>> Please be as specific and concrete as possible. I promise to do my
>> best to listen patiently, treat all reasons as legitimate, and report
>> impartially.
>>
>> Thanks for your help,
>>
>> Jonathan
>
Received on Friday, 2 September 2011 16:45:27 UTC