Re: Request for feedback on HTTP Location header syntax + semantics, Re: Issues 43 and 185, was: Issue 43 (combining fragments)

On Thu, Mar 11, 2010 at 12:55 PM, Julian Reschke <julian.reschke@gmx.de> wrote:
> On 11.03.2010 13:31, Jonathan Rees wrote:
>>
>> (bcc www-tag)
>>
>> If you believe in the "identification" / "resource" / "representation"
>> theory then figuring this out if pretty straightforward.
>
> I think that's a theory I believe in :-).
>
>> Suppose http://example.com/a redirects to http://example.com/c#d, and
>> we want to know what resource is identified by http://example.com/a#b.
>>  In general resource x#y means y as locally defined in x. So
>> http://example.com/a#b is b as locally defined in
>> http://example.com/a.  To find what's in http://example.com/a, you
>> look at the resource http://example.com/c#d.  How a fragid is defined
>> locally in something depends on the media type registration, and the
>> only media type of which I'm aware that allows one to define locally a
>> fragid b to identify something that itself has representations with
>> the potential for fragid definition is application/rdf+xml (and the
>
> Interesting. How so? Pointer? Couldn't see that in
> <http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-fragID>.

The RDF media type registration supports the use of URIs with fragids
as "identifying" anything. You could for example say
<http://example.com/c#d> owl:sameAs <http://example.com/e>. where
http://example.com/e has an HTML representation that defines its own
local identifiers, and so on.  (The URI syntax really ought to support
scheme:path#frag#frag#frag ... so it goes...)

This is all theoretical... I'm not going to get on anyone's case for
not implementating this, only for ruling it out.

>> other RDF media types). For text/html, for example,
>> http://example.com/c#d would identify an HTML element, and there's no
>> fragid namespace defined locally inside an HTML element in which #a
>> could be defined.
>>
>> Usually, when a browser finds that a resource doesn't define the
>> desired fragid, it just shows you a representation of the resource. (I
>> vaguely remember some discussion about how the user should be alerted
>> when this happens, like a mild form of 404, but that's another story.)
>> To be consistent that is what should happen in this case. That is, it
>> would throw away the unresolvable #b and just show you
>> http://example.com/c#d. (unless the representation of
>> http://example.com/c is RDF that defines d to be a resource that has a
>> locally defined b.)
>
> The good thing is that the end result for text/html is the same: the
> fragment id from the redirect URL is taken into account, the original fragid
> being overridden.
>
> So, considering that whatever has to happen here depends on the media type
> of the representation we get from the redirect target -- isn't this
> something the spec for text/html needs to spell out?

Only if the recovery strategy is specific to text/html (see below). If
it's generic across all media types then it's HTTP's job. I'll leave
the determination up to you.

> We currently have (in "-latest"):
>
> "Note: This specification does not define precedence rules for the case
> where the original URI, as navigated to by the user agent, and the Location
> header field value both contain fragment identifiers."
>
> How about expanding this to something like:
>
> "Note: This specification does not define precedence rules for the case
> where the original URI, as navigated to be the user agent, and the Location
> header field value both contain fragment identifiers. In particular, the
> semantics of fragment identifiers depend on the representation's media
> type."
>
> ...and thus make it the HTML WG's problem (for text/html)? :-)

I could live with that. Maybe remove the word "precedence" as it is
distracting. Maybe cite 3986.

The reason the spec "does not define" it (which I think is debatable)
stems from HTTP's incomplete embrace of the
identification/resource/representation theory in the case of
redirects. For example:

   "For 3xx responses, the location SHOULD
   indicate the server's preferred URI for automatic redirection to the
   resource."

More I/R/R friendly might be

   "For 3xx responses, the location SHOULD
   identify a resource for which a representation should
   be obtained."

The language for 302 and 307 talks about the resource "residing at"
another URI without defining what "residing" means (especially when
the Location: URI doesn't use the http: scheme). You could exploit the
I/R/R theory by observing that the Location: URI actually identifies
another resource (maybe the same one for a 301, I don't know) and in
the 302/307 case it's "lending" its "corresponding" representations
(and redirects) to the first resource. Then you don't have to define
"resides" and you've defined "redirect" quite clearly and abstractly.

I wouldn't mind someone arguing that in <div id="d"> foo <div id="b">
bar </div> </div> it would make sense to "go" to the div with id="b"
when http://example.com/a#b redirects to http://example.com/c#d,
although in an ideal world this would be licensed by the html (xml,
etc) media type registrations. Heck, I don't really care going to #b
when the "b" element is outside the "d" element, when what we're
talking about is classified as an error recovery case. There is some
theoretical risk that a user could be confused or tricked somehow by
this behavior, but that would be true for any error recovery behavior,
including going to the top of the document. Whether this is in the
purview of HTTP depends on whether one would want to agree on a
general recovery principle that's independent of HTML. No advice on
that matter.

> Best regards, Julian
>

Best
Jonathan

Received on Friday, 12 March 2010 16:54:08 UTC