RDFa, Fragment Identifiers and HTTP

There have been discussions if hash URIs in RDFa, especially concerning content negotiation, cause contradictions in terms of Web architecture. This note gathers relevant material to clarify these issues.

Preliminaries

The Cool URIs for the Semantic Web note is largely silent on this issue
The TAG finding httpRange-14 defines that 'if an http resource responds to a GET request with a 2xx response, then the resource identified by that URI is an information resource; and if an http resource responds to a GET request with a 303 (See Other) response, then the resource identified by that URI could be any resource'
The Architecture of the World Wide Web Volume One (AWWW) says tackles the issue in a generic (non-RDFa-specific) way in section 3.2.1. Representation types and fragment identifier semantics and section 3.2.2. Fragment identifiers and content negotiation

Example

Let us assume an (X)HTML+RDFa document at http://example.org/bob.html with the following content (assuming header, etc. has been defined properly):

<div about="#me" typeof="foaf:Person">
 <a rel="foaf:knows" href="http://example.com/alice#I">Alice</a><
</div>

What happens when the agent who is processing http://example.org/bob.html now discovers the foaf:knows relation and dereferences http://example.com/alice#I?

Resolution

To answer the above question, let us first have a look at the HTTP interaction between the agent (> in the following) and the server (< in the following), based on RFC2616 section 5.

The syntax for the request URI reads as follows:

 Request-URI    = "*" | absoluteURI | abs_path | authority

We see that the request URI MUST NOT include a fragment identifier, hence the agent would remove the #I and send the following request:

> GET /alice HTTP/1.1
> Host: http://example.com/

The response of the server now depends on the nature of the requested resource. Before we go into these details, let's step back a bit and have a look what the AWWW tells us about this.

From AWWW section 2.6. Fragment Identifiers we learn:

The fragment identifier component of a URI allows indirect identification of a secondary resource by reference to a primary resource and additional identifying information. The secondary resource may be some portion or subset of the primary resource, some view on representations of the primary resource, or some other resource defined or described by those representations.

Section 3.2.1. Representation types and fragment identifier semantics of the AWWW tells us:

The semantics of a fragment identifier are defined by the set of representations that might result from a retrieval action on the primary resource. The fragment's format and resolution are therefore dependent on the type of a potentially retrieved representation, even though such a retrieval is only performed if the URI is dereferenced. If no such representation exists, then the semantics of the fragment are considered unknown and, effectively, unconstrained. Fragment identifier semantics are orthogonal to URI schemes and thus cannot be redefined by URI scheme specifications.

as well as:

Interpretation of the fragment identifier is performed solely by the agent that dereferences a URI; the fragment identifier is not passed to other systems during the process of retrieval. This means that some intermediaries in Web architecture (such as proxies) have no interaction with fragment identifiers and that redirection (in HTTP RFC2616, for example) does not account for fragments.

and further, from section 3.2.2. Fragment identifiers and content negotiation:

Individual data formats may define their own rules for use of the fragment identifier syntax for specifying different types of subsets, views, or external references that are identifiable as secondary resources by that media type. Therefore, representation providers must manage content negotiation carefully when used with a URI that contains a fragment identifier.

In the following we take the two standardized RDF serializations RDFa and RDF/XML into account; both formats define the semantics of fragment identifiers:

RDFa via its hosting language(s) - see RFC3236 application/xhtml+xml
RDF/XML - see RFC8370 application/rdf+xml

Let us assume that the server has both RDFa and RDF/XML representations for the resource http://example.com/alice available. Now, depending on the agent and it's preferences there may be different possible outcomes.

A agent might ask for an RDFa representation:

> GET /alice HTTP/1.1
> Host: http://example.com/
> Accept: application/xhtml+xml
< HTTP/1.1 200 OK
< ... HTML as payload ...

Another agent might ask for an RDF/XML representation:

> GET /alice HTTP/1.1
> Host: http://example.com/
> Accept: application/rdf+xml
< HTTP/1.1 200 OK
< ... RDF/XML as payload

No matter if the agent receives an RDFa or an RDF/XML representation, the agent can now (locally) process the RDF graph concerning http://example.com/alice#I.

Eventually, two things to note:

It is in the responsibility of the representation provider to decides when definitions of fragment identifier semantics are sufficiently consistent (in our case: between RDFa and RDF/XML), see AWWW section 3.2.2. Fragment identifiers and content negotiation.
In case of RDFa there can be a URI collision if the fragment identifier is also used as @id value on an HTML element (see also 1 and 2).

Nevertheless, the applied principle of orthogonality and the relevant sections in the AWWW (3.2.1. and 3.2.2) suggest that where there is no corresponding @id on an (X)HTML element, the URI fragment identifies whatever the publisher's intention is.