RDFa, Fragment Identifiers and HTTP
There have been discussions if hash URIs in RDFa, especially concerning content negotiation, cause contradictions in terms of Web architecture. This note gathers relevant material to clarify these issues.
- The Cool URIs for the Semantic Web note is largely silent on this issue
- The TAG finding httpRange-14 defines that 'if an http resource responds to a GET request with a 2xx response, then the resource identified by that URI is an information resource; and if an http resource responds to a GET request with a 303 (See Other) response, then the resource identified by that URI could be any resource'
- The Architecture of the World Wide Web Volume One (AWWW) says tackles the issue in a generic (non-RDFa-specific) way in section 3.2.1. Representation types and fragment identifier semantics and section 3.2.2. Fragment identifiers and content negotiation
Let us assume an (X)HTML+RDFa document at
http://example.org/bob.html with the following content (assuming header, etc. has been defined properly):
<div about="#me" typeof="foaf:Person"> <a rel="foaf:knows" href="http://example.com/alice#I">Alice</a>< </div>
To answer the above question, let us first have a look at the HTTP interaction between the agent (> in the following) and the server (< in the following), based on RFC2616 section 5.
The syntax for the request URI reads as follows:
Request-URI = "*" | absoluteURI | abs_path | authority
We see that the request URI MUST NOT include a fragment identifier, hence the agent would remove the
#I and send the following request:
> GET /alice HTTP/1.1 > Host: http://example.com/
The response of the server now depends on the nature of the requested resource. Before we go into these details, let's step back a bit and have a look what the AWWW tells us about this.
From AWWW section 2.6. Fragment Identifiers we learn:
The fragment identifier component of a URI allows indirect identification of a secondary resource by reference to a primary resource and additional identifying information. The secondary resource may be some portion or subset of the primary resource, some view on representations of the primary resource, or some other resource defined or described by those representations.
Section 3.2.1. Representation types and fragment identifier semantics of the AWWW tells us:
The semantics of a fragment identifier are defined by the set of representations that might result from a retrieval action on the primary resource. The fragment's format and resolution are therefore dependent on the type of a potentially retrieved representation, even though such a retrieval is only performed if the URI is dereferenced. If no such representation exists, then the semantics of the fragment are considered unknown and, effectively, unconstrained. Fragment identifier semantics are orthogonal to URI schemes and thus cannot be redefined by URI scheme specifications.
as well as:
Interpretation of the fragment identifier is performed solely by the agent that dereferences a URI; the fragment identifier is not passed to other systems during the process of retrieval. This means that some intermediaries in Web architecture (such as proxies) have no interaction with fragment identifiers and that redirection (in HTTP RFC2616, for example) does not account for fragments.
and further, from section 3.2.2. Fragment identifiers and content negotiation:
Individual data formats may define their own rules for use of the fragment identifier syntax for specifying different types of subsets, views, or external references that are identifiable as secondary resources by that media type. Therefore, representation providers must manage content negotiation carefully when used with a URI that contains a fragment identifier.
In the following we take the two standardized RDF serializations RDFa and RDF/XML into account; both formats define the semantics of fragment identifiers:
- RDFa via its hosting language(s) - see RFC3236 application/xhtml+xml
- RDF/XML - see RFC8370 application/rdf+xml
Let us assume that the server has both RDFa and RDF/XML representations for the resource
http://example.com/alice available. Now, depending on the agent and it's preferences there may be different possible outcomes.
A agent might ask for an RDFa representation:
> GET /alice HTTP/1.1 > Host: http://example.com/ > Accept: application/xhtml+xml < HTTP/1.1 200 OK < ... HTML as payload ...
Another agent might ask for an RDF/XML representation:
> GET /alice HTTP/1.1 > Host: http://example.com/ > Accept: application/rdf+xml < HTTP/1.1 200 OK < ... RDF/XML as payload
No matter if the agent receives an RDFa or an RDF/XML representation, the agent can now (locally) process the RDF graph concerning
Eventually, two things to note:
- It is in the responsibility of the representation provider to decides when definitions of fragment identifier semantics are sufficiently consistent (in our case: between RDFa and RDF/XML), see AWWW section 3.2.2. Fragment identifiers and content negotiation.
- In case of RDFa there can be a URI collision if the fragment identifier is also used as
@idvalue on an HTML element (see also 1 and 2).
Nevertheless, the applied principle of orthogonality and the relevant sections in the AWWW (3.2.1. and 3.2.2) suggest that where there is no corresponding
@id on an (X)HTML element, the URI fragment identifies whatever the publisher's intention is.