for telecon: ISSUE-25 @profile Document Retrieval Failure

The issue: assume that profiles 'a' and 'b' both define the term
'Person' as tokens for different URIs. Additionally, assume that 'b'
defines the prefix 'dc' to point to 'http://purl.org/dc/terms/'. Then:

	<html profile="http://example.com/a"
	  prefix="dc: http://purl.org/dc/elements/1.1/
	    foaf: http://xmlns.com/foaf/0.1/">
	  ...
	  <body profile="http://example.net/b">
	    <address>
	      Last modified by <a typeof="Person"
	      rel="foaf:homepage" href="/joe"
	      property="foaf:name">Joe Bloggs</a>
	      on <span property="dc:modified">2010-05-26</span>.
	    </address>
	  </body>
	</html>

If profile 'b' is inaccessible (e.g. the host example.net is down
temporarily or permanently) then an RDFa processor will create
erroneous triples. typeof="Person" will be mapped to its definition
from profile 'a'; and the 'dc' in 'dc:modified' will be mapped to the
older Dublin Core 1.1 URI, which does not define a property called
'modified'.

The suggested solution, originally from Jeni Tennison is that when
parsers encounter a profile URI that cannot be retrieved, they should
not process that subtree.

My interpretation/clarification of this is:

If the profile does not return an HTTP 200 status (or equivalent for any
non-HTTP profile URIs) possibly after following some redirections, then
the profile is a "Failed Profile".

If the profile is delivered in a media type or dialect which the RDFa
processor is not able to handle (e.g. delivered in Turtle to a processor
that cannot handle Turtle), it is also deemed a Failed Profile.

Having processed @profile, if any of the profiles are Failed Profiles,
then an RDFa processor cannot continue. It completes any incomplete
triples using the following method to determine the subject of
incomplete forward triples / object of incomplete reversed triples:

	1. if @about is set to a relative URI, use that;
	2. otherwise, if @about is set, use a new blank node;
	3. otherwise, if @src is set, use that;
	4. otherwise, if @resource is set to a relative URI, use that;
	5. otherwise, if @resource is set, use a new blank node;
	6. otherwise, if @href is set, use that;
	7. otherwise, use a new blank node.

Note that this method will result in any token containing a colon
resulting in a blank node - this is because it becomes impossible
to determine whether that token represents a CURIE that would
have been defined by the Failed Profile, or is an absolute URI.

Descendant elements are then not processed.

I'll add that there is actually a safe subset of RDFa to process within
the subtree of Failed Profiles, however the rules for determining this
safe subset are complex, so for the sanity of RDFa publishers, it's
simpler to say that a Failed Profile causes the subtree to be ignored.

For the record though, a method for continuing parsing safely is:

* when a parser encounters a Failed Profile, it empties the local lists
  of prefix mappings and terms.
* it continues processing (including processing other profiles, as
  the @profile attribute is a list of profiles) with a small
  modification in behaviour - when a token is encountered that matches
  the CURIE syntax but does not match a prefix mapping, instead of
  assuming that it's an absolute URI, the parser assumes it's
  an undefined value.

I don't recommend that method though as it might lead to confusion
amongst authors.

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>

Received on Wednesday, 26 May 2010 08:43:33 UTC