ACTION-487 Assess potential impact of IRI draft on RDF/XML, OWL, and Turtle

ACTION-487
Assessing the impact of proposed changes to IRIs on RDF and OWL.

Executive summary: The RDF specs are protected against changes to IRIs
because they refer to "[IRI draft] or its successors".  The OWL specs
and RDFa are not protected since they refer normatively to RFC 3987.
I can't say whether any applications will be affected.

(To non-TAG readers of the list, this action refers to TAG discussion on IRIs
last week, in which Larry outlined changes that are being considered.
You can consult the TAG F2F minutes when they come out, and Larry will
keep us posted on developments and drafts. This email is completely
independent of the details of the changes.)

------------------------------------------------------------
Details

RDF has an abstract syntax ('graphs') and a variety of
serializations, including RDF/XML, Turtle, and RDFa.

The RDF abstract syntax treatment of IRIs is here:

  http://www.w3.org/TR/rdf-concepts/#section-Graph-URIref  (2004)

Nodes in a graph can be "RDF URI references".

    "This section anticipates an RFC on Internationalized Resource
     Identifiers.  Implementations may issue warnings concerning the
     use of RDF URI References that do not conform with [IRI draft] or
     its successors."

This is sort of OK, since "its successors" would include the future IRI
specification.  However, conradicting this somewhat, it says that a
Unicode string is an RDF URI Reference iff it has no control
characters AND the ASCII string obtained by converting to UTF-8 and
then percent-encoding is a valid URI.  (I'm glossing, see the spec for
details.)

There is a reference to 'XML Schema Part 2: Datatypes' (May 2001),
which in turn defers to 'XML Linking Language', which repeats the
definition of validity based on what would happen according to the
UTF-8 / %-encoding process.

There is also a reference to 'Namespaces in XML 1.1', which says
pretty much the same thing.

As the references to Schema and Namespaces are from 'notes' one might
think they are non-normative, but they're in a normative section of
the document and the references are listed as normative.

Comparison of 'RDF URI References' is by string comparison, not URI
equivalence.  There is no specified conversion of RDF URI References
to URIs, because none is needed.  So all that matters (from the POV of
the specs) is which strings are valid IRIs, not what happens when they
get converted to URIs.

Thus the only RDF-related failure induced by a change to IRIs would be
in causing a formerly valid RDF graph to be invalid or vice versa.

Of course there are applications that convert IRIs to URIs, and they
would be affected, but this would have nothing to do with their RDF
conformance.

'RDF Semantics'
    http://www.w3.org/TR/2004/REC-rdf-mt-20040210/#urisandlit
talks about 'URI references' without saying what they are.  There is
normative reference to RDF 2396 for other reasons, but I find it hard
to imagine that any use of this recommendation would be affected by
changes to the syntax of URIs (or IRIs), as it's not the job of the
document to specify the syntax of anything.

RDF/XML just refers to 'RDF URI References' from the Concepts
document.

Turtle ( http://www.w3.org/TeamSubmission/turtle/ ) is vague on
the subject.  It has a normative reference to RFC 3987 but this is not
part of defining what its 'URI references' mean - indeed it doesn't
say what 'URI references' are, syntactically.  I would guess that any
reasonable person would go to RDF Concepts to get the definition,
although taking them to be RFC 3986 URI references would
also be forgiveable.

Similarly, 'RDFa in XHTML' does not define 'URI reference' and the
reasonable assumption would be that these are inherited from RDF
Concepts or 3986.

Unfortunately RDFa has a definition  of CURIEs that normatively
references RFC 3987.

SPARQL IRIs are defined by normative reference to RDF Concepts.

OWL 2 cites RFC 3987 normatively in defining what an IRI is.  See

    http://www.w3.org/TR/2009/REC-owl2-syntax-20091027/#IRIs

That will obviously be a problem if 3987 gets replaced.

OWL speaks very abstractly of accessing ontology documents:

    http://www.w3.org/TR/owl2-syntax/#Ontology_Documents

"Each ontology document can be accessed via an IRI by means of an
appropriate protocol."  This leaves the means of access up to each
application involved in implementing OWL. The 'appropriate protocol'
might not even involve URIs at all.

The OWL documents refer to 'XML Base' (2008)
http://www.w3.org/TR/xmlbase/ a few times, which references LEIRI,
not RFC 3987, but not in a way that would cause LEIRI to apply to OWL.

------------------

So what does this all mean?

1. There are the obvious annoyances around normative references to
documents that are really part of a time series. Henry has figured out
one way this might be addressed, and that technique should be applied
to the OWL and RDF recommendations at the next opportunity.
(Someone will provide the reference to Henry's policy I'm sure...)

2. An "old" IRI might be rejected or misinterpreted by an
upgraded application.

3. A "new" IRI might be rejected or misinterpreted by an
"old" application.

2 and 3 seem highly unlikely, but the question is difficult to answer,
One reason for this is that the new draft hasn't been written, so we
don't know exactly what the changes will be. (I'm confident that the
authors of the new IRI draft will explain to us exactly what they are,
when the time comes.) The other is that as a member of the
ASCII-speaking world, I (as most members of this list) would not be
aware of how non-ASCII IRIs are being deployed in RDF and OWL.

Jonathan

Received on Wednesday, 27 October 2010 16:50:08 UTC