ISSUE-49: RDFNode type, equality and canonicalization [RDFa 1.1 API]

ISSUE-49: RDFNode type, equality and canonicalization [RDFa 1.1 API]

http://www.w3.org/2010/02/rdfa/track/issues/49

Raised by: Nathan Rixham
On product: RDFa 1.1 API

Currently there are three issues with RDF Interfaces which are all interrelated:

1) There is no way to find out the type of an RDF Interface, that is to say there is nothing to distinguish BlankNode from IRI at runtime, and the interface type is not accessible [NoInterfaceObject]

2) There is no way to compare if two RDFNodes are equal

3) The full details of this issue outlines that implementations will need to implement toNT methods (privately) to access the canonicalized form of an RDF Node for use in both equality and the toString() method of RDF Triple.

The proposal is to add the following methods to the RDFNode interface:

1) boolean equals( in RDFNode otherNode );

2) DOMString nodeType(); // returning the string name of the relevant interface IRI, BlankNode, PlainLiteral, TypedLiteral

3) DOMString toNT();


Full details of this issue, originally posted here:
  http://lists.w3.org/Archives/Public/public-rdfa-wg/2010Oct/0099.html

There are a lot of subtleties to equality, especially in the domains of javascript and RDF, so I thought it best to catalogue them all in a single mail.

First of all, in javascript (and many other languages) two objects (or variables containing objects) are only equal if they are a reference to the object, that is to that the following two objects are *not* equal:

  var a = document.data.createIRI('http://example.org/');
  var b = document.data.createIRI('http://example.org/');
  a == b; // false

Javascript (and some other languages) do implement type inference in the native equality implementation, for example if an object has a toString() method and is compared to a native string then the stringified form of the object is used for comparison:

  var a = document.data.createIRI('http://example.org/');
  var b = 'http://example.org/';
  a == b; // true

And bringing RDF in to the equation only compounds matters, the subtlety above is that we've just compared an RDFResource to a String Literal URI and received a false positive because we didn't consider canonicalization.

Thus, we will need to cater for this by adding an equality method to RDFNode, and by stipulating that it *must* compare canonicalized values of RDFNodes.

Which, by inference requires us to define the canonical form of IRI, BlankNode, PlainLiteral, TypedLiteral and quite possible RDFTriple too, since none of the existing toString or toValue methods are normalized (they do not expose the type or language via toString/toValue and as above you cannot compare the objects directly).

Typically I'd suggest that the canonicalized form of any RDFNode (+ RDFTriple) is it's N-Triples string value.

I also feel that the details of implementing this in the specification would be somewhat easier to understand if we added a toNT() method to RDFNode and RDFTriple and asserted that the implementation of the equals() method should simple call .toNT() on both objects and compare the (canonicalized string value) return.

I'm aware there has been some hesitance to have a clear dependency on N Triples (which we already have) and to expose a toNT method, however with the above considered then I strongly feel we should expose it, trying to describe how to do comparison/equality operations and canonicalization will be exponentially more difficult for users, us, and  implementers if we don't.

Finally, there is one other detail missing which affects the API, namely that none of our RDFNodes are typed, at runtime there is no way to tell if an IRI is an IRI or if a BlankNode is a BlankNode, and likewise no way to practically distinguish between them, more over you can't even tell if they are an RDFNode at all, they could be any old object.

This typing issue raises it's head in two places, first when implementing an optimal object equality method, and second - and indeed far more importantly - when implementing a Serializer.

Thus I would suggest that we need to add either a method or an attribute which exposes a name or identifier for the RDF Interface which is implemented; aside - would aligning with DOM and exposing a nodeType attribute be sensible, or would it confuse & conflate? 

Received on Wednesday, 27 October 2010 00:54:47 UTC