major technical: blank nodes from Fred Zemke on 2006-01-12 (public-rdf-dawg-comments@w3.org from January 2006)

From: Fred Zemke <fred.zemke@oracle.com>
Date: Thu, 12 Jan 2006 13:38:32 -0800
To: public-rdf-dawg-comments@w3.org
Message-ID: <43C6CC58.4040209@oracle.com>
Blank nodes of the form _:a and [] do not add anything to the language. 
Everything that can be expressed with such blank nodes can be expressed
with variables.  What is the difference semantically between
_:a and ?a ?   The only difference I can see is that _:a can not be
placed in the SELECT list (and there does not appear to be any
motivation for this).  Thus if the user, in the course of writing a
query, later decided he wants to receive the value of the blank node,
he must rewrite the query with a variable in place of the blank node.
The user might as well just write the query without blank nodes from
the beginning. 

In addition, the term "blank node" creates a false analogy with RDF. 
An RDF blank node is a node in a graph with no IRI.  A SPARQL blank node
is not a node at all, it is actually a variable that cannot be named in
the SELECT list.  Note that the definition of pattern solution in
section 2.4 says that a SPARQL blank node can be mapped to an RDF
term that is not an RDF blank node, and conversely a variable may be
mapped to an RDF blank node.  Thus the two notions of blank
node have nothing to do with one another aside from the notation that
is employed.

A possible reply is that the "SELECT *" only selects
the variables and not the blank nodes, so the distinction has a meaning.
However, SQL has found that the wildcard asterisk in the SELECT list
was a bad language idea, and I do not recommend it for SPARQL.

This is not a criticism of the blank nodes of the form [ :p "v" ],
which correspond to the linguistically useful "that which" construction.
Perhaps the reply is that blank nodes of the form _:a or [] exist
to provide the translation for [ :p "v" ].  However, the rule for
translating these says that the implementation must create a unique blank
node, ie, different from any that the user has already placed in the
query; it could just as well be worded to say that the implementation must
create a unique variable name, different from any the user has chosen.
The specification could also be written so that any variables created
by the implementation would not be visible to SELECT *, in the unfortunate
event that you keep that notation.

My preference would be to eliminate SPARQL blank nodes
from the language as unnecessary and liable to cause confusion with
users.  If you don't accept that, my next proposal would be to come
up with some other term for these gadgets (though I think that the use
of similar notation for RDF blank nodes and SPARQL blank nodes will
cause confusion even if you change the term).  My last recourse position
would be that the entire document should
be scanned to replace every occurrence of "blank node" with either
"RDF blank node" or "SPARQL blank node". 

As an example of the possible confusion caused by SPARQL blank nodes,
consider the arbitrary ordering in 10.1.3 "ORDER BY",
which sorts blank nodes second, after unassigned
variables and before IRIs.  The term "blank node" is ambiguous,
meaning either an RDF blank node or a SPARQL blank node.  In this context
you must mean an RDF blank node, since a SPARQL blank node is a
piece of syntax and not a value. 

This is not a criticism of blank nodes in CONSTRUCT templates, where
there actually is connection between the SPARQL blank nodes and
RDF blank nodes.  Blank nodes in the CONSTRUCT template should be
retained.

Fred Zemke
Received on Thursday, 12 January 2006 21:38:39 UTC