Determining whether '<' is a beginning of IRI or 'less than' operator

I am not sure how should scanner for SPARQL determine whether '<'
character it encountered is beginning of an IRI or a comparison
operator.

Consider these queries:

SELECT * WHERE { ?a ?b ?c, ?d . FILTER(?a<?b && ?c>?d) }
SELECT * WHERE { ?a ?b ?c, ?d . FILTER(?a<?b&&?c>?d) }

Yacker validator results look troubling to me:
http://www.w3.org/2005/01/yacker/uploads/SPARQL?markup=html&lang=perl&text=SELECT+*+WHERE+%7B+%3Fa+%3Fb+%3Fc%2C+%3Fd+.+FILTER%28%3Fa%3C%3Fb+%26%26+%3Fc%3E%3Fd%29+%7D&action=validate+text
http://www.w3.org/2005/01/yacker/uploads/SPARQL?markup=html&lang=perl&text=SELECT+*+WHERE+%7B+%3Fa+%3Fb+%3Fc%2C+%3Fd+.+FILTER%28%3Fa%3C%3Fb%26%26%3Fc%3E%3Fd%29+%7D%0D%0A&action=validate+text

The first query validates, the other does not.
My guess is that the validator uses some flex-like scanner, that
prefers the longest tokens. In the first case "<?b && ?c>" can't be
parsed as IRI because of the spaces, so the scanner falls back and
'less than' rule is picked.
On the other hand, "<?b&&?c>" is a valid (according to the grammar)
IRI. But 'variable iri variable' is not a valid FILTER condition and
the parser rejects the query.

The problem is more obvious for scanners with one character
look-ahead, because they are completely unable to distinguish these
two cases.
They also have the same problem with () and [] tokens (NIL and ANON
terminals) but that can easily be solved by going from LL(1) to LL(2).

Jiri Dokulil

Received on Friday, 18 August 2006 16:10:56 UTC