Re: Increased lookahead requirements in the Turtle draft

* David Robillard <d@drobilla.net> [2013-02-17 17:43-0500]
> Hi,
> 
> I recently got a bug report from a user who's encountered dots in
> prefixed names in "Turtle" found in the wild which my parser does not
> yet support.  So, I looked at the draft towards implementing this.
> 
> Unfortunately it looks like a can of worms for a simple
> recursive-descent parser.  The previous specification could be
> implemented with 1 character of lookahead, but I don't think this one
> can.
> 
> Since a PrefixedName can contain a dot, while reading a PrefixedName if
> the next character is a dot, it is ambiguous whether or not the dot is
> part of the PrefixedName or the end of a statement.  To determine this,
> you need to check whether or not the next-next character is a valid
> PrefixedName character, and until this is known, neither the dot nor the
> next character can be 'eaten'.
> 
> The significance is that *1* character of "lookahead" isn't really
> lookahead, you just need a peek().  Anything greater requires some kind
> of real lookahead implementation, or at least some crafty case-specific
> kludges to get around it.
> 
> This is not necessarily a spec problem, and two character lookahead is
> not an onerous requirement in general, but compared to 1 it is.  I just
> thought it was worth mentioning that there is a considerable new
> implementation requirement here.  I will have to pay a price in
> throughput for this as well.
> 
> It's clear, though, that dots in prefixed names are desirable.  Ideally,
> tokens, including the delimeters (i.e. '.' and ';'), would be whitespace
> delimited, so reading a PrefixedName would simply stop when whitespace
> is encountered and this problem would not exist.  Perhaps not realistic
> given existing practice, but it would certainly be nice.

Many apologies for the response time on this. Thank you for your
comment and I hope you enjoyed implementing Serd. As to the request
for a required whitespace character after numeric literals, the RDF
Working Group believes that introducing a requirement for whitespace
before '.' and ';' will break a considerable fraction of the deployed
Turtle and introduce unfortunate incompatibilities with SPARQL and
Notation3. If you are content with this resolution, please reply with
"[RESOLVED]" in the begging of the Subject:.


> Cheers,
> 
> -dr



-- 
-ericP

office: +1.617.599.3509
mobile: +33.6.80.80.35.59

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

There are subtle nuances encoded in font variation and clever layout
which can only be seen by printing this message on high-clay paper.

Received on Saturday, 2 November 2013 12:34:37 UTC