Rewrite for Turtle rule 7

ACTION-175
== Summary

predicateObjectList ::= verb objectList (";" (predicateObjectList)?)?



== The problem

We want to accept

S P O .
S P O ; P O .
S P O ; .
S P O ; P O ; .

but not
S P O ; ; P O .
S P O ; ;  .
S P O ; P O ; ;  .


== The details

The rule in the Turtle draft, and in the submission, is:

[7]
predicateObjectList ::= verb objectList ( ';' verb objectList )* ( ';')?

For a parser that works simply from left-to-right, looking no more than 
one token ahead, it reads "verb objectList" then sees (lookahead of one) 
a ";".  What does it do?

Does it go into
   ( ';' verb objectList )*
and process "verb objectList"
or skip this because * can be zero occurrences, and go on to:
   ( ';')?
and exit the rule?

The next token is going to be "." in legal Turtle but that is 2 tokens 
of lookahead, ";" and ".".


== A solution

predicateObjectList ::= verb objectList (";" (predicateObjectList)?)?

which is recursive on predicateObjectList.

After the first "verb objectList", if the parser sees ";", then it 
enters the outer ()?

";" (predicateObjectList)?

consuming one ";" and tries again.

A predicateObjectList can't be empty, (predicateObjectList)? can be.

The next token is going to be "." in legal Turtle.

The recursion can be unrolled into a loop.  javacc is an LL(1) does not 
do this - it generates a recursive call and on a very large file can run 
out of stackspace.  There is a test in the turtle test suite for this 
test-16.ttl.


== SPARQL

SPARQL does not impose the one trailing semi-colon.  Recursion was not 
popular.

PropertyListNotEmpty  ::=  Verb ObjectList ( ';' ( Verb ObjectList )? )*

which allows

S P O ;;;; .

After the first  Verb ObjectList, if it sees a ";" it goes into the ()* 
and then it may allow another Verb ObjectList.  It loops back if it see 
a ";" next.

It is not recursive -- it's a loop and compiler generators do generate a 
loop so no problems at scale.

It (SPARQLs approach) is different from deployed Turtle.  I personally 
don't see this as an important issue as there other changes like local 
part of prefix names, specifically leading digit, which are far more 
likely in new data sent to old parser.


== Experiment

Attached is a JavaCC grammar file (LL(1), the default for javacc) for a 
little language covering tokens that are S<number> P<number> O<number> 
so you can see what is accepted where.

 T := ( <S> POL "." )* <EOF>
 VO := <P> <O>
 POL := VO ( ";" ( POL )? )?

TOKENs
  P:= "P" ["0"-"9"]
  S:= "S" ["0"-"9"]
  O:= "O" ["0"-"9"]

so VO is a verbObjectList and POL is predicateObjectList

== For Eric

The generated code for POL is: my comments using ##

   final public void POL() throws ParseException {
     ## POL := VO ( ";" ( POL )? )?
     ## Debug mode - pritns what it is doing.
     trace_call("POL");
     try {
       ## unconditionally do VO
       VO();
       ## Look for ";" or "."
       switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
       case 2:
         ## the case of ";"
         ## Enter ( ";" ( POL )? )?
         jj_consume_token(2);
         switch ((jj_ntk==-1)?jj_ntk():jj_ntk) {
         case P:
           ## It's a P so it must be POL from (POL)?
           POL();
           break;
         default:
           ## Not a P - end (POL)?
           jj_la1[1] = jj_gen;
           ;
         }
         break;
       default:
         ## Not a ";" do not enter ( ";" ... )?
         jj_la1[2] = jj_gen;
         ;
       }
     } finally {
       ## Debug.
       trace_return("POL");
     }
   }

I can send the entire parser ...

Received on Wednesday, 20 June 2012 17:50:39 UTC