Re: Please make sure the grammar is directly machine consumable.

Richard Newman wrote:
> 
> 
> On 19 Aug 2005, at 04:12, Tim Berners-Lee wrote:
> 
>> Richard,
>>
>> I didn't realize the grammar in the spec is machine-generated.
>> Maybe it should be hand-edited and everything else
>> generated from it.
> 
> 
> I think that would be a good idea from one point of view (mine and  
> yours, certainly!), but we'd have to see what the current maintainers  
> of the SPARQL grammar think.
> 
>> Yosi (on vacation right now) has generated (with a small hand tweak)
>> the CFG grammar in RDF from the spec.   (See sparql* in
>> http://www.w3.org/2000/10/swap/grammar/
>> )  This is in plain BNF (  cfg:mustBeOneSequence properties
>> with nested RDF collections )
>>
>> See the bnf.n3 ontology in that directory as well as
>> the bnf-rules.n3 which go from some forms of ebnf to bnf,
>> also in that directory.
> 
> 
> Very handy (and pretty cool!). As it seems the tools are in place, it  
> would be nice to have a machine-readable 'spec' grammar that could be  
> re-purposed into presentation EBNF, JavaCC, plain BNF, etc. -- this  
> would certainly save me a lot of work whenever the grammar changes!
> 
> It is also nice, in an "eating one's own dog food" way, to have the  
> grammar itself in RDF.
> 
> -R
> 

This is not a response to the comment - just a description of some details 
in case it helps.

The grammar is written using JavaCC, which, while an LL parser generator, 
also provides tools to do LA checking.  JavaCC also provides a text output 
format.

The JavaCC text output is converted to the HTML for the document by a script 
although the tokens have to be manually described.  The process is 
converting javacc syntax to the EBNF syntax as described in
http://www.w3.org/TR/2004/REC-xml11-20040204/#sec-notation.

The grammar in javacc is not quite LL(1) (there is a 2 state lookahead at 
the Triples production - related to the optional dots Richard commented on). 
  The document grammar is also fed into yacker (a W3C tool) which checks for 
conversion to bison/flex (LALR(1)).

There are trade-off between readability by humans and processable by 
machines in the current grammar.  Some people find the weighting towards a 
machine-processable grammar makes the grammar unclear (e.g. the use of 
recursive rules use rather than repetition).

	Andy

Received on Sunday, 21 August 2005 16:39:06 UTC