Notation3: The Great QName Survey

Summary: This is a survey of the disagreement between the
implementations of the Notation3 grammar, using the QName production
as an example.

The following are relevant pieces excerpted from 8/9 Notation3
grammars and parsers. The /DesignIssues/Notation3 BNF is usually taken
as definitive (but is broken). A conclusion and recommendation on the
actual QName production follows the various lists.

/DesignIssues/Notation3 [1]
   alpha = [A-Za-z]
   alphanumeric = [A-Za-z0-9_]
   prefix = ( alpha alphanumeric* ) | '_'
   localname = alpha alphanumeric*
   qname = prefix ":" localname

/2000/10/swap/notation3.py [2]
   _namechars = [a-z] + [A-Z] + [0-9] + '_-'
   qname() => _namechars* ':' _namechars* # v 1.87+ 2001/08/23

/2000/10/swap/rdfn3.g [3]
   PREFIX: r'[a-zA-Z0-9_-]*:'
   QNAME: r'([a-zA-Z][a-zA-Z0-9_-]*)?:[a-zA-Z0-9_-]+'
   EXVAR: r'_:[a-zA-Z0-9_-]+'

/2000/10/n3/notation3.py [4]
   _namechars = [a-z] + [A-Z] + [0-9] + '_-'
   qname() => ( _namechars* ':' _namechars* ) | _namechars*

/2000/10/swap/n3spark.py [5]
   qname: r' [a-zA-Z0-9_-]*:[a-zA-Z0-9_-]* '

/2001/03/flaten3/lexer.l [6]
   wordchar ([_A-Za-z$!]|[0-9])
   {wordchar}*":"{wordchar}+

/cvsweb/~checkout~/2001/blindfold/sample/n3.bnf [7]
   alpha ::= [a-zA-Z];
   alphanumeric ::= alpha | [0-9] | "_";
   nprefix ::= "" | ((alpha | "_") alphanumeric*);
   localname ::= alpha alphanumeric*;
   qname ::= nprefix ":" localname;

RDF::Notation3/Notation3.pm [8]
   $tk =~ /^([_a-zA-Z]\w*)?:$/o)
   $tk =~ /^([_a-zA-Z]\w*)?:[a-zA-Z]\w*$/o

eep/n3.py [9]
   Name = r'[A-Za-z0-9_]+'
   bNode = r'_:' + Name
   QName = r'[A-Za-z0-9]*:' + Name
   Prefix = r'[A-Za-z0-9]*:'

To summarize the various productions in a canonical format:-

/DesignIssues/Notation3
   prefix = [A-Za-z][A-Za-z0-9_]* | '_'
   name = [A-Za-z][A-Za-z0-9_]*

/2000/10/swap/notation3.py
   prefix = [A-Za-z0-9_-]*
   name = [A-Za-z0-9_-]*

/2000/10/swap/rdfn3.g
   prefix = [A-Za-z0-9_-]* | ([A-Za-z][A-Za-z0-9_-]*)? # ???
   name = [A-Za-z0-9_-]+

/2000/10/swap/n3spark.py
   prefix = [A-Za-z0-9_-]*
   name = [A-Za-z0-9_-]*

/2001/03/flaten3/lexer.l
   prefix = [A-Za-z0-9_$!]*
   name = [A-Za-z0-9_$!]+

/cvsweb/~checkout~/2001/blindfold/sample/n3.bnf
   prefix = '' | [A-Za-z_][A-Za-z0-9_]*
   name = [A-Za-z][A-Za-z0-9_]*

RDF::Notation3/Notation3.pm
   prefix = ([A-Za-z_]\w*)?
   name = [A-Za-z]\w*

eep/n3.py
   prefix = [A-Za-z0-9]* | '_'
   name = [A-Za-z0-9_]+

For comparison:-

Prefixes
   DesignIssues = [A-Za-z][A-Za-z0-9_]* | '_'
   notation3.py = [A-Za-z0-9_-]*
   rdfn3.g      = [A-Za-z0-9_-]* | ([A-Za-z][A-Za-z0-9_-]*)? # ???
   n3spark.py   = [A-Za-z0-9_-]*
   lexer.l      = [A-Za-z0-9_$!]*
   n3.bnf       = [A-Za-z_][A-Za-z0-9_]* | ''
   Notation3.pm = ([A-Za-z_]\w*)?
   Eep n3.py    = [A-Za-z0-9]* | '_'

Names
   DesignIssues = [A-Za-z][A-Za-z0-9_]*
   notation3.py = [A-Za-z0-9_-]*
   rdfn3.g      = [A-Za-z0-9_-]+
   n3spark.py   = [A-Za-z0-9_-]*
   lexer.l      = [A-Za-z0-9_$!]+
   n3.bnf       = [A-Za-z][A-Za-z0-9_]*
   Notation3.pm = [A-Za-z]\w*
   Eep n3.py    = [A-Za-z0-9_]+

It is interesting that only notation3.py and n3spark.py agree on the
QName production. As already mentioned, the DesignIssues BNF is
slightly ambigous in that it lists "alpha", "alphanum*" and "_" as the
prefixes, which doesn't make sense, and even disallows "void". The
rdfn3.g grammar is broken since it allows (for example) "_0" to be
declared as a prefix, but not used as a QName. The Eep n3.py was my
initial interpretation of the production, and I will change it to that
which I recommend below.

In general, it is better to "be conservative in what you write, and
liberal in what you accept" (to paraphrase Tim), so Notation3 parsers
should probably use the most liberal of the productions above, and
Notation3 writers (including humans) the most conservative. However,
it would be nice if everyone could agree on a production.

One implementation question is whether or not "_" as the bNode prefix
should be overridable. CWM allows one to do this, but I think that
this is confusing for people who are trying to learn Notation3, and
not all that difficult to ban in a parser. The recommendation below is
based upon all of the productions above.

Recommendation
   prefix = ([A-Za-z][A-Za-z0-9_]*)? | '_'
   name = [A-Za-z0-9_]+

Notes on the recommendation: The hyphen-minus "-" character is
generally disallowed since the DesignIssues note excludes it for its
grammar, reserving the character for future use. I have allowed "_" as
the first character of a name since I have seen this used in various
Notation3 files already - notwithstanding the fact that [1], [7], and
[8] disallow it.

Todo: now repeat for all of the Notation3 productions :-)

[1] http://www.w3.org/DesignIssues/Notation3
[2] http://www.w3.org/2000/10/swap/notation3.py
[3] http://www.w3.org/2000/10/swap/rdfn3.g
[4] http://www.w3.org/2000/10/n3/notation3.py
[5] http://www.w3.org/2000/10/swap/n3spark.py
[6] http://www.w3.org/2001/03/flaten3/lexer.l
[7] http://dev.w3.org/cvsweb/~checkout~/2001/blindfold/sample/n3.bnf
[8]
http://www.cpan.org/authors/id/P/PC/PCIMPRICH/RDF-Notation3-0.50.tar.g
z
[9] eep.py

--
Kindest Regards,
Sean B. Palmer
@prefix : <http://purl.org/net/swn#> .
:Sean :homepage <http://purl.org/net/sbp/> .

Received on Sunday, 17 February 2002 12:59:54 UTC