Review of Turtle doc (part 1) from Andy Seaborne on 2012-03-30 (public-rdf-wg@w3.org from March 2012)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Fri, 30 Mar 2012 15:43:11 +0100
To: RDF-WG <public-rdf-wg@w3.org>
Message-ID: <4F75C67F.9020303@epimorphics.com>
ACTION-157

First part of a review of the Turtle document, up to the ened of section 
5 (the Turtle grammar).

I'm getting this out now because it covers the structure and audience of 
the document and because it's already a bit long.

 Andy

== General

1/ Audience:

Who is the audience for the document?

   Data authors?
   Parser writers?

The doc, especially in the early sections, feels much more centered on 
parser writers but I'd like to see it being for data authors with parser 
writers material pushed towards the back in a "theory" section.

2/ Turtle? N-Triples?

The document, especially in the earlier sections, talks a lot about 
N-triples, e.g "Introduction" has an N-Triples example but no Turtle.

It reads almost like an Turtle/N-Triples comparison at times with Turtle 
assumed background knowledge.

It would be better to define Turtle, then to discuss N-Triples if you 
want to frame N-triples as a simple subset of Turtle.

The mixture, and comparing Turtle to N-triples (section 1 and 2), is 
quite confusing.  (Alternative, start with writing down triples, then 
introduce the Turtle syntax forms for better expression. But I suggest 
the "describe Turtle, then N-Triples" approach.)

2/ Grammars
    The two grammar are presented differently.
    The Turtle grammar is not up-to-date (e.g. the IRI naming changes)
    The Turtle grammar has some presentation issues.

    There is a reference to the BNF as definitive - but
    this doc is definitive and normative isn't it?

3/ Resolve all design issues.

There are various notes (not just note boxes) that need to be removed 
for last call.

== Title

The title is "Turtle" but the document covers Turtle and N-Triples.

"Turtle and N-Triples: syntaxes for RDF."

"""
Turtle: syntax for humans
N-Triples: syntax for machines
"""

== Abstract

"This document defines a textual syntax ..."

It defines 2 syntaxes!

"Turtle provides levels of compatibility ..." lead this reader to expect 
a definition of "level" and a list of syntax items for each level. 
Reword/remove.

== SOTD

"There are still a few rough edges ..."

All rough edges about design must be removed for a LC.  As must this text.

"The working group does not expect to make any large changes to the 
existing syntax."

Debatable and not an item for LC.

== 1 Introduction

* Remove "which in turn borrows ...."

The history lesson is a distraction in defining a spec.  Just say what 
Turtle is, not frame it relative to other things.

* para 2: remove the NT discussion.

* Remove NT example.

* Put in an example of realistic Turtle (not all syntax forms).  There 
isn't one until much later.

* As there is no tutorial section, the introduction needs to give the 
reader a brief overview.

* Add a document outline to explain the document structure.

== 2 RDF Terms in Turtle and N-Triples.

The TTL/NT thing again.

* "three types of RDF terms" -- from a syntax point of view there are 
more: IRIs, prefixed names, literals, label bnodes, []-bnodes.  This is 
what 2.1.1 is about - two ways to write IRIs.

The doc is moving between abstract data model and concrete syntax.

* Links and using the names of grammar productions is confusing.
   Very early to be thinking about the parser writer as the reader here.

== 2.1.1 Relative IRIs and Prefixed Names in Turtle.

This is two different sections - have one section on prefixed names and 
one on IRI resolution.

The second sentence is the definition. "A Prefixed Name is ..."  Put 
this first.

Not sure that "a prefixed name is mapped to an IRI" is clear - it's a 
process of mapping the prefix part and concatenating the local part.

"Note" - remove or at least put much later.

All this diving into escape sequences and differences from XML before 
it's clear what a prefixed name is,

"Relative IRIs are combined .. " use "resolved" not "combined" which 
hints at concatenation.

Example at end of 2.1.1 has no DOTs after directives.

Example has "prefixed IRIs" -- not defined terminology.

== 2.1.2 etc

The summary tables in 2.1.2, 2.2.3, 2.3.2 didn't help me.  Too much 
emphasis on the differences, not on the concepts.  Move all these to a 
Turtle vs N-Triples section later in the doc.

Keep examples to normal cases, the foo:bar\=baz example is stressing 
detail that draws the reader into escaping.

== 2.2 RDF Literals

"Literals in N-Triples ..." -- what about Turtle?

s/langugae/language/

"processing the escape sequences"
==>
"processing any escape sequences"

Discussion of rdf:langString in first para -- too detailed, mixes syntax 
and data model.  At this point the reader is barely introduced to a 
simple "foo".

== 2.2.1 Alternative Lexical Representations in Turtle

Change to "Different String Representions in Turtle

The lexical part is the thing in the abstract data model.  It is the 
same for the same literal.

Example needs @prefixes -- IMHO it is better when examples, especially 
early ones, are syntactically complete.

== 2.2.2 Abbreviating Common Datatypes in Turtle

We are not abbreviating the datatypes!

"""Representing Numbers in Turtle

Numbers can be written with lexical form and datatype (example) but 
Turtle has special syntax for numbers.
"""

Show full and abbreviated forms side-by-side.

Example uses ";" before described.

Example has no @prefix my: ...

"IV"^^my:romanNumeral detracts from the section on special syntax for 
numbers.

== 2.3. RDF Block Nodes

"Issue" must go for LC.

Again: "RDF Blank Nodes in N-Triples ..."

== 2.3.1

"the production blankNodePropertyList"

If you want to structure the document by referring to productions, at 
least copy the productions into the section discussing them so the 
reader can see the key area of the grammar at the time of reading the text.

== 3 Predicate Object Lists

This is two sections.

I suggest a section on ";" and then a section on ",".

Driving the structure of the document from the structure of the grammar 
is a big constraint. It does not help the data author audience.

First example: the output of parsing is absolute IRIs.  You said earlier 
that IRIs are solved as encountered.

Second example is wrong : ":subject :predicate :object" needs to IRIs.

"Corresponding N-Triples"

Reads as if Turtle parses to N-Triples.

== 4 Collections in Turtle

Start with a discussion of what RDF collections are, of the predicates 
and of rdf:nil.

Need example with () and rdf:rest/rdf:first forms.

== 5 Turtle Grammar

The end of line problem.

In the literal:

"""foo
bar"""

what are the characters for end-of-line?  NL, CRNL? Whatever the file has?

== 5.1 White Space

"significant in tokens IRI_REF and string"

'string' is not a token.

== 5.3 Escape Sequences

The numeric escape table has no boarders.

We should allow \U to cover the basic plane.

\U00000020 should be legal.

"traditionally" -- remove.

Remove quotes on '\t' -- it can be read as tjhe 4 chars '-\-t-' are code 
point 9.  This also fixes  '\''.

reserved character escapes:

Uses '\' where elsewhere the text would use monospaced orange \

== 5.4 Grammar

- "Production label consisting of a number and a final 's'"
   They aren't any in the Turtle grammar!  Remove.

- Fix formatting issues.
   Put in borders.

- Multiple double bracketing (( ))

- Unnecessary (...)? for a single item.

- Inline "@prefix" and "@base" -- they are used once.

- [24] Wrong naming.

- Why are there <> around tokens when defined and not when used?

- INTEGER_POSITIVE ::= "+" INTEGER

Firstly , that allows     +        123

Secondly, elsewher it says INTEGER is [+-][0-9]+ i.e. no _POSITIVE or 
_NEGATIVE.

SPARQL needs this because of 1 +2  and 1 + +2
Turtle does not.  Simplify.


- Several uses of \ to escape characters in the grammar

e.g. ECHAR ::= "\\" [tbnrf\\\"']

which makes the escape sequence \\t not \t

<WS> is no "\t" -- that makes the two characters \-t the white space.

- [55]

Blank rules.


- [62]

   This is not EBNF - remove.
Received on Friday, 30 March 2012 14:43:48 UTC