Re: Review of Turtle doc (part 1) from Gavin Carothers on 2012-04-11 (public-rdf-wg@w3.org from April 2012)

From: Gavin Carothers <gavin@carothers.name>
Date: Tue, 10 Apr 2012 20:24:01 -0700
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: RDF-WG <public-rdf-wg@w3.org>
Message-ID: <CAPqY83xoqpvAPHAvNeBpowXFH05Wu0xCegqJKvjALvLoAOiAhA@mail.gmail.com>
On Fri, Mar 30, 2012 at 7:43 AM, Andy Seaborne
<andy.seaborne@epimorphics.com> wrote:
> ACTION-157
>
> First part of a review of the Turtle document, up to the ened of section 5
> (the Turtle grammar).
>
> I'm getting this out now because it covers the structure and audience of the
> document and because it's already a bit long.
>
>        Andy
>
> == General
>
> 1/ Audience:
>
> Who is the audience for the document?
>
>  Data authors?
>  Parser writers?
>
> The doc, especially in the early sections, feels much more centered on
> parser writers but I'd like to see it being for data authors with parser
> writers material pushed towards the back in a "theory" section.

Leaning a bit too far to the parser side in that draft. Currently
working on getting it back to start with data authorsa more in mind.

>
> 2/ Turtle? N-Triples?
>
> The document, especially in the earlier sections, talks a lot about
> N-triples, e.g "Introduction" has an N-Triples example but no Turtle.
>
> It reads almost like an Turtle/N-Triples comparison at times with Turtle
> assumed background knowledge.
>
> It would be better to define Turtle, then to discuss N-Triples if you want
> to frame N-triples as a simple subset of Turtle.
>
> The mixture, and comparing Turtle to N-triples (section 1 and 2), is quite
> confusing.  (Alternative, start with writing down triples, then introduce
> the Turtle syntax forms for better expression. But I suggest the "describe
> Turtle, then N-Triples" approach.)

This was the WG resolution as well. Eric had tried in one draft but it
didn't go that well, I'm taking a stab at it now.

>
> 2/ Grammars
>   The two grammar are presented differently.

The method of getting from BNF input to HTML output is still poor,
while the grammars were still changing the process of going to HTML
got in the way too much.

>   The Turtle grammar is not up-to-date (e.g. the IRI naming changes)

Yep.

>   The Turtle grammar has some presentation issues.

Yep, even a note saying as much :\

>
>   There is a reference to the BNF as definitive - but
>   this doc is definitive and normative isn't it?

Yes.

>
> 3/ Resolve all design issues.

Remaning design issues have been resolved at last WG meeting.

>
> There are various notes (not just note boxes) that need to be removed for
> last call.
>
> == Title
>
> The title is "Turtle" but the document covers Turtle and N-Triples.
>
> "Turtle and N-Triples: syntaxes for RDF."
>
> """
> Turtle: syntax for humans
> N-Triples: syntax for machines
> """
>

Not thrilled with retitling but yes may have to. Will see after
N-Triple rewrite.

> == Abstract
>
> "This document defines a textual syntax ..."
>
> It defines 2 syntaxes!
>
> "Turtle provides levels of compatibility ..." lead this reader to expect a
> definition of "level" and a list of syntax items for each level.
> Reword/remove.
>
> == SOTD
>
> "There are still a few rough edges ..."
>
> All rough edges about design must be removed for a LC.  As must this text.
>
> "The working group does not expect to make any large changes to the existing
> syntax."
>
> Debatable and not an item for LC.

Was unclear on who was to write this text as it comes from the W3C not
the editor. Will write and submit a draft of SOTD text as soon as
possible.

>
> == 1 Introduction
>
> * Remove "which in turn borrows ...."
>
> The history lesson is a distraction in defining a spec.  Just say what
> Turtle is, not frame it relative to other things.

Done.

>
> * para 2: remove the NT discussion.
>
> * Remove NT example.

Not done... considering on the basis of other N-Triples rewites.

>
> * Put in an example of realistic Turtle (not all syntax forms).  There isn't
> one until much later.

Full "awesome" example added. (Sadly there are too many ninja turtles
to make for a good short example used Leigh Dodds
 spider man example.)

>
> * As there is no tutorial section, the introduction needs to give the reader
> a brief overview.

Agreed, or at least not throw the user into the deep end right away.

>
> * Add a document outline to explain the document structure.

Outlines tend to mean the document is too complex. Lets see if it
still needs one later. Not -that- long a document.

>
> == 2 RDF Terms in Turtle and N-Triples.
>
> The TTL/NT thing again.
>
> * "three types of RDF terms" -- from a syntax point of view there are more:
> IRIs, prefixed names, literals, label bnodes, []-bnodes.  This is what 2.1.1
> is about - two ways to write IRIs.
>
> The doc is moving between abstract data model and concrete syntax.
>
> * Links and using the names of grammar productions is confusing.
>  Very early to be thinking about the parser writer as the reader here.

Agreed.
>
> == 2.1.1 Relative IRIs and Prefixed Names in Turtle.
>
> This is two different sections - have one section on prefixed names and one
> on IRI resolution.

Split. Section rewritten.

>
> The second sentence is the definition. "A Prefixed Name is ..."  Put this
> first.

Done.

>
> Not sure that "a prefixed name is mapped to an IRI" is clear - it's a
> process of mapping the prefix part and concatenating the local part.
>
> "Note" - remove or at least put much later.

Section rewritten

>
> All this diving into escape sequences and differences from XML before it's
> clear what a prefixed name is,

Section rewritten

>
> "Relative IRIs are combined .. " use "resolved" not "combined" which hints
> at concatenation.

Done. Was corrected in other spots not this one.

>
> Example at end of 2.1.1 has no DOTs after directives.

Do now.

>
> Example has "prefixed IRIs" -- not defined terminology.
>
> == 2.1.2 etc
>
> The summary tables in 2.1.2, 2.2.3, 2.3.2 didn't help me.  Too much emphasis
> on the differences, not on the concepts.  Move all these to a Turtle vs
> N-Triples section later in the doc.

Moved this one so far. Will likely move all of them.

>
> Keep examples to normal cases, the foo:bar\=baz example is stressing detail
> that draws the reader into escaping.

Some examples at least should use escaping, but should avoid ones
DESIGNED for escaping other than in escaping sections.

>
> == 2.2 RDF Literals
>
> "Literals in N-Triples ..." -- what about Turtle?

Agreed.

>
> s/langugae/language/

Fixed.

>
> "processing the escape sequences"
> ==>
> "processing any escape sequences"

Fixed.

>
> Discussion of rdf:langString in first para -- too detailed, mixes syntax and
> data model.  At this point the reader is barely introduced to a simple
> "foo".

Fixed?

>
> == 2.2.1 Alternative Lexical Representations in Turtle
>
> Change to "Different String Representions in Turtle

Restructered.

>
> The lexical part is the thing in the abstract data model.  It is the same
> for the same literal.
>
> Example needs @prefixes -- IMHO it is better when examples, especially early
> ones, are syntactically complete.

Totally agreed. Been moving towards more and more full examples.

>
> == 2.2.2 Abbreviating Common Datatypes in Turtle
>
> We are not abbreviating the datatypes!

Original Team Submission language.

>
> """Representing Numbers in Turtle
>
> Numbers can be written with lexical form and datatype (example) but Turtle
> has special syntax for numbers.
> """

Yes that's better :D

>
> Show full and abbreviated forms side-by-side.

Agreed.

>
> Example uses ";" before described.

Triple patterns should come earlier in document.

>
> Example has no @prefix my: ...

Oops, should.

>
> "IV"^^my:romanNumeral detracts from the section on special syntax for
> numbers.

Agreed.

>
> == 2.3. RDF Block Nodes
>
> "Issue" must go for LC.

Gone!

>
> Again: "RDF Blank Nodes in N-Triples ..."

Yeah, needs work.

>
> == 2.3.1
>
> "the production blankNodePropertyList"
>
> If you want to structure the document by referring to productions, at least
> copy the productions into the section discussing them so the reader can see
> the key area of the grammar at the time of reading the text.

Agreed, don't talk about the production directly or have the production there.

>
> == 3 Predicate Object Lists
>
> This is two sections.
>
> I suggest a section on ";" and then a section on ",".

Sounds good.

>
> Driving the structure of the document from the structure of the grammar is a
> big constraint. It does not help the data author audience.

Nor does it in fact help parser authors that much either.

>
> First example: the output of parsing is absolute IRIs.  You said earlier
> that IRIs are solved as encountered.
>
> Second example is wrong : ":subject :predicate :object" needs to IRIs.

Second example needs to be valid Turtle too

>
> "Corresponding N-Triples"
>
> Reads as if Turtle parses to N-Triples.

Well it could! ... but no, agreed.

>
> == 4 Collections in Turtle
>
> Start with a discussion of what RDF collections are, of the predicates and
> of rdf:nil.
>
> Need example with () and rdf:rest/rdf:first forms.

Agreed.

>
> == 5 Turtle Grammar
>
> The end of line problem.
>
> In the literal:
>
> """foo
> bar"""
>
> what are the characters for end-of-line?  NL, CRNL? Whatever the file has?

/me places hands over ears and closes eyes

>
> == 5.1 White Space
>
> "significant in tokens IRI_REF and string"
>
> 'string' is not a token.

You know, stringish things! ... Yes, agreed.

>
> == 5.3 Escape Sequences
>
> The numeric escape table has no boarders.
>
> We should allow \U to cover the basic plane.
>
> \U00000020 should be legal.

Perhaps, but not in N-Triples.

>
> "traditionally" -- remove.

Agreed.

>
> Remove quotes on '\t' -- it can be read as tjhe 4 chars '-\-t-' are code
> point 9.  This also fixes  '\''.

Agreed.

>
> reserved character escapes:
>
> Uses '\' where elsewhere the text would use monospaced orange \

Will fix.

>
> == 5.4 Grammar
>
> - "Production label consisting of a number and a final 's'"
>  They aren't any in the Turtle grammar!  Remove.

Well... not in the HTML, see issues with HTML generation from BNF :(

>
> - Fix formatting issues.
>  Put in borders.

Do something. Yes.

>
> - Multiple double bracketing (( ))

See HTML formatting issues.

>
> - Unnecessary (...)? for a single item.

Mm, I think this is there for TriG? Will check.

>
> - Inline "@prefix" and "@base" -- they are used once.

Twice. Have to be used in language too.

>
> - [24] Wrong naming.
>
> - Why are there <> around tokens when defined and not when used?

See HTML generation issues.

>
> - INTEGER_POSITIVE ::= "+" INTEGER
>
> Firstly , that allows     +        123
>
> Secondly, elsewher it says INTEGER is [+-][0-9]+ i.e. no _POSITIVE or
> _NEGATIVE.
>
> SPARQL needs this because of 1 +2  and 1 + +2
> Turtle does not.  Simplify.

Slight change of goal from using exactly the same productions as
SPARQL... totally fine by me.

>
>
> - Several uses of \ to escape characters in the grammar

See HTML generation/Yacker/ARGH THERE ARE NO EBNF (W3C) TOOLS or if
their are somewhere please please tell me about them.

>
> e.g. ECHAR ::= "\\" [tbnrf\\\"']
>
> which makes the escape sequence \\t not \t
>
> <WS> is no "\t" -- that makes the two characters \-t the white space.
>
> - [55]
>
> Blank rules.
>
>
> - [62]
>
>  This is not EBNF - remove.
>
Received on Wednesday, 11 April 2012 03:24:32 UTC