comments on 17 December 2013 WD of RDF 1.1 Primer from Bob DuCharme on 2013-12-30 (public-rdf-comments@w3.org from December 2013)

From: Bob DuCharme <bob@snee.com>
Date: Mon, 30 Dec 2013 18:25:33 -0500
To: public-rdf-comments@w3.org
Message-ID: <52C200ED.2080601@snee.com>
There's a lot of good stuff in it, but because it's a Primer, I assume 
that its intended audience is people who are new to RDF, and the 
document often assumes too much about the reader's knowledge of 
technical specification vocabulary.

I've divided up my comments into two lists: comments about substance 
followed by picky copyediting suggestions. Suggestions often show a 
quoted phrase from the Primer followed by a suggested revision. For 
example,

   "cypress-tree" cypress tree

is a suggestion to replace "cypress-tree" with "cypress tree".


=== substantive (to varying degrees)  ===

Section 1. says "The Resource Description Framework (RDF) is a framework 
for describing info about resources in the World Wide Web." 1.1 says 
that "An IRI identifies a web resource" and then references 
http://www.ietf.org/rfc/rfc3987.txt, but I couldn't find anything in 
that RFC about IRIs being limited to the identification of web 
resources. I know that URLs define web resources, but if I assign an IRI 
to the chair I'm sitting in, couldn't I use RDF to state facts about the 
chair's location, manufacturer, etc., without this having anything to do 
with the web? Or am I misunderstanding something? I always thought that 
we could assign IRIs to absolutely anything and then use RDF to describe 
them; limiting its use to web-based resources really limits its power.

3.1 "Resources typically occur in multiple triples, for example Bob and 
the Mona Lisa painting in the examples above." The Mona Lisa resource 
only occurs in one triple above this sentence, not two, unless you want 
the reader to assume case insensitivity in the sample data, which I 
think is a bad idea. I would capitalize <the Mona Lisa> consistently and 
then explicitly point out how the same resource can appear in the 
subject of one triple and the object of another, which is a new idea at 
this point of the Primer. (A wonderful new idea!) After normalizing the 
capitalization, the sentence might be better off like this: "The same 
resource is often referenced in multiple triples. In the example above, 
Bob is the subject of four triples, and the Mona Lisa is the subject of 
one and the object of another. This ability to have the same resource be 
in the subject position of one triple and the object position of another 
makes it possible to find connections between triples, which is an 
important part of RDF's power. We can therefore visualise triples as..."

"The example above... an RDF graph" Move that paragraph before the 
"Resources typically" paragraph, i.e. right after the example itself, 
maybe in one of the green "NOTE" blocks.

In the note that begins

   The RDF Data Model is described in this section in the form of an 
"abstract syntax"

do "encoding" and "concrete RDF syntax" refer to the same thing? If so, 
make that clearer. I think it would be better off to never use the word 
"encoding," which people are more likely to associate with things like 
UTF-8 vs. Latin 1, and instead use the term "concrete syntax" 
consistently. The first time the Primer uses the phrase "concrete 
syntax," a parenthesized phrase after it could say something like "(the 
syntax used to represent triples stored in text files)", because as a 
Primer this should provide more hints about the meaning of highly 
technical phrases. These same issues come up in the paragraph of Section 
5 beginning "Many different concrete syntaxes..."

"three types of RDF data that occur in triples" three types of RDF 
resources that occur in triples

"The notion of IRI is a generalization of URI (Uniform Resource 
Identifier)" To assume that someone who doesn't understand RDF (the 
intended audience of the Primer) understands what URIs are and their 
relationship to URLs is a huge, huge  assumption. How about adding, 
after the sentence with this, something like "The URLs (Uniform Resource 
Locators) that people use as web addresses are one form of URI, with an 
important difference: URIs are not necessarily locators that provide the 
address of a resource; they are often merely identifiers that provide a 
unique ID for a given resource. IRIs are a generalization of this 
because..."

3.2 "RDF is agnostic about what the IRI stands for" Unlike section 5.1 
("in this example foaf:Person stands for 
<http://xmlns.com/foaf/0.1/Person>") I think that "stands for" is not 
appropriate here. (After all, IRI stands for "International Resource 
Identifier.")  "Represents" or "identifies" would be better.

3.4 I don't think algebra variables are a very good analogy here. Those 
are named things that may not have values, and blank nodes are unnamed 
things that do have values.

Section 3.4 overall is a little too brief and abstract for an RDF 
neophyte. Blank nodes are a difficult concept for people who are new to 
RDF. Either don't cover them in the Primer or cover them a bit more. For 
example, this section would greatly benefit from a new diagram similar 
to the one in Figure 1 that includes the cypress tree.

Also: 'Resources such as the unidentified cypress tree are called "blank 
nodes" in RDF.' The resource (the tree, in this case) is not called a 
blank node. How about this: 'Resources without identifiers such as the 
painting's cypress tree can be represented by "blank nodes" in RDF.'

3.5 "does not specify a particular semantics" That's normative 
spec-speak, not primer-speak, and should be reworded to be clearer to 
beginners. A bit later, the "i.e." parenthetical remark after "RDF 
provides no way to convey this semantic assumption" provides a good 
model of connecting this high-level talk of semantics to the actual data 
being discussed.

Section 1 said that "For example retrieving http://www.example.org/bob 
could provide data about Bob," leading me to believe that this URI 
represented the resource Bob. In the section on named graphs, the same 
URI represents a named graph, not a person. I understand that this 
doesn't invalidate the "For example" sentence--if it's the name of a 
graph, retrieving it could still "provide data about Bob"--but I think 
this can still confuse the RDF beginner, and recommend that the examples 
in the section on named graphs use new IRIs that have not appeared in 
the Primer before.

"In the example default (unnamed) graph below we see two triples that 
have a graph name as subject:" Insert a sentence before this about why 
someone would want to do this, e.g. "When you can reference a graph with 
a IRI, you can create triples that provide metadata about that graph."

"subsets of triples" doesn't make sense. "subsets of a dataset [ or 
collection] of triples"?

4. "For example, one can state that the IRI ex:friendOf can be used as a 
property" the idea of this being an IRI will come as a complete surprise 
to the reader, because the use of prefixes hasn't been discussed at all 
yet. (Is a qname considered an IRI?) The original RDF Primer at 
http://www.w3.org/TR/rdf-primer/ has a good paragraph beginning "The 
full triples notation requires" that introduces this well. However it's 
done, as a Primer this should explain any new syntax, such as the use of 
namespace prefixes, before using that syntax.

"domain respectively range restrictions" domain and range restrictions, 
respectively (The sentence with this is another example of assuming a 
pre-existing, strong understanding of the relevant technical vocabulary 
by the reader; the Primer really should have a few more sentences to 
explain the use of rdfs:domain and rdfs:range, which is always a 
difficult point with RDF beginners.)

After Example 2 add something like this, because the idea of (and value 
of!) properties as subjects or objects in triples has not been covered 
at all up to this point and often comes as a surprise to people with an 
object-oriented background: "Note that, while <is a friend of> is a 
property typically used as the predicate of a triple (as it was in 
Example 1), properties like this are themselves resources that can be 
described by triples or provide values in the descriptions of other 
resources. In this example, <is a friend of> is the subject of triples 
that assign type, domain, and range values to it, and it's the object of 
a triple that describes something about the <is a good friend of> 
property."

"RDFa (for HTML embedding)" I always think it's a shame that people 
think that RDFa is only for use with HTML. It can be very useful with 
other kinds of XML as well; see 
http://www.devx.com/semantic/Article/42543 . I would love to see the 
several references to this say "for HTML and XML embedding."

Section 5.1 is more like a quick reference of Turtle syntax than a 
Primer, because it covers so much so quickly. Readers who are new to RDF 
(the intended audience of this document) will find it confusing. A brief 
introduction to N-Triples before the Turtle part would make the Turtle 
part much easier to understand, because then the reader will understand 
that the use of angle brackets around full IRIs, quotes around literals, 
and a period after each triple are the most important parts of the 
syntax and that everything else in Turtle is just a syntactical convenience.

"the predicate-object part of triples with <http://example.org/bob#me> 
as subject"  the predicate-object part of triples that have 
<http://example.org/bob#me> as their subject

"The semicolons at the end of lines 9-11 indicate that the set is not 
yet complete. A period is used to signal the end of a Turtle statement." 
The use of "set" here is confusing. Set of what? I know that it refers 
to predicate-object pairs associated with a common subject, but someone 
new to Turtle might think that it's some specific Turtle construct. I 
think it would be better to say "The semicolons at the end of lines 9-11 
each indicate the the predicate-object pair that follows them is part of 
a new triple that uses the most recent subject shown in the data--in 
this case, <bob#me>."

'The term _:x is a blank node. It represents some unnamed tree depicted 
in the Mona Lisa painting and belonging to the "Cypress" class.'  The 
term _:x is a blank node. It represents an unnamed resource depicted in 
the Mona Lisa painting that is an instance of the "Cypress" class. [It's 
safer to say that it represents a resource, not a tree, and the idea of 
"belonging" here is not quite accurate.]

=== copyediting ===

There are several places where "for example" should have a comma after 
it: "For example retrieving", "For example a dataset about paintings", 
"For example 'Léonard de Vinci'",

3.1 " <subject>  <predicate> <object>" has an extra space after <subject>

- "multiple triples, for example" [em dash not comma]

"allow writing literals" allow writing of literals

"markup webpages": "mark up" should be two words when used as a verb. 
I'd say "web pages" as two words as well.

"Library of Congress published its..." The Library of Congress published its

The phrase "Using the Web Ontology Language" would make sense, but 
"Using the OWL" in section 4 does not. Just say "Using OWL."

"a RDF vocabulary" an RDF vocabulary

"the reader is referred to the Turtle document" see the Turtle document

" the reader can find for each RDF syntax corresponding"  the reader can 
find, for each RDF syntax, corresponding

"cypress-tree" cypress tree

"cater for" cater to [although that could just be a British vs. American 
usage thing]

"semantics which is specified in the RDF" semantics which are specified 
in the RDF

"Wikidata, a free, collaborative..." end that bullet point with a period 
like the other bullets in that list.


Thanks,

Bob DuCharme
Received on Monday, 30 December 2013 23:25:40 UTC