Review of SWEO "Cool URIs for the Semantic Web"

I am commenting on the Working Draft of 17 December 2007.
http://www.w3.org/TR/2007/WD-cooluris-20071217/
The comments vary in weight, but I keep them in document order.  My 30  
comments are marked by **. I hope they make sense.

I think this is an important document. It contains a large amount of  
very valuable material, a few places which are confusing, and a small  
number places which are, I believe, actively misleading.
There are also one or two places where I disagree about the  
recommendation it makes.
On the whole, though, the document is important and I hope energy is  
found to incorporate these comments.

timbl
______________________________________________

1. Introduction

** Suggest add a reference to the N3 Primer as an introduction to RDF  
which some find easier to get into.

2.  Last para before 2.1.

** Delete the sentence "In short, to locate a Web document—hence the  
term URL (Uniform Resource Locator)".  (Don't go there. Don't try to  
distinguish between names and locations. See for examplehttp://www.w3.org/DesignIssues/NameMyth.html)

** I would note that "URL simply identifies whatever we see when we  
type it into a browser" is simplistic, as users are aware of the  
difference between links and permalinks in a blog for example.

2.1

** After

"HTTP/1.1 200 OK
Content-Type: text/html
Content-Language: en"

add
"Content-Location: alice-en.html
"

** Start a new paragraph at "...English <p> Content ..."
** Add text to show how people how  conneg works and how the client  
understands the content-type specific URI.

** Technical bug: The example uses 302 Found to redirect according to  
the Accept: headers. This is *not* advisable IMHO.  It uses an extra  
round trip to no advantage.  Conneg should be done directly.
I suggest replacing this example with one without the 302 "twist".

3. URI for Real-world objects

** In general, remove the term "non-information resource" from the  
entire document.  Replace it with "thing".  It is wrong.   It is used  
misleadingly to mean "A thing, which is not necessarily an information  
resource".

** It would I think in document like this be best to stick with "web  
document" instead of "information resource" too, but that is just for  
readability. It is already done in places.

** Delete "We call all these real-world objects or (according to WWW- 
Arch) non-information resources."  (It is a bad term, as explained  
above, and the AWWW does not use it at all).


3 ..  Box "1 Be on the web".

** Important architectural philosophical point.
Replace "Machines should get RDF data and humans should get a readable  
representation, such as HTML." with "Machines, and humans through user  
agents, should get data in RDF (and related standards). In some cases,  
it may be useful to provide a view of the data in HTML for users with  
conventional web browsers without data functionality"

** Add text after the box.  "This document describes ways of serving  
both raw data and hypertext views of data.   Remember that the most  
important duty of the provider of data is to provide the data as soon  
as possible, and raw. [ref to the blog "Give Us the Data Raw, and Give  
it to Us Now" http://blog.okfn.org/2007/11/07/give-us-the-data-raw-and-give-it-to-us-now/ 
].  Other sites and other applications can often produce hypertext and  
graphical views of the data.  Data such as calendar events, RSS  
events, bank statements, etc are much more powerfully displayed using  
multiple client-side views.

That said, the ability to dereference a URI in an existing browser and  
get meaningful results is valuable, and so provision of HTML, if it  
can be done without undue cost, is valuable. This document describes  
various ways of doing this."


Diagram before 3.1

**  The relationships are a big vague.  I think the relationships  
expressed by the arrows in the diagram are both "description".  The  
two describing documents have different content-types.  Maybe change  
the arrows to read "description", and add "read by semantic web  
applications" under the RDF and "Read by web browsers" under the HTML.


3.1 Distinguishing between web documents and real-world objects

** This section has major flaws in its argumentation.  It says "Above  
we assumed that there is a distinction between web documents  
(information resources)andreal-world, non-document objects (non- 
information resources). The question is where to draw the line between  
them. "

That is, with respect, NOT the question.  That is a question is one  
which has proved unproductive.  It is not fruit full to try to define  
from scratch "Information resource" The question is to distinguish  
between something and a document about something.  That distinction  
has been introduced already in the document and explained well.  Now  
we have to explain that 200 means "Here is the content of the document  
you requested" and 303 means "Here is the URI of a document about the  
thing you requested".    When that has been explained, then the class  
of things which get a 200 will be clear by people understanding the  
protocol.

Later, it says 'The problem now is that web documents are also part of  
our perceived world, hence they are real-world objects in their own  
right.". But this is NOT a problem.   Once you have thrown out non- 
information resources" and replaced it with "things".    ((For  
example,   mobydick#this may denote a book, and mobydick may denote a  
library catalog card about the book. Both the book and the card are  
documents, one is about the other. That is the relationship which is  
important.))

I propose removing section 3.1

4.1. Hash URIs

** Change "and therefore cannot identify a Web document" to "and  
therefore does not necessarily identify a Web document"


The diagram just before 4.2

** Remove "303 redirect".  I hope that was a typo (copy/pasteo).

** Please add the Content-Location: headers to this diagram.

4.2. 303 URIs


** Change "to a different (information) resource which can be  
represented as a document  and can give you the information that you  
want." to "to a document which has information *about* the thing you  
asked about."

** Major technical question about the implementation of 303.   I know  
that dbpedia does it the way described, but there are a lot of good  
reasons to do it by a 303 to a generic URI for the document, which  
then itself does a conneg to RDF and HTML.

- It is no more round trips than the dbpedia way
- It gives the client a URI to bookmark which is generic. This is  
important:
- It allows the user with an RDF-capable client to bookmark the  
document, and mail it to another user (or another device) which then  
dereferences it and gets the HTML view.  This use of generic resources  
is important.
- It provides the server with the ability to add representation in new  
languages in the future.
- It is standard conneg and so probably more supported on servers

Just because client started with the URI of a thing, it doesn't mean  
that the document involved is not a first class document on the WWW.   
Best practices for this document apply.  One of these is the use of  
Generic Resources.  (See for example http://www.w3.org/DesignIssues/Generic.html 
  and the new ontology )


4.3 Choosing ...

**   I think a whole sentence at least could elaborate that if you use  
303 for an ontology, like FOAF, then the network delay can be  
intolerable for any client looking up a set of terms, even though the  
client has already loaded everything there is to know.

** The text says: "To address scalability issue with the management of  
a large set of URIs in case of the 303 solution, the usage of a SPARQL  
endpoint or comparable services is advised". Why?   There is no  
justification for this.  The 303 to an encoded SPARQL endpoint is IMHO  
clumsy and a proxied normal URI would be better.  In future, we may  
have ways of associating whole URI subtrees with a SPARQL server, but  
we don't yet.  Suggest remove the sentence or expand and explain it.

** The text says: "Note also, that both 303 and Hash can be combined,  
allowing to spread a large dataset into multiple parts and have an  
identifier for a non-document resource. An example for a combination  
of 303 and Hash is:
http://www.example.com/bob#thisBob, the person with a combined URI."
This is strange.  Where is the 303 in this?  This  (bob#this) is an  
important way of generating URIs, and deserves a section (insert new  
4.3) of its own. For when databases are exposed for example, or other  
virtual RDF linked data spaces generated from underlying systems.

4.3 ... Conclusion

** In first para, change "grow much" to "grow out of control" or "grow  
extremely".

** Change "303 URIs should be used for large sets of data that are, or  
may grow, beyond the point where it is practical to serve all related  
resources in a single document." to
"URIs of the bob#this form can be used for large sets of data that  
are, or may grow, beyond the point where it is practical to serve all  
related resources in a single document.</p><p>
303 URIs may also be used for such data sets, making neater-looking  
URIs, but with an impact on run-time performance and server load."

** Delete the paragraph "If in doubt, it's better to use the more  
flexible 303 URI approach.".

4.5 Linking

** After the example box, change "This allows RDF-aware clients to  
find a human-readable version of the resource" to "This allows RDF- 
aware clients to find a human-readable resource".  (The ?x! foaf:page   
is not at all guaranteed to be an HTML version of  ?x! 
rdfs:isDefinedBy .)

** "authoritative".  In what way is the document authoritative?  When  
an ontology defines a term, then the rdfs:isDefined by really means  
the document gives definitive information from the owner of the term.   
With alice's company giving data about alice, it is not clear that  
this is authoritative.  I would delete the rdfs:isDefined by unless  
changing the example.  I am not sure though whether the semantics of  
this are that closely defined.

** Add a paragraph:

"The client also can deduce similar link information directly from the  
HTTP headers: that a thing is described by the document its URI  
redirects to with a 303s; that the content-location resource is a  
content-specific version of the generic document, and so on.   
Ontologies for these relations are not discussed here"

(Note the AWWSW group is looking at formalizing that more).


** In the para <<This allows RDF-aware Web clients to discover the RDF  
information. The approach isrecommended in the RDF/XML specification  
([RDFXML], section 9). If the information on the Web page differs  
significantly from the RDF version, then we recommend using rel="meta"  
instead ofrel="alternate".>> rewrite:

<<This allows RDF-aware Web clients to discover the RDF information.  
The approach is recommended in the RDF/XML specification ([RDFXML],  
section 9). If the RDF data is *about* the web page, rather than an  
expression of the information in it, then we recommend using  
rel="meta" instead of rel="alternate".
 >>

(I think this distinction is important, and very much in line with the  
distinctions made throughout the document)


5. Examples from the web

Last line of section 5:

Change "A better URI would be for examplehttp://ontoworld.org/rdf/Karlsruhe 
." to "A better URI would be for examplehttp://ontoworld.org/data/Karlsruhe 
."  This is a cooler URI as it allows conneg to be introduced to allow  
the same data to be expressed in rdf/xml or n3 or RIF or whatever we  
think of next.

________________________________
ENDS

Received on Sunday, 24 February 2008 05:25:50 UTC