Review of SWEO "Cool URIs for the Semantic Web" from Tim Berners-Lee on 2008-02-24 (public-sweo-ig@w3.org from February 2008)

From: Tim Berners-Lee <timbl@w3.org>
Date: Sat, 23 Feb 2008 22:25:37 -0700
To: public-sweo-ig@w3.org
Cc: tag@w3.org
Message-Id: <A638B161-C182-4E93-B956-3B5A8332BB19@w3.org>

I am commenting on the Working Draft of 17 December 2007.
http://www.w3.org/TR/2007/WD-cooluris-20071217/
The comments vary in weight, but I keep them in document order. My 30
comments are marked by **. I hope they make sense.

I think this is an important document. It contains a large amount of
very valuable material, a few places which are confusing, and a small
number places which are, I believe, actively misleading.
There are also one or two places where I disagree about the
recommendation it makes.
On the whole, though, the document is important and I hope energy is
found to incorporate these comments.

timbl
______________________________________________

1. Introduction

** Suggest add a reference to the N3 Primer as an introduction to RDF
which some find easier to get into.

2. Last para before 2.1.

** Delete the sentence "In short, to locate a Web document—hence the
term URL (Uniform Resource Locator)". (Don't go there. Don't try to
distinguish between names and locations. See for examplehttp://www.w3.org/DesignIssues/NameMyth.html)

** I would note that "URL simply identifies whatever we see when we
type it into a browser" is simplistic, as users are aware of the
difference between links and permalinks in a blog for example.

2.1

** After

"HTTP/1.1 200 OK
Content-Type: text/html
Content-Language: en"

add
"Content-Location: alice-en.html
"

** Start a new paragraph at "...English <p> Content ..."
** Add text to show how people how conneg works and how the client
understands the content-type specific URI.

** Technical bug: The example uses 302 Found to redirect according to
the Accept: headers. This is *not* advisable IMHO. It uses an extra
round trip to no advantage. Conneg should be done directly.
I suggest replacing this example with one without the 302 "twist".

3. URI for Real-world objects

** In general, remove the term "non-information resource" from the
entire document. Replace it with "thing". It is wrong. It is used
misleadingly to mean "A thing, which is not necessarily an information
resource".

** It would I think in document like this be best to stick with "web
document" instead of "information resource" too, but that is just for
readability. It is already done in places.

** Delete "We call all these real-world objects or (according to WWW-
Arch) non-information resources." (It is a bad term, as explained
above, and the AWWW does not use it at all).

3 .. Box "1 Be on the web".

** Important architectural philosophical point.
Replace "Machines should get RDF data and humans should get a readable
representation, such as HTML." with "Machines, and humans through user
agents, should get data in RDF (and related standards). In some cases,
it may be useful to provide a view of the data in HTML for users with
conventional web browsers without data functionality"

** Add text after the box. "This document describes ways of serving
both raw data and hypertext views of data. Remember that the most
important duty of the provider of data is to provide the data as soon
as possible, and raw. [ref to the blog "Give Us the Data Raw, and Give
it to Us Now" http://blog.okfn.org/2007/11/07/give-us-the-data-raw-and-give-it-to-us-now/
]. Other sites and other applications can often produce hypertext and
graphical views of the data. Data such as calendar events, RSS
events, bank statements, etc are much more powerfully displayed using
multiple client-side views.

That said, the ability to dereference a URI in an existing browser and
get meaningful results is valuable, and so provision of HTML, if it
can be done without undue cost, is valuable. This document describes
various ways of doing this."

Diagram before 3.1

** The relationships are a big vague. I think the relationships
expressed by the arrows in the diagram are both "description". The
two describing documents have different content-types. Maybe change
the arrows to read "description", and add "read by semantic web
applications" under the RDF and "Read by web browsers" under the HTML.

3.1 Distinguishing between web documents and real-world objects

** This section has major flaws in its argumentation. It says "Above
we assumed that there is a distinction between web documents
(information resources)andreal-world, non-document objects (non-
information resources). The question is where to draw the line between
them. "

That is, with respect, NOT the question. That is a question is one
which has proved unproductive. It is not fruit full to try to define
from scratch "Information resource" The question is to distinguish
between something and a document about something. That distinction
has been introduced already in the document and explained well. Now
we have to explain that 200 means "Here is the content of the document
you requested" and 303 means "Here is the URI of a document about the
thing you requested". When that has been explained, then the class
of things which get a 200 will be clear by people understanding the
protocol.

Later, it says 'The problem now is that web documents are also part of
our perceived world, hence they are real-world objects in their own
right.". But this is NOT a problem. Once you have thrown out non-
information resources" and replaced it with "things". ((For
example, mobydick#this may denote a book, and mobydick may denote a
library catalog card about the book. Both the book and the card are
documents, one is about the other. That is the relationship which is
important.))

I propose removing section 3.1

4.1. Hash URIs

** Change "and therefore cannot identify a Web document" to "and
therefore does not necessarily identify a Web document"

The diagram just before 4.2

** Remove "303 redirect". I hope that was a typo (copy/pasteo).

** Please add the Content-Location: headers to this diagram.

4.2. 303 URIs

** Change "to a different (information) resource which can be
represented as a document and can give you the information that you
want." to "to a document which has information *about* the thing you
asked about."

** Major technical question about the implementation of 303. I know
that dbpedia does it the way described, but there are a lot of good
reasons to do it by a 303 to a generic URI for the document, which
then itself does a conneg to RDF and HTML.

- It is no more round trips than the dbpedia way
- It gives the client a URI to bookmark which is generic. This is
important:
- It allows the user with an RDF-capable client to bookmark the
document, and mail it to another user (or another device) which then
dereferences it and gets the HTML view. This use of generic resources
is important.
- It provides the server with the ability to add representation in new
languages in the future.
- It is standard conneg and so probably more supported on servers

Just because client started with the URI of a thing, it doesn't mean
that the document involved is not a first class document on the WWW.
Best practices for this document apply. One of these is the use of
Generic Resources. (See for example http://www.w3.org/DesignIssues/Generic.html
and the new ontology )

4.3 Choosing ...

** I think a whole sentence at least could elaborate that if you use
303 for an ontology, like FOAF, then the network delay can be
intolerable for any client looking up a set of terms, even though the
client has already loaded everything there is to know.

** The text says: "To address scalability issue with the management of
a large set of URIs in case of the 303 solution, the usage of a SPARQL
endpoint or comparable services is advised". Why? There is no
justification for this. The 303 to an encoded SPARQL endpoint is IMHO
clumsy and a proxied normal URI would be better. In future, we may
have ways of associating whole URI subtrees with a SPARQL server, but
we don't yet. Suggest remove the sentence or expand and explain it.

** The text says: "Note also, that both 303 and Hash can be combined,
allowing to spread a large dataset into multiple parts and have an
identifier for a non-document resource. An example for a combination
of 303 and Hash is:
http://www.example.com/bob#thisBob, the person with a combined URI."
This is strange. Where is the 303 in this? This (bob#this) is an
important way of generating URIs, and deserves a section (insert new
4.3) of its own. For when databases are exposed for example, or other
virtual RDF linked data spaces generated from underlying systems.

4.3 ... Conclusion

** In first para, change "grow much" to "grow out of control" or "grow
extremely".

** Change "303 URIs should be used for large sets of data that are, or
may grow, beyond the point where it is practical to serve all related
resources in a single document." to
"URIs of the bob#this form can be used for large sets of data that
are, or may grow, beyond the point where it is practical to serve all
related resources in a single document.</p><p>
303 URIs may also be used for such data sets, making neater-looking
URIs, but with an impact on run-time performance and server load."

** Delete the paragraph "If in doubt, it's better to use the more
flexible 303 URI approach.".

4.5 Linking

** After the example box, change "This allows RDF-aware clients to
find a human-readable version of the resource" to "This allows RDF-
aware clients to find a human-readable resource". (The ?x! foaf:page
is not at all guaranteed to be an HTML version of ?x!
rdfs:isDefinedBy .)

** "authoritative". In what way is the document authoritative? When
an ontology defines a term, then the rdfs:isDefined by really means
the document gives definitive information from the owner of the term.
With alice's company giving data about alice, it is not clear that
this is authoritative. I would delete the rdfs:isDefined by unless
changing the example. I am not sure though whether the semantics of
this are that closely defined.

** Add a paragraph:

"The client also can deduce similar link information directly from the
HTTP headers: that a thing is described by the document its URI
redirects to with a 303s; that the content-location resource is a
content-specific version of the generic document, and so on.
Ontologies for these relations are not discussed here"

(Note the AWWSW group is looking at formalizing that more).

** In the para <<This allows RDF-aware Web clients to discover the RDF
information. The approach isrecommended in the RDF/XML specification
([RDFXML], section 9). If the information on the Web page differs
significantly from the RDF version, then we recommend using rel="meta"
instead ofrel="alternate".>> rewrite:

<<This allows RDF-aware Web clients to discover the RDF information.
The approach is recommended in the RDF/XML specification ([RDFXML],
section 9). If the RDF data is *about* the web page, rather than an
expression of the information in it, then we recommend using
rel="meta" instead of rel="alternate".
>>

(I think this distinction is important, and very much in line with the
distinctions made throughout the document)

5. Examples from the web

Last line of section 5:

Change "A better URI would be for examplehttp://ontoworld.org/rdf/Karlsruhe
." to "A better URI would be for examplehttp://ontoworld.org/data/Karlsruhe
." This is a cooler URI as it allows conneg to be introduced to allow
the same data to be expressed in rdf/xml or n3 or RIF or whatever we
think of next.

________________________________
ENDS

Received on Sunday, 24 February 2008 05:25:50 UTC