SWD WG review of SWEO IG Note "Cool URIs for the Semantic Web"

Index

Contents

Preamble
SWD WG Recommendations
Editorial issues

1. Preamble

The SWEO IG requested the SWD WG to review the Interest Group Note Cool URIs for the Semantic Web. Three members of the SWD have agreed to review the document. Their raw comments in chronolgical order read:

At the SWD WG F2F meeting (8/9 Oct 2007) it was agreed to compile recommmendations for the authors of IG note. In the following, these recommendations are listed, along with an ordered list of detailed, editorial issues.

2. SWD WG Recommendations

The general structure of the document is good. It provides reasonable solutions for many practical scenarios for the design and usage of URIs on the Semantic Web. The presented running examples facilitate easy and enjoyable reading of the document. This document represents the logical application of the TAG finding regarding the httpRange-14 issue in providing some simple guidelines for both those who seek to publish RDF graphs, and vocabularies building upon it on the Web, as well as for application developers who are building agents to process those data sets.

2.1. RDF serialisation (GRDDLable documents)

The note suggests basically two ways how to publish RDF-based descriptions on the Web. There seems to be an implicit assumption that the vocabulary is published externally serialised using RDF/XML or an alike serialisation. The question is how representations should be treated that are defined 'inline' that is all GRDDLable formats (such as RDFa, microformats, etc.). Will the proposed recipes (hash and slash) still be applicable in the this kind of setup? We recommend to address this issue by explicitly mention it in the Scope section of the document.

RECOMMENDATION: At the beginning of Sec. 4 (Two good solutions), we suggest adding a clarification as follows:

The solutions described in the following apply to deployment scenarios only in which the RDF data and the HTML data is served separately, such as a standalone RDF/XML document along with an HTML document. The solutions are not applicable in a setup, where the metadata is embedded in the HTML, such as RDFa or microformats - that is GRDDLable documents in general.

2.2. Scalability Issues

Although the IG Note tackles scalability issues roughly, we recommend to more explicitly state for which kind of problem each solution applies. A possible solution was pointed out in http://lists.w3.org/Archives/Public/public-swd-wg/2007Sep/0070.html.

RECOMMENDATION: To Sec. 4.3 (Choosing between 303 and Hash), we suggest adding a clarification after the conlusion as follows:

To address scalability issue with the management of a large set of URIs in case of the 303 solution, the usage of a SPARQL endpoint or comparable services is advised.

3. Editorial issues

A 'Scope' section right after the Abstract would help to identify the intended audience.
In Sec. 1 you write ' ... URIs and URLs share the same syntax ... '. Please, be more specific here; add references to the according RFCs (http://www.ietf.org/rfc/rfc2396.txt and http://www.ietf.org/rfc/rfc1738.txt)
In Sec. 1, between the paragraph 3 and 4 there seems to be a logical break, IMHO.
In Sec. 1, the last paragraph could go for example in the 'Scope' section.
Sec. 4 heading - please rephrase to something less marketing-like
In Sec. 4.2 the Fig. 4 seems a bit lost. Please provide more explanation and put in context.
Sec. 4.4 needs a major rewrite. For example add a proper reference to CHIPS and explain it. See for example the Manual of Style for how to reference ...
Sec. 4.6 would definitely benefit from references and some more details ...
Sec. 6.1: the sentence 'For a more complete list, see here.' needs to be rewritten; see also noClickHere. Put a proper reference as well into the sentence 'The problems with new URI schemes are discussed at length by Thompson and Orchard.'
Sec 6.2: The sentence 'Regarding FOAF's practice of avoiding URIs for people, we agree with Tim Berners-Lee: "Go ahead and give yourself a URI. You deserve it!"' seems not appropriate to me. Though I'm with you I don't see how this fits into this section. Please reformulate.
Sec. 9: Can you please check the IPR issues. I'm note sure if this is in accordance with W3C policies
As noted in the header this IG Note needs to be run through the pub rules checker
p. 4, paragraph 5: Since the AWWW is being cited did the authors consider using 'information resource' and 'non-information resource' instead of 'web document' and 'non-document resource'? The AWWW seems to prefer 'information resource' and I couldn't find any mention of 'web document' in it. A change like this would ripple out across the document. Also did the authors consider a few examples of an 'information-resource sniff test'? I think a few examples of how to determine if a resource is an information resource or a non-information resource would be useful.
p.5, paragraph 3: 'Requests for HTML would be redirected to the web page URLs we gave in section 2' might be clearer as 'HTTP requests for HTML content would be redirected to the HTML URLs we gave in section 2'.
p.9, paragraph 3: In addition to seeing the use of <link> to to allow agents to discover RDF associated with an HTML document, it would be useful to see a similar example using RDFa and GRDDL. Does the fact that RDFa isn't a recomendation yet preclude it from being used in this document?
p. 10, section 5: I would like to see dbpedia included since it uses the 303 redirect technique to link directly to a SPARQL query, and it is such a rich and evolving dataset. It would also be useful to be referred to a good hash URI real world example.
p. 14, section 9: Would the restriction against derivative works prevent this document from being used as a W3C reference document? For example the SWD had talked about possibly including portions of the document in the Recipes document or elsewhere and this license was perceived to restrict that usage.
The first paragraph of Section 1 could be a little more descriptive when mentioning the challenge of "...distributed modelling of the world with a shared data model...". If the audience of the document is (partially) not supposed to be absolutely familiar with the Semantic Web principles, the meaning of this could be explained in a little more detail.
The note about public archive in the fifth paragraph of the "Status of this document" section is perhaps not needed - the reader who is familiar with the W3C mailinglists will know that, others will be notified anyway when sending an e-mail there.
Maybe it would be better to change the font in the example statements presented in the second paragraph of Section 1 - for instance put "subjects" and "objects" in bold face and "predicates" in italic. This could improve the readability of this piece in line with the intended impact.
The remark on the possible MediaWiki adoption for Wikipedia is perhaps not completely appropriate in the end of Section 5, until the software would have been actually adopted.
The following comment about the last note in Section 6.2 about personal URIs is rather "philosophical". Even though the note comes out from a strongly backed opinion;), it is really questionable how to actually establish such personal URIs in an ideal, practical and systematic way. Imagine creating URI of John Smith in a company XYZ - we can use company's dedicated namespace to distinguish among this John Smith and other John Smiths around the world. But what if more John Smiths are in a company? We can use perhaps a namespace or URI prefix according to the departments these guys work in. But what if they work in the same department? So maybe we can use a time-stamp or a respective slashed namespace inferred from the date when they joined the company. Etc... Such situations may of course rarely occur in practice for persons, but we should have a recommended way how to solve them, since similar problems may be encountered also with names of products, services and so on. Thus, the document should either propose a recommended way of how to deal with such problems, or be a little more "careful" and avoid such potentially questionable strict statements.
Though the overall language quality of the document is quite high, it would be good to do some spell-checking and language clean-up by a native speaker. Just to mention few things noticed at the first sight even by a reviewing non-native speaker - the fourth paragraph of Section 1 is inconsistent in the tense used (from the sentence "In the remainder of this paper..." on - future and present tense are mixed); the third paragraph of Section 6.1 contains a typo ("resourcse").

OLD STUFF

Based on

The Cool URIs for the Semantic Web IG Note will be useful to cite in documents like the SWD's Best Practice Recipes and SKOS Core Guide because it will provide sufficient background information on what the recipes are trying to accomplish. In some ways I think the document highlights the limited focus the Recipes on RDFS and OWL vocabularies instead of RDF graphs in general.

A rather conceptual issue is not covered by the document. Maybe the solution of the following problem is implicitly "entailed" by the document, but if it is indeed, it is not very clear from the presented strategies how to actually deal with such situations. Consider Gene Ontology (GO in the following text), which is being monthly updated in many formats, RDF/XML being one of them. According to the requirement number 1 of the document under review, every resource identifiable by a URI should be on the web. But it is not very clear how to actually publish all resources GO contains following any of the strategies presented in the document. As stated in the Conclusion frame in Section 4.3, the hash URI strategy is not appropriate for GO, since its RDF/XML serialisation has currently about 30MB. It is definitely neither "rather small", nor "stable" (changes may occur every month). So, should we use the 303 URI approach? Possibly yes, but this would mean that we should establish and regularly maintain tens of thousands of different URIs for every resource represented in GO, each of this URIs representing generally very small piece of information (since the GO descriptions are usually rather shallow). It is questionable whether such approach is either reasonable or practical... Maybe we could combine both approaches, which is one of the possibilities mentioned in Section 4.3 as well. However, it is not very clear from the document, how we should actually combine the two approaches in order to deal with situations similar to this "GO problem" optimally. In conclusion, there may be two possible solutions - either allow and discuss "off-line" exceptions from the requirement number 1 (be on the web), or propose a reasonable way of combining the two approaches and give respective examples to cover the aforementioned problem. The former seems to be not very systematic and would also possibly require to change the "attitude" of the whole document substantially, so the latter seems to be more appropriate.