This is an archive of an inactive wiki and cannot be modified.

SWD WG review of SWEO IG Note "Cool URIs for the Semantic Web"


1. Preamble

The SWEO IG requested the SWD WG to review the Interest Group Note Cool URIs for the Semantic Web. Three members of the SWD have agreed to review the document. Their raw comments in chronolgical order read:

At the SWD WG F2F meeting (8/9 Oct 2007) it was agreed to compile recommmendations for the authors of IG note. In the following, these recommendations are listed, along with an ordered list of detailed, editorial issues.

2. SWD WG Recommendations

The general structure of the document is good. It provides reasonable solutions for many practical scenarios for the design and usage of URIs on the Semantic Web. The presented running examples facilitate easy and enjoyable reading of the document. This document represents the logical application of the TAG finding regarding the httpRange-14 issue in providing some simple guidelines for both those who seek to publish RDF graphs, and vocabularies building upon it on the Web, as well as for application developers who are building agents to process those data sets.

2.1. RDF serialisation (GRDDLable documents)

The note suggests basically two ways how to publish RDF-based descriptions on the Web. There seems to be an implicit assumption that the vocabulary is published externally serialised using RDF/XML or an alike serialisation. The question is how representations should be treated that are defined 'inline' that is all GRDDLable formats (such as RDFa, microformats, etc.). Will the proposed recipes (hash and slash) still be applicable in the this kind of setup? We recommend to address this issue by explicitly mention it in the Scope section of the document.

RECOMMENDATION: At the beginning of Sec. 4 (Two good solutions), we suggest adding a clarification as follows:

The solutions described in the following apply to deployment scenarios only in which the RDF data and the HTML data is served separately, such as a standalone RDF/XML document along with an HTML document. The solutions are not applicable in a setup, where the metadata is embedded in the HTML, such as RDFa or microformats - that is GRDDLable documents in general.

2.2. Scalability Issues

Although the IG Note tackles scalability issues roughly, we recommend to more explicitly state for which kind of problem each solution applies. A possible solution was pointed out in

RECOMMENDATION: To Sec. 4.3 (Choosing between 303 and Hash), we suggest adding a clarification after the conlusion as follows:

To address scalability issue with the management of a large set of URIs in case of the 303 solution, the usage of a SPARQL endpoint or comparable services is advised.

3. Editorial issues


Based on

The Cool URIs for the Semantic Web IG Note will be useful to cite in documents like the SWD's Best Practice Recipes and SKOS Core Guide because it will provide sufficient background information on what the recipes are trying to accomplish. In some ways I think the document highlights the limited focus the Recipes on RDFS and OWL vocabularies instead of RDF graphs in general.

A rather conceptual issue is not covered by the document. Maybe the solution of the following problem is implicitly "entailed" by the document, but if it is indeed, it is not very clear from the presented strategies how to actually deal with such situations. Consider Gene Ontology (GO in the following text), which is being monthly updated in many formats, RDF/XML being one of them. According to the requirement number 1 of the document under review, every resource identifiable by a URI should be on the web. But it is not very clear how to actually publish all resources GO contains following any of the strategies presented in the document. As stated in the Conclusion frame in Section 4.3, the hash URI strategy is not appropriate for GO, since its RDF/XML serialisation has currently about 30MB. It is definitely neither "rather small", nor "stable" (changes may occur every month). So, should we use the 303 URI approach? Possibly yes, but this would mean that we should establish and regularly maintain tens of thousands of different URIs for every resource represented in GO, each of this URIs representing generally very small piece of information (since the GO descriptions are usually rather shallow). It is questionable whether such approach is either reasonable or practical... Maybe we could combine both approaches, which is one of the possibilities mentioned in Section 4.3 as well. However, it is not very clear from the document, how we should actually combine the two approaches in order to deal with situations similar to this "GO problem" optimally. In conclusion, there may be two possible solutions - either allow and discuss "off-line" exceptions from the requirement number 1 (be on the web), or propose a reasonable way of combining the two approaches and give respective examples to cover the aforementioned problem. The former seems to be not very systematic and would also possibly require to change the "attitude" of the whole document substantially, so the latter seems to be more appropriate.