HCLS/HCLS URI matrix

From W3C Wiki

Comparison matrix of URI proposals

We're talking about minting URIs, or choosing among equivalent URIs, for use in RDF. See HCLSIG_BioRDF_Subgroup/Tasks/URI_Best_Practices/Recommendations for context.

Remember that there are at least two distinct issues here:

  • approach to minting new URIs (when one is the authority)
  • particular URIs to use for public database records (authority not clear)
Feature / quality Science Commons PURLs W3C TAG Recommendation LSID 'Francois and Marc-Alexandre ideas' UniProt PURLs
Stakeholders: publishers and tools making use of it (not only the URI scheme, but also the resolution system) HCLS demo many? BioMoby, CardioSHARE, Biodiversity Information Standards (TGWB), Biodiversity Information Facility (GBIF) Bio2RDF and some labs in Canada UniProt Consortium
Behaves as expected by Semantic Web - naive users ('clickable' etc.) yes yes no yes yes
How to set up a resolver How to set up a resolver in Java How to set up an Authority and Metadata Resolver in Perl Main server at bio2rdf.org. Can be replicated using the package on SourceForge N/A
How to set up a client How to set up a client in Perl Is designed to be usable with generic HTTP clients/libraries Can use generic HTTP clients/libraries.
Transparency of ontology / metadata standards used If accepted by the community, an ontology could easily be built using the currently provided URI's ?
Distinguishes (for example) a class of proteins from the document describing the class of proteins yes in principle in principle yes yes
Encourages re-use of URIs by independent parties (e.g. URIs do not 'look' proprietary) Probably yes. PURLs appear to some to be independent and stable. It is the intention that LSIDs are re-used by third parties, and there is a part of the specification describing how third-parties can register themselves with the primary authority such that they can be discovered during resolution. The reliance on a primary authority to always be available may be just as practical a problem as other urn based schemes though. URIs contain "bio2rdf", although the functionality of the URI does not rely on the http://bio2rdf.org section. May seem more unstable than a urn based proposal which seems on the surface to be more clearly planned out no (except for UniProt resources)
Identify and/or access different document formats (rdf, html, xml etc.) distinct URIs CN Resolution to data provides whatever document format was intended by the provider of the identified "thing". Resolution to metadata (may) provide other LSIDs pointing to the same information in other formats. Yes. Linked to original document when available. Resulting document is RDF with clear links to any other representations without further negotiation. Uses content-type negotiation to decide on whether to forward to a web page or a machine readable representation.
Capture document contracts (e.g. stability, permanence) Assumes users will cache documents if they want a guarantee, as most of the data is dynamically generated from providers on demand. no
Support for versioning (of what exactly?) Promise of bitstream identity (or nothing) for whatever is returned by the getData resolution method. No versioning of metadata associated with the same LSID. This is useful for when LSIDs are used to identify documents or records. There is no explicit semantics however on the version identifier other than that it is different to another one. Rely on upstream providers to version their information, which is then reflected in a different identifier, possibly with a relationship to previous versions if indicated by providers. Version numbers may appear as part of identifiers, but there is no explicit support.
Provide for cases where data is available in a structured, but not semantically enhanced format Based on RDFizers, which may utilise RDF where available but do not rely on providers changing their systems to be able to utilise and integrate the information into the ultimate semantic web goal
(add others)

Discussion of this matrix

  • I think "Encourages re-use of URIs by independent parties" is the most important feature of all, although it is largely neglected. Many entities are not re-used by other ontologies simply because of their namespace. (MatthiasSamwald)
    • What exactly does "Encourages re-use of URIs by independent parties" mean? Neutral branding? Easy to discover the associated URI declaration? Resolution stability? Some combination thereof? -- DBooth
      • It should at least mean that discovery is not painful, as this is where the effort barrier to "encouraging re-use" will be broken. Neutral branding isn't as important in my opinion as it doesn't give the impression that the ontology is accountable to a certain organisation. PeterAnsell
  • RDFa is not integrated in the current proposals [JAR answers: because RDFa has to do with the representation of RDF; syntax is independent of URI choice]
  • SPARQL endpoints are not integrated in the current proposals [JAR answers: URI choice potentially affects every context of use, and there is nothing special about SPARQL as far as I know]
  • The main difficulties I have with this matrix are:
    • It is not comparing apples to apples. For example, "W3C TAG Recommendation" is very broad, and presumably would encompass (nearly?) all of the other proposals, whereas LSID is very specific. Another example: PURLs are a technique that can be used in conjunction with many other techniques. Also, a 303-redirect service can be used with other techniques.
    • It is not clear what some of the column titles mean. For example, does "W3C TAG Recommendation" imply the use of HTTP URIs?
    • The wiki markup for it is quite difficult to edit. Does anybody have suggestions for easier editing?
With that said, this matrix may be helpful in thinking things through, even if it is insufficient for direct decision-making.   -- DBooth

Other related proposals

  • 303-redirect service (David Booth)
  • Resolution ontology idea (Alan Ruttenberg)
  • Sparql endpoint - centric idea (Matthias Samwald)