HCLS/HCLS URI matrix/GenericToolsNeedResolvableUris

From W3C Wiki
Jump to: navigation, search

Resolvable URIs work better with generic tools

There are classes of tool that work better with resolvable URIs, in particular, with http URIs where the useful content is retrievable by GETting the URI. Examples include web spiders such as those used by search engines (e.g., Google or Swoogle) and new sorts of browser (e.g., PiggyBack).

Research

Issues

  • There's a distinction between using http URIs for everything and using http URIs in certain ways. For example, http (like) uris are essential for combining ontologies using owl:imports, but the URIs of the terms imported do not affect the importing procedure at all (though they are merged with other uses of those URIs).

Proposals

  • Use HTTP URIs for terms
    • Pros:
      • Natural, esp. when using RDF and OWL (which encourage URI use) and HTTP savvy clients
        • Pop the URI in your favorite web browser, when all else fails
    • Makes it easy to have one thing that supports several different classes of user. Consider ClinicalTrials.gov...by conneg, it can serve either HTML or a nice XML record. PiggyBack would have an easier time scraping the XML than the HTML (and perhaps they could be convinced to serve up RDF as well)
      • Resuses a lot of infrastructure
      • Sanctions an authoritative source and provides a way to get to that source
      • Fits in with several best practice recommendations (e.g., from the TAG)
  • Cons:
  • Use HTTP uris for information about terms using some non-resolvable scheme. For example, one can publish an OWL document at a URL which contains nothing but URNs for classes, properties, and individuals. The ontology has an HTTP identifier, but nothing else described by the ontology does. A variant of this is to publish a SPARQL endpoint.
    • Pros:
      • Tools can still get at the data
      • One can cluster terms and axioms as one needs
      • Migration to new hosts/domains and mirroring is easier
      • Not as clearly biased toward an owner
      • Less committment/perceived committemnt
      • Search engines/aggregators can still aggregate statements using the term URIs
      • Some legacy systems don't need updating
  • Cons:
    • Follow your nose fails
    • There can be a disconnect between the term and the documents about the term (loss of authoritative source for the term, although still authoritative sources of documents about the term)
    • Generic tool developers are turned off
    • Search engines/aggregators may have to use custom or special resoluation mechanisms
    • Several of the benefits can be recovered by building on top of the HTTP infrastructure; e.g., PURL can be seen as a custom resolution service for a custom sort of URI; one could imagine an HTTP resolver that imposed other constraints on the URI forms, provided caching, etc. etc.

E-mails