HCLSIG BioRDF Subgroup/Tasks/URI Best Practices/Recommendations/DeterministicDescriptionAccess

Proposal for deterministic access to resource descriptions

Problem: Given a URI (belonging to a certain class to be defined below), find descriptions of the "resource" it identifies.

Currently we have a bunch of hacks - methods - for doing this, each with its own problems. What I propose to do is to build community around a document, to be revised from time to time, that lists these various methods and defines their domain of applicability. The set of URIs for which this document defines methods is the set to which we (HCLS) agree to limit ourselves. We set a certain bar for admission to this elite set, not just the existence of description access methods but also for uniqueness, clarity of definition, stability of definition, and existence of documentation.

A second, even more elite set, is URIs whose metadata and data (or other referent, such as a museum specimen) are assured to be persistent. I will defer consideration of this problem until later.

Definitions:

OK URI = unique, clearly defined, unrepurposable, documented
Description (of a resource) = RDF giving defining description of the resource

There will be a retrieval ontology [access ontology, resolution ontology] providing classes and predicates that will help use specify description access methods.

A retrieval rule is a rule for retrieving resource descriptions or document versions, expressed using the retrieval ontology. A rule specifies a set of URIs (or resources? see POWDER) to which it applies, together with the method to be used to retrieve descriptions for those URIs.

(I will assume that if you can retrieve a document description, you can retrieve its data, since at worst the location of the data can live inside the metadata. Again, I'd like to defer consideration of this problem.)

There will be a series (through time) of well publicized documents carrying "root" retrieval rule sets. The most recent at any given time is called "the root set". The first root set will be published with the HCLS recommendations.

I'll call a URI in the domain of a description retrieval rule in the root set a POKI, or "putatively OK URI".

The root set is designed to be fairly abstract and stable. The space of POKIs is expected to grow monotonically. Particular retrieval rules may be adjusted or dropped over time.

The root set may lead to other retrieval rule sets, according to suitable rules.

There will be some coordinated process (similar to the W3C WG recommendation process, or the handle system's authority review process, or review for journal publication) by which a new root set becomes published and publicized. Revisions are expected to be infrequent, although a revision may be sudden due, for example, to the demise (renaming, repurposing) of a server.

(Perhaps there are two series of documents, one changing infrequently and the other changing frequently.)

Of course review of a root set includes review of candidates for extension of the space. POKI is an exclusive club.

The first root set will need to cover:

well-established "ontologies" such as rdf, rdfs, owl, foaf, and dc
the retrieval ontology itself
the new Uniprot URIs
successor to Banff demo
handles (including DOIs) (hdl: scheme?) -- why not?
cooperating LSIDs (those with adequate "metadata", in RDF)

rdf:type, owl:Thing, rdfs:label etc. must end up being POKIs - otherwise we will not be able to recommend their use!

The rules for rdf:type et al. will specify either hash-stripping or 303s, since that's the convention they generally follow. (Will need to be checked for each case.) The 303 tactic only applies for non-documents, and the # tactic only applies when the stripped resource is an RDF document, so the domains of the rules will have to be carefully constructed.

Areas of the URI space that contain only documents should be pretty easy to add to the POKI space - depending on how high we set the bar for document descriptions. Most of the time I would be content to just know that a resource is a document.

We may need to provide services in order to force some URIs to be OK. E.g. if we decide we like DOIs, we have to solve the problem that DOI metadata probably isn't in RDF and probably doesn't provide rdf:types.

We'll need a validator, and we should really make an attempt to police the POKI space (in addition to just making sure POKI's are OK when first admitted to the POKI space).

I haven't worked out the details of the retrieval ontology, but I think it will write itself once we know what we want done.

We need to encourage multiple ways to access descriptions, to increase their availability.

Descriptions needn't be sanctioned by the URI owner, and multiple descriptions of the same resource compatible with one another (although that's certainly desirable).

Broken links (missing descriptions) should be limited to situations in which there is reason to believe that no one cares about the resource any more - for example, when there are no public mentions of it. If someone (not neccessarily a publisher or "authority") cares about a resource, they should provide their own copy of the description and make that available, preferably through mechanisms provided for, perhaps indirectly, by the root set.

So how does this work?

To mint inside the POKI space, you'll already have been browbeaten into following the rules - and perhaps your actions will be checked or otherwise restrained by validation software.

To bring new URI spaces (including new domains within the http: space) into POKI-land, you'll have to sign in blood that you'll follow the rules. But after you'll have some freedom to decide how you want the descriptions to be retrieved: SPARQL, CGI, pattern-based redirect, etc.

HCLS will recommend use of POKIs exclusively. If this recommendation is followed, the result will be high-quality URI's and RDF suitable for use in publications.