HCLSIG BioRDF Subgroup/Tasks/URI Best Practices/Recommendations/ShorterSputnikDraft
From W3C Wiki
Please see URI Note main page for link to current version.
Note on Minting, Defining, and Using URIs (short version)
- Editor: Jonathan Rees
- Authors: Jonathan Rees and David Booth (and later, others, I hope)
How to comment on this draft
For now, please put your comments on the /DraftTalk page, or if commenting on a "major issue," join the fray at one of the pages reserved for this purpose - see the list at [[/../../Recommendations]] under "Major remaining trouble spots". I am editing this file off line, and keeping comments outside of this page helps ensure that your comments are heard and tracked. I will attempt to address all concerns and record dissenting views fairly.
Status of this document
This is an editor's draft with no official standing.
I am hoping to ask HCLS at its November 1 teleconference for approval to publish some form of this note on the W3C web site. The note will still be just an editor's draft at that time.
It is my intent to publish preprints of this note on a Science Commons web site under a Creative Commons Attribution 3.0 license in advance of publication on W3C's web site. W3C's version will be published under the slightly more restrictive W3C license. The non-W3C version will be clearly marked as being a preprint.
This note attempts to address aspects of the problem of using RDF to communicate information effectively. It does not cover other uses of RDF. It is assumed that the reader will learn elsewhere about RDF and using the semantic web for their own purposes. The issue here is rather how to be a "good citizen" so that your work benefits the community by combining and interacting well with the work of others.
The primary focus is the use of URIs as terms that denote, and that therefore may be used in declarative statements. This is in contrast with the conventional use of many URIs as specifying communication endpoints. Sometimes a URI is used both ways, and if two senses are different, there is a pun.
Besides providing general direction, one goal of this document is to give an account of URI reference and resolution that is independent of URI scheme.
This advice is presented largely without rationale. See [[/../SputnikDraft]] for a lengthier exposition (now out of date, sorry). Some motivation, examples, and elaboration will be provided in notes to follow the main text.
Please read this document in conjunction with the [[/../URI_Resolution]] vocabulary description.
Obligatory citation of RFC 2119
- A URI, playing the role of something that might refer to something else. [link to justification]
- Refer to. [can I include "name" and/or "designate" as synonyms?]
- referent (of a term)
- The thing (individual, class, property, etc.) to which the term refers.
- URI owner
- See AWWW
- spelling (of a term)
- The term taken as a string (i.e. stripped of its role of referring); a URI.
- a heuristic method for finding definitions (DereferenceURI)
- a URI that is a candidate for nose-following
- Do some research to track down terms that are already in use that might be useful to you.
- Reuse terms when it is correct to do so (to be considerate of query engines that don't infer sameAs entailments). Be very careful about 'correct'.
- When multiple fully equivalent terms are available, choose the one that has the best definition, has the most easily located definition, and is in widest use. Balancing these criteria may require judgment. Seek advice if you're not sure.
- State an appropriate equivalence relation, such as owl:sameAs or owl:equivalentClass, when not already stated for equivalent terms. Choose the relation with care.
- Mint new terms when definitions of extant terms are nonexistent, wrong, unclear, or not as precise or specific as you need.
- Make sure that statements make sense, and in particular that the subject and object are the kinds of things that can sensibly be related by the verb (property). For example, a potato can't have an author, but a document describing a potato can.
For each term that is used in an RDF document:
- The term should have a published definition.
- Cite sources defensively. If a term can be nose-followed to a stable, adequately availabl RDF document, then the term is its own citation. Otherwise, use the resolution ontology to cite an RDF document that a reader will be able to use to track down a definition.
- If the term's definition is unpublished or only ephemerally published - for example, if you know it only from use, but not from a definition - publish what you believe concerning the term yourself, in your document or elsewhere.
Establishing new terms
Establishing a new term requires three steps:
- Invent the term (i.e. its spelling - the way it's written)
- Define what you want it to mean
- Publish the definition so that others will know what it means
In the below, by "the URI" I mean "the URI that is the spelling of the term".
Minting a term
- Do not mint a term without also composing and publishing a definition.
- Mint new terms for new meanings; do not overload.
- Do not mint or define a term unless you are the URI owner (in the sense of AWWW [cite]).
- Many people encourage minting terms that are locators. Use proxy server prefixes to derive a locator from a non-locator URI (and state the intended equivalence using owl:sameAs or a resolution rule).
- The first word in a relationship-denoting term should be a verb. This is to help avoid confusion over polarity: For example, ex:hasCapital (or ex:has_capital) or ex:isCapitalOf (or ex:is_capital_of), but not ex:capital.
- If the term is to denote a property (verb), the URI must end in an XML "NCname" so that it can be used in RDF/XML. (This is a bug in the RDF/XML spec.) Roughly speaking, an NCname is a sequence of characters from the set (letters + digits + "_" + "." + "-") that begins with a letter or with "_". [cite XML spec] [footnote: encourage this for all terms. helps make turtle more concise.]
Composing a definition
- Compose clear definitions. Term definitions should specify single and particular usage. In particular,
- if you define a term to refer to a document (especially an RDF document) or database record, do not use the term to refer to the thing described by the document or record, or vice versa. If the thing and the document/record both need to have names, the names must be different.
- Write definitions using RDF. Provide defining prose in an rdfs:comment property, or use a well-justified alternative definition method (such as OWL statements that are fully adequate to define the term).
- Do not assume that the URI itself helps to define it. [footnote: RDF = either Turtle or an established RDF standard]
- Every definition should establish an informative type for the term's referent using appropriate RDF statements.
- When defining a term, also publish statements relating its referent to other things [but not necessarily as part of the definition -- see issues].
Publishing a definition
- Make best efforts to ensure that the definition is accessible for the lifetime of any RDF that uses the term. (but see issue [[/../AttitudeTowardMigration]].)
- If the URI is a locator, publish a definition
- at the URI's racine, if it's a #-containing URI (the 'racine' is the part before the #)
- following a "303 See Other" redirect for non-# HTTP URIs. [footnote httpRange-14]
- If the URI is not a locator, publish a definition somewhere else, and publicize the location by providing a 'defines' statement or URI resolution rule (see [[/../URI_Resolution]]) that can be included in documents that use the term.
- Once you publish a definition, other people will start using it, so the term effectively becomes community property. Never redefine a term in a way that might break or confuse others' use of the term.
- Avoid mixing mere descriptive statements with the essential statements that define a term. Instead the definition should link to a second document containing the non-definitional statements (cite Booth; say exactly how). (issue [[/../DefinitionDelineation]])
Terms that denote and locate the same thing
The HTTP spec (cite RFC 2616) is about access to what it calls "resources". RDF was originally formulated to described these resources, and later generalized to admit descriptions of (statements about) other kinds of things. Because these "resources" form the fabric of the semantic web, they have special status regarding publication.
- the thing, specified by a locator, to which requests are issued and from which responses are received (Pat H says "identify" instead of "specify" but I've used "specify" because most people I've spoken to find "identify" to be too loaded)
- access (a thing)
- dereference a locator for the thing
- dereference (a URI)
- talk to the specified endpoint in order to get (thing = document) or use (thing = service) a thing over a network (other relationships may be acceptable; see [[/../DenoteVsDereference]])
- Nose-following exemption: The definition need not be published at the URI, as indeed it cannot be, since the endpoint has to live at the URI.
- A statement of stability (using [[/../URI_Resolution]]) is tantamount to a definition of the term, since knowing the document's content and the fact that the content doesn't change is all you need to know about what the term is to refer to.
Here are two possible ways to define terms that denote and access the same thing:
- Thing to metadata to data
- http://example.com/spiffy# denotes the document (but does not dereference to it)
- http://example.com/spiffy denotes the document's metadata (definition) and asserts (informationally, not as a constraint) <http://example.com/spiffy#> urinote:hasAssociatedEndpointAt <http://example.com/realspiffy>
- http://example.com/realspiffy is an endpoint that is or provides the document
- Citation: All mentions of http://example.com/docdoc in RDF are accompanied by the statement <http://example.com/docdoc-meta> urinote:defines "http://example.com/docdoc"xsd:anyURI .
As a way of imposing discipline and providing hints about intent, we recommend the following for all HTTP URIs that are not # URIs (cite httpRange-14 resolution):
- A web server should respond with a 2xx response code only when a URI accesses what it denotes.
- An alternative to a 2xx response is 303 See Other, which should be used when the intended referent is not what is accessed.
Be a cautious consumer
Assume that anything that can go wrong, will.
We've suggested three sources of definitions:
- Included definitions [when they can be identified as such; no established conventions]
- Citation via URI resolution statements (see [[/../URI_Resolution]])
- Nose-following (when it works)
If any two of these methods lead to mutually inconsistent definitions, seek an explanation and agitate for repair.
Be a cautious mediator of RDF
Tools that care about accessing things (endpoints, definitions, etc.) should understand the [[/../URI_Resolution]] ontology, so that they can properly implement relocation and second sourcing.
In particular, there is often occasion to display RDF in a web browser or other user-facing interface. When arranging this, be prepared to link to definitions or other appropriate documents using browser-friendly URIs, e.g. by routing through a proxy. The term's spelling may be an inadequate locator for most browsers (e.g. urn:lsid:) or it may not lead to the correct definition. Use the resolution ontology to obtain a locator that can be used for hyperlinking.
- One way to help protect against collisions over time (e.g. redefinition by a future URI owner) is to have the path component of the URI contain site version information in the form of a year or more precise date. [cite RFC 3986] [cite RFC 4151 - tag: URI]
Help most recently received from: David Booth, Chimezie Ogbuji, Pat Hayes, Alan Ruttenberg. Thank you.
- End notes
- Provide motivation
- Provide examples
- Provide elaboration
- Complete versioning story: database records, databases, ontologies
- Do RDFa documents qualify as RDF documents? I.e. can we use them as definition carriers?
- Appendix: Compare with AWWW ?