HCLSIG BioRDF Subgroup/Tasks/URI Best Practices/Use Cases
RDF Use Cases
My (JAR's) current thinking is that our use of RDF varies along several axes. Here are three of them.
Breadth of coverage (I'm not keen on the term, suggest another) means how many triples are considered in calculating a result set or visualization. Points along this spectrum:
- Navigation - following links by examining discrete information sources (web pages) one at a time.
- Knowledge base query - answering questions using information drawn from a specified bulk information source or specified federation of semi-independent sources.
- Web query - answering questions covering information sources discovered through web search or similar open-ended process.
Technical durability means how long the RDF is supposed to last, and I would articulate at least three points on this spectrum:
- Ephemeral - the RDF is consumed soon after it's written, and never consulted again. For example, it is automatically generated by a server, then used in a browser or end user application, but never stored.
- Current - the RDF may be stored, for example on a static web page, but is at risk of becoming untrue, meaningless, or otherwise unuseable because it uses URIs whose meaning or (when applicable) 200 responses are unstable or may go away. This is analogous to the web, in which 404 responses are expected. Such a page must be maintained - tended to so as to keep it alive - or it will perish.
- Archival - the RDF may be written to write-once memory, such as a CD-ROM, Genbank, or a scholarly article, and there is a reasonable expectation that it can still be understood by someone many years in the future using ordinary tools (not heroic measures).
Semantic durability means how long the RDF is likely to last in the context of future curation. Will the RDF resource be useful for combination with new, unanticipated resources far into the future, or will it have to be rewritten periodically to account for new relationships and new classes? Will it be robust as new knowledge is built around it? I would articulate at least these cases:
- Quick and dirty - the author just wants to get something rendered quickly so as to use a body of information with semantic web tools, or to combine it with existing information for immediate purposes. Automatic database exports would fit into this category.
- Informal - using RDF to refine ideas, as a complement to exploration or analysis processes, or to draft material to be developed more seriously later
- Rigorous curation - an effort is made to use clear definitions, so that future curators can know with confidence whether and when a URI should be used and what RDF using the URI means; to use community-based definitions, so that the material relates to what has come before; to use well thought out ontologies that have a sound footing in known good practice, and a growth path; to rely on logical systems that permit the detection of nonsense.
URI Use Cases
Among the many things that a URI might denote in Health Care and Life sciences:
- Entries in a database
- Journal articles
- Parts of a body
- Events (a specific surgery)
- The DNA sequence of some organism
- Annotation of that DNA sequence marking the stretches that are translated into protein
- Places where DNA from other individuals of the same species tend to differ.
- A X-Ray of someone's chest
- The various parts of the description of that person's condition
- A movie of growth of a cell culture in some condition
- Terms in a controlled vocabulary
Among the use cases we have
- You want to publish a paper in which you identify one of the above things and want to enable others to retrieve it and information about it
- You want to make a local copy of some information and know it is the same as the original
- You are making assertions about some of the above things in an ontology
- Information associated with the thing will be used in a larger computation
We notice that sometimes the URI
- Refers to an entity that changes, e.g. the measured DNA sequence of a gene. (limitation of the technology)
- Refers to an object that doesn’t change, e.g. image.
- Refers to an object that doesn’t change, but for which some information about it can, for example the functional annotations associated with proteins.
- Refers to a data object that doesn't make sense to view in a web browser
- Matthias Samwald: Is the distinction between group 1 (entity that changes, e.g. gene) and group 3 (entity that doesn't change, but metadata about it does) an important one? I don't think so.
- Matthias Samwald: Another (important) distinction is the distinction between URIs that are mere symbols for real-world objects (like a gene) and URIs that can be resolved to some non-RDF data through HTTP GET or any other mechanism. An example for this group would be a JPEG picture at a given URL. When RDF was invented it was mostly intended to be used with URLs of the second group. Nowadays, almost all of the use-cases produce URLs of the first group - and there are no standards or at least informal practices to distinguish between the two.
One idea of mine would be to define a class for each category of URLs of the second kind. Let us call this the "ontology of resolvable resources". For example, we could have an OWL class called "something-resolvable-through-HTTP", and all of the URIs that yield some data when resolved via HTTP (e.g. our JPEG picture) are instances of this class. I think this would be very useful!
- Alan Ruttenberg: Nice idea - I had been thinking of something similar. Don't expect to resolve the URI that you use for a concept name ever. Instead supply extra properties which tell you explicitly how to resolve it. This way you allow for the possibility of resolving the uri in different ways - for metadata, for use friendly html, etc. For example suppose we had the following convention: Class ResolveBySubstition. Each property of this class would name a type of information to be retrieve, and the value would be the pattern to insert the uri into to resolve it. Suppose we wanted to set up a server so that given a URI(http://my.com/2006#one) it provides the following services:
- html pretty page: http://resolve.my.com/http://my.com/2006#one?userview
- rdf fragment of the definition: http://resolve.my.com/http://my.com/2006#one?definition
- ontology the uri is in: http://resolve.my.com/http://my.com/2006#one?ontology
- the image bits: http://resolve.my.com/http://my.com/2006#one?picture
- Then you would define your class with the following slots (@@@ is the substitution pattern):
- With this setup, we have more flexibility, but can also emulate your pattern. So something-resolvable-through-HTTP has a pattern which is just "@@@". Perhaps we specify that by default when no specific type of information is requested, a user agent tries to resolve the URI, as is. But when a specific type of information is requested, you use the patterns above. To accomodate the LSID idea, we further specify that if people want to use lsid like things (urls with suffix .lsid), they need to have their servers return this class before proceeding to do further resolution. Really it is an ontology of data/metadata and rules for how to resolve them.
- Chimezie Ogbuji: I think such a class would essentially be hijacking a transport-level mechanism which content negotiation was meant to provide: retrieving various representations of the same URL. Ofcourse, it wouldn't be too useful for a client to blindly dispatch GET requests with accept headers listing all the representations of interest for a given URL, so the class could provide a mime-type registry (one per representation for a resolvable URI). The use of the indicates that the URI is resolvable, and the mime-type registry lists the services/representations/modalities for the location. Where a mime-type wouldn't suffice for a 'service' - the ontology of a URI for instance, then such a mapping could be described using the substitution pattern you propose
:ResolveableConcept a owl:Class; rdfs:subClassOf [ a owl:Restriction; owl:onProperty atomOwl:mimeType; owl:allValuesFrom :MimeType ], [ a owl:Restriction; owl:onProperty :otherRepresentation; owl:allValuesFrom :ResolutionBySubstitution ]. :ResolutionBySubstitution a owl:Class; rdfs:subClassOf [ a owl:Restriction; owl:onProperty :pattern; owl:allValuesFrom xsd:string ], [ a owl:Class; owl:unionOf (:OntologicalInformation :Image .. other non-mime representations..) ]. #This could instead be the mime-type string instead of the mime-type as a URI #for many reasons, including the additional complexity for DL reasoning introduced by the use of a nominal (owl:oneOf) #But this probably isn't relevant for this use case :MimeType a owl:Class; owl:oneOf (.. all mime-types as URIs - is this feasible? .. )
- David Booth: Of course, what Sam is calling a "resolvable URI" sounds very much like what the TAG calls an "information resource" URI, though I agree with Jonathan Rees that the TAG's definition of "information resource" is currently wrong. (I hope it will eventually get corrected.)
- (edit this to reply)
- (edit this to make a new thread)
See also : infoatom ontology from Reto Bachmann-Gmuer. Attempts to capture the difference between resources and their representation(s).
Somewhat related, there is a proposed standard for URL mapping that overlaps with some of the issues here:
WebCap: Web-based Capability List File Format and Model