W3C

SKOS Use Cases and Requirements

W3C Editors' Draft 17 January 2007

This version:
http://www.w3.org/2006/07/SWD/???/usecases/200701??
Latest version:
http://www.w3.org/2006/07/SWD/???/usecases
Previous version:
<previous version uri>
Editors:
Antoine Isaac, Vrije Universiteit Amsterdam, aisaac@few.vu.nl
Daniel Rubin, Stanford Medical Informatics, dlrubin@stanford.edu
Jon Phipps, Cornell University, jphipps@madcreek.com

Abstract

This document provides use cases for SKOS.

Status of this document

This is an internal draft produced by the Semantic Web Deployment Working Group [SWD].

This document is for internal review only and is subject to change without notice. This document has no formal standing within the W3C.

Table of contents



1 Introduction

Knowledge organization systems play a fundamental role in information organization and access. Worldwide, people use specific vocabularies to describe their resources, organize their websites... SKOS is intended at providing a model to represent and use these vocabularies in the framework of the Semantic Web.

1.1 An overview of the use cases

If needed...

Use Case #1 —

Use Case #2 —

Use Case #3 —

etc.

2 Use Cases

2.1 Use Case #1 — Geographical web service for hierarchical browsing

(Complete description available at http://www.w3.org/2006/07/SWD/wiki/EucTgnDetailed)

This web service (http://www.digipark.at/webservice/documentation/) enables to select subtrees from the TGN vocabulary, for indexing and search purposes in the context of distributed content management systems. With this service, a user can ask for a concept by typing in a string which will be matched with one of the concept's associated labels. It is possible to limit the subtree found to a given depth and a number of displayed concepts.

The service can potentially be applied to every monohierarchically structured vocabulary. It has however been developped primarily for the Thesaurus for Geographic Names (http://www.getty.edu/research/conducting_research/vocabularies/download.html), which provides with a link from a child concept to its parent one.

TGN features a hierarchical vocabulary of around 1.1 million names, and coordinates and other information for around 892,000 geographic places, as examplified here:

Italia

        ID: 1000080

        Path: /Top of the TGN hierarchy/World/Europe/Italia

        Place Types: primary political unit | independent sovereign nation
        | republic | nation | dictatorship | kingdom | inhabited region

        Parent: 1000003

        Description: Inhabited since 50,000 BCE; settled by Indo-Europeans
        1850 BCE, Etruscans 1600 BCE, and Greeks 800 BCE; united by Romans
        270 BCE; independent states rose after fall of Holy Roman Empire,
        notably Naples, Milan, Florence, Venice and papacy; reunited in the
        19th century; official language is Italian, though significant
        minorities speak German, French, and Slovene.

        Coordinates: 12.8333E 42.8333N

        Elevation: 0

2.2 Use Case #2 — Iconclass Iconographical Vocabulary

(Complete description available at http://www.w3.org/2006/07/SWD/wiki/EucIconclassDetailed)

Iconclass (www.iconclass.nl) contains 28000 items used to describe the subjects of an image (persons, event, abstract ideas). Complete versions are available for English, German, French, Italian, and partial translations for Finnish and Norvegian.

The main building blocks of Iconclass are subjects, used to describe the subjects of images. An Iconclass subject consists of a notation (an alphanumeric identier used for annotation) and a textual correlate (e.g. “25F9 mis-shapen animals; monsters”). Subjects are organized in hierarchical trees.

2 Nature

25 earth, world as celestial body

25F animals

25F(+) KEY

25F1 groups of animals

….

25F9 mis-shapen animals; monsters

25FF fabulous animals (sometimes wrongly called 'grotesques'); 'Mostri' (Ripa)

Subjects can have associative cross-reference links between them (systematic references) and are linked to keywords that are used to access them. Keywords form a network of their own, with See, See also and translation links between them.

Iconclass additionally provides with mechanisms (auxiliaries) for subject specialisation at indexing time: keys, queues of keys, structural digits, double letters and bracketed text. These actually allow for collection-specific extension, by specializing a conceptual "placeholder" into a named individual (11H(…) saints gives 11H(ALISTAIR MILES)) or combining an existing subject with a non-independent concept (25F animals gives 25F(+33) head of an animal).

Maintenance of vocabulary is done via manual edition of semi-structured source files. As a general rule, the standard version shall only be changed in a conservative way, not modifying the existing subjects.

2.3 Use Case #3 — An integrated view to medieval illuminated manuscripts

(Complete description available at http://www.w3.org/2006/07/SWD/wiki/EucManuscriptsDetailed)

The purpose of this application is to provide the user with access to two collections of illuminated manuscripts from different libraries (accessible online at http://www.kb.nl/manuscripts and http://mandragore.bnf.fr) .The descriptions of these two collections follow different metadata schemes, and contain values from different controlled vocabularies for subject indexing. The user should however be able to search for items from the two collections using his preferred point of view, either using vocabulary from collection 1 or vocabulary from collection 2.

The main feature of the application is collection browsing, which uses hierarchical links in vocabularies: if a concept matching a query have subconcepts, the documents indexed against these subconcepts should be returned. The application also turns to mapping links between the two vocabularies. The elements from the two vocabularies have been mapped together, and depending on the link between the query concept and another one, documents indexed by this concept shall be returned.

Additionally, the application enables search based on free text queries: documents can be retrieved based on free-text querying of the different fields used to describe the documents (creator, place, subject, etc.). For the subject indexing, if a text query matches the label of a controlled vocabulary concept, the documents indexed against this concept will be returned.

The two collections use respectively the Iconclass and Mandragore analysis vocabularies (Iconclass being described in EucIconclass)

Mandragore contains 16000 subjects with French labels. Most of them, the descriptors used to describe the illuminations, compose a flat list. Some structure is given by topic classes which organize the descriptors according to general domains:

BIBLE ET APOCRYPHES

TECHNIQUES, SCIENCES APPLIQUEES

ZOOLOGIE

.zoologie (généralités)

.mollusques

.crustacés

bernard.l'ermite [crustacé]

crabe [crustacé]

crevette [crustacé]

écrevisse [crustacé]

A descriptor is specified by a label (“cochon”), optional rejected forms (“porc”) and optional definition (“mamifère ongulé”), and a reference to one or more topic classes (“.mammifères”). A note can sometimes be found.

To enable integrated browsing, elements from Mandragore and Iconclass vocabularies shall be linked together using correspondence equivalence or specialization links, e.g.:

“25F72 molluscs” (Iconclass) is equivalent to “mollusques” (Mandragore)

"25F711 insects" (Iconclass) is more specific than "autres invertébrés (vers,arachnides,insectes...)" (Mandragore)

"25F(+441) herd, group of animals" (Iconclass) is equivalent to “troupeau” (Mandragore)

“11U4 Mary and John the Baptist together with (e.g. kneeling before) the judging Christ, 'Deesis' ~ Last Judgement” (Iconclass) is equivalent to the combination “s.marie”, “s.jean.baptiste”, “christ” and “jugement.dernier” (Mandragore))

2.4 Use Case #4 — Bio-zen ontology framework for representing scientific discourse in life science

(Complete description available at http://www.w3.org/2006/07/SWD/wiki/EucBiozenDetailed)

Bio-zen (http://neuroscientific.net/index.php?id=43) is intended to be used by researchers and developers in the life sciences. It allows the description of biological systems and the representation of scientific discourse on the web in a highly distributed manner.

The bio-zen framework will consist of several applications, especially Semantic Wikis. A Bio-zen ontology incorporates constructs to make statements about digital information resources, that is creating "concept tags". This concept-tagging is an important feature of bio-zen, because it eases the integration of information from different sources.

SKOS is used in bio-zen for the representation of many existing life sciences vocabularies, taxonomies and ontologies coming from the "Open Biomedical Ontologies" (OBO) collection (http://www.fruitfly.org/~cjm/obo-download/). The size of all converted taxonomies taken together is in the order of millions of concepts. Typical examples are the Gene Ontology or Medical Subject Headings (MeSH), an entry of which is displayed here:

[Term]

id: MESH:A.01.047.025

name: abdominal_cavity

def: "The region in the abdomen extending from the thoracic DIAPHRAGM to the plane of the superior pelvic aperture (pelvic inlet). The abdominal cavity contains the PERITONEUM and abdominal VISCERA\, as well as the extraperitoneal space which includes the RETROPERITONEAL SPACE." [MESH:A.01.047.025]

synonym: "abdominal_cavity"

synonym: "cavitas_abdominis"

is_a: MESH:A.01.047 ! abdomen

To represent such vocabulary elements as well as other type of information, the existing SKOS model has been integrated into a single OWL ontology, together with the DOLCE fundational ontology and the Dublin core metadata model. In the process, the SKOS model has been extended with special types of concepts, e.g. biozen:sequence-concept. To enable efficient reasoning with the available dataset, it is important to notice that existing constructs have been made compatible with the OWL-DL language.

2.5 Use Case #5 — Semantic search service accross mapped multilingual thesauri in the agriculture domain

(Complete description available at http://www.w3.org/2006/07/SWD/wiki/EucAimsDetailed)

This application coming from the AIM project (http://www.fao.org/aims) is a semantic search service that makes use of mapped agriculture thesauri. It allows users to search any available terminology in any of the languages the thesauri are provided and retrieve information from resources which may have been indexed by one of the mapped vocabularies. Typical functions are navigating resources, helping to build boolean searches via concept identification or expanding given searches by extra languages or synonyms.

The service builds on several agriculture vocabularies: the Agrovoc Thesaurus (http://www.fao.org/aims/ag_intro.htm), the Agris/Caris Classification Scheme (ASC), the FAO Technical Knowledge Classification Scheme (TKCS), the subjects from the FAOTERM vocabulary, etc.

Agrovoc contains 35000 terms in 12 languages (not all the languages feature the same translated terms however), while ASC, TCKS and FAOTERM range between 100 and 200 categories coming in the 5 official FAO languages. Agrovoc terms consist of one or more words and representing always one and the same concept. Terms are divided into Descriptors and non-descriptors, the first ones being the only currently used for indexing. For each descriptor, a word block is displayed showing the relation to other terms: BT (broader term), NT (narrower term), RT (related term), UF (non-descriptor). There are also scope notes, used to clarify the meaning of both descriptors and non-descriptors.

Term code: 1939

Term label: EN : Cows, FR : Vache, ES : Vaca, AR : بقرات , ZH : 母牛 , PT : Vaca, CS : krávy, JA : 雌牛 , TH : แม่โค , SK : kravy, DE : KUH

BT : Cattle (code 1391)

NT : Suckler cows, Dairy cows (26767, 36875)

RT : Heifers, Cow milk, Milk yielding animals, Females (3535, 4833, 15969, 16080)

SNR : Females (15969)

Scope Note : Use only for cattle and zebu cattle; for other species use "Females" (15969) plus the descriptor for the species

Actually, the AIMS project includes some more links, presented in http://www.fao.org/aims/cs_relationships.htm: specific Concept-to-Concept relationships (subclass of; caused by; member of; part of), Term-to-Term relationships (related term; synonym; translation) and String-to-String relationships (spelling variant; acronym).

Currently the Agrovoc management system lacks distributed maintenance, but it is expected that a new system will soon solve this problem, which is crucial since changes are made by experts from all over the world.

For AIMS, Agrovoc has been converted into SKOS (ftp://ftp.fao.org/gi/gil/gilws/aims/kos/agrovoc_formats/skos/2006) and is being mapped to two other vocabularies: the Chinese Agricultural Thesaurus (CAT) and the National Agricultural Library thesaurus (NAL). This mapping uses the links provided by the SKOS mapping vocabulary, as below:

CAT-ID CAT-EN Map AG-ID AG-EN AG-ID AG-EN
30854 Senta flammea Exact 9748 Cheena
50008 Mayetola destructor Exact-OR 24260 Triticale (gramineae) 7949 Triticales (product)
1160 Two-shear sheep NT1 3662 Hordeum vulgare

2.6 Use Case #6 — Representation of Tactical Situation Objects

(Complete description available at http://www.w3.org/2006/07/SWD/wiki/EucTacticalSituationObjectDetailed)

The aim here is to provide a lightweight protocol for situation awareness in emergency response. The protocol shall present table-driven attribute values, including types of action, types of resource, relationships between items, etc.

The vocabulary inherits from the NATO Multilateral Interoperability Programme initiative (http://www.mip-site.org/). It is currently represented in a tabular document, with some tens of tables. Some tables have tens of entries. Base definitions are in English, as examplified below:

2.5 URGENCY element

This is a proposal, consistent with OASIS description of the civil protection domain. It is consistent with the CAP protocol v1.0

(cap.alertinfo.urgency.code). The code denotes the urgency of the subject event of the alert message.

Acronym Level Definition

IMMEDI Immediate Responsive action should be taken immediately

EXPECT Expected Responsive action should be taken soon (within next hour)

FUTURE Future Responsive action should be taken in the near future

PAST Past Responsive action is no longer required

UNKNWN Unknown Urgency not known

2.7 Use Case #7 — Supporting product life cycle

(Complete description available at http://www.w3.org/2006/07/SWD/wiki/EucProductLifeCycleSupportDetailed)

The aim of this application is to propose a data exchange mechanism for managing the life support of complex products (http://www.oasis-open.org), including configuration definition, maintenance definition, maintenance planning and scheduling, maintenance and usage recording (including configuration change).

This shall require several hundred separate functions, including classification of items, classification of information usages (e.g. types of part identifier), classification of entity roles (e.g. date as start date) or classification of relationships (e.g. supersedes).

The vocabulary to be used for the data contains an upper ontology of several hundred items for description of product life cycle. Many terms are then defined as specializations upper ontology, where some terms play the role of place holders for local extension. PLCS is conceptually a co-operatively developed web in XML, with the live version being a set of run time views assembled from files submitted by a dozen or so contributors. Typical examples of terms are:

Identification_code

An Identification_code is an identifier_type which is encoded according to some convention. Typically but not necessarily concatenated from parts each with a meaning. E.g. tag number, serial number, package number and document number.

Part_identification_code

A Part_indentfication_code is a Identification_code that identifies the types of parts. For example, a part number.

CONSTRAINT: An Identification_assignment classified as a Part_identification_code can only be assigned to Part Organization_name

Owner_of

An Owner_of is an Organization_or_person_in_organization_assignment that is assigning a person or organization to something in the role of owner.

For example, the owner of the car.

The vocabulary has been encoded using OWL, and is managed via the Protege OWL editor.

2.8 Use Case #8 — GTAA Web Browser

(Complete description available at http://www.w3.org/2006/07/SWD/wiki/EucGtaaBrowserDetailed)

The application provides a way to search for and browse through the Dutch GTAA (Common Thesaurus for Audiovisual Archives) thesaurus.

This vocabulary, used by the Dutch national public Audiovisual and radio archives for its documentation process, covers a wide range of topics, as it is meant to describe anything that can be broadcasted on TV or radio. It contains approximately 160.000 terms, divided in 6 disjoint facets: Keywords, Locations, Person Names, Organization-Group-Other Names, Maker Names and Genres.

The thesaurus mainly uses constructs from the ISO 2788 standard, like Broader Term, Narrower Term, Related Term and Scope Notes. Terms from all facets of the GTAA may have Related Terms, Use/Use for and Scope Notes, but only Keywords and Genres can also have a Broader Term/Narrower Term relations, organizing them into a set of hierarchies. Additionally to these standard features, Keywords terms are thematically classified in 88 subcategories of 16 top Categories.

Preferred Term: ambachten

Related terms: ondernemingen, beroepen, artistieke beroepen

Broader Term: beroepen

Narrower terms: boekbinders, bouwvakkers, glasblazers

...

Scope Note: niet voor afzonderlijke ambachten maar alleen als verzamelbegrip, bijv. voor (markten van) oude ambachten

Categories: 05 economie, 09 techniek

The browser (http://ems01.mpi.nl:8080/GTAABrowser) gives access to terms from a selection of facets. It gives the possibility to access all terms of a given Category, browse through the different elements of the thesaurus displayed as a tree and to refine a search by combining different categories.

For example, if a user selects the Category Military Issues, the terms related to Military Issues are displayed, and other overlapping categories are proposed for narrowing down the number of terms (this numbers are shown to the user). If the user selects also Traffic and Transportation, he will get the list of military vehicles in the thesaurus. He can narrow down his query even further by selecting Vessels, in which case the list is narrowed down to military vessels. The number of terms to be displayed can thus be narrowed down to a dozen by two or three clicks.

All the relationships in the thesaurus are proposed as hyperlinks to navigate its content. This includes pre-computed inter-facets links that are not part of the ISO standard, though allowed by the GTAA data model. For example, one can link a "King" in the Person facet to the general subject "Kings" and the country which this King rules.

Other information (related terms, non preferred terms and Scope notes) for terms is displayed as well. Especially when a facet has no hierarchical structure, an alphabetical access (using spell checking and computed synonym tables) can be granted to the thesaurus preferred terms.

The Browser is accessible across the Web, and is implemented as a web application that can retrieve thesaurus data from an extensible set of data sources, including a the primary relational database one or a research-oriented RDF/OWL one. This will allow for getting updated versions of the thesaurus, via an XML export of the documentation system converted into SKOS. The application shall not deal therefore with the changes that are currently made manually by an expert committee, even though this current process lacks useful features such as generation of unique identifiers for Preferred terms, checking that hierarchical or associative relationships only occur between Preferred terms, or making a concept-based view of the thesaurus instead of a term-based one.

2.9 Use Case #9 — CHOICE@CATCH ranking of candidate terms for description of radio and TV programs

(Complete description available at http://www.w3.org/2006/07/SWD/wiki/EucRankingForDescriptionDetailed)

Radio and television (RTV) programs at the Dutch national broadcasting archive (Sound and Vision) are typically associated with contextual text descriptions (web site texts, subtitles, program guide texts, texts from the production process, etc). Documentalists at Sound and Vision manually describe programs using this type of context documents. For this description task, they use the GTAA (Gemeenschappelijke Thesaurus Audiovisuele Archieven - Common Thesaurus for Audiovisual Archives), described in section 8.

The CHOICE project (part of the Dutch CATCH research programme) uses natural language processing techniques to automatically extract candidate GTAA terms from the context documents. The application that is described here takes these candidate terms as input, and ranks them on basis of the structure of the GTAA thesaurus. For example, the fact that "Voting" and "Democratization" are related in GTAA by a two-step path (via the "Election" term and two "related-to" links) will influence positively the ranking of these terms.Ranked terms will be presented to documentalists to speed up their description work, as detailed in the use case description "Recommend metadata" on http://ems01.mpi.nl/usecases/

Currently the application (now a standalone Java application, later a SOAP web service) is called with a file containing URIs as argument. It uses a Sesame web repository containing the SKOS version of the GTAA thesaurus to retrieve the 'term context' of the terms in the input list, that is, for one given term, all terms that are directly connected to it by broader term, narrower term or related term relations. This term context is stored in a temporary local Sesame repository.

For this ranking, it is now assumed that candidate terms that are mutually connected by thesaurus relations (directly or indirectly) are more likely to be good descriptions than isolated candidate terms. Later on, it might be more interesting to differentiate between types of thesaurus relations, or one may want to use more complex patterns of thesaurus relations for our ranking algorithm.

We also plan to integrate our thesaurus based recommendation system with a recommendation system that is based on co-occurences between terms that are used in previously existing descriptions of RTV programs.

2.10 Use Case #10 — BIRNLex: a lexicon for neurosciences

(This is the detailed version, http://www.w3.org/2006/07/SWD/wiki/EucBirnLexDetailed)

Application

General purpose and services to the end user

BIRNLex is an integrated ontology+lexicon used for various purposes - some end-user/interactive, others back-end/infrastructure - within the the BIRN Project to support semantically-formal data annotation, semantic data integration, and semantically-driven, federated query resolution.

Functionality examples

Here a few examples of BIRNLex class definitions that illustrate the need for lexical support and links to external knowledge sources. Our general design goals have been to use both the Dublin Core MD elements and SKOS where ever possible. Preferably we'd like to use SKOS for all lexical qualities. There are certain annotation properties that should be shared across all biomedical knowledge resources. There are other required elements specific to our needs in BIRN.

Anterior_ascending_limb_of_lateral_sulcus

birn_annot:birnlexCurator Bill Bug

birn_annot:birnlexExternalSource NeuroNames

birn_annot:bonfireID C0262186

birn_annot:curationStatus raw import

birn_annot:neuronamesID 49

birn_annot:UmlsCui C0262186

obo_annot:createdDate "2006-10-08"^^http://www.w3.org/2001/XMLSchema#date

obo_annot:modifiedDate "2006-10-08"^^http://www.w3.org/2001/XMLSchema#date

skos:prefLabel Anterior_ascending_limb_of_lateral_sulcus

skos:scopeNote human-only

Class: Medium_spiny_neuron

birn_annot:birnlexCurator Maryann Martone

birn_annot:birnlexDefinition The main projection neuron found in caudate nucleus, putamen

and nucleus accumbens...

birn_annot:bonfireID BF_C000100

birn_annot:curationStatus pending final vetting

dc:source Maryann Martone

obo_annot:createdDate "2006-07-15"^^http://www.w3.org/2001/XMLSchema#date

obo_annot:modifiedDate "2006-09-28"^^http://www.w3.org/2001/XMLSchema#date

skos:prefLabel Medium_spiny_neuron

Application architecture

The following is a subset of tools either extant or in the offing:

In all of these applications, it is critical to have a clear, distinct, and shared representation for the associated lexicon. For instance, when integrated BIRN segmented brain images with those from other projects across the net, use of lexical variants from a variety of public terminilogies and thesauri such as SNOMED and MeSH can provide a powerful means to largely automate semantic integration of like entities - e.g., corresponding brain region, equivalent behavioral assays described using different preferred labels/names. In provided a community shared formalism for representing the associated lexicon, SKOS can greatly simplify this task. If, for instance, the lexical repository (collection of LUIs) contained in UMLS were represented according to SKOS, this would provide an extremely valuable resource to the community of semantically-oriented bioinformatics researchers, as well as a powerful tool to support LSI/NLP when linking to unstructured text.

Vocabularies

Titles of Vocabularies

The following are the collection of terminologies and ontologies we are linking into BIRNLex: Neuronames, Brainmap.org classification schemes, ?RadLex, Gene Ontology, Reactome, OBI, PATO, Subcellular Anatomy Ontology (CCDB - http://ccdb.ucsd.edu/), MeSH

General characteristics of the vocabularies

Neuronames: brain anatomy (~750 classes and 1000s of associated lexical variants) Brainmap.org classification: hierarchies to describe neuroanatomy, subject variables, stimulus conditions, and experimental paradigms associated with functional MRI of the nervous system Subcellular Anatomy Ontology: designed to describe the subcellular entities associated with ultrastructural and histological imaging of neural tissue.

Language(s) in which the vocabulary is provided

We currently are only dealing with English.

Machine-readable representation of the vocabulary

Class: Fear

birn_annot:birnlexCurator Jessica Turner

birn_annot:birnlexExternalSource UMLS

birn_annot:bonfireID C0015726

birn_annot:curationStatus uncurated

birn_annot:UmlsCui C0015726

obo_annot:createdDate "2006-06-01"^^http://www.w3.org/2001/XMLSchema#date

obo_annot:externallySourcedDefinition Unpleasant but normal emotional response to genuine external danger or threats;

compare with ANXIETY and CLINICAL ANXIETY. (CSP)

obo_annot:externallySourcedDefinition The affective response to an actual current external danger which subsides with

the elimination of the threatening condition. (MeSH)

obo_annot:modifiedDate "2006-10-11"^^http://www.w3.org/2001/XMLSchema#date

skos:prefLabel Fear

Class: Forebrain

birn_annot:birnlexCurator Allan MacKenzie-Graham

birn_annot:birnlexDefinition The part of the brain developed from the most rostral

of the three primary vesicles of the embryonic neural tube and

consisting of the Diencephalon and Telencephalon.

birn_annot:birnlexExternalSource NeuroNames

birn_annot:bonfireID C0085140

birn_annot:curationStatus pending final vetting

birn_annot:neuronamesID 8

birn_annot:UmlsCui C0085140

obo_annot:createdDate "2006-07-15"^^http://www.w3.org/2001/XMLSchema#date

obo_annot:modifiedDate "2006-09-28"^^http://www.w3.org/2001/XMLSchema#date

obo_annot:synonym prosencephalon

skos:prefLabel Forebrain

Software applications used to create and/or maintain the vocabulary, features lacking for the case

Protege-OWL.

Standards and guidelines considered during the design and construction of the vocabulary

We have been working close with the NCBO to adopt the OBO Foundry recommendations in the construction of our ontology. Use of SKOS elements has been a big help to us here, so that, for instance, we can create software applications specifically designed to draw on "skos:prefLabel", "obo_annot:synonym", "obo_annot:definition", etc.

Management of changes

Currently we are doing this manually in Protege-OWL, but, as mentioned above, we are moving toward a client-server infrastructure that will created an RDF-based backend store and support both curation of the ontology and annotation using the ontology via Java Portlet-based applications. BIRN has a core infrastructure staff dedicated to use of the ?GridSphere Java Portlet implementation framework (www.gridsphere.org).

2.11 Use Case #11 — Ontology for Biomedical Investigation--OBI for for describing methods in biomedical research

(This is the detailed version, http://www.w3.org/2006/07/SWD/wiki/EucObiDetailed)

Application

General purpose and services to the end user

The Ontology for Biomedical Investigations (OBI) project is developing an integrated ontology for the description of biological and medical experiments and investigations. This includes a set of 'universal' terms, that are applicable across various biological and technological domains, and domain-specific terms relevant only to a given domain. This ontology will support the consistent annotation of biomedical investigations, regardless of the particular field of study. The ontology will model the design of an investigation, the protocols and instrumentation used, the material used, the data generated and the type analysis performed on it. This project was formerly called the Functional Genomics Investigation Ontology (FuGO) project.

Functionality examples

When constructed, generally to support coding (eg, consistency checks), annotation, relational and generalized querying or associative knowledge browsing of biomedical investigation data. Ensuring minimal requirements for journal submission? Standardisation of terminologies,…etc Some of the functionalities required by applications utilizing OBI will be similar to the ones described here: a) Search for class codes according to query terms or definitions, b) browse for class codes according to a taxonomic hierarchy of classes, c) display to the user result sets of queries, and concept details including all the terms by which a concept is referred as well as definitions and other attributes

Application architecture

Applications that utilize OBI (below) have not yet been constructed.

Vocabularies

Titles of Vocabularies

Ontology for Biomedical Investigations (OBI). The Upper Level ontology that this work extends from is the Basic Formal Ontology (BFO).

General characteristics of the vocabularies

OBI is built following OBO Foundry best practices and uses BFO as it's Upper Level ontology. The ontology currently has 200 terms, but is early in development and will increase in size. The scope of this ontology is generally described above and there are ~15 communities participating in this development effort. See http://obi.sourceforge.net/community/index.php for a updated listing.

OBI is currently under construction. Its scope is limited to biological investigation terminology, with references/mappings to terminologies covering specific domains, as necessary. Already present in the development version are references to the the BFO top-level ontology. Of interest to us are other vocabularies/terminologies such as SKOS that provide a set of standardized names for specific annotations that we are interested in, to include names for properties that hold terms, definitions, curatorial status, editor notes, scope notes and others. We would be interested in referring to an external resource for these entities rather than creating our own properties. One issue that arises in re-utilizing SKOS is that some relations are already defined (e.g. skos:narrower) and it’s not clear how to avoid clashes where these types of relations might conflict with OWL’s built-in relations.

Structure of the Vocabulary

The ontology is built using OWL. The Upper Level ontology that this work extends from is the Basic Formal Ontology (BFO). Classes and properties as defined in OWL. Terms and definitions are included as annotations of classes and properties. Relationships other than “is_a” will be expressed as object properties, or by another mechanism if the relationship is not definitional (definitional in the OWL Description Logic sense). The relationships used will extend from the Relationship Ontology (RO). The is_a subsumption relation as in OWL (rdfs:subClassOf) is the primary relationship used. Other relations are under discussion and will later be added as owl object properties once the taxonomy is more robust and the ultimate application is more clear...

There are duplications with Protégé metadata tags, the Dublin Core and the existing terms in SKOS. It would be nice to include SKOS as a default import for Protégé. Our ontological effort would need administrative/ontology editing, as well as terminological metadata descriptors.

Language(s) in which the vocabulary is provided

English, with plans to include other languages utilizing the xml:lang attribute.

Machine-readable representation of the vocabulary

http://svn.sourceforge.net/viewvc/obi/ontology/trunk/OBI.owl

Software applications used to create and/or maintain the vocabulary, features lacking for the case

Protege-OWL.

Standards and guidelines considered during the design and construction of the vocabulary

OBO Foundry principles (http://obofoundry.org/) and the Protégé tutorials. One possible area that may be seen as divergence tackles the general practice within the Protégé tutorials. We is the use ofan an alphanumeric identifier for the term versus using the human readable term name. The reason for this was to remove semantics from the unique identifier for the term.render the identifier unique and ease term obsoletion and versioning.

Management of changes

The terminology is stored in OWL-DL. The main storage of the ontology is theThe owl files are stored in Sourceforge SVN repositories. Individual communities may have there own preferred database storage and perhaps should answer individually?

Additional references

The current list of information that we would like to capture as metadata for representational units is listed at: https://www.cbil.upenn.edu/obiwiki/index.php/RuMetadataRefinedRefined

2.12 Use Case #12 — STAR:dust model: Semantic Travel Across Resources (STAR) with the aim of designing unified support tools (dust)

(Complete description available at http://www.w3.org/2006/07/SWD/wiki/EucStarDustDetailed)

STAR:dust is a conceptual model aimed at designing and specifying the "travel" that web users undertake while surfing through resources. It provides a conceptualization that can be used as "application ontology" for model-driven software tools (called "vehicles") that support creation of pages, navigation and presentation of resources according to different views.

The STAR:dust model is made up of seven main primitives (Vehicle, Traveler, TravelType, HyperEnvironment, TravelModel, TravelObject and Mapping) and their relations. It is further divided into three sub-ontologies, partially defined ad hoc (e.g., most of the access model) and partially referring to shared and wide-spread models like SKOS and Dublin Core vocabulary: the navigation model, the access model and the presentation model.

Navigation model:
  • skos:related is used to define the connection between the current resource and other resources that are somehow similar or on the same subject;
  • skos:broader/skos:narrower are used to represent the connections between the current resource and those resources that are at a higher/lower level of complexity;
  • skos:relatedPartOf (part-of relation) represents the containment connection between the current resource and its parts (e.g., the relation between a section and its sub-sections);
  • skos:Concept is used to represent the "element", i.e. every "place" where it is possible to go and the portion of information that is relevant for the navigation.
Access model:
  • axs:Home is the landmark indication, i.e. the denotation of specific resources that can be taken as reference for navigation;
  • axs:prev/axs:next relations are the connections between the current resource and those resources that are immediately before/after in a specific path;
  • axs:up/axs:down relations are the connections between the current resource and those resources that are immediately above/below in a specific ordered list or hierarchy.

The presentation model contains classes and properties to model all the characteristics of knowledge visualization, for example describing the different options (positioning, abbreviation) to visualize a long text in a page. It is composed of both existing primitives coming from popular and shared models (e.g., dc:title, dcterms:image or skos:symbol, skos:prefLabel and skos:altLabel) and building blocks modeled explicitly to represent e.g. the features useful for visualization functions (pres:hasText and its sub-properties pres:hasFullText, pres:hasShortText and pres:hasSlidebarText).

This STAR:dust model is modeled in RDF/OWL:

<owl:ObjectProperty rdf:ID="next">

<rdfs:label>next</rdfs:label>

<rdfs:comment>link to the subsequent resource</rdfs:comment>

<rdf:type rdf:resource="&owl;TransitiveProperty"/>

</owl:ObjectProperty>

<owl:ObjectProperty rdf:ID="prev">

<owl:inverseOf rdf:resource="#next"/>

<rdfs:label>previous</rdfs:label>

<rdfs:comment>link to the previous resource</rdfs:comment>

</owl:ObjectProperty>

The conceptual model specifies the "navigation and presentation semantics". The resulting vocabulary/ontology, however, is not useful per se, but it is used to strongly decouple the editing of contents from their visualization.

For example, it is assumed that the contents about a specific domain (e.g., artists and artwork of a museum) are edited by domain experts and provided/translated into a machine-readable format, namely OWL. Each portal has its own (multilingual) domain ontology, making use of hyperonymy/hyponymy, meronymy/holonymy (part-of relation), multiple wordings (homonymy/pseudonymy/synonymy) and generic semantic relationship whenever needed. Both limited and very huge ontologies (with millions of triples) have been experimented with.

Once we have this domain knowledge base, we can design a visualization by mapping between the domain ontology and the STAR:dust Travel model. For example, for a virtual museum portal, we map between the navigation/access/presentation models and the ontology of art and artists:

Mapping between domain ontology and navigation model
  • if an Artist painted an Artwork --> Artist skos:relatedHasPart Artwork
  • if a Chapter is about an Artist and describes an Artwork --> Chapter skos:related Artist, Chapter skos:related Artwork
Mapping between domain ontology and access model
  • if a ThematicTrail contains a Chapter --> ThematicTrail axs:down Chapter, Chapter axs:up ThematicTrail
  • if Chapter-1 is before Chapter-2 --> Chapter-1 axs:next Chapter-2, Chapter-2 axs:prev Chapter-1
Mapping between domain ontology and presentation model
  • if an Artwork is represented by an Image --> Artwork skos:symbol Image
  • if an Artist is described by his Biography --> Artist pres:hasFullText Biography

Generally, mappings actually match any kind of (sub)graph made with the domain ontology with any kind of graph made with components of the 3 STAR:dust models. For the simplest cases, SPARQL CONSTRUCT queries are used to perform those mappings.

Finally, a tool like SOIP-F (Semantic Organizational Information Portal framework, http://seip.cefriel.it), taking as input both the domain knowledge and the mappings, makes lever on the STAR:dust model (including semantic descriptions of the users' profiles) and produce a way to present and navigate across contents.

Existing implementations (cf. http://swa.cefriel.it/) feature semantic-based healthcare information portals (using respectively a medical ontology from the L&C TeSSI suite and PubMed bibliographic references with MeSH taxonomy), a virtual museum of contemporary art and a Semantic Web virtual lesson.

SKOS Requirements List Sandbox

This is a dump of the informal list of candidate requirements that can be fount at http://www.w3.org/2006/07/SWD/wiki/CandidateReqList

Nothing definitive of course (some are really minimally explained), just a basis that we want to propose for further discussion and contributions of the working group.

Amongst these requirements, one shall notice that some could play a special role, because they motivate representational features for SKOS, and possibly link to test cases for these features. Especially concerned are R1 to R7, as well as R11 and R12.

R0. Information accessible in distributed setting

An application shall get vocabulary information from an external source.

Motivation: EucTgn, EucAims

R1. Representation and access to relationship between concepts

Hierarchical (BT), but also non hierarchical (RT), for displaying or searching concepts

Motivation: EucTgn, EucIconclass, EucAims, EucProductLifeCycleSupport, EucGtaaBrowser, EucRankingForDescription, etc.

R2. Representation and access to basic lexical values (labels) associated to concepts

E.g. labels (preferred or not) different kind of keywords, for displaying or searching concepts

Motivation: EucTgn, EucIconclass, EucAims, EucGtaaBrowser, EucRankingForDescription, etc.

R3. Representation of links between labels associated to concepts

E.g. translation link between labels from different languages

Motivation: EucIconclass , EucAims

R4. Representation of glosses and notes attached to vocabulary concepts

E.g. explaining use of (the different kind of) vocabulary elements

Motivation: EucAims, EucProductLifeCycleSupport, EucTacticalSituationObject, EucBirnLexDetailed

R6. Multilinguality

Dealing with vocabularies presenting lexical information (labels, but also definitions and notes) from different languages, perhaps incompletely

Motivation: EucIconclass , EucAims

R6. Descriptor concepts and non-descriptor ones

Some elements from vocabulary shall not be used to describe documents. For example they could only be introduce to give extra meaning to a concept.

Motivation: EucIconclass , EucAims

R7. Composition of concepts

Building a new concept from existing ones (aka post-coordination)

Motivation: EucIconclass

R8. Vocabulary interoperability

Additionally to be used in a same application, elements from different vocabularies can be linked together.

Motivation: EucBiozen , EucAims

R9. Extension of vocabularies

A vocabulary might be locally extended, with new elements refering to existing ones. (special case of R.8?)

Motivation: EucIconclass, EucProductLifeCycleSupport

R10. Extendability of SKOS model

For particular cases, a vocabulary modeler should be able to introduce new properties (e.g. precise kind of definitions or notes) and type of entities (e.g. special kinds of concepts) by linking to existing SKOS constructs.

Motivation: EucIconclass, EucTgn, EucAims, EucBiozen, EucGtaaBrowser

R11. Attaching resources to concepts

Provide means to attach a given resource (e.g. corresponding to a document) to a concept the resource is about, so as to access resources described by a given concept

Motivation: EucIconclass, EucAims, EucBirnLex

R12. Correspondence/Mapping links between concepts from different vocabularies

Similar to existing SKOS mapping and normal vocabulary conceptual links (BT, NT) plus concept equivalence

Motivation: EucManuscripts, EucAims

R13. Compatibility between SKOS and other metadata models and ontologies

Using SKOS features (especially the resource-concept association link) shall be compatible with using other metadata models, like Dublin Core. When there are links between SKOS features and other metadata standards or ontologies, these shall be specified.

Motivation: EucManuscripts, EucBiozen, EucBirnLexDetailed

R.14 OWL-DL compatibility

SKOS model should be compatible with OWL-DL

Motivation: EucBiozen

R.15 Compatibility with thesaurus standards

Whenever possible, SKOS should provide with constructs mirroring existing thesaurus design standards, such as ISO 2788

Motivation: EucGtaaBrowser

R.16 Checking the consistency of a vocabulary

SKOS should provide means to check the consistency of a vocabulary, e.g. with respect to conceptual relationships which should only apply between selected elements.

Motivation: EucGtaaBrowser

References

SWBP-SKOS-CORE-GUIDE
SKOS Core Guide, A. Miles, D. Brickley, Editors, W3C Working Draft (work in progress), 2 November 2005, http://www.w3.org/TR/2005/WD-swbp-skos-core-guide-20051102 . Latest version available at http://www.w3.org/TR/swbp-skos-core-guide .
SWBP-SKOS-CORE-SPEC
SKOS Core Vocabulary Specification, A. Miles, D. Brickley, Editors, W3C Working Draft (work in progress), 2 November 2005, http://www.w3.org/TR/2005/WD-swbp-skos-core-spec-20051102 . Latest version available at http://www.w3.org/TR/swbp-skos-core-spec .
SWD
The Semantic Web Deployment Working Group

Acknowledgments

The editors gratefully acknowledge contributions from: