Sealife - A Semantic Grid Browser for the Life Sciences Applied to the study of Infectious Diseases
Contact e-mail: simon.jupp # manchester.ac.uk , robert.stevens # manchester.ac.uk
General purpose and services to the end user
The idea is to develop a grid browser for the life sciences, which will link the web to the current emerging e-science infrastructure. The Sealife browser will allow users to automatically link a host of Web servers and Web/Grid services to the Web content he/she is visiting. The browser will identify terms in web pages and documents as they are being browsed using background knowledge held in ontologies and vocabularies. Through the use of Semantic Hyperlinks, which link identified concepts and terms in pages to servers and services, the Sealife browser will offer a new dimension to context-based information integration.
The vocabularies will be used to identify bio-medical terms and concepts on web pages as the user browses the web. Dynamic hyperlinks are generated around the concepts and link boxes will offer relevant and related services to the user based on terms held in the vocabularies and ontologies.
The system is built on the Conceptual Open Hypermedia Service (COHSE) system. COHSE's Architecture is composed of a COHSE Distributed Links Service (DLS) agent and two supporting services: a Knowledge Service (OS) and a Resource Manager (RM). The DLS agent is responsible for adding hyperlinks and drawing text boxes on the pages using AJAX technology. The knowledge Service stores ontologies (in OWL) or SKOS vocabularies, that are used for term recognition on web pages. The resource manager allows mapping between concepts and resources, once resources are discovered hyperlinks to the resource are generated on the page. Services are pluggable components, as for google web search service.
The COHSE system can be deployed on the web server side, as an intermediary service (e.g. proxy) or in the browser (e.g Mozilla plugin).
Special strategies involved in the processing of user actions
For any term highlighted on a page, resources are provided for broader, narrower and related terms obtained from the underlying vocabulary.
COHSE project: http://cohse.cs.manchester.ac.uk/
General characteristics (size, coverage) of the vocabulary
The vocabularies SeaLife plan to use cover various aspects of the bio-medical domain. The domain ranges from vocabularies used in the area of molecular biology and anatomy to infectious disease, taxonomy and medicine. These vocabularies range in size, scope and representation.
The Gene Ontology, for instance has some 20,000 terms. It covers molecular function, biological process and cellular location of gene products. Other OBO ontologies cover many features of molecular biology from genotype to phenotype. Other ontologies cover disease, anatomy, healthcare, etc. they range in size from a few hundred terms to 100,000's.
Language(s) in which the vocabulary is provided
The vocabularies are provided in American English.
Using the OBO format used to create the ontologies/vocabularies.
[Term] id: neli:000015 name: Aspergillus description: .... Synonym: .... ! narrower is_a: neli:000011 ! Bacteria
[Term] id: neli:000016 name: Bacillus description: ... Synonym: ... ! broader is_a: neli:000011 ! Bacteria
All of the vocabularies and ontologies range in their level of expression but share some common features, such as subsumption hierarchies, preferred labels, synonyms and descriptors. The Gene Ontology, for instance, is a simple taxonomy combined with part-of relationships. It also has thesaurus features of scope note (definition), synonyms, broader than, narrower than and related terms.
Machine-readable representation of the vocabulary
Many of the biological ontologies (e.g. The Gene Ontology http://www.geneontology.org/) are represented in the OBO format, available as OBO XML or in the OBO file format. The OBO model has OWL semantics, and has been mapped to SKOS.
Other vocabularies have their own proprietory formats, coming in OWL, and some like MeSH (Medical Subject Headings) are in some native representation (e.g. XML, with some XSLT available to do a conversion).
SeaLife is also working with the Nation Library of Infectious Disease (NeLI) to generate a new vocabulary that will be used aid user navigation around the NeLI website. This vocabulary will be represented in SKOS and used as a lexical resource for the COHSE system.
Software applications used to create and/or maintain the vocabulary, features lacking for the case
The porject uses a combination of Protege and OBO-edit API to generate and edit vocabularies, then a set of application that convert these various formats into a SKOS representation.
Standards and guidelines considered during the design and construction of the vocabulary
SeaLife uses a mapping from semantic description formats to SKOS (OBO to SKOS, MeSH to SKOS, OWL to SKOS).
Extracts of Mappings
"Mycobacterium Tuberculosis" -> is_a (skos:broaderThan) -> "Mycobacterium" "Mycobacterium Tuberculosis" -> causes (skos:relatedTo) -> "Tuberculosis" "Azoles" -> Treatment (skos:relatedTo) -> "Tuberculosis"
Types of mapping used