From W3C Wiki
The wiki page for the HCLSIG is at: http://esw.w3.org/topic/HCLSIG
This page was created prior to the formation for the original HCLSIG.
In Health Care and Medicine, managing information is important in
- Creating and Managing an WikiPedia:Electronic_Health_Record (eHR) for patients in Health Care
- Managing large medical terminologies and vocabularies like the WikiPedia:Unified_Medical_Language_System (UMLS), GALEN and SNOMED-CT, the WikiPedia:Systematized_Nomenclature_of_Medicine
- All stages of the WikiPedia:Drug_discovery pipeline, from target selection through to clinical trials, rely on the effective management of large amounts of information
In the Life Sciences, large amounts of information need to be managed, for example
- Understanding vast amounts of data generated by WikiPedia:Sequencing of biological molecules WikiPedia:DNA , WikiPedia:RNA and WikiPedia:Protein, for example in the WikiPedia:Human_Genome_Project
- Managing high-throughput WikiPedia:Microarray experiments, which measure WikiPedia:Gene expression
- Mining public repositories of biomedical literature on the Web, for example in WikiPedia:PubMed, and in all stages of scientific, technical and medical publishing
- Managing large thesauri used by biologists, such as the WikiPedia:National_Cancer_Institute Thesaurus and the WikiPedia:Gene_Ontology
- Classifying organisms in WikiPedia:Taxonomy, such as the WikiPedia:National_Center_for_Biotechnology_Information (NCBI) Taxonomy, as in the study of WikiPedia:Phylogenetics and WikiPedia:Evolution
- Understanding the Biological Pathways involved in WikiPedia:Metabolism
The use of WikiPedia:RDF and WikiPedia:Web_Ontology_Language (OWL) produced by the SemanticWeb intitiative, has already helped in some of these areas (both Health Care and Life Sciences) and shows promise for the future. See the summary report from the W3C Workshop on Semantic Web for Life Sciences in 2004 for further details on this activity
The development and deployment of ontologies is a fundamental part of the Semantic Web. The Ontology Working Group of the HCLS is focused on this area. Some example biomedical ontologies (RDF, OWL and OBO formats) being developed and used includes:
- Open Biomedical Ontologies is an umbrella web address for well-structured controlled vocabularies for shared use across different biological domains.
- Gene Ontology and Gene Ontology Next Generation Critical ontologies for assigning location, function and process involvement to genes and proteins. The first place a scientist goes when she needs to quickly annotate a list of gene names is GO.
- The Micro Array Gene Expression Data (MGED) Ontology is an ontology for describing Microarray data and experiments
- BioPAX - An ontology for biological pathway data
- The Unified Medical Language System ontology is at the core of the National Library of Medicine's (US) Semantic Knowledge Representation effort.
- Robert Steven's Bio-Ontology page lists some biomedical ontologies but also contains some dead links.
- Gene Ontology Annotation Tool (GOAT) Uses an OWL representation of the GO to drive an annotation tool using the reasoning support provided by OWL.
- Reconcile And SHare (RASH) aims to address the problem of semantic heterogeneity in bioinformatics resources.
- HyBrow Computer aided hypothesis evaluation
- Gene Ontology Categorizer paper describing the following method: given a list of genes of interest, what are the best nodes of the GO to summarize or categorize that list?.
- The Bio-Ontologies Meeting at the Intelligent Systems for Molecular Biology (ISMB) publishes papers on the latest research in biomedical ontologies
- The Pacific Symposium on Biocomputing has published life science ontology papers for a couple of years now: 2004,2003 and has a dedicated Semantic Web track, publishing research in this area
Data Sources in RDF
- UniProt in RDF Format (21 Jul to semweb-lifesci) This is a lot of triples about proteins and includes enormous amounts of annotations. There is also a separate (more stable) distribution for benchmarking.
- RDF Data on splice variants and protein motifs released by Affymetrix's Melissa Cline. This is harder to parse for a non-scientist but extremely valuable. The data were released in conjunction with a paper at the Pacific Symposium on Biocomputing.
- the Gene Ontology schema in RDF is unfortunately not yet official, but fun to play with. This is also in the Uniprot RDF release.
- Extracting RDF from GenBank XML files might be a neat trick. NCBI is a major provider of biological data (including Genbank) as well as of database cross-references (especially Entrez Gene), so getting (at least some of) their data in RDF form is critical.
- Viewing the Gene Ontology in SVG (.pdf warning) is not data in RDF, but intriguing.
Use Cases and Demonstrations
- The myGrid project at University of Manchester, UK makes extensive use of RDF, OWL and LSID to enable grid-based computing in its workflow application for biomedical scientists called Taverna. Semantic Web technology plays a role at all stages of the experimental cycle from discovery, selection, composition, execution and results management of many biomedical Web Services.
- The BioMOBY project uses RDF and LSID to describe its services, and has many use cases and examples.
- BioDASH is a Semantic Web prototype of a Drug Development Dashboard that associates disease, compounds, drug progression stages, molecular biology, and pathway knowledge.
- YeastHub is a semantic web use case for integrating data in the life sciences domain.
- ProofOfConcept is the set of use cases we are trying to develop for the Call For Participation: W3C Workshop on Semantic Web for Life Sciences
- Andrew Cates has built some BusinessCases, ProofOfConcept and GrantApplications and asks that people stop by and edit them.
- LifeSciencesQueries points to some queries that can be used to test query engines.
Languages, Standards and Applications
- Life Sciences Identifiers is a new OMG standard.
- Biological Sequence ML
- Microarray Gene Expression Database ML
- BioDAS An open source description for annotations of bio data
- Format for the Database Japan resources with WSDL and SOAP transaction features
- Chemical Markup Language has lots of uptake by vendors and users alike.
- Kyoto Encyclopedia of Genes and Genomes XML is a markup language used in the biggest collection of public pathways information.
- PSI MI XML is an accepted format for describing protein-protein interactions and interaction networks.
- the MEDGENE database full of interesting disease-gene associations.
- This short introductory chapter on Molecular Biology for Computer Scientists may also be helpful.
- URCHIN from nature (RSS+RDF) is an extremely cool RSS system.
- the ExperiBase project soon to include RDF goodness. Being implemented at Pacific Northwest National Labs, designed at MIT.
- The World Wide Molecular Matrix CMLRSS is the use of RSS1.0 to support the free distribution of molecular information in CML (Chemical Markup Language)
- the OMG's Life Sci group is responsible for gene expression markup language and LSID among other things.
- Cover Pages: W3C Public Workshop on Semantic Web and Life Sciences Features OWL, RDF, and LSID.