diffs for A Prototype Knowledge Base for the Life Sciences

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This ~~is a First Public Working Draft of~~ W3C Interest Group Note describes how one can use the Semantic Web in to express and integrate scientific data. These techniques can be used for modeling any data, and the benefits of integration and model consistency apply to other diverse, distributed data domains. It is hoped that this document will inspire further contributions to the ongoing work at Neurocommons and the Health Care and Life Sciences Interest ~~Group (HCLS) , part of~~ Group, as well as inspire those in other domains to exploit the ~~W3C~~ Semantic ~~Web Activity .~~ Web.

This document describes the construction and use of the HCLS Knowledgebase used in the WWW2007 Banff HCLS Demo . It describes the process for creating a bilogical database on the Semantic Web. The companion document, Experiences with the conversion of SenseLab databases to RDF/OWL , describes the process for integrating new data into this Knowledgebase.

~~Please send all comments on either of these documents~~ The document was produced by ~~21 April,~~ the Semantic Web in Health Care and Life Sciences Interest Group (HCLS) ,part of the W3C Semantic Web Activity ( see charter ). Comments may be sent to the publicly archived public-semweb-lifesci@w3.org ~~, a~~ mailing list. Feedback is encouraged, as is participation in the recently re-charted HCLSIG. A list ~~with a public archive , though~~ of changes since the ~~IG does not promise explicit responses to each comment.~~ last publication is available.

Publication ~~of this document~~ as an Interest Group Note is planned for May 2008; timely comments are appreciated. Areas marked with "@@" are known to be incomplete, however, any suggestions in these areas are still appreciated. Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the disclosure obligations of the 5 February 2004 W3C Patent Policy . The group does not expect this document to become a W3C Recommendation. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information to public-semweb-lifesci@w3.org [ public archive ] in accordance with in accordance with section 6 of the W3C Patent Policy .

Appendices

1 Introduction

The life sciences have a rich history of making data available on the Web, because researchers recognized the benefits of sharing data and made it available to other researchers for the benefit of greater science. However, because many of the data repositories were developed in relative isolation, they tend to use different identifier schemes, incompatible terminology, and dissimilar data formats. This makes it hard for researchers to find all data about an entity of interest and to assemble it into a useful block of knowledge. ~~The HCLS knowledgebase~~ This prototype was built to demonstrate how Semantic Web technologies can integrate such heterogeneous data sets and thereby help scientists to more easily answer interesting scientific questions.

The key to advancing scientific understanding is empowering scientists with the information that they need to make well-informed decisions. Scientists need to be able to easily gain access to all information about chemical compounds, biological systems, diseases, and the interactions between these entities, and this requires data to be effectively integrated in order to provide a biological systems level view to the ~~user.~~ user, i.e. a complete view of biological activity. However, achieving this goal has proven to be a formidable challenge in the life sciences, where data and models are found in a large variety of formats and scales that span from the molecular to the anatomical.

In order to overcome the challenge of gaining insight directly from the Web, a number of laboratories, organizations, and companies have built internal data warehouses from the publicly available data sources. This certainly helps scientists to more easily query for all information related to entities of interest. However, these efforts generally integrate only a subset of publicly available data that is deemed to be of greatest interest, and it has proven difficult to add data sources to the ~~data~~ warehouse at a later point. Further, advances in scientific knowledge require regular changes to be made to the underlying data models, and this is not straightforward with a relational model. Organizations that use this approach also typically face challenges with representing data that is at different levels of abstraction, and that includes data of very different quality.

Many health care and life sciences organizations are interested in the data integration abilities promised by the Semantic Web. More specifically, the benefits include the aggregation of heterogeneous data using explicit semantics, and the expression of rich and well-defined models for data aggregation and search. ~~Another critical aspect of the~~ Semantic Web ~~is the ability~~ technologies enable one to more flexibly add additional data sets into the data model, and more easily reuse data in unanticipated ways. ~~Finally, once the~~ Once data has been ~~integrated, the~~ aggregated, a Semantic Web ~~enables~~ reasoner computes implied relationships among the ~~application~~ aggregated data resulting in tighter integration and the possibility of ~~reasoning to infer~~ additional insights.

~~The HCLS Knowledgebase~~ This prototype knowledge base imports data from data sources that span multiple domains in ~~health care and~~ the life sciences to make cross-discipline ~~queries and, thereby,~~ queries. It therefore provides a working (and reproducible) example of the possibilities that become available via knowledge ~~integration possible.~~ integration. The use of an RDF repository to store RDF and OWL makes it possible to query, manipulate, and reason about the data with standard ~~tools~~ tools, such as OWL reasoners, and ~~languages~~ languages, such as the SPARQL Query Language for ~~RDF, as well as OWL reasoners.~~ RDF. Although this document addresses a specific use case, the approach described here can be applied to any use case that integrates data from multiple domains.

1.2 Document Scope and Target Audience

This document attempts to succinctly describe how ~~the HCLS Knowledgebase~~ this knowledge base was constructed so that interested parties can use the core ~~requirements~~ techniques to ~~eventually~~ create their own ~~knowledgebase.~~ knowledge base. We have attempted to write a general description ~~but~~ but, unavoidably, the ~~knowledgebase~~ knowledge base makes use of ~~unavoidably~~ specialized resources, such as those found in the Data Sources section. Some, but not all, of the reasoning behind design decisions is explained. Several technologies such as ~~semantic web~~ the Semantic Web standards RDF, OWL, and SPARQL were ~~used~~ used, but ~~we are unable~~ in order to keep this document to a manageable size, we will not explain all aspects in the depth that would be required for those new to the area. Those interested in a general introduction to ~~semantic web~~ the Semantic Web should see The Semantic Web Primer . See also the CO-ODE web site for a hands-on OWL tutorial with ~~Protégé~~ Protégé .For materials introducing ontology see National Center for Biomedical Ontology (NCBO) Introduction to Biomedical Ontologies .For materials related to reasoning see The Semantic Web: Ontologies and OWL .

1.3 Stability of Terms

This document uses URLs to identify records about biological entities and processes. The identifiers used in this document are the same as those used in the ~~knowledgebase~~ prototype knowledge base and are not yet stable. ~~However, an accompanying appendix will index the regular expressions or scripts used to update these identifiers as they evolve. Knowledgebase~~ Knowledge base implementors should use these terms whenever possible.

1.4 Document Conventions

In RDF data in this ~~document, examples assume the~~ document is expressed in Turtle [ TURTLE ]. Queries on this data are expressed in SPARQL [ SPARQL ]. The following namespace prefix bindings are assumed unless otherwise stated:

Prefix	URI	Description
`rdf:`	`http://www.w3.org/1999/02/22-rdf-syntax-ns#`	The RDF Vocabulary
`rdfs:`	`http://www.w3.org/2000/01/rdf-schema#`	The RDF Schema vocabulary
`xsd:`	`http://www.w3.org/2001/XMLSchema#`	XML Schema
`sc:`	`http://purl.org/science/owl/sciencecommons/`	~~Classes and properties belonging to the~~ The ad hoc Science Commons ~~ontology.~~ ontology
`pubmedRec:`	`http://purl.org/commons/record/pmid/`	PubMed records (not the articles ~~themselves).~~ themselves)
`article:`	`http://purl.org/science/article/pmid` /	PubMed ~~articles.~~ articles
`ncbi_gene:`	`http://purl.org/commons/record/ncbi_gene/`	Entrez Gene records (not the genes ~~themselves).~~ themselves)
`proteinsubclass:`	`http://purl.org/science/protein/subjects/`	Proteins of a given gene participating in a given ~~pathway.~~ pathway
`go:`	`http://purl.org/obo/owl/GO#`	~~temporary namespace for~~ Gene Ontology terms
`protein:`	`http://purl.org/science/protein/bysequence/`	~~NCBI~~ National Center for Biotechnology Information (NCBI) records for Genes ~~sequences.~~ sequences
`ro:`	`http://www.obofoundry.org/ro/ro.owl#` ( proposed update may be more complete)	Relation Ontology (RO): Relationships between members of OBO ~~classes.~~ classes
`obo:`	`http://purl.org/obo/owl/obo#`	~~@@ don't know — contains part_of @@~~ Open Biomedical Ontologies (OBO)
`senselab:`	`http://purl.org/ycmi/senselab/neuron_ontology.owl#`	Neuroscience ontology derived from the SenseLab NeuronDB ~~database.~~ database
`dnaGeneProduct:`	`http://purl.org/science/owl/sciencecommons/is_protein_gene_product_of_dna_`	Syntactic trick to shorten sc:is_protein...described_by

1.5 Document Outline

2 Use Case introduces an interesting scientific question that the ~~knowledgebase~~ knowledge base can be used to address.

3 Data Sources describes the data sources that have been incorporated into the ~~knowledgebase.~~ knowledge base.

6 Query explains the ~~basis~~ use case query that answers the scientific question.

2 Use Case

Alzheimer's is a debilitating neurodegenerative disease that affects approximately 27 million people worldwide. The cause of Alzheimer's is currently unknown and no therapy is able to halt its progression. However, insight into the mechanism and potential treatment of this debilitating disease may come from the integration of neurological, biomedical and biological resources. The ~~HCLS knowledgebase~~ knowledge base assembles several neurology-related resources alongside an array of clinical and biological resources. This makes it possible to integrate knowledge across several research domains and potentially provide insight into the mechanisms of the disease.

The scientific question under scrutiny in our use case involves several elements of putative functional importance to Alzheimer's. CA1 Pyramidal Neurons (CA1PN) are known to be particularly damaged in Alzheimer's disease and play a key role in signal transduction. Signal transduction pathways are considered to be rich in proteins that might respond to chemical therapy. By integrating information about signal transduction, pyramidal neurons, their genes, and gene products, the query corresponding to our scientific question can provide information relevant to researchers that are looking for drug target candidates that are potentially effective against Alzheimer's Disease.

3 Data Sources

In order to incorporate data from several information sources, it was necessary to convert several exported formats, each into its own RDF bundle . The largest RDF bundle of 200M triples resulted from MeSH associations with PubMed articles. In contrast, there were a number of smaller bundles ranging from 10K to 10M triples. This resulted in a total of approximately 350M triples occupying approximately ~~20Gb~~ 20GB when loaded into the RDF repository. In several cases, we extracted only a subset, for example, by selecting only human, rat, and mouse data. Click on [Details] in the table below to view ~~provenance information~~ details such as the date of the last ~~extraction, whether the extraction was a subset, etc.~~ extraction.

At the time of publication, the following information sources have been (sometimes partially) incorporated into the ~~knowledgebase.~~ knowledge base. This set will continue to be extended in depth (i.e., more complete inclusion of partially represented data sets) and in breadth (i.e., novel data sets):

4 Design Decisions

A number of design decisions were made during the construction of the ~~HCLS Knowledgebase.~~ prototype knowledge base. Many of the decisions were pragmatic in nature, as a consequence of the need to implement the solution on a commodity PC within a two-month period for a demonstration at WWW2007.

5 Importing to RDF - Homologene Example

A number of different approaches were used for the conversion of data into RDF/OWL. The most commonly used approach was the use of Lisp code to read text exports of the data and create OWL or RDF documents. We will focus on the example of importing data from Homologene.

In the case of Homologene, we start with a text file that contains the exported information. The original tab delimited file is ftp://ftp.ncbi.nih.gov/build54/homologene.data . ~~The Lisp code for the homologene conversion is also available.~~

We are interested in the first 3 fields. The first field identifies the homologous cluster. The second field is the species taxon. The third field is the EntrezGene id. We are only interested in human, mouse, rat, taxon ids: "9606" "10116" "10090".

We The Lisp code for the homologene conversion is also available. In the conversion process, we first iterate over the lines in the file, creating a table mapping cluster id to the pairs of taxon id, entrez id in the cluster. This is the variable homologene , created by the function read-homologene . For each of these clusters we will create an individual to represent the cluster e.g for cluster 99949:

~~There are two things to note about this conversion. The first~~ Above is the RDF/XML (see [ RDF ]) expression of:

Note that we used HTTP ~~URIs were adopted as the mechanism~~ URLs to identify ~~records. Importantly, these~~ Homologene records by prefixing the EntrezGene identifiers (e.g. 727759 ) with a stem URL, http://purl.org/commons/record/ncbi_gene/ .The resulting URL can ~~also~~ be usefully resolved ~~using standard~~ with a web ~~technology (web browsers). PURLS~~ browser. The domain purl.org serves Persistent URLs (PURLs), which currently redirect these requests for NCBI gene identifiers to a ~~specific format of database records redirect~~ script at sw.neurocommons.org. If the community wishes to ~~web pages that describe those records in~~ move the ~~format. It has been our experience that URIs~~ service to, for ~~such web pages~~ instance, an NCBI page about these genes, they can simply notify the custodians of purl.org. This extra level of indirection protects these identifiers from becoming orphaned as organizations stop existing or change ~~over time. PURLS~~ their priorities. These URLs were ~~chosen because we can change what web page a PURL redirects to. Thus PURLs provide a more stable URI for a resource than the provider resources, and place control enabling one~~ also used to ~~make repairs in such situations into~~ identify gene information imported from other data sources, automatically linking the ~~hands~~ Semantic Web representations of these records. For example, PubMesh statements about gene records use these same identifiers for genes, as do the ~~HCLS community. The second is that we mapped equivalent terms~~ statements from ~~different resources to a common base URI i.e. the http://purl.org/commons/record/ncbi_gene/ prefix was used to consistently identify Entrez~~ Gene ~~records.~~ Ontology and SenseLab. This allows for trivial data integration between different resources ~~and simplifies queries~~ involving Entrez Gene records.

6 Query

Our scientific question can be summarized as "What genes are involved in signal transduction that are related to pyramidal neurons?". The scientific question can be answered with the following query, which searches for gene names and processes from four data sources within the ~~knowledgebase.~~ knowledge base. The data sources include: MeSH (Pyramidal Neurons), PubMed (Journal Articles), Entrez Gene (Genes), Gene Ontology (Signal Transduction). The example query ~~example~~ selects the gene name of the genes involved in signal transduction that are related to pyramidal neurons . Some of the complexity in this query comes from the need to capture relevant anatomical and functional detail at the subcellular and molecular level. The portion probing the Gene Ontology queries a set of classes describing processes at the molecular level. Our query employs the SPARQL RDF query language to perform knowledge integration across the sources of the ~~knowledgebase.~~ knowledge base. Details on SPARQL can be found in the References .

[Note: The query below will not work verbatim at SPARQL endpoints. We have simplified the actual Banff demonstration query for explanatory purposes in our example below. The Banff demonstration query is discussed in more detail in Named Graphs Section. You can try running the query ~~at the DERI installation .]~~ HERE .].

Please note that the same color (and CSS class) is used to connect the descriptive text in the query with relevant portions of the following figures.

The following section describes the RDF data model and how we employed it to make our query possible.

7 Data Model

Source	(colored) CSS class
PubMesh	mesh
Gene Ontology Annotation (GOA)	goa
Entrez Gene	glbl
Gene Ontology	plbl

gene_record_name	processname
Entrez Gene record for human DRD1, 1812	adenylate cyclase activation
Entrez Gene record for human ADRB2, 154	adenylate cyclase activation
...

The data in the ~~knowledgebase~~ knowledge base is modeled in OWL-DL, which has been expressed as RDF triples. Briefly, an RDF triple consists of a subject , predicate , and object . The predicate is also known as the property of the triple. Subjects and objects in the data unify to create an RDF Graph, with subjects and objects as nodes and predicates as edges. For more information about RDF and OWL, see the References section in the Appendix.

Nodes labeled with a leading "_:", e.g. ~~proteinsubclass:p1812_7190_1~~ _:activateAdenalCyclase , are called RDF blank nodes [ CONCEPTS ]. These frequently have machine-generated identifiers and are therefore typically opaque to a human reader (e.g., the set of all nodes that represent protein entities linked to the GO molecular function ~~XXX), but~~ Adenal Cyclase Activation). Here, for the purposes of explanation, ~~here,~~ they have been named to convey meaning to the reader. Blank nodes ending in "_1" in this document indicate this blank node is one of many in this ~~class.~~ class, e.g. _:signalingParticipants_1 .

Figure 1, Triples in Solution ,shows a graphical representation of the triples that compose one solution to the query posed in section 6 .Following is a discussion of the origins and intents of those triples:

The application of a commercial text mining tool to neuroscience-related PubMed abstracts results in a set of annotations that link MeSH terms to genes (for more details on MeSH, see the table in Data Sources . ). An article with PubMed id 10698743 mentions ncbi_gene:1812 and that the corresponding PubMed record has a MeSH term mesh:D017966 . The following three triples express this:

subject	predicate	object
pubmedRec:10698743	sc:has-as-minor-mesh	mesh:D017966	.
article:10698743	sc:identified_by_pmid	pubmedRec:10698743	.
ncbi_gene:1812	sc:describes_gene_or_gene_product_mentioned_by	article:10698743	.

A set of genes or gene products in human bodies are described by ncbi_gene:1812 . Here, we call this set _:equiv1812 .

protein:ncbi_gene.1812 has the same extension (members) as the OWL restriction _:equiv1812 .

is the RDF representation of an ~~owl idiom to say~~ OWL class axiom that says: for ~~every~~ all X such that

Using our other supplied constant, we note that adenylate cyclase activation, go:GO_0007190 , is part of signal transduction, ~~go:GO_0007166~~ go:GO_0007165 . Note: this simplified query matches only processes that are a sub-process of ~~go:GO_0007166;~~ go:GO_0007165; the actual query , described in §9 Named Graphs , looks also for subclasses. The part_of relationships were inferred from the OWL class restrictions described in §7.1 Precomputing Inferences . The class of functions that are realized_as adenylate cyclase activation is here labeled _:activateAdenylCyclase .

There are many possible classes of substance participating in molecular signaling, one of which (called here _:molecularSignalers_1 ) is defined by the ability to activate adenyl cyclase.

The class of proteins in the intersection of _:signalingParticipants_1 and protein:ncbi_gene.1812 is here abbreviated proteinsubclass:p1812_7190_1 , though the actual identifier is proteinsubclass:product_of_ncbi_gene.1812_that_participates_in_GO_0007190_fbc49f20524727a24c7b7effa29bad4a . Note: the Venn diagram reveals that this set is potentially ~~empty (like the intersection of cars and ice cream stands),~~ empty, theoretically permitting the query to range over pairs of gene/process that aren't related through any known protein. However, OWL-DL reasoners will not infer new classes, so the proteins in the intersection of ncbi_gene:1812 and the substances participating in molecular signaling is restricted to the set which have already been entered into the ~~knowledgebase,~~ knowledge base, e.g. like proteinsubclass:p1812_7190_1

The addition ~~(@@curation (from a text media)?@@)~~ of another MeSH record gives us another solution:

7.1 Precomputing Inferences

The demonstration query depends on the existence of an obo:part_of (or rdfs:subClassOf ) relationship between any part (i.e. any subclass of any step in the sequence) of molecular signaling, and the general identifier for molecular signaling, ~~go:GO_0007166~~ go:GO_0007165 :

~~This part_of relationship between _:subPart and~~ Triples of this form were generated by a rule, graphically expressed in Figure 2, ~~_:parentClass is inferred from~~ obo:part_of Rule .The shaded area on the right of the figure shows the ~~following~~ OWL ~~restriction:~~ restriction which is the antecedent of the rule:

The symmetric property for rdfs:subClassOf need not be explicitly modeled because the RDF Schema Specification defines subClassOf, including its transitivity. Note that if _:subClass is a subClassOf _:parentClass , , then all members of _:subClassOf are of type _:parentClass (as well as _:subClass ):

Because the ~~knowledgebase~~ triple store used does not do perform inferencing, these triples have been pre-computed (forward-chained) and inserted into the ~~knowledgebase.~~ triple store. This also simplifies the ~~query; were~~ query. If these triples were not pre-computed, the obo:part-of part of the query would be expressed:

and would need to query over a transitive closure ~~over~~ of the union of the obo:part-of and rdfs:subClassOf rules.

8 Adding a New Data Source

~~The last data source listed above,~~ SenseLab , is a collection of relational (Oracle) databases for neuroscientific research that was independently added to ~~an already used database.~~ the knowledge base after the other data sources. An accompanying document, Experiences with the conversion of SenseLab databases to RDF/OWL , describes the details of adding it to ~~the KB.~~ this knowledge base. With this new data incorporated, the example query could be extended to extract data from the new data source , in this case, discovering the names of receptor proteins associated with the genes discovered in the previous query. In an integrative query of this sort, we can use the results as a starting point for more detailed queries of a particular repository, such as in this case SenseLab.

gene_record_name	processname	receptor_protein_name
Entrez Gene record for human DRD1, 1812	adenylate cyclase activation	D1 receptor
Entrez Gene record for human ADRB2, 154	adenylate cyclase activation	NULL
...

The additional triples this matched in the SenseLab ~~knowledgebase~~ knowledge base connect to the existing data by talking about the same genes, e.g. ncbi_gene:1812 .

Figure 3, Additional Triples from SenseLab ,shows a subset of the triples provided by SenseLab. Following is a discussion of the origins and intents of those triples:

A nucleotide sequence is also described by ncbi_gene:1812 . Here, we call this _:nucleo1812 .

subject	predicate	object
_:nucleo1812	owl:onProperty	nucleotideSequence:described_by	.
_:nucleo1812	owl:hasValue	ncbi_gene:1812	.

The class senselab:DRD1_Gene has the same members as the OWL restriction _:nucleo1812 .

Our solution is a subclass of _:protGeneProd_1 called senselab:D1 . ~~@@What other subclasses of _:protGeneProd_1 are there motivating this extra subclassof relationship?@@~~

9 Named Graphs

In the Banff Demo, the resulting ~~knowledgebase~~ knowledge base partitioned the assertions into groups called Named Graphs. This process basically consists of associating a distinct URI with a connected graph of triples, and then referring to that graph via the URI. At the time of publication, any query would be expected to include SPARQL GRAPH constraints, e.g.:

The named graphs help with both provenance and scaling. In the current approach, each RDF bundle is imported into its own named graph. This is useful for a number of reasons. First, we know the source of each named graph, so we can control and review which data sources are being accessed by our queries. Additionally, the association of a named graph with a data source serves as data provenance and can also be employed by schemes that exploit knowledge about the data source to assign confidence measures in a model of trust. For example, one of the ~~knowledgebase~~ knowledge base data sources resulted from text mining experiments to find protein associations . Users of the ~~knowledgebase~~ knowledge base can choose to view this evidence of association differently than the associations provided from a protein-protein interaction database. Also, named graphs support scaling by making it possible to update selected parts of the ~~knowledgebase,~~ knowledge base, for example when the data source has new information or related ontologies are changed.

10 ~~Next Steps~~ Opportunities for further development

The ~~knowledgebase~~ knowledge base was initially designed for the purposes of a live demo. ~~The data warehousing that was performed, several~~ It also provided a basis for early work on the Neurocommons ,where its development continues. Some design choices ~~in the data, and the resulting queries~~ were ~~all aimed at~~ made to favor simplicity and maximal ~~performance.~~ performance, including the use of a central triple store, and the design of the data and queries. Many of the choices were guided by the desire for transparency for a broader audience of biomedical informaticists. Several areas of possible improvement are noted here:

There are also a number of open issues that should be addressed in future ~~research:~~ work:

Appendix

A RDF Sources

A table of the RDF sources used to create the ~~Knowledgebase:~~ Knowledge base:

RDF bundle name	Last modified	Size	Description	RDF conversion by	Terms
aba-2007-08-07.tgz	22-Sep-2007	51M	SC's extract of Allen Brain Atlas metadata from their ~~web~~ Web site. Web site was read on 26 Feb 2007 or shortly before	SC	terms of use
addgene.ttl	16-May-2007	1.1M	Addgene catalog (tab-delimited file)	SC	provided to Science Commons by Addgene
bams-from-swanson-98-4-23-07.owl	23-Apr-2007	5.6M	BAMS	~~HCLS/NIST (John Barkley)~~ HCLSIG/NIST	released without contract
galen.tgz	22-Sep-2007	1.9M	Galen from co-ode.org	-	released without contract
gene-owl.tgz	08-May-2007	7.7M	~~Extract~~ Select fields from ~~ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz~~ Entrez Gene records	~~HCLS/HP (Ray Hookway)~~ HCLSIG/SC	NCBI Copyright and Disclaimers
gene-pubmed.ttl.tgz	08-May-2007	1.5M	~~Extract from~~ Entrez Gene Extract from ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz	~~HCLS/HP~~ HCLSIG/HP/SC	NCBI Copyright and Disclaimers
goa-in-owl.tgz	16-May-2007	73M	GO annotations from ~~NCBI~~ National Center for Biotechnology Information (NCBI) and ~~EBI~~ European Bioinformatics Institute (EBI)	~~HCLS/SC~~ HCLSIG/SC	NCBI Copyright and Disclaimers ; EBI terms of use
homologene.tgz	16-May-2007	626K	Homologene	~~HCLS/SC~~ HCLSIG/SC	NCBI Copyright and Disclaimers
medline-mesh.tgz ( contact Medline for use terms)	16-May-2007	758M	List of all associations of MeSH headings to papers indexed by Medline extracted from 2007 Medline baseline distribution	~~HCLS/SC~~ HCLSIG/SC	License Agreement to Lease NLM Databases in Machine-Readable Form - see below
medline-titles.tgz ( contact Medline for use terms)	16-May-2007	670M	Extracted from 2007 Medline baseline distribution	~~HCLS/SC~~ HCLSIG/SC	see below
mesh-qualified-headings.ttl.gz	30-Apr-2007	13M	NLM 2007 MeSH descriptor/qualifier pairs	~~HCLS/SC~~ HCLSIG/SC	MeSH MOU
mesh-skos.tgz	16-May-2007	13M	NLM 2007 MeSH	van Assem et al al/SC	MeSH MOU
mesh07-eswc06.rdfs	28-Jun-2007	2.2K	van Assem et al's ontology (used by output of MeSH to SKOS conversion)	- van Assem et al	released without contract
neurocommons-text-mining.tgz	05-May-2007	24M	Neurocommons text mining pilot - extracted from Temis software applied to 7% of Medline records ~~(SC)~~	- SC	released without contract
obo-all.tgz	22-Sep-2007	36M	All OBO ~~ontologies, from berkeleypop~~ ontologies	- BBOP	released without contract
obo-in-owl.tgz	16-May-2007	2.6M	selected OBO ontologies, downloaded ~21 April 2007, augmented with inferred relations	~~HCLS/SC~~ HCLSIG/SC	released without contract
sciencecommons.owl	28-Jun-2007	19K	~~ad hoc ontology~~ A bridging ontology, from Science Commons , importing other ontologies used ~~by the output of of several of~~ in the ~~conversion scripts~~ prototype, defining classes and relations used to represent gene records and their contents, as well as few items referred to by imported data sources, but not available in a published ontology.	- HCLSIG/SC	released without contract
senselab.tgz	16-May-2007	216K	From Yale Senselab	~~HCLS/Yale (Cheung, Samwald, et al.)~~ HCLSIG/Yale	released without contract

~~This table describes the classes and properties used in the knowledgebase: sc:Gene sc:describes_gene_or_gene_product_mentioned_by sc:Article rdfs:label Gene Label~~ ~~sc:Article . sc:identified_by_pmid sc:Paper sc:Paper . sc:has-as-minor-mesh sc:PubMedId sc:PubMedId .~~ Attributions:

Science Commons (SC), Berkeley Bioinformatics Open-source Projects(BBOP), Health Care and Life Sciences Interest Group (HCLSIG), National Institute of Standards and Technology (NIST), Hewlett Packard (HP)

C B References

D C Additional Resources

The ~~knowledgebase~~ knowledge base has been installed at several locations. ~~Below are~~ At the time of this writing, these locations ~~that also~~ provide SPARQL query ~~access:~~ access, however, it is not guaranteed that the endpoints at these address will persist, or continue to serve the knowledge base described in this note:

Below are a few visual interfaces that make it possible to browse the results of a search on the ~~knowledgebase:~~ knowledge base:

Entrez Neuron was developed by the SenseLab team as a graphical user interface for querying the SenseLab ontologies.

The actions and scripts that were used to create the ~~knowledgebase~~ knowledge base on a commodity PC have been documented by several ~~HCLS~~ HCLSIG members. The necessary instructions and scripts that were used will be listed here as completely as possible:

E D Acknowledgements ~~(Informative)~~

Special thanks to to: Alan Ruttenberg (Science Commons) who coordinated the ~~assembly~~ assembly, conversion, and deployment of the data sets and ontologies and Susie Stephens (Eli Lilly) who coordinated the BioRDF task force. Together they presented the initial version of the ~~knowledgebase~~ knowledge base at ~~the~~ a WWW2007 Banff ~~demo.~~ workshop.

Many contributed to the ~~knowledgebase~~ development, documentation and ~~its documentation,~~ validation of the knowledge base, as well as the ~~thoughts~~ thinking behind ~~it, including~~ it.

Mikail Bota (USC) who kindly provided the BAMS database for our use and John Barkley ~~(NIST), Olivier Bodenreider (NLM, NIH), William Bug (School of Medicine, UCSD),~~ (NIST) converted it to RDF. Huajun Chen (Zhejiang University), ~~Paolo Ciccarese (SWAN), Kei~~ Matthias Samwald (Yale Center for Medical Informatics; DERI Galway; Semantic Web Company), Alan Ruttenberg, and Kei-Hoi Cheung ~~(SenseLab, Yale),~~ (Yale Center for Medical Informatics) participated in the the SenseLab RDF Conversion. Members of the SWAN team: Tim ~~Clark (SWAN),~~ Clark, Paolo Ciccarese, June Kinoshita, Gwen Wong, and Elizabeth Wu contributed the SWAN data source. June Kinoshita, Gwen Wong, Elizabeth Wu, Don Doherty (Brainstage Research Inc.), ~~Michel Dumontier (Carleton University), Kerstin Forsberg (AstraZeneca),~~ William Bug (School of Medicine, UCSD), and Alan Ruttenberg worked on the neurogenerative disease use cases. Ray Hookaway ~~(HP), Vipul Kashyap (Partners Healthcare), June Kinoshita (AlzForum), Joanne Luciano (Harvard Medical School), M. Scott Marshall (University~~ (HP) provided digests from Entrez Gene that were more easily converted to RDF. Jonathan Rees (Science Commons) did the RDF conversions of ~~Amsterdam),~~ Addgene, Pubmed to Gene, Medline, and MeSH, the Neurocommons text mining pilot, and compiled the data source and licensing information for this document. Alan Ruttenberg did the RDF conversion of Entrez Gene records, GO Annotations, Allen Brain Atlas, Homologene, wrote the Science Commons ontology. Alan Ruttenberg and Matthias Samwald wrote the SPARQL queries described in this document. Chris Mungall ~~(NCBO),~~ (NCBO) wrote the converter that produced the OWL versions of the OBO ontologies and consulted on matters of ontology.

Eric Neumann (Clinical Semantics ~~Group),~~ Group) produced the Exhibit visualization. Alan Ruttenberg developed the Google Mouse prototype, with contributions from Mike Travers (CollabRX), Brian Gilman (SciLink), and Tom Stambaugh (Zeetix). Don Doherty, Matthias Samwald, Holger Stenzorn (DERI), M. Scott Marshall, and Eric ~~Prud’hommeaux (W3C),~~ Prud'hommeaux have presented this work at conferences.

Barry Smith (State University of New York at Buffalo, USA) provided advice on ontology work and led the development of the Basic Formal Ontology ,which inspired all ontology work related to the knowledge base.

William Bug (School of Medicine, UCSD), Michel Dumontier (Carleton University), and Holger Stenzorn (DERI) reviewed and gave detailed comments on an initial draft of this note. Alan Ruttenberg Jonathan Rees and Susie Stephens, reviewed and contributed to several versions of the document. Susie Stephens coordinated the BioRDF task force, worked on presentations of the work, and wrote the introduction to this document. M. Scott Marshall (University of Amsterdam) and Eric Prud’hommeaux (W3C) edited and coordinated the production of this note. Eric Prud’hommeaux created the figures.

We would like to offer special thanks for organizations which gave contributions of equipment and service. Through Ray Hookway and Jeannine Crockford, Hewlett Packard donated two machines for a period of six weeks during the demo. Science Commons hosted the the prototype during development and continues to host and develop a knowledge base derived from the prototype as part of the Neurocommons .MIT CSAIL hosts Science Commons and provided computer and networking infrastructure.

Kingsley Idehen, Orri Erling, Ivan Mikhailov, Mitko Iliev, Patrick van Kleef and Anton Avramov from Openlink Software provided rapid technical support including several custom builds of the Virtuoso triple store to address early performance issues, making it possible to develop the prototype on an aggressive schedule. Evren Sirin from Clark and Parsia provided support for the Pellet OWL reasoner .

In addition to data sources that were incorporated into the prototype, other data that did not make it in was provided by Judith Blake (MGD) an Simon Twigger (RGD), and Colin Knep (Alzforum)

Change Log

Raw CVS log:

  Log: Overview.html,v $
        Revision 1.2  2008/06/05 16:20:34  eric
        + color legend
        
        Revision 1.124  2008/06/05 14:51:54  mmarshal
        few typo's fixed, color guide unfinished
        
        Revision 1.123  2008/06/05 12:03:36  eric
        ~ trying 5 more pixels for slTriples.svg
        
        Revision 1.122  2008/06/05 03:42:43  eric
        ~ resized slTriples.svg
        
        Revision 1.121  2008/06/04 21:06:05  eric
        - <font> tags
        
        Revision 1.120  2008/06/04 21:03:47  eric
        - some encoding bugs
        
        Revision 1.119  2008/06/04 18:59:04  eric
        - removed width on homologene data pre border
        
        Revision 1.118  2008/06/04 17:49:38  mmarshal
        last fixes
        
        Revision 1.117  2008/06/03 18:33:59  mmarshal
        misc edits
        
        Revision 1.116  2008/06/03 16:11:45  mmarshal
        misc edits
        
        Revision 1.115  2008/06/03 06:01:31  eric
        ~ reworked §7 ¶2 sentence 2
        ~ reworked SenseLab introduction
        ~ changed "simplicity and performance" sentence in §10
        
        Revision 1.114  2008/06/02 22:56:14  eric
        + text references to the figures per mid:fcc499200805270711h275ffbbahd41edf1ccb063705@mail.gmail.com
        
        Revision 1.113  2008/06/02 21:20:13  eric
        ...
        
        Revision 1.112  2008/06/02 21:03:56  eric
        + turtle representation of RDF/XML
        
        Revision 1.111  2008/06/02 20:56:04  eric
        ~ picked a pick-one alternative
        + PubMesh and SenseLab using same identifiers
        
        Revision 1.110  2008/06/02 18:39:34  eric
        ~ tweaked refs, removed some
        
        Revision 1.109  2008/06/02 17:25:56  eric
        ~ picked a data set to match the generated RDF
        - some spurious markup
        
        Revision 1.108  2008/05/30 17:23:18  eric
        ~ s/a reasoner, another Semantic Web technology, /a Semantic Web reasoner /
        ~ s/standard tools and languages such as the SPARQL Query Language for RDF, and OWL reasoners/standard tools, such as OWL reasoners, and languages, such as the SPARQL Query Language for RDF/
        ~ removed odd whitespaces
        + proposed alternate wording for PURLs
        ~ looking for processes that are a subclass *or component* of signal transduction
        + ericP has presented this work
        
        Revision 1.107  2008/05/30 06:29:09  mmarshal
        Added Bill Bug Memorial
        
        Revision 1.106  2008/05/29 23:23:49  mmarshal
        misc edits
        
        Revision 1.105  2008/05/27 16:57:28  mmarshal
        corrected encoding to unix (!)
        
        Revision 1.104  2008/05/25 22:16:11  mmarshal
        misc edits, corrections
        
        Revision 1.103  2008/05/24 13:20:34  mmarshal
        removed @@'s and fixed assorted errors
        
        Revision 1.102  2008/05/23 22:54:04  mmarshal
        Susie's edits
        
        Revision 1.101  2008/05/18 21:56:21  mmarshal
        Alan's edits
        
        Revision 1.100  2008/05/13 17:50:34  mmarshal
        a few fixes
        
        Revision 1.99  2008/05/13 15:01:52  mmarshal
        Mostly edits due to Alan's comments
        
        Revision 1.98  2008/05/12 18:09:09  mmarshal
        misc edits from comments and contributors - in progress..
        
        Revision 1.97  2008/04/07 07:17:12  eric
        ~ prep for publication
        
        Revision 1.96  2008/04/03 21:21:26  mmarshal
        edits from Jonathan Rees
        
        Revision 1.95  2008/04/03 20:54:40  eric
        ~ clarify restrictions
        
        Revision 1.94  2008/04/03 17:42:11  mmarshal
        clarified initial pubmesh reference
        
        Revision 1.93  2008/04/03 15:45:33  mmarshal
        misc edits
        
        Revision 1.92  2008/04/01 16:13:45  eric
        ~ fix forward ref for part_of inference
        
        Revision 1.91  2008/03/28 22:07:46  eric
        ~ avoid mod-dir redirect to senselab/
        
        Revision 1.90  2008/03/28 02:26:40  eric
        ~ add file extension to svg pics
        
        Revision 1.89  2008/03/27 15:43:01  mmarshal
        validate
        
        Revision 1.88  2008/03/24 17:34:24  eric
        ~ sized and fragmented tables
        
        Revision 1.87  2008/03/24 09:44:42  eric
        ~ validate
        
        Revision 1.86  2008/03/24 09:38:59  eric
        ~ validate
        
        Revision 1.85  2008/03/24 09:30:40  eric
        ~ re-embedded SVG images
        ~ sized and fragmented tables
        
        Revision 1.84  2008/03/21 19:12:08  eric
        ~ clarify DRD1 class membership
        
        Revision 1.83  2008/03/20 22:35:02  mmarshal
        more edits from Alan, Scott
        
        Revision 1.82  2008/03/20 18:34:22  mmarshal
        changed RDF bundle locations into PURL's
        
        Revision 1.81  2008/03/18 18:15:24  mmarshal
        changed section order, some formatting
        
        Revision 1.80  2008/03/18 17:38:22  mmarshal
        more corrections and comments from Alan
        
        Revision 1.79  2008/03/17 20:23:08  eric
        ~ integrate alanR's owl tutorial
        
        Revision 1.78  2008/03/17 20:13:33  eric
        ~ integrate alanR's owl tutorial
        
        Revision 1.77  2008/03/17 16:25:51  eric
        ~ fixed encoding prob
        
        Revision 1.76  2008/03/17 16:20:41  eric
        ~ fixed some broken refs
        
        Revision 1.75  2008/03/17 15:28:57  eric
        ~ moved susie to contributors
        
        Revision 1.74  2008/03/17 14:59:12  eric
        ~ fix b0rken links
        
        Revision 1.73  2008/03/17 14:41:30  eric
        ~ use neurocommons to back up KB sources (apart from medline)
        
        Revision 1.72  2008/03/10 15:06:34  eric
        ~ pubrules-ready
        
        Revision 1.71  2008/03/08 16:48:51  eric
        ~ validation
        
        Revision 1.70  2008/03/07 18:42:28  eric
        ~ a bit more in 6.1
        
        Revision 1.69  2008/03/04 00:20:44  mmarshal
        Some of Alan Ruttenberg's comments
        
        Revision 1.68  2008/03/02 21:38:45  eric
        + namespace descriptors per AlanR's suggestion
        
        Revision 1.67  2008/02/25 16:06:07  mmarshal
        Misc. edits from William Bug, Michel Dumontier.
        
        Revision 1.66  2008/02/11 14:30:36  eric
        + Rule Modeling
        
        Revision 1.65  2008/02/07 23:22:49  eric
        ~ change molecularSignalers name to signalingParticipants
        
        Revision 1.64  2008/02/07 23:22:04  eric
        ~ change molecularSignalers name to signalingParticipants
        
        Revision 1.63  2008/02/04 16:02:47  mmarshal
        more edits from Michel + misc
        
        Revision 1.62  2008/02/03 22:48:09  mmarshal
        worked some of Michel Dumontier's suggestions in
        
        Revision 1.61  2008/01/28 16:03:40  mmarshal
        expanded on RDF Import section, added details link from incorporated database table
        
        Revision 1.60  2008/01/27 19:02:01  mmarshal
        Some restructuring, new section for Query, Scope section
        
        Revision 1.59  2008/01/25 18:01:41  mmarshal
        small edits to sync to googledocs
        
        Revision 1.58  2008/01/19 00:14:46  mmarshal
        More edits from Susie's comments
        
        Revision 1.57  2008/01/14 16:10:23  mmarshal
        added some of Susie's edits
        
        Revision 1.56  2008/01/10 23:04:02  eric
        ~ s/_:protein_1/protein:p1812_7190_1/g
        ~ tweak wording of subclass-part-of issue
        
        Revision 1.55  2008/01/08 18:34:13  mmarshal
        added Acknowledgements
        
        Revision 1.54  2008/01/08 18:07:07  mmarshal
        added RDF sources table
        
        Revision 1.53  2008/01/07 15:14:14  eric
        ~ HTML-validate
        
        Revision 1.52  2008/01/07 15:08:50  eric
        ~ feedback from phone call with AlanR
        
        Revision 1.51  2008/01/06 17:30:40  eric
        ~ stack figures instead of floating them to the right
        
        Revision 1.50  2008/01/06 15:15:00  eric
        ~ pretty-up subclass-part-of issue
        
        Revision 1.49  2008/01/06 15:12:47  eric
        + subclass-part-of issue
        
        Revision 1.48  2008/01/06 14:59:47  eric
        ~ update empty-protein-set issue
        
        Revision 1.47  2008/01/05 21:05:57  eric
        + senselab triples
        
        Revision 1.46  2007/12/26 14:55:41  eric
        ~ fix stupid case-folding attribute assignments (I blame HTML-mode)
        
        Revision 1.45  2007/12/26 14:47:57  eric
        ~ fix some broken linkes, look askance at rest
        
        Revision 1.44  2007/12/26 14:17:23  eric
        ~ rearranged triples in the triples table and added prose
        
        Revision 1.43  2007/12/26 13:20:05  eric
        ~ moved style shared with svg to local.css
        ~ commented out pre-formatted triples alternative
        
        Revision 1.42  2007/12/25 23:07:54  eric
        + triples picture in 3 Triple Model
        
        Revision 1.41  2007/12/25 14:41:57  eric
        ~ switched from the graph around
            ncbi_gene:2903, GO:0007215, record:7816096
          to
            ncbi_gene:1812, GO:0007190, record:10698743
          to make sure it worked with DERI's SenseLab KB
        ~ switched from subClassOf 7166 to part_of 7166
        ~ made SenseLab join OPTIONAL
        
        Revision 1.40  2007/12/25 11:03:37  eric
        ~ s/\?parent/?protein_superclass/g
        ~ fixed named graph query
        
        Revision 1.39  2007/12/25 04:40:32  eric
        + override link underlining in pre-formatted text (presumably queries and triples)
        + triplesTable style
        ~ improve comments in the example query in 2 Use Case
        + broach GO modelling
        ~ improve RDF exposition in 3 Triple Model
        + added { ?restriction3 owl:onProperty dnaGeneProduct:described_by } triples to Matthias's SenseLab query and Named Graph example
        
        Revision 1.38  2007/12/23 13:51:43  eric
        ~ fixed errors in putative extracted triples
        
        Revision 1.37  2007/12/23 07:07:51  eric
        + Contributors
        
        Revision 1.36  2007/12/21 17:45:46  eric
        ...
        
        Revision 1.35  2007/12/21 17:15:16  eric
        ...
        
        Revision 1.34  2007/12/21 17:12:14  eric
        ...
        
        Revision 1.33  2007/12/21 17:09:33  eric
        ~ working on triples
        
        Revision 1.32  2007/12/21 15:56:37  mmarshal
        more edits
        
        Revision 1.31  2007/12/21 14:19:19  mmarshal
        more edits, Design section
        
        Revision 1.30  2007/12/21 13:05:29  eric
        ~ trying to make the Triple Model graph more human-readable
        + prefixes in named graph query
        
        Revision 1.29  2007/12/21 12:06:13  mmarshal
        more text to named graphs
        
        Revision 1.28  2007/12/21 11:05:13  mmarshal
        assorted changes to intro, design principles, next steps
        
        Revision 1.27  2007/12/20 23:50:30  mmarshal
        Corrections from Susie, Jonathan, others.
        Created Design Principles section with Susie's text
        (merged Unifying Terms with new section)
        
        Revision 1.26  2007/12/20 16:51:05  eric
        + notes from publication powow
        
        Revision 1.25  2007/12/20 16:11:04  eric
        moved 8 Named Graphs section
        
        Revision 1.24  2007/12/20 16:08:28  eric
        ...
        
        Revision 1.23  2007/12/20 16:07:01  eric
        ...
        
        Revision 1.22  2007/12/20 16:04:20  eric
        + unify on ncbi:1812
        
        Revision 1.21  2007/12/20 15:08:24  eric
        ~ s/?paper/?pubmed_record/ per alanR's suggestion mid:04189805-5B6D-4BA1-9C99-DC38D1D50472@gmail.com
        
        Revision 1.20  2007/12/20 15:05:45  eric
        + current demo query
        
        Revision 1.19  2007/12/20 14:50:30  mmarshal
        Some rewording..
        
        Revision 1.18  2007/12/19 23:54:07  mmarshal
        added triples mini-summary, refined use case text,
        changed database table (_sources_ only - RDF bundle details to appendix)
        
        Revision 1.17  2007/12/19 17:45:17  mmarshal
        added explanation to Use Case, updated TOC and Outline to reflect new section "Adding a New Database"
        
        Revision 1.16  2007/12/19 16:14:45  eric
        + incorporated

Allen Brain Atlas (ABA)	Allen Brain Atlas is an interactive, genome-wide image database of gene expression in the mouse brain. A combination of RNA in situ hybridization data, detailed Reference Atlases and informatics analysis tools are integrated to provide a searchable digital atlas of gene expression. Together, these resources present a comprehensive online platform for exploration of the brain at the cellular and molecular level.	[Details]
Addgene	A catalog of plasmids from Addgene	[Details]
BAMS	The Brain Architecture Management System (BAMS) is designed to be a repository of information about brain structures from different species, and has a set of inference engines for processing the neurobiological data. BAMS contains ~~to date~~ five interrelated modules: Brain Parts (brain regions, major fiber tracts, and ventricles), Cell Types, Molecules, Relations (between structures from different neuroanatomical atlases), and Connections.	[Details]
GALEN	GALEN is an advanced terminology of medical concepts for clinical information systems. ~~More on GALEN.~~ We imported the GALEN ontology in OWL from CO-ODE .	[Details]
NCBI gene_info	~~NCBI~~ Information from the gene_info file distributed by NCBI that was imported into OWL.	[Details]
Gene Ontology (GO)	The Gene Ontology project provides a controlled vocabulary to describe gene and gene product attributes in any organism. GO terms are often used to annotate gene and protein records.	[Details]
GOA	GO annotations from ~~NCBI~~ National Center for Biotechnology Information (NCBI) and ~~EBI.~~ European Bioinformatics Institute (EBI).	[Details]
HomoloGene	Homologene is a system for automated detection of homologs among the annotated genes of several completely sequenced eukaryotic genomes.	[Details]
MEDLINE/PubMed	PubMed is a service of the U.S. National Library of Medicine that includes over 17 million citations from MEDLINE and other life science journals for biomedical articles back to the 1950s. PubMed includes links to full text articles and other related resources.	[Details]
MeSH	Medical Subject Headings. 2008 MeSH includes the subject descriptors appearing in MEDLINE/PubMed , the ~~NLM~~ National Library of Medicine (NLM) catalog database, and other NLM databases. ~~See the MeSH introduction .~~	[Details]
Neurocommons Text Mining Pilot	Protein/gene associations/interactions extracted from Temis software applied to 7% of Medline ~~records (SC).~~ records. Annotations were captured in RDF using the Neurocommons Annotations Schema .	[Details]
~~BerkeleyBop OBO ontologies~~ Open Biomedical Ontologies (OBO)	All Open Biomedical Ontologies ( OBO ) available from ~~BerkelyBop.~~ Berkeley Bioinformatics Open-source Projects.	[Details]
Science Commons Ontology	~~An ad hoc ontology~~ A bridging ontology, from Science Commons , importing other ontologies used ~~by the output of several of~~ in the ~~conversion scripts.~~ prototype, defining classes and relations used to represent gene records and their contents, as well as few items referred to by imported data sources, but not available in a published ontology.	[Details]
SenseLab	~~There will be a reference here~~ See Experiences with the conversion of SenseLab databases to ~~another W3C document.~~ RDF/OWL .	[Details]
SWAN	Semantic Web Applications in Neuromedicine [ SWAN ] is a knowledge base of hypotheses, claims, and evidence in Alzheimer Disease (AD) research, created through a community process to capture the collective scientific insights of the AD field.	~~@@ no bundle~~ Not yet @@ public
SKOS	Simple Knowledge Organization System : (SKOS): specifications and standards to support the use of knowledge ~~Organization~~ organization systems (KOS) such as thesauri, classification schemes, subject heading systems and taxonomies within the framework of the Semantic ~~Web~~ Web.	- [Details]

?process	rdfs:subClassOf	?what	.
?what	owl:onProperty	obo:has_part	.
?what	owl:someValuesFrom	~~go:GO_0007166~~ go:GO_0007165	.

HCLS Knowledgebase A Prototype Knowledge Base for the Life Sciences

W3C Working Draft Interest Group Note 4 April June 2008

Abstract