Copyright © 2008 W3C ® ( MIT , ERCIM , Keio ), All Rights Reserved. W3C liability , trademark and document use rules apply.
The HCLS Knowledgebase (HCLS-KB)
prototype we describe is a biomedical
knowledge base base, constructed for a demonstration at Banff WWW2007 ,
that integrates 15 distinct data sources using currently available
Semantic Web Technologies technologies such as the W3C standard Web Ontology
Language (OWL) [ OWL ] and Resource
Description Framework (RDF).
[ RDF ]. This report
outlines which resources were integrated, how the KB knowledge base was
constructed using freely available
free and open source triple store
technology, how it can be queried using the W3C Recommended RDF
query language SPARQL, SPARQL [ SPARQL ], and what
resources and inferences are involved in answering complex queries.
While the utility of the KB knowledge base is illustrated by identifying a set
of genes involved in Alzheimer's Disease, the approach described
here can be applied to any use case that integrates data from
multiple domains.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This is a First Public Working Draft
of W3C Interest Group Note describes
how one can use the Semantic Web in to express and integrate
scientific data. These techniques can be used for modeling any
data, and the benefits of integration and model consistency apply
to other diverse, distributed data domains. It is hoped that this
document will inspire further contributions to the ongoing work at
Neurocommons and the Health Care and Life Sciences Interest
Group (HCLS) , part of Group, as well as inspire those in other domains to
exploit the W3C Semantic
Web Activity . Web.
This document describes the construction and use of the HCLS Knowledgebase used in the WWW2007 Banff HCLS Demo . It describes the process for creating a bilogical database on the Semantic Web. The companion document, Experiences with the conversion of SenseLab databases to RDF/OWL , describes the process for integrating new data into this Knowledgebase.
Please send all comments on either of
these documents The document was
produced by 21 April, the Semantic
Web in Health Care and Life Sciences Interest Group
(HCLS) ,part of the W3C Semantic Web
Activity ( see
charter ). Comments may be
sent to the
publicly archived public-semweb-lifesci@w3.org
, a mailing list. Feedback is encouraged, as is participation in the
recently re-charted HCLSIG.
A list with a
public archive , though of changes
since the IG does not promise explicit
responses to each comment. last
publication is available.
Publication of this document as an
Interest Group Note is planned for May 2008;
timely comments are appreciated. Areas marked with "@@" are known
to be incomplete, however, any suggestions in these areas are still
appreciated. Publication as a Working Draft does not imply
endorsement by the W3C Membership. This is a draft document and may
be updated, replaced or obsoleted by other documents at any time.
It is inappropriate to cite this document as other than work in
progress.
This document was produced by a group operating under the disclosure obligations of the 5 February 2004 W3C Patent Policy . The group does not expect this document to become a W3C Recommendation. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information to public-semweb-lifesci@w3.org [ public archive ] in accordance with in accordance with section 6 of the W3C Patent Policy .
The life sciences have a rich history of making data available
on the Web, because researchers recognized the benefits of sharing
data and made it available to other researchers for the benefit of
greater science. However, because many of the data repositories
were developed in relative isolation, they tend to use different
identifier schemes, incompatible terminology, and dissimilar data
formats. This makes it hard for researchers to find all data about
an entity of interest and to assemble it into a useful block of
knowledge. The HCLS knowledgebase
This prototype was built to demonstrate
how Semantic Web technologies can integrate such heterogeneous data
sets and thereby help scientists to more easily answer interesting
scientific questions.
The key to advancing scientific understanding is empowering
scientists with the information that they need to make
well-informed decisions. Scientists need to be able to easily gain
access to all information about chemical compounds, biological
systems, diseases, and the interactions between these entities, and
this requires data to be effectively integrated in order to provide
a biological systems level
view to the user. user, i.e. a complete view of biological activity.
However, achieving this goal has proven to be a formidable
challenge in the life sciences, where data and models are found in
a large variety of formats and scales that span from the molecular
to the anatomical.
In order to overcome the challenge of gaining insight directly
from the Web, a number of laboratories, organizations, and
companies have built internal data warehouses from the publicly
available data sources. This certainly helps scientists to more
easily query for all information related to entities of interest.
However, these efforts generally integrate only a subset of publicly available data that is
deemed to be of greatest interest, and it has proven difficult to
add data sources to the data
warehouse at a later point. Further,
advances in scientific knowledge require regular changes to be made
to the underlying data models, and this is not straightforward with
a relational model. Organizations that use this approach also
typically face challenges with representing data that is at
different levels of abstraction, and that includes data of very
different quality.
Many health care and life sciences organizations are interested
in the data integration abilities promised by the Semantic Web.
More specifically, the benefits include the aggregation of
heterogeneous data using explicit semantics, and the expression of
rich and well-defined models for data aggregation and search.
Another critical aspect of the Semantic
Web is the ability technologies enable one to more flexibly add
additional data sets into the data model, and more easily reuse
data in unanticipated ways. Finally, once
the Once data has been
integrated, the aggregated, a Semantic Web enables reasoner computes
implied relationships among the application aggregated data
resulting in tighter integration and the possibility of
reasoning to infer additional
insights.
The HCLS Knowledgebase This prototype knowledge base imports data from
data sources that span multiple domains in health care and the life sciences to make
cross-discipline queries and, thereby,
queries. It therefore provides a working (and
reproducible) example of the possibilities that become available
via knowledge integration
possible. integration. The use of
an RDF repository to store RDF and OWL makes it possible to query,
manipulate, and reason about the data with standard tools tools, such as OWL
reasoners, and languages
languages, such as the SPARQL Query
Language for RDF, as well as OWL
reasoners. RDF. Although this
document addresses a specific use case, the approach described here
can be applied to any use case that integrates data from multiple
domains.
This document attempts to succinctly describe how the HCLS Knowledgebase this
knowledge base was constructed so that interested parties can
use the core requirements techniques to eventually create their own knowledgebase. knowledge
base. We have attempted to write a general description
but but,
unavoidably, the knowledgebase
knowledge base makes use of unavoidably specialized resources, such as those
found in the Data Sources section. Some, but not
all, of the reasoning behind design decisions is explained. Several
technologies such as semantic web
the Semantic Web standards RDF, OWL, and SPARQL were used used, but
we are unable in
order to keep this document to a
manageable size, we will not explain all aspects in the depth
that would be required for those new to the area. Those interested
in a general introduction to semantic
web the Semantic Web should see
The Semantic Web
Primer . See also the CO-ODE
web site for a
hands-on OWL tutorial with Protégé
Protégé .For
materials introducing ontology see National Center
for Biomedical Ontology (NCBO)
Introduction to Biomedical
Ontologies .For materials related
to reasoning see The Semantic Web: Ontologies and OWL .
This document uses URLs to identify records about biological
entities and processes. The identifiers
used in this document are the same as those used in the knowledgebase prototype
knowledge base and are not yet stable. However, an accompanying appendix will index the regular
expressions or scripts used to update these identifiers as they
evolve. Knowledgebase Knowledge
base implementors should use these terms whenever
possible.
In RDF data
in this document, examples assume
the document is expressed in Turtle
[ TURTLE ]. Queries on
this data are expressed in SPARQL [ SPARQL ]. The following namespace prefix bindings
are assumed unless otherwise
stated:
Prefix | URI | Description |
---|---|---|
rdf: |
http://www.w3.org/1999/02/22-rdf-syntax-ns# |
The RDF Vocabulary |
rdfs: |
http://www.w3.org/2000/01/rdf-schema# |
The RDF Schema vocabulary |
xsd: |
http://www.w3.org/2001/XMLSchema# |
XML Schema |
sc: |
http://purl.org/science/owl/sciencecommons/ |
|
pubmedRec: |
http://purl.org/commons/record/pmid/ |
PubMed records (not the articles |
article: |
http://purl.org/science/article/pmid / |
PubMed |
ncbi_gene: |
http://purl.org/commons/record/ncbi_gene/ |
Entrez Gene records (not the genes |
proteinsubclass: |
http://purl.org/science/protein/subjects/ |
Proteins of a given gene participating in a given |
go: |
http://purl.org/obo/owl/GO# |
|
protein: |
http://purl.org/science/protein/bysequence/ |
|
ro: |
http://www.obofoundry.org/ro/ro.owl# (
proposed update may be more complete) |
Relation Ontology (RO):
Relationships between members of OBO |
obo: |
http://purl.org/obo/owl/obo# |
|
senselab: |
http://purl.org/ycmi/senselab/neuron_ontology.owl# |
Neuroscience ontology derived from the SenseLab NeuronDB
|
dnaGeneProduct: |
http://purl.org/science/owl/sciencecommons/is_protein_gene_product_of_dna_ |
Syntactic trick to shorten sc:is_protein...described_by |
1 Introduction motivates and explains this document.
2 Use Case introduces an
interesting scientific question that the knowledgebase knowledge
base can be used to address.
3 Data Sources describes the data
sources that have been incorporated into the knowledgebase. knowledge
base.
4 Design Decisions explains the reasons for several design choices.
5 Importing to RDF - Homologene Example explains the process of translating data into RDF triples.
6 Query explains the basis use case query
that answers the scientific question.
7 Data Model explains the basics of RDF triples.
8 Adding a New Data Source explains how the SenseLab database was integrated.
9 Named Graphs discusses the use of named graphs and query details.
10 Next
Steps Opportunities for further
development discusses problem areas and possible
improvements.
Alzheimer's is a debilitating neurodegenerative disease that
affects approximately 27 million people worldwide. The cause of
Alzheimer's is currently unknown and no therapy is able to halt its
progression. However, insight into the mechanism and potential
treatment of this debilitating disease may come from the
integration of neurological, biomedical and biological resources.
The HCLS knowledgebase knowledge base assembles several neurology-related
resources alongside an array of clinical and biological resources.
This makes it possible to integrate knowledge across several
research domains and potentially provide insight into the
mechanisms of the disease.
The scientific question under scrutiny in our use case involves several elements of putative functional importance to Alzheimer's. CA1 Pyramidal Neurons (CA1PN) are known to be particularly damaged in Alzheimer's disease and play a key role in signal transduction. Signal transduction pathways are considered to be rich in proteins that might respond to chemical therapy. By integrating information about signal transduction, pyramidal neurons, their genes, and gene products, the query corresponding to our scientific question can provide information relevant to researchers that are looking for drug target candidates that are potentially effective against Alzheimer's Disease.
In order to incorporate data from several information sources,
it was necessary to convert several exported formats, each into its
own RDF bundle . The largest RDF bundle of 200M triples
resulted from MeSH associations with PubMed articles. In contrast,
there were a number of smaller bundles ranging from 10K to 10M
triples. This resulted in a total of approximately 350M triples
occupying approximately 20Gb
20GB when loaded into the RDF
repository. In several cases, we extracted only a subset, for
example, by selecting only human, rat, and mouse data. Click on
[Details] in the table below to view provenance information details such as the date of the last extraction, whether the extraction was a subset,
etc. extraction.
At the time of publication, the following information sources
have been (sometimes partially) incorporated into the knowledgebase. knowledge
base. This set will continue to be extended in depth (i.e.,
more complete inclusion of partially represented data sets) and in
breadth (i.e., novel data sets):
Allen Brain Atlas (ABA) | Allen Brain Atlas is an interactive, genome-wide image database of gene expression in the mouse brain. A combination of RNA in situ hybridization data, detailed Reference Atlases and informatics analysis tools are integrated to provide a searchable digital atlas of gene expression. Together, these resources present a comprehensive online platform for exploration of the brain at the cellular and molecular level. | [Details] |
Addgene | A catalog of plasmids from Addgene | [Details] |
BAMS | The Brain Architecture Management System (BAMS) is designed to
be a repository of information about brain structures from
different species, and has a set of inference engines for
processing the neurobiological data. BAMS contains |
[Details] |
GALEN | GALEN is an advanced terminology of medical concepts for
clinical information systems. |
[Details] |
NCBI gene_info | [Details] | |
Gene Ontology (GO) | The Gene Ontology project provides a controlled vocabulary to describe gene and gene product attributes in any organism. GO terms are often used to annotate gene and protein records. | [Details] |
GOA | GO annotations from
|
[Details] |
HomoloGene | Homologene is a system for automated detection of homologs among the annotated genes of several completely sequenced eukaryotic genomes. | [Details] |
MEDLINE/PubMed | PubMed is a service of the U.S. National Library of Medicine that includes over 17 million citations from MEDLINE and other life science journals for biomedical articles back to the 1950s. PubMed includes links to full text articles and other related resources. | [Details] |
MeSH | Medical Subject Headings. 2008 MeSH includes the subject
descriptors appearing in MEDLINE/PubMed
, the |
[Details] |
Neurocommons Text Mining Pilot | Protein/gene associations/interactions extracted from Temis software applied to 7% of Medline
|
[Details] |
All Open Biomedical Ontologies ( OBO ) available from |
[Details] | |
Science Commons Ontology | [Details] | |
SenseLab | [Details] | |
SWAN | Semantic Web Applications in Neuromedicine [ SWAN ] is a knowledge base of hypotheses, claims, and evidence in Alzheimer Disease (AD) research, created through a community process to capture the collective scientific insights of the AD field. | |
SKOS | Simple Knowledge Organization System |
A number of design decisions were made during the construction
of the HCLS Knowledgebase. prototype knowledge base. Many of the decisions
were pragmatic in nature, as a consequence of the need to implement
the solution on a commodity PC within a two-month period for a
demonstration at WWW2007.
A number of different approaches were used for the conversion of data into RDF/OWL. The most commonly used approach was the use of Lisp code to read text exports of the data and create OWL or RDF documents. We will focus on the example of importing data from Homologene.
The general steps required to import from an existing data source into RDF are:
In the case of Homologene, we start with a text file that
contains the exported information. The original tab delimited file
is ftp://ftp.ncbi.nih.gov/build54/homologene.data
. The Lisp code for the homologene conversion
is also available.
It looks like this: Here is a sample of the original file:
99949 9606 727759 LOC727759 113427825 XP_001125931.1 99949 10116 678753 LOC678753 109498373 XP_001053282.1 99949 5833 812783 GeneID:812783 16805082 NP_473111.1 99950 3702 820917 AT3G16650 18401203 NP_566557.1
We are interested in the first 3 fields. The first field identifies the homologous cluster. The second field is the species taxon. The third field is the EntrezGene id. We are only interested in human, mouse, rat, taxon ids: "9606" "10116" "10090".
We The
Lisp code for the homologene
conversion is also available. In the conversion process, we
first iterate over the lines in the file, creating a table mapping
cluster id to the pairs of taxon id, entrez id in the cluster. This
is the variable homologene , created by the function
read-homologene . For each of these clusters we will
create an individual to represent the cluster e.g for cluster
99949:
<sciencecommons:orthology_record rdf:about="http://purl.org/science/record/homologene/cluster_r54_99949"> <sciencecommons:has_homologous_gene_record rdf:resource="http://purl.org/commons/record/ncbi_gene/678753"/> <sciencecommons:has_homologous_gene_record rdf:resource="http://purl.org/commons/record/ncbi_gene/727759"/> <sciencecommons:has_supporting_evidence rdf:resource="http://purl.org/science/evidence/homologene/cluster_r54_99949"/> </sciencecommons:orthology_record>
There are two things to note about this
conversion. The first Above is
the RDF/XML (see [ RDF ]) expression of:
@PREFIX homologene: <http://purl.org/science/record/homologene/> homologene:cluster_r54_99949 sciencecommons:has_homologous_gene_record rdf:resource ncbi_gene:678753 . homologene:cluster_r54_99949 sciencecommons:has_homologous_gene_record rdf:resource ncbi_gene:727759 . homologene:cluster_r54_99949 sciencecommons:has_supporting_evidence homologene:cluster_r54_99949 .
Note that we
used HTTP URIs were adopted as the
mechanism URLs to identify
records. Importantly, these Homologene records by prefixing the EntrezGene
identifiers (e.g. 727759
) with a stem
URL, http://purl.org/commons/record/ncbi_gene/
.The resulting URL can also be usefully
resolved using standard with a web technology (web
browsers). PURLS browser. The domain
purl.org serves Persistent URLs (PURLs), which currently redirect
these requests for NCBI gene
identifiers to a specific format of
database records redirect script at
sw.neurocommons.org. If the community wishes to web pages that describe those records in
move the format.
It has been our experience that URIs service to, for such web
pages instance, an NCBI page about
these genes, they can simply notify the
custodians of purl.org. This extra level of indirection protects
these identifiers from becoming orphaned as organizations stop
existing or change over time.
PURLS their priorities. These
URLs were chosen because we can change
what web page a PURL redirects to. Thus PURLs provide a more stable
URI for a resource than the provider resources, and place control
enabling one also used to
make repairs in such situations into
identify gene information imported from other
data sources, automatically linking the hands Semantic Web
representations of these records. For
example, PubMesh statements about gene records use these same
identifiers for genes, as do the HCLS
community. The second is that we mapped equivalent terms
statements from different resources to a common base URI i.e. the
http://purl.org/commons/record/ncbi_gene/ prefix was used to
consistently identify Entrez Gene records. Ontology and
SenseLab. This allows for trivial data integration between
different resources and simplifies
queries involving Entrez Gene records.
Also, the individual
http://purl.org/science/evidence/homologene/cluster_r54_99949
serves as a link to the "evidence", which is not elaborated in this
translation, but would include the blast BLAST scores and
other evidence used to establish the orthology in future work. (see
http://www.ncbi.nlm.nih.gov/sites/entrez?cmd=Retrieve&db=homologene&dopt=AlignmentScores&list_uids=99949
)
Our scientific question can be summarized
as "What genes are involved in signal transduction that are related
to pyramidal neurons?". The scientific question can be
answered with the following query, which searches for gene names
and processes from four data sources within the knowledgebase. knowledge
base. The data sources include: MeSH (Pyramidal Neurons),
PubMed (Journal Articles), Entrez Gene (Genes), Gene Ontology
(Signal Transduction). The example
query example selects the gene name of the genes
involved in signal transduction that are related to
pyramidal neurons . Some of the complexity in this query
comes from the need to capture relevant anatomical and functional
detail at the subcellular and molecular level. The portion probing
the Gene Ontology queries a set of classes
describing processes at the molecular level. Our query employs the
SPARQL RDF query language to perform knowledge integration across
the sources of the knowledgebase.
knowledge base. Details on SPARQL can
be found in the References .
[Note: The query below will not work verbatim at SPARQL
endpoints. We have simplified the actual Banff demonstration query for explanatory
purposes in our example below. The Banff demonstration query is
discussed in more detail in Named Graphs
Section. You can try running the query at the
DERI installation .] HERE .].
Please note that the same color (and CSS class) is used to connect the descriptive text in the query with relevant portions of the following figures.
Source | (colored) CSS class |
---|---|
PubMesh | mesh |
Gene Ontology Annotation (GOA) | goa |
Entrez Gene | glbl |
Gene Ontology | plbl |
SELECT ?genename ?processname WHERE { # PubMeSH includes ?gene_records mentioned in ?articles which are identified by pmid in ?pubmed_records . ?pubmed_record sc:has-as-minor-mesh mesh:D017966 . ?article sc:identified_by_pmid ?pubmed_record . ?gene_record sc:describes_gene_or_gene_product_mentioned_by ?article . # The Gene Ontology has a set of ?proteins such that foreach ?protein, ?protein ro:has_function [ ro:realized_as ?process ]. ?protein rdfs:subClassOf ?restriction1 . ?restriction1 owl:onProperty ro:has_function . ?restriction1 owl:someValuesFrom ?restriction2 . ?restriction2 owl:onProperty ro:realized_as . ?restriction2 owl:someValuesFrom ?process . # Also, foreach ?protein, ?protein has a parent class which is linked by some predicate to ?gene_record. ?protein rdfs:subClassOf ?protein_superclass . ?protein_superclass owl:equivalentClass ?restriction3 . ?restriction3 owl:onProperty dnaGeneProduct:described_by . ?restriction3 owl:hasValue ?gene_record . # Each ?process (that we are interested in) is a subclass of the signal transduction process.?process obo:part_of go:GO_0007165 . ?gene_record rdfs:label ?genename . ?process rdfs:label ?processname . }
The following shows a few of the results from the query:
gene_record_name | processname | |
---|---|---|
Entrez Gene record for human DRD1, 1812 | adenylate cyclase activation | |
Entrez Gene record for human ADRB2, 154 | adenylate cyclase activation | |
... |
The following section describes the RDF data model and how we employed it to make our query possible.
The data in the knowledgebase
knowledge base is modeled in OWL-DL,
which has been expressed as RDF triples. Briefly, an RDF triple
consists of a subject , predicate , and
object . The predicate is also known as the
property of the triple. Subjects and objects in the data
unify to create an RDF Graph, with subjects and objects as nodes
and predicates as edges. For more information about RDF and OWL,
see the References section in the
Appendix.
Nodes labeled with a leading "_:", e.g. proteinsubclass:p1812_7190_1 _:activateAdenalCyclase , are called
RDF blank nodes [ CONCEPTS ]. These
frequently have machine-generated identifiers and are
therefore typically opaque to a human reader (e.g., the set of all
nodes that represent protein entities linked to the GO molecular
function XXX), but Adenal Cyclase Activation). Here, for the purposes
of explanation, here, they have been
named to convey meaning to the reader. Blank nodes ending in "_1"
in this document indicate this blank node is one of many in this
class. class,
e.g. _:signalingParticipants_1 .
Figure 1, Triples in Solution ,shows a graphical representation of the triples that compose one solution to the query posed in section 6 .Following is a discussion of the origins and intents of those triples:
The application of a
commercial text mining tool to neuroscience-related PubMed
abstracts results in a set of annotations that link MeSH terms to
genes (for more details on MeSH, see the table in Data Sources . ). An article with PubMed id 10698743 mentions
ncbi_gene:1812 and that the
corresponding PubMed record has a MeSH term
mesh:D017966 . The following three
triples express this:
subject | predicate | object | ||
---|---|---|---|---|
pubmedRec:10698743 | sc:has-as-minor-mesh | mesh:D017966 | . | |
article:10698743 | sc:identified_by_pmid | pubmedRec:10698743 | . | |
ncbi_gene:1812 | sc:describes_gene_or_gene_product_mentioned_by | article:10698743 | . |
A set of genes or gene products in human bodies are described by ncbi_gene:1812 . Here, we call this set _:equiv1812 .
_:equiv1812 | owl:onProperty | dnaGeneProduct:described_by | . |
_:equiv1812 | owl:hasValue | ncbi_gene:1812 | . |
protein:ncbi_gene.1812 has the same extension (members) as the OWL restriction _:equiv1812 .
protein:ncbi_gene.1812 | owl:equivalentClass | _:equiv1812 | . |
The expression
NamedClass equivalentClass R . R onProperty SomeProperty . R hasValue SomeClass
is the RDF representation of an
owl idiom to say OWL class axiom that says: for every
all X such that
X SomeProperty SomeClass .
X is a member of the class NamedClass
(and vice
versa). See
OWL Web Ontology Language Semantics and Abstract Syntax Section 4.
Mapping to RDF Graphs for a formal treatment of this..
Using our other supplied constant, we note that adenylate
cyclase activation,
go:GO_0007190 , is part
of signal transduction,
go:GO_0007166
go:GO_0007165 . Note: this simplified query matches
only processes that are a sub-process of go:GO_0007166; go:GO_0007165; the actual
query , described in §9 Named Graphs ,
looks also for subclasses. The part_of relationships were inferred
from the OWL class restrictions described in §7.1 Precomputing Inferences . The class of
functions that are realized_as adenylate cyclase
activation is here labeled _:activateAdenylCyclase .
go:GO_0007190 | obo:part_of |
|
. |
_:activateAdenylCyclase | owl:onProperty | ro:realized_as | . |
_:activateAdenylCyclase | owl:someValuesFrom | go:GO_0007190 | . |
There are many possible classes of substance participating in molecular signaling, one of which (called here _:molecularSignalers_1 ) is defined by the ability to activate adenyl cyclase.
_:signalingParticipants_1 | owl:onProperty | ro:has_function | . |
_:signalingParticipants_1 | owl:someValuesFrom | _:activateAdenylCyclase | . |
The class of proteins in the intersection of
_:signalingParticipants_1 and
protein:ncbi_gene.1812 is here abbreviated
proteinsubclass:p1812_7190_1 , though the actual
identifier is
proteinsubclass:product_of_ncbi_gene.1812_that_participates_in_GO_0007190_fbc49f20524727a24c7b7effa29bad4a
. Note: the Venn diagram
reveals that this set is potentially empty
(like the intersection of cars and ice cream stands),
empty, theoretically permitting the
query to range over pairs of gene/process that aren't related
through any known protein. However, OWL-DL reasoners will not infer
new classes, so the proteins in the intersection of ncbi_gene:1812
and the substances participating in molecular signaling is
restricted to the set which have already been entered into the
knowledgebase, knowledge base, e.g. like
proteinsubclass:p1812_7190_1
proteinsubclass:p1812_7190_1 | rdfs:subClassOf | _:signalingParticipants_1 | . |
proteinsubclass:p1812_7190_1 | rdfs:subClassOf | protein:ncbi_gene.1812 | . |
ncbi_gene:1812 and go:GO_0007190 have human-readable labels.
ncbi_gene:1812 | rdfs:label | "Entrez Gene record for human DRD1, 1812" | . |
go:GO_0007190 | rdfs:label | "adenylate cyclase activation" | . |
The addition (@@curation (from a text
media)?@@) of another MeSH record gives us another
solution:
pubmedRec:11441182 | sc:has-as-minor-mesh | mesh:D017966 | . |
article:11441182 | sc:identified_by_pmid | pubmedRec:11441182 | . |
ncbi_gene:1812 | sc:describes_gene_or_gene_product_mentioned_by | article:11441182 | . |
The demonstration query depends on the existence of an
obo:part_of (or rdfs:subClassOf ) relationship
between any part (i.e. any subclass of any step in the sequence) of
molecular signaling, and the general identifier for molecular
signaling, go:GO_0007166
go:GO_0007165 :
?process | obo:part_of | . |
This part_of relationship between
_:subPart and Triples of this form were
generated by a rule, graphically expressed in Figure 2,
_:parentClass is inferred from
obo:part_of Rule .The shaded area on the right of the figure shows
the following OWL restriction: restriction
which is the antecedent of the rule:
_:subPart | owl:onProperty | obo:part_of | . |
_:subPart | owl:allValuesFrom | _:subClass | . |
_:subClass | owl:onProperty | rdfs:subClassOf | . |
_:subClass | owl:hasValue | _:parentClass | . |
The symmetric property for rdfs:subClassOf need not be
explicitly modeled because the RDF Schema Specification
defines subClassOf, including its transitivity. Note that if
_:subClass is a subClassOf _:parentClass , , then all members of _:subClassOf
are of type _:parentClass (as well as _:subClass
):
_:subClass | owl:onProperty | rdf:type | . |
_:subClass | owl:hasValue | _:parentClass | . |
Because the knowledgebase
triple store used does not do perform
inferencing, these triples have been pre-computed (forward-chained)
and inserted into the knowledgebase.
triple store. This also simplifies the
query; were query. If these triples were not pre-computed, the obo:part-of
part of the query would be expressed:
?process | rdfs:subClassOf | ?what | . |
?what | owl:onProperty | obo:has_part | . |
?what | owl:someValuesFrom | . |
and would need to query over a
transitive closure over of the union of the obo:part-of and
rdfs:subClassOf rules.
The last data source listed above,
SenseLab ,
is a collection of relational (Oracle)
databases for neuroscientific research that was independently added to an
already used database. the knowledge
base after the other data sources. An accompanying document,
Experiences with the conversion of SenseLab
databases to RDF/OWL , describes the details of adding it to
the KB. this
knowledge base. With this new data incorporated, the example query could be extended to extract data
from the new data source , in this
case, discovering the names of receptor proteins associated with
the genes discovered in the previous query. In an integrative query
of this sort, we can use the results as a starting point for more
detailed queries of a particular repository, such as in this case
SenseLab.
SELECT ?genename ?processname ?receptor_protein_name WHERE { # PubMeSH includes ?gene_records mentioned in ?articles which are identified by pmid in ?pubmed_records . ?pubmed_record sc:has-as-minor-mesh mesh:D017966 . ?article sc:identified_by_pmid ?pubmed_record . ?gene_record sc:describes_gene_or_gene_product_mentioned_by ?article . # The Gene Ontology asserts that foreach ?protein, ?protein ro:has_function [ ro:realized_as ?process ]. ?protein rdfs:subClassOf ?restriction1 . ?restriction1 owl:onProperty ro:has_function . ?restriction1 owl:someValuesFrom ?restriction2 . ?restriction2 owl:onProperty ro:realized_as . ?restriction2 owl:someValuesFrom ?process . # Also, foreach ?protein, ?protein has a parent class which is linked by some predicate to ?gene_record. ?protein rdfs:subClassOf ?protein_superclass . ?protein_superclass owl:equivalentClass ?restriction3 . ?restriction3 owl:onProperty dnaGeneProduct:described_by . ?restriction3 owl:hasValue ?gene_record . # Each ?process (that we are interested in) is a subclass of the signal transduction process.?process obo:part_of go:GO_0007165 . ?gene_record rdfs:label ?genename . ?process rdfs:label ?processname . OPTIONAL { # Foreach ?gene, ?gene senselab:has_nucleotide_sequence_described_by ?gene_record . ?gene owl:equivalentClass ?restriction4 . ?restriction4 owl:onProperty senselab:has_nucleotide_sequence_described_by . ?restriction4 owl:hasValue ?gene_record . # Foreach ?receptor_protein, ?receptor_protein senselab:proteinGeneProductOf ?gene . ?receptor_protein rdfs:subClassOf ?restriction5 . ?restriction5 owl:onProperty senselab:proteinGeneProductOf . ?restriction5 owl:someValuesFrom ?gene . # Find the labels of all such ?receptor_proteins. ?receptor_protein rdfs:label ?receptor_protein_name } }
yielding another variable in our results:
gene_record_name | processname | receptor_protein_name |
---|---|---|
Entrez Gene record for human DRD1, 1812 | adenylate cyclase activation | D1 receptor |
Entrez Gene record for human ADRB2, 154 | adenylate cyclase activation | NULL |
... |
The additional triples this matched in the SenseLab
knowledgebase knowledge base connect to the existing data by
talking about the same genes, e.g.
ncbi_gene:1812 .
Figure 3, Additional Triples from SenseLab ,shows a subset of the triples provided by SenseLab. Following is a discussion of the origins and intents of those triples:
A nucleotide sequence is also described by ncbi_gene:1812 . Here, we call this _:nucleo1812 .
subject | predicate | object | ||
---|---|---|---|---|
_:nucleo1812 | owl:onProperty | nucleotideSequence:described_by | . | |
_:nucleo1812 | owl:hasValue | ncbi_gene:1812 | . |
The class senselab:DRD1_Gene has the same members as the OWL restriction _:nucleo1812 .
senselab:DRD1_Gene | owl:equivalentClass | _:nucleo1812 | . |
This _:protGeneProd_1 is defined by being a product of DRD1_Gene .
_:protGeneProd_1 | owl:onProperty | senselab:proteinGeneProductOf | . |
_:protGeneProd_1 | owl:someValuesFrom | senselab:DRD1_Gene | . |
Our solution is a subclass of _:protGeneProd_1 called
senselab:D1 . @@What
other subclasses of _:protGeneProd_1 are there motivating this
extra subclassof relationship?@@
senselab:D1 | rdfs:subClassOf | _:protGeneProd_1 | . |
senselab:D1 | rdfs:label | "D1" | . |
In the Banff Demo, the resulting knowledgebase knowledge
base partitioned the assertions into groups called Named
Graphs. This process basically consists of associating a distinct
URI with a connected graph of triples, and then referring to that
graph via the URI. At the time of publication, any query would be
expected to include SPARQL GRAPH constraints, e.g.:
prefix go: <http://purl.org/obo/owl/GO#> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix owl: <http://www.w3.org/2002/07/owl#> prefix mesh: <http://purl.org/commons/record/mesh/> prefix sc: <http://purl.org/science/owl/sciencecommons/> prefix ro: <http://www.obofoundry.org/ro/ro.owl#> prefix senselab: <http://purl.org/ycmi/senselab/neuron_ontology.owl#> prefix obo: <http://purl.org/obo/owl/obo#> SELECT ?genename ?processname ?receptor_protein_name WHERE { # PubMeSH includes ?gene_records mentioned in ?articles which are identified by pmid in ?pubmed_records . GRAPH <http://purl.org/commons/hcls/pubmesh> { ?pubmed_record sc:has-as-minor-mesh mesh:D017966 . ?article sc:identified_by_pmid ?pubmed_record . ?gene_record sc:describes_gene_or_gene_product_mentioned_by ?article } # The Gene Ontology asserts that foreach ?protein, ?protein ro:has_function [ ro:realized_as ?process ]. GRAPH <http://purl.org/commons/hcls/goa> { ?protein rdfs:subClassOf ?restriction1 . ?restriction1 owl:onProperty ro:has_function . ?restriction1 owl:someValuesFrom ?restriction2 . ?restriction2 owl:onProperty ro:realized_as . ?restriction2 owl:someValuesFrom ?process . # Also, foreach ?protein, ?protein has a parent class which is linked by some predicate to ?gene_record. ?protein rdfs:subClassOf ?protein_superclass . ?protein_superclass owl:equivalentClass ?restriction3 . ?restriction3 owl:onProperty sc:is_protein_gene_product_of_dna_described_by . ?restriction3 owl:hasValue ?gene_record .# Each ?process (that we are interested in) is a subclass or component of the signal transduction process. GRAPH <http://purl.org/commons/hcls/20070416/classrelations> {{ }{ ?process obo:part_of go:GO_0007165 } UNION{ }{ ?process rdfs:subClassOf go:GO_0007165 } } } GRAPH <http://purl.org/commons/hcls/gene> { ?gene_record rdfs:label ?genename } GRAPH <http://purl.org/commons/hcls/20070416> { ?process rdfs:label ?processname } GRAPH <http://purl.org/ycmi/senselab/neuron_ontology.owl> { # Foreach ?gene, ?gene senselab:has_nucleotide_sequence_described_by ?gene_record . ?gene owl:equivalentClass ?restriction4 . ?restriction4 owl:onProperty senselab:has_nucleotide_sequence_described_by . ?restriction4 owl:hasValue ?gene_record . # Foreach ?receptor_protein, ?receptor_protein senselab:proteinGeneProductOf ?gene . ?receptor_protein rdfs:subClassOf ?restriction5 . ?restriction5 owl:onProperty senselab:proteinGeneProductOf . ?restriction5 owl:someValuesFrom ?gene . # Find the labels of all such ?receptor_proteins. ?receptor_protein rdfs:label ?receptor_protein_name } }
The named graphs help with both provenance and scaling. In the
current approach, each RDF bundle is imported into its own named
graph. This is useful for a number of reasons. First, we know the
source of each named graph, so we can control and review which data
sources are being accessed by our queries. Additionally, the
association of a named graph with a data source serves as data
provenance and can also be employed by schemes that exploit
knowledge about the data source to assign confidence measures in a
model of trust. For example, one of the knowledgebase knowledge
base data sources resulted from text mining experiments to
find protein associations . Users of the knowledgebase knowledge
base can choose to view this evidence of association
differently than the associations provided from a
protein-protein interaction database. Also, named graphs support
scaling by making it possible to update selected parts of the
knowledgebase, knowledge base, for example when the data source
has new information or related ontologies are changed.
The knowledgebase knowledge base was initially designed for the
purposes of a live demo. The data warehousing
that was performed, several It also
provided a basis for early work on the Neurocommons ,where its
development continues. Some design choices in the data, and the resulting queries were
all aimed at made
to favor simplicity and maximal performance. performance,
including the use of a central triple store, and the design of the
data and queries. Many of the
choices were guided by the desire for transparency for a broader
audience of biomedical informaticists. Several areas of possible
improvement are noted here:
There are also a number of open issues that should be addressed
in future research: work:
A table of the RDF sources used to create the Knowledgebase: Knowledge
base:
RDF bundle name | Last modified | Size | Description | RDF conversion by | Terms |
---|---|---|---|---|---|
aba-2007-08-07.tgz | 22-Sep-2007 | 51M | SC's extract of Allen Brain
Atlas metadata from their |
SC | terms of use |
addgene.ttl | 16-May-2007 | 1.1M | Addgene catalog (tab-delimited file) | SC | provided to Science Commons by Addgene |
bams-from-swanson-98-4-23-07.owl | 23-Apr-2007 | 5.6M | BAMS | released without contract | |
galen.tgz | 22-Sep-2007 | 1.9M | Galen from co-ode.org | - | released without contract |
gene-owl.tgz | 08-May-2007 | 7.7M | NCBI Copyright and Disclaimers | ||
gene-pubmed.ttl.tgz | 08-May-2007 | 1.5M | NCBI Copyright and Disclaimers | ||
goa-in-owl.tgz | 16-May-2007 | 73M | GO annotations from |
NCBI Copyright and Disclaimers ; EBI terms of use | |
homologene.tgz | 16-May-2007 | 626K | Homologene | NCBI Copyright and Disclaimers | |
medline-mesh.tgz ( contact Medline for use terms) |
16-May-2007 | 758M | List of all associations of MeSH headings to papers indexed by Medline extracted from 2007 Medline baseline distribution | License Agreement to Lease NLM Databases in Machine-Readable Form - see below | |
medline-titles.tgz ( contact Medline for use terms) |
16-May-2007 | 670M | Extracted from 2007 Medline baseline distribution | see below | |
mesh-qualified-headings.ttl.gz | 30-Apr-2007 | 13M | NLM 2007 MeSH descriptor/qualifier pairs | MeSH MOU | |
mesh-skos.tgz | 16-May-2007 | 13M | NLM 2007 MeSH | van Assem et
|
MeSH MOU |
mesh07-eswc06.rdfs | 28-Jun-2007 | 2.2K | van Assem et al's ontology (used by output of MeSH to SKOS conversion) | released without contract | |
neurocommons-text-mining.tgz | 05-May-2007 | 24M | Neurocommons
text mining pilot - extracted from Temis software applied to 7%
of Medline records |
released without contract | |
obo-all.tgz | 22-Sep-2007 | 36M | All OBO
|
released without contract | |
obo-in-owl.tgz | 16-May-2007 | 2.6M | selected OBO ontologies, downloaded ~21 April 2007, augmented with inferred relations | released without contract | |
sciencecommons.owl | 28-Jun-2007 | 19K | released without contract | ||
senselab.tgz | 16-May-2007 | 216K | From Yale Senselab | released without contract |
This table describes the classes and
properties used in the knowledgebase: sc:Gene
sc:describes_gene_or_gene_product_mentioned_by sc:Article
rdfs:label Gene Label sc:Article .
sc:identified_by_pmid sc:Paper sc:Paper . sc:has-as-minor-mesh
sc:PubMedId sc:PubMedId . Attributions:
Science Commons (SC), Berkeley Bioinformatics
Open-source Projects(BBOP), Health Care and Life Sciences Interest
Group (HCLSIG), National Institute of Standards and Technology
(NIST), Hewlett Packard (HP)
The knowledgebase knowledge base has been installed at several
locations. Below are At the time of this writing, these locations
that also provide SPARQL query
access: access,
however, it is not guaranteed that the endpoints at these address
will persist, or continue to serve the knowledge base described in
this note:
Below are a few visual interfaces that make it possible to
browse the results of a search on the knowledgebase: knowledge
base:
Entrez Neuron was developed by the SenseLab team as a graphical user interface for querying the SenseLab ontologies.
We used the open source edition of the Openlink Virtuoso repository from http://sourceforge.net/projects/virtuoso/ .
The actions and scripts that were used to create the knowledgebase knowledge
base on a commodity PC have been documented by several
HCLS HCLSIG
members. The necessary instructions and scripts that were used will
be listed here as completely as possible:
The following resources may be of interest for future work:
In memory of our friend and colleague William Bug, Ontological Engineer.
Special thanks to to: Alan Ruttenberg (Science
Commons) who coordinated the assembly assembly,
conversion, and deployment of the data sets and ontologies and Susie Stephens (Eli Lilly) who
coordinated the BioRDF task force. Together they presented
the initial version of the knowledgebase knowledge
base at the a WWW2007 Banff demo.
workshop.
Contributors:
Many contributed to the knowledgebase development,
documentation and its
documentation, validation of the
knowledge base, as well as the thoughts thinking
behind it, including it.
Mikail Bota (USC) who kindly provided the
BAMS database for our use and John Barkley (NIST), Olivier Bodenreider (NLM, NIH), William Bug
(School of Medicine, UCSD), (NIST)
converted it to RDF. Huajun Chen (Zhejiang University),
Paolo Ciccarese (SWAN), Kei Matthias Samwald (Yale Center for Medical Informatics;
DERI Galway; Semantic Web Company), Alan Ruttenberg, and
Kei-Hoi Cheung (SenseLab, Yale),
(Yale Center for Medical Informatics)
participated in the the SenseLab RDF Conversion. Members of the
SWAN team: Tim Clark (SWAN),
Clark, Paolo Ciccarese, June Kinoshita, Gwen
Wong, and Elizabeth Wu contributed the SWAN data source. June
Kinoshita, Gwen Wong, Elizabeth Wu, Don Doherty (Brainstage
Research Inc.), Michel Dumontier (Carleton
University), Kerstin Forsberg (AstraZeneca), William Bug (School of Medicine, UCSD), and Alan
Ruttenberg worked on the neurogenerative disease use cases.
Ray Hookaway (HP), Vipul Kashyap (Partners
Healthcare), June Kinoshita (AlzForum), Joanne Luciano (Harvard
Medical School), M. Scott Marshall (University (HP) provided digests from Entrez Gene that were more
easily converted to RDF. Jonathan Rees (Science Commons) did the
RDF conversions of Amsterdam),
Addgene, Pubmed to Gene, Medline, and MeSH,
the Neurocommons text mining pilot, and compiled the data source
and licensing information for this document. Alan Ruttenberg did
the RDF conversion of Entrez Gene records, GO Annotations, Allen
Brain Atlas, Homologene, wrote the Science Commons ontology. Alan
Ruttenberg and Matthias Samwald wrote the SPARQL queries described
in this document. Chris Mungall (NCBO), (NCBO) wrote the
converter that produced the OWL versions of the OBO ontologies and
consulted on matters of ontology.
Eric Neumann (Clinical Semantics Group), Group) produced the
Exhibit visualization. Alan Ruttenberg developed the Google Mouse
prototype, with contributions from Mike Travers (CollabRX), Brian
Gilman (SciLink), and Tom Stambaugh (Zeetix). Don Doherty, Matthias
Samwald, Holger Stenzorn (DERI), M. Scott Marshall, and Eric
Prud’hommeaux (W3C), Prud'hommeaux have presented this work at
conferences.
Barry Smith (State University of New York at Buffalo, USA) provided advice on ontology work and led the development of the Basic Formal Ontology ,which inspired all ontology work related to the knowledge base.
William Bug (School of Medicine, UCSD), Michel Dumontier (Carleton University), and Holger Stenzorn (DERI) reviewed and gave detailed comments on an initial draft of this note. Alan Ruttenberg Jonathan Rees and Susie Stephens, reviewed and contributed to several versions of the document. Susie Stephens coordinated the BioRDF task force, worked on presentations of the work, and wrote the introduction to this document. M. Scott Marshall (University of Amsterdam) and Eric Prud’hommeaux (W3C) edited and coordinated the production of this note. Eric Prud’hommeaux created the figures.
We would like to offer special thanks for organizations which gave contributions of equipment and service. Through Ray Hookway and Jeannine Crockford, Hewlett Packard donated two machines for a period of six weeks during the demo. Science Commons hosted the the prototype during development and continues to host and develop a knowledge base derived from the prototype as part of the Neurocommons .MIT CSAIL hosts Science Commons and provided computer and networking infrastructure.
Kingsley Idehen, Orri Erling, Ivan Mikhailov, Mitko Iliev, Patrick van Kleef and Anton Avramov from Openlink Software provided rapid technical support including several custom builds of the Virtuoso triple store to address early performance issues, making it possible to develop the prototype on an aggressive schedule. Evren Sirin from Clark and Parsia provided support for the Pellet OWL reasoner .
In addition to data sources that were incorporated into the prototype, other data that did not make it in was provided by Judith Blake (MGD) an Simon Twigger (RGD), and Colin Knep (Alzforum)
Raw CVS log:
Log: Overview.html,v $ Revision 1.2 2008/06/05 16:20:34 eric + color legend Revision 1.124 2008/06/05 14:51:54 mmarshal few typo's fixed, color guide unfinished Revision 1.123 2008/06/05 12:03:36 eric ~ trying 5 more pixels for slTriples.svg Revision 1.122 2008/06/05 03:42:43 eric ~ resized slTriples.svg Revision 1.121 2008/06/04 21:06:05 eric - <font> tags Revision 1.120 2008/06/04 21:03:47 eric - some encoding bugs Revision 1.119 2008/06/04 18:59:04 eric - removed width on homologene data pre border Revision 1.118 2008/06/04 17:49:38 mmarshal last fixes Revision 1.117 2008/06/03 18:33:59 mmarshal misc edits Revision 1.116 2008/06/03 16:11:45 mmarshal misc edits Revision 1.115 2008/06/03 06:01:31 eric ~ reworked §7 ¶2 sentence 2 ~ reworked SenseLab introduction ~ changed "simplicity and performance" sentence in §10 Revision 1.114 2008/06/02 22:56:14 eric + text references to the figures per mid:fcc499200805270711h275ffbbahd41edf1ccb063705@mail.gmail.com Revision 1.113 2008/06/02 21:20:13 eric ... Revision 1.112 2008/06/02 21:03:56 eric + turtle representation of RDF/XML Revision 1.111 2008/06/02 20:56:04 eric ~ picked a pick-one alternative + PubMesh and SenseLab using same identifiers Revision 1.110 2008/06/02 18:39:34 eric ~ tweaked refs, removed some Revision 1.109 2008/06/02 17:25:56 eric ~ picked a data set to match the generated RDF - some spuriousmarkup
Revision 1.108 2008/05/30 17:23:18 eric ~ s/a reasoner, another Semantic Web technology, /a Semantic Web reasoner / ~ s/standard tools and languages such as the SPARQL Query Language for RDF, and OWL reasoners/standard tools, such as OWL reasoners, and languages, such as the SPARQL Query Language for RDF/ ~ removed odd whitespaces + proposed alternate wording for PURLs ~ looking for processes that are a subclass *or component* of signal transduction + ericP has presented this work Revision 1.107 2008/05/30 06:29:09 mmarshal Added Bill Bug Memorial Revision 1.106 2008/05/29 23:23:49 mmarshal misc edits Revision 1.105 2008/05/27 16:57:28 mmarshal corrected encoding to unix (!) Revision 1.104 2008/05/25 22:16:11 mmarshal misc edits, corrections Revision 1.103 2008/05/24 13:20:34 mmarshal removed @@'s and fixed assorted errors Revision 1.102 2008/05/23 22:54:04 mmarshal Susie's edits Revision 1.101 2008/05/18 21:56:21 mmarshal Alan's edits Revision 1.100 2008/05/13 17:50:34 mmarshal a few fixes Revision 1.99 2008/05/13 15:01:52 mmarshal Mostly edits due to Alan's comments Revision 1.98 2008/05/12 18:09:09 mmarshal misc edits from comments and contributors - in progress.. Revision 1.97 2008/04/07 07:17:12 eric ~ prep for publication Revision 1.96 2008/04/03 21:21:26 mmarshal edits from Jonathan Rees Revision 1.95 2008/04/03 20:54:40 eric ~ clarify restrictions Revision 1.94 2008/04/03 17:42:11 mmarshal clarified initial pubmesh reference Revision 1.93 2008/04/03 15:45:33 mmarshal misc edits Revision 1.92 2008/04/01 16:13:45 eric ~ fix forward ref forpart_of
inference Revision 1.91 2008/03/28 22:07:46 eric ~ avoid mod-dir redirect to senselab/ Revision 1.90 2008/03/28 02:26:40 eric ~ add file extension to svg pics Revision 1.89 2008/03/27 15:43:01 mmarshal validate Revision 1.88 2008/03/24 17:34:24 eric ~ sized and fragmented tables Revision 1.87 2008/03/24 09:44:42 eric ~ validate Revision 1.86 2008/03/24 09:38:59 eric ~ validate Revision 1.85 2008/03/24 09:30:40 eric ~ re-embedded SVG images ~ sized and fragmented tables Revision 1.84 2008/03/21 19:12:08 eric ~ clarify DRD1 class membership Revision 1.83 2008/03/20 22:35:02 mmarshal more edits from Alan, Scott Revision 1.82 2008/03/20 18:34:22 mmarshal changed RDF bundle locations into PURL's Revision 1.81 2008/03/18 18:15:24 mmarshal changed section order, some formatting Revision 1.80 2008/03/18 17:38:22 mmarshal more corrections and comments from Alan Revision 1.79 2008/03/17 20:23:08 eric ~ integrate alanR's owl tutorial Revision 1.78 2008/03/17 20:13:33 eric ~ integrate alanR's owl tutorial Revision 1.77 2008/03/17 16:25:51 eric ~ fixed encoding prob Revision 1.76 2008/03/17 16:20:41 eric ~ fixed some broken refs Revision 1.75 2008/03/17 15:28:57 eric ~ moved susie to contributors Revision 1.74 2008/03/17 14:59:12 eric ~ fix b0rken links Revision 1.73 2008/03/17 14:41:30 eric ~ use neurocommons to back up KB sources (apart from medline) Revision 1.72 2008/03/10 15:06:34 eric ~ pubrules-ready Revision 1.71 2008/03/08 16:48:51 eric ~ validation Revision 1.70 2008/03/07 18:42:28 eric ~ a bit more in 6.1 Revision 1.69 2008/03/04 00:20:44 mmarshal Some of Alan Ruttenberg's comments Revision 1.68 2008/03/02 21:38:45 eric + namespace descriptors per AlanR's suggestion Revision 1.67 2008/02/25 16:06:07 mmarshal Misc. edits from William Bug, Michel Dumontier. Revision 1.66 2008/02/11 14:30:36 eric + Rule Modeling Revision 1.65 2008/02/07 23:22:49 eric ~ change molecularSignalers name to signalingParticipants Revision 1.64 2008/02/07 23:22:04 eric ~ change molecularSignalers name to signalingParticipants Revision 1.63 2008/02/04 16:02:47 mmarshal more edits from Michel + misc Revision 1.62 2008/02/03 22:48:09 mmarshal worked some of Michel Dumontier's suggestions in Revision 1.61 2008/01/28 16:03:40 mmarshal expanded on RDF Import section, added details link from incorporated database table Revision 1.60 2008/01/27 19:02:01 mmarshal Some restructuring, new section for Query, Scope section Revision 1.59 2008/01/25 18:01:41 mmarshal small edits to sync to googledocs Revision 1.58 2008/01/19 00:14:46 mmarshal More edits from Susie's comments Revision 1.57 2008/01/14 16:10:23 mmarshal added some of Susie's edits Revision 1.56 2008/01/10 23:04:02 eric ~ s/_:protein_1/protein:p1812_7190_1/g ~ tweak wording of subclass-part-of issue Revision 1.55 2008/01/08 18:34:13 mmarshal added Acknowledgements Revision 1.54 2008/01/08 18:07:07 mmarshal added RDF sources table Revision 1.53 2008/01/07 15:14:14 eric ~ HTML-validate Revision 1.52 2008/01/07 15:08:50 eric ~ feedback from phone call with AlanR Revision 1.51 2008/01/06 17:30:40 eric ~ stack figures instead of floating them to the right Revision 1.50 2008/01/06 15:15:00 eric ~ pretty-up subclass-part-of issue Revision 1.49 2008/01/06 15:12:47 eric + subclass-part-of issue Revision 1.48 2008/01/06 14:59:47 eric ~ update empty-protein-set issue Revision 1.47 2008/01/05 21:05:57 eric + senselab triples Revision 1.46 2007/12/26 14:55:41 eric ~ fix stupid case-folding attribute assignments (I blame HTML-mode) Revision 1.45 2007/12/26 14:47:57 eric ~ fix some broken linkes, look askance at rest Revision 1.44 2007/12/26 14:17:23 eric ~ rearranged triples in the triples table and added prose Revision 1.43 2007/12/26 13:20:05 eric ~ moved style shared with svg to local.css ~ commented out pre-formatted triples alternative Revision 1.42 2007/12/25 23:07:54 eric + triples picture in 3 Triple Model Revision 1.41 2007/12/25 14:41:57 eric ~ switched from the graph around ncbi_gene:2903, GO:0007215, record:7816096 to ncbi_gene:1812, GO:0007190, record:10698743 to make sure it worked with DERI's SenseLab KB ~ switched from subClassOf 7166 to part_of 7166 ~ made SenseLab join OPTIONAL Revision 1.40 2007/12/25 11:03:37 eric ~ s/\?parent/?protein_superclass/g ~ fixed named graph query Revision 1.39 2007/12/25 04:40:32 eric + override link underlining in pre-formatted text (presumably queries and triples) + triplesTable style ~ improve comments in the example query in 2 Use Case + broach GO modelling ~ improve RDF exposition in 3 Triple Model + added { ?restriction3 owl:onProperty dnaGeneProduct:described_by } triples to Matthias's SenseLab query and Named Graph example Revision 1.38 2007/12/23 13:51:43 eric ~ fixed errors in putative extracted triples Revision 1.37 2007/12/23 07:07:51 eric + Contributors Revision 1.36 2007/12/21 17:45:46 eric ... Revision 1.35 2007/12/21 17:15:16 eric ... Revision 1.34 2007/12/21 17:12:14 eric ... Revision 1.33 2007/12/21 17:09:33 eric ~ working on triples Revision 1.32 2007/12/21 15:56:37 mmarshal more edits Revision 1.31 2007/12/21 14:19:19 mmarshal more edits, Design section Revision 1.30 2007/12/21 13:05:29 eric ~ trying to make the Triple Model graph more human-readable + prefixes in named graph query Revision 1.29 2007/12/21 12:06:13 mmarshal more text to named graphs Revision 1.28 2007/12/21 11:05:13 mmarshal assorted changes to intro, design principles, next steps Revision 1.27 2007/12/20 23:50:30 mmarshal Corrections from Susie, Jonathan, others. Created Design Principles section with Susie's text (merged Unifying Terms with new section) Revision 1.26 2007/12/20 16:51:05 eric + notes from publication powow Revision 1.25 2007/12/20 16:11:04 eric moved 8 Named Graphs section Revision 1.24 2007/12/20 16:08:28 eric ... Revision 1.23 2007/12/20 16:07:01 eric ... Revision 1.22 2007/12/20 16:04:20 eric + unify on ncbi:1812 Revision 1.21 2007/12/20 15:08:24 eric ~ s/?paper/?pubmed_record/ per alanR's suggestion mid:04189805-5B6D-4BA1-9C99-DC38D1D50472@gmail.com Revision 1.20 2007/12/20 15:05:45 eric + current demo query Revision 1.19 2007/12/20 14:50:30 mmarshal Some rewording.. Revision 1.18 2007/12/19 23:54:07 mmarshal added triples mini-summary, refined use case text, changed database table (_sources_ only - RDF bundle details to appendix) Revision 1.17 2007/12/19 17:45:17 mmarshal added explanation to Use Case, updated TOC and Outline to reflect new section "Adding a New Database" Revision 1.16 2007/12/19 16:14:45 eric + incorporated