HCLS/ScientificPublishingTaskForce

Scientifc Publishing Task Force

Task1: Ontology for Experiment Self-Publishing

Task Objectives

To develop a general purpose ontology for self-publishing single experiment in RDF format that will facilitate data sharing, discovery and integration. Applications such as web-publishing tool and semantic search engine that are built on top of this ontology will demonstrate the emerging semantic standards and technologies can help developing more interactive scientific communities centered around user-generated scientific contents on the web .

Task Status

8/28/2007, In order to make more people aware of this new approach of publishing experiment data in semantic format, I implemented a feature for publishing "Study" object using a similar ontology in the newly-released free online tool for data feed publishing. Researchers may publish experiment information (as Study object) using the ufeed online tool without a need to install any software. The URIs created for study feed and study data can then be used on researchers' own web pages. -AJ
11/9/2006, New revision of SPE ontology specs is now available, Media:HCLS$$ScientificPublishingTaskForce$SPE_specs_v0_3.html. Major changes: It is limited to only four classes directly related to experiment data publishing. It re-uses classes Person, Group, Organization and Product from another new ontology modeling web site for semantic publishing (business object ontology, BOON). Web2x semantic publishing plugin for Wordpress is also updated to implement SPE and BOON. Now, one can use Wordpress with Web2x plugin to semantically publish experiment data as well as other key information usually shared on web site, including products, services, jobs, technology, licensing, news, and events.
9/25/06, Search Engine Demo becomes available from web2express.org. It crawls and searches semantic data published in SPE format.
8/27/06, Mock examples of semantic data using SPE ontology (v0.2.1): Media:HCLS$$ScientificPublishingTaskForce$project.rdf, Media:HCLS$$ScientificPublishingTaskForce$experiment.rdf, Media:HCLS$$ScientificPublishingTaskForce$protocol.rdf, Media:HCLS$$ScientificPublishingTaskForce$product.rdf, Media:HCLS$$ScientificPublishingTaskForce$member.rdf, Media:HCLS$$ScientificPublishingTaskForce$group.rdf, Media:HCLS$$ScientificPublishingTaskForce$organization.rdf
8/25/06, Demo for self-publishing experiment data and related information, first release. Real-world example.
8/6/06, Media:HCLS$$ScientificPublishingTaskForce$SPE_specs_v0_2.html. Specs for Self-Publishing Experiment ontology, version 0.2. Seeking comments and contributions.
6/25/06, Media:HCLS$$ScientificPublishingTaskForce$SPE_Specs.html. Specs for Self-Publishing Experiment ontology, first draft proposed by AJ Chen. Seeking comments and contributions.
6/18/06, Requirements for Self-Publishing of Experiment, Proposed by AJ Chen. Please add your requirements. See discussion threads: 1, 2, 3.
6/8/2006, Media:HCLS$$ScientificPublishingTaskForce$SPE.html: Proposed Terms for Self-Publishing of Experiment, by AJ Chen. Please review the file and provide input or comment. See discussion thread.
5/8/2006, Task proposal: Distributed self-publishing of experiments by AJ Chen.

Rationale

Today, scientific research data and results are shared primarily through publishing paper. A paper is usually composed of many experiments. While it's an important form of knowledge representation and sharing, scientific paper is far from an ideal form for finding and sharing experiment data. One can easily imagine how many new possibilities will open up if most of research data are available on the web as units of single experiment, which can be searched by everyone and consumed by computers. Of course, the challenge is how to get the scientific community to publish experiment data on the web in an appropriate format. As one key step toward meeting this challenge, the task describe here is intended to provide a set of vocabularies for publishing experiment data as single unit.

It's worth to mention a few benefits to researchers who self-publish experiment data:

Make your experiment data widely accessible, unlocking the values of your most important assets.
Increase the visibility of your research as a result of this new and extensive web of experiment data.
Promote research collaborations and new knowledge discovery.

Scope

Defining the right scope is as important as the work product of this task. We should realize that scale is critical in establishing a new paradigm. In order to encourage large scale participation from everyone in the scientific community, two requirements are placed on the ontology to be developed here: (1) the terms should be as general as possible so that researchers across all disciplines can use the ontology; and (2) the ontology should facilitate development of easy-to-use self-publishing tools that everyone will like to use.

Note that there are some domain-specific ontologies, such as microarray experiment ontology, already existed today. However, the scope of the current task is broader than any of these. In fact, the ontology to be developed here will not be specific to any research domain.

Participants

AJ Chen (add your name here if you are interested in joining the task)

Paola Di Maio (keen but rather dumb would be system user)

Harry Snow (wants to help develop authoring tools so we can all stop wasting time trying to mine RDF out recently published papers!)

Use Cases

Any researcher can publish his or her experiment data as single unit of experiment in RDF format using this ontology. Ideally, a new easy-to-use self-publishing tool specifically designed for such purpose will encourage the use of the ontology. However, any ontology editor can be used to create RDF content. Because data in such format can be searched by semantic web search engine, it offers a new channel for data sharing and retrieval that will accelerate scientific discovery as well as increase the researcher's visibility.
Anyone can search for all experiments available on the web at various property levels. This use case requires a search engine that aggregates published experiment data in RDF format. Compared to traditional literature search, this semantic web search engine will provide much more relevant information.

Deliverables

1st Deliverable: A set of defined objects and properties for describing/publishing single unit of experiment.
2nd Deliverable: An ontology for self-publishing experiment data at high level.

Related resources

Knowledge Ecosystem Task Force Proposal by Tim Clark 2/13/06
Wikipedia is adding semantic publishing features, see Semantic MediaWiki Project. Also see a recent paper on the design.
Existing knowledge representation projects related to the encapsulation of biomedical investigation-related information - in no particular order:
- Semantic Web Applications in Neuromedicine
- EXPO ontology of scientific experiments - see also this article from Bioinformatics
- ExperiBase
- Ontology of Biomedical Investigation (OBI) is a community project to develop an ontology formally specifying entities (continuants and occurrents) and their relations pertaining to biomedical investigation in general. It is an outgrowth of the original effort to create such an ontological framework for functional genomics (FuGO). As of this writing (2006-08-28), the new OBI moniker is still just a proposal yet to be ratified by the FuGO Coordinating Committee.
- The Semantic Synapse
- proteomics-related ontologies in the Charleston Core project
- an umbrella consortium representing the following several efforts from the bioimaging community are also are under consideration for inclusion in OBI:
  - DICOM
  - The Neuroimaging Informatics Technology Initiative (NIfTI)
  - related work from The fMRI Data Center
  - The Biomedical Information Research Network (BIRN) - XCEDE schema
  - The Radiological Society of North America RSNA-sponsored RadLex project
  - see also The NCBO-sponsored Workshop on an Ontology of Bioimages/imaging
NCBC Software Classification project
- BrainML

These employ a range of formal syntactical implementations from XML Schemas on through OWL-based ontologies.

EDITORIAL NOTE: On hopes some effort will be invested in ensuring these efforts have commensurate, formal semantic implementations, as this will be an absolute pre-requisite for these individual efforts to realize their intended goals. This does not necessarily imply they all must use the same normative syntax or link to a single, foundational ontology but provisions must be made to ensure they are algorithmically commensurate, if they are to contribute to the overall semantically-specified, biomedical investigation-related knowledge ecosystem (BillBug - 2006-08-28).

Timeline for Task Completion

Stage 1 (3 month goals)
Stage 2 (6 months goals)