HCLSIG/SWANSIOC/Nanopublications-Subtask

From W3C Wiki

Nanopublications

Core Definitions

What is a nanopublication?

Following [1], a nanopublication is constructed using the following elements.

  • Concepts (unitary elements of knowledge)
  • Triples (tuples of three concepts)
  • Named Graph (a set of interconnected Triples)
  • Statements (originally defined as a triple that is uniquely identifiable, but now extended to be a named graph that is uniquely identifiable)
  • Annotation (a triple such that the subject of the triple is a statement)
  • Nanopublication (a set of annotations that refer to the same statement and contains a minimum set of annotations)

All concepts, statements, and nanopublications must be uniquely identifiable.

Immediate open questions

  • What annotations are needed from the definition shown above to support provenance?
    • where did the statement originate from?, who said it?, who curated it?, under what reasoning paradigm was the nanopublication derived?
    • These elements may be supported by elements from [2]
    • For a worked example supplied by Paolo Ciccarese, see [Nanopublication AlzSWAN Claim #1‎]
    • This provides a nice framework for development going forward.
  • What elements may act as the originating source of nanopublications?
    • excerpts from published peer-reviewed publications (text and figures)
    • excerpts from online resources, wikis, newspapers, etc.
    • data elements from a file
    • an automated computation
    • a tuple from a database
    • any element that can be incorporated into a research object
  • How will the persistence and lifecycle of a nanopublication be defined?
    • it may not be necessary for nanopublications to live forever, but we do need to know how long they will live for
  • How should we assign credit for specific individual's contribution to the creation of a nanopublication?
    • This is absolutely crucial to support next-generation knowledge architectures for science since this sort of credit is the only currency that really matters to scientists.
  • What different types of nanopublication are there?
  • How should nanopublications be implemented?
    • It would be more powerful to have the definitions of what a nanopublication is be based on a conceptual design / specification (with examples that are actually implemented, naturally) so that we are not restricted to a single platform or programming language.

The Nanopublication Bestiary

Note that this specification is originally inspired by Bruno Latour's view in his book 'Laboratory Life' [3]

  • 'textbook'
    • universals, broad-based statements that are widely accepted as 'true'
    • cardinal assertion nanopublications (e.g., malaria is caused by mosquitoes).
  • 'review paper'
    • widely accepted statements that are qualified for specific contexts / models
    • based on lots of supporting evidence
    • aggregated interpretative assertion nanopublications (e.g. 'CA1 projects massively to the entorhinal cortex in rats')
  • 'abstract'
    • describe the high level findings of a single study, the main punchlines of a paper.
    • based on supporting evidence from a single study
    • interpretive assertion nanopublications (e.g. 'a DNAse hypersensive site was identified in the vicinity of exon 1')
  • 'citences' (i.e. citation sentences, after Marti Hearst)
    • Sentences in a document that cite a finding from another document (usually Introductions, Related Works or Discussion sections).
    • citation nanopublications (e.g. 'Recently, CCr3 has been shown to be upregulated on neutrophils and monocytoid U937 cells by interferons in vitro and to be expressed by endothelial cells, epithelial cells and mast cells [11-16]').
  • 'theory'
    • what is the underlying reasoning model of the work?
    • not really sure about this, since theory rarely plays a role in biology
    • theory nanopublication
  • 'methods'
    • the scientific protocol used. What was done?
    • simple statement of narrative based on material entities, material processing, assays, information entities, and data transformation. Could include experimental design elements.
    • experimental design nanopublication (also equivalent to a KEfED model).
  • 'results'
    • The main data findings of a study
    • must be linked to experimental design
    • experimental observations nanopublication
  • 'data paper'
    • all the data from a given experiment
    • could be equivalent to tuples from an experimental database or a LIMS system
    • data-set nanopublication
  • 'proposal / hypothesis / question'
    • hypotheses and planned knowledge that has not yet been found to be true.
    • hypothesis nanopublication

Note that the following statements may have a similar basic role within scientific reasoning, but would be assigned lower 'reliability' scores.

  • 'conference article / poster'
    • a lightweight publication of preliminary findings
    • use the same nanopublications as above but less concrete, lower values for reliability
  • 'blog'
    • informal ongoing work within a given project
    • use the same nanopublications as above but much less concrete and probably not published
  • 'tweet'
    • throwaway assertions, unqualified ideas,
    • nanopublications you might keep as part of a hypothetical set for brainstorming.

Research Objects

Following [5], a research object (looked at from the MyExperiment point of view) is a collection of specific data objects linked together via annotations. One possible way of looking at the relationship between research objects and nanopublications is that nanopublications provide the semantics of individual research statements (e.g. 'mosquitos cause malaria') and research objects provide the packing mechanism to group together all the other data elements that may be required to substantiate that claim (such as PDF files, Powerpoint elements, movies, spreadsheets, database records, etc).

Specific Research Domains to act as examples

Spinal Muscular Atrophy

Maryann Martone is a driving force within biomedical knowledge engineering and is currently on sabbatical working on finding a cure for this debilitating disease within the SMA foundation [4]. She is fully committed to approaching this challenge using annotation and reasoning tools that we develop and is an ideal domain expert to help frame the construction of nanopublications.

In particular, the review paper by Burghes and Beattie (2009) is a very well-written overview of the field and should serve as our starting point for developing nanopublications [6]. It cites 171 studies and provides an excellent overview of the most relevant research questions pertaining to the disease. This provides a number of 'citation' nanopublications that we could trace back to the original publication and flesh out a more complete model.

HyQue and SWAN

HyQue is a well-developed framework constructed by Nigam Shah for hypothesis-based querying of pathway models. SWAN [7,8] is a framework for representation of scientific discourse and its provenance. The combination of HyQue and SWAN forms the basis of our first example: [Nanopublication AlzSWAN Claim #1‎]

Example of some sparql queries over nanopubs in this data:

Give me all the triples contributed by person 0
- select ?s ?p ?o where {?g <http://purl.org/pav/curatedBy> <http://example.info/person/1> . GRAPH ?g {?s ?p ?o.}} 
- Results:
<http://www.example.org/G5> <http://purl.org/pav/contributedBy> <http://example.info/person/0>
<http://tinyurl.com/4h2am3a> <http://www.w3.org/1999/02/22-rdf-syntax-ns#seeAlso>  <http://bio2rdf.org/alzswan:statement_f3556dcfc331d9b9af9d5c0cfc570ba6_hypothesis>
<http://tinyurl.com/4h2am3a> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/swan/2.0/discourse-elements/Claim>
<http://tinyurl.com/4h2am3a> <http://purl.org/dc/terms/title> "Intramembranous behaves as chaperones of other membrane proteins"
<http://www.example.org/G1> <http://purl.org/pav/authoredBy> <http://example.info/person/0>

Get me all the information about the statement curatedBy person 1
- select ?a ?b where {?g <http://purl.org/pav/curatedBy> <http://example.info/person/1> . GRAPH ?g {?s ?p ?o. GRAPH ?h {?o ?a ?b.}}}
-Results:
<http://semanticscience.org/ontology/hyque.owl#HYPOTHESIS_0000023>   
<http://bio2rdf.org/alzswan:statement_f3556dcfc331d9b9af9d5c0cfc570ba6_hypothesis_part_1>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://semanticscience.org/hyque#HYPOTHESIS_0000009>
<http://purl.org/dc/elements/1.1/description> "This AlzSwan hypothesis has one exclusive disjunctive part consisting of one event. The event is not negated, is of type go:chaperone binding, and has an actor of type chebi:beta amyloid. The event has a physical context of type go:plasma membrane. The event has a target of type mesh:membrane protein."
<http://www.w3.org/2000/01/rdf-schema#label> "Intramembranous Aβ behaves as chaperones of other membrane proteins"

Tract-tracing experiments and Neural Connectivity using KEfED

This example has served to provide the core, underlying representation for the KEfED modeling approach [9,10] and has a preliminary representation that serves as a structure for linking interpretive assertion nanopublications and observational assertion nanopublications as a reasoning chain (this is being submitted to BMC Bioinformatics as a paper independent of this effort).

The next goal for this effort would be to represent these elements in the nanopublication formalism.

References

[1] Groth et al (2010) 'The anatomy of a nanopublication' Information Services & Use 30:51-6

[2] Semantic Web Applications in Neuromedicine (SWAN) Ontology, W3C Interest Group Note 20 October 2009

[3] B. Latour and S. Woolgar, Laboratory Life, Princeton, New Jersey: Princeton University Press, 1979.

[4] Research Objects WIki Page @ MyExperiment

[5] Spinal Muscular Atrophy Foundation Website

[6] Burghes AH, Beattie CE. Spinal muscular atrophy: why do low levels of survival motor neuron protein make motor neurons sick? Nat Rev Neurosci. 2009 Aug;10(8):597-609. Epub 2009 Jul 8. Review. PubMed PMID: 19584893; PubMed Central PMCID: PMC2853768.

[7] SWAN Ontology v. 1.2

[8] Ciccarese P, Wu E, Wong G, Ocana M, Kinoshita J, Ruttenberg A, Clark T. The SWAN biomedical discourse ontology. J Biomed Inform. 2008 Oct;41(5):739-51. Epub 2008 May 4. PubMed PMID: 18583197;

[9] Knowledge Engineering from Experimental Design ('KEfED') BIRN Wiki Page

[10] KEfED Neural Connectivity and Tract-Tracing Experiments BIRN Wiki Page

Meetings Notes

Participants