WebSchemas/BioDatabases

From W3C Wiki


This is an archived WebSchemas proposal Biological Databases for schema.org. See Proposals listing for more. Note: active schema.org development is now based at github



Overview

This page discusses a schema extension for describing biological databases, proposed by MORITA Mizuki (NIBIO) on behalf of Sagace (a biological database search engine) and NBDC (National Bioscience Database Center, Japan).

Vocabulary

  • Adds a class 'BiologicalDatabaseEntry' as a kind of CreativeWork, introducing 'entryID', 'isEntryOf', 'taxon', and 'seeAlso'. Adds 'BiologicalDatabase' also subclass of CreativeWork, with no special properties. Both also use 'breadcrumb' from WebPage.

BiologicalDatabaseEntry

Properties for a biological database:

Property Expected Type Description
Properties from Thing
additionalType URL An additional type for the item, typically used for adding more specific types from external vocabularies in microdata syntax.
description Text A short description of the entry.
image URL URL of an image of the entry.
name Text The name of the entry.
url URL URL of the entry.
Properties from CreativeWork
alternativeHeadline Text A secondary title of the entry.
inLanguage Language The language of the content. Please use one of the language codes from the IETF BCP 47 standard.
dateCreated Date The date on which the content was created (in ISO 8601 date format).
dateModified Date The date on which the content was most recently modified (in ISO 8601 date format).
keywords Text The keywords/tags used to describe this content.
provider Person or Organization Specifies the person or organization that distributed the content.
Properties from WebPage
breadcrumb Text A set of links that can help a user understand and navigate a website hierarchy.
Original properties in BiologicalDatabaseEntry
entryID Text The identifier of the entry.
isEntryOf BiologicalDatabase Indicates the database to which the entry belongs.
taxon BiologicalDatabaseEntry or Text The taxonomy information of the entry.
seeAlso BiologicalDatabaseEntry or URL Reference to another resource.
reference Text or URL The identifier of the reference, such as PMID, DOI, and PMCID. For example: . If the reference doesn't have ID, use URL. For example: .


BiologicalDatabase

Properties for a biological database entry:

Property Expected Type Description
Properties from Thing
additionalType URL An additional type for the item, typically used for adding more specific types from external vocabularies in microdata syntax.
description Text A short description of the entry.
image URL URL of an image of the entry.
name Text The name of the entry.
url URL URL of the entry.
Properties from CreativeWork
alternativeHeadline Text A secondary title of the entry.
inLanguage Language The language of the content. Please use one of the language codes from the IETF BCP 47 standard.
dateCreated Date The date on which the content was created (in ISO 8601 date format).
dateModified Date The date on which the content was most recently modified (in ISO 8601 date format).
keywords Text The keywords/tags used to describe this content.
provider Person or Organization Specifies the person or organization that distributed the content.
Properties from WebPage
breadcrumb Text A set of links that can help a user understand and navigate a website hierarchy.
Original properties in BiologicalDatabase
reference Text or URL The identifier of the reference, such as PMID, DOI, and PMCID. For example: . If the reference doesn't have ID, use URL. For example: .

Example Markup

BiologicalDatabaseEntry

JCRB0225 [COLO320 DM]

Profile: Human colon carcinoma cell line with double minute chromosomes.
Tags: tumor, colon, adenocarcinoma
Date created: 08/27/2007
Animal: human
Organism: Homo sapiens (human)
Taxonomy ID: 9606 [UniProt Taxonomy]

JCRB Cell Bank
<div itemscope itemtype="http://schema.org/BiologicalDatabaseEntry">
 <h1><a itemprop="url" href="http://cellbank.nibio.go.jp/legacy/celldata/jcrb0225.htm">
  <span itemprop="entryID">JCRB0225</span> [<span itemprop="name">COLO320 DM</span>]
 </a></h1>
 Profile: <span itemprop="description">Human colon carcinoma cell line with double minute chromosomes.</span>
 Tags: <span itemprop="keywords">tumor</span>, <span itemprop="keywords">colon</span>, <span itemprop="keywords">adenocarcinoma</span>
 Date created: <meta itemprop="dateCreated" content="2007-08-27">08/27/2007
 Animal: human
 <span itemprop="taxon" itemscope itemtype="http://schema.org/BiologicalDatabaseEntry">
 Organism: <span itemprop="name">Homo sapiens</span> (human)
 Taxonomy ID: <a itemprop="url" href="http://www.uniprot.org/taxonomy/9606"><span itemprop="entryID">9606</span></a>
 [<span itemprop="isEntryOf" itemscope itemtype="http://schema.org/BiologicalDatabase"><a itemprop="url" href="http://purl.uniprot.org/taxonomy/"><span itemprop="name">UniProt Taxonomy</span></a></span>]
 </span>
 <span itemprop="isEntryOf" itemscope itemtype="http://schema.org/BiologicalDatabase">
  <a itemprop="url" href="http://cellbank.nibio.go.jp/">
   <span itemprop="name">JCRB Cell Bank</span>
  </a>
 </span>
</div>
KEGG DISEASE: H00653

Entry:
 H00653
Name:
 Marfan syndrome, including:
 Marfan syndrome (MFS);
 Neonatal MFS;
 Atypically severe MFS;
 New variant of MFS
Description:
 Marfan syndrome (MFS) is a relatively common autosomal dominant disorder ...
Other DBs:
 ICD-10: Q87.4
 OMIM: 154700
Species:
 Human

KEGG DISEASE (Diseases viewed as perturbed states of the molecular system)
<div itemscope itemtype="http://schema.org/BiologicalDatabaseEntry">
<a itemprop="url" href="http://www.kegg.jp/dbget-bin/www_bget?ds:H00653"><span itemprop="name">KEGG DISEASE: H00653</span></a>
Entry:
 <span itemprop="entryID">H00653</span>
Name: <span itemscope itemtype="http://schema.org/MedicalCondition">
            <span itemprop="code" itemscope itemtype="http://schema.org/MedicalCode">
             <meta itemprop="codeValue" content="Q87.4">
             <meta itemprop="codingSystem" content="ICD-10">
            </span>
           </span>
  Marfan syndrome, including:
  Marfan syndrome (MFS);
  Neonatal MFS;
  Atypically severe MFS;
  New variant of MFS
Description:
 <span itemprop="description">Marfan syndrome (MFS) is a relatively common autosomal dominant disorder ...</span> 
Other DBs:
 <span itemprop="seeAlso" itemscope itemtype="http://schema.org/BiologicalDatabaseEntry">
    <span itemprop="isEntryOf" itemscope itemtype="BiologicalDatabase">
    <span itemprop="name">ICD-10</span>
</span>: 
<a itemprop="url" href="http://www.kegg.jp/kegg-bin/get_htext?br08403+H00653"><span itemprop="entryID">Q87.4</span></a></span>
 <span itemprop="seeAlso" itemscope itemtype="http://schema.org/BiologicalDatabaseEntry"><span itemprop="isEntryOf" itemscope itemtype="BiologicalDatabase">
<span itemprop="name">OMIM</span>
</span>: <a itemprop="url" href="http://omim.org/entry/154700"><span itemprop="entryID">154700</span></a></span>
Species:
<span itemprop="taxon" itemscope itemtype="http://schema.org/BiologicalDatabaseEntry">
 <span itemprop="name">Human</span>
</span>

<span itemprop="isEntryOf" itemscope itemtype="http://schema.org/BiologicalDatabase">
 <a itemprop="url" href="http://www.kegg.jp/kegg/disease/"><span itemprop="name">KEGG DISEASE (Diseases viewed as perturbed states of the molecular system)</span></a>
</span>
</div>

BiologicalDatabase

JCRB Cell Bank

Profile: JCRB Cell Bank is the first cell bank in Japan. We collect ...
Date established: 10/1984
Last modified: 02/28/2011
Operated by: National Institute of Biomedical Innovation (NIBIO)
<div itemscope itemtype="http://schema.org/BiologicalDatabase">
 <span itemprop="name"><a itemprop="url" href="http://cellbank.nibio.go.jp/">JCRB Cell Bank</a></span>
 Profile: <span itemprop="description">JCRB Cell Bank is the first cell bank in Japan. We collect ...</span>
 Date established: <meta itemprop="dateCreated" content="1984-10">10/1984
 Last modified: <meta itemprop="dateModified" content="2011-02-28">02/28/2011
 Operated by: <span itemprop="provider" itemscope itemtype="http://schema.org/Organization">
  <a itemprop="url" href="http://www.nibio.go.jp/"><span itemprop="name">National Institute of Biomedical Innovation (NIBIO)</span></a>
 </span>
</div>
KEGG: Kyoto Encyclopedia of Genes and Genomes

Profile: KEGG (Kyoto Encyclopedia of Genes and Genomes) is a database resource that ...
Date established: 1995
Current release: Release 64.0, October 1, 2012
Developed by: Kanehisa Laboratories
References:
 Kanehisa M, et al. Nucleic Acids Res. 40, D109-D114 (2012). [pubmed]
 Kanehisa M and Goto S. Nucleic Acids Res. 28, 27-30 (2000). [pubmed]
 Kanehisa M, et al. PNE, 52(12), 1486-1491 (2007). [PNE}
<div itemscope itemtype="http://schema.org/BiologicalDatabase">
 <span itemprop="name"><a itemprop="url" href="http://www.kegg.jp/">KEGG: Kyoto Encyclopedia of Genes and Genomes</a></span>
 Profile: <span itemprop="description">KEGG (Kyoto Encyclopedia of Genes and Genomes) is a database resource that ...</span>
 Date established: <meta itemprop="dateCreated" content="1995">1995
 Current release: Release 64.0, <meta itemprop="dateModified" content="2012-10-01">October 1, 2012
 Developed by: <span itemprop="provider" itemscope itemtype="http://schema.org/Organization"><a itemprop="url" href="http://www.kanehisa.jp/"><span itemprop="name">Kanehisa Laboratories</span></a></span>
 References:
  Kanehisa M, et al. Nucleic Acids Res. 40, D109-D114 (2012). [<meta itemprop='reference' content='pmid:22080510'/><a href="http://www.ncbi.nlm.nih.gov/pubmed/22080510">pubmed</a>]
  Kanehisa M and Goto S. Nucleic Acids Res. 28, 27-30 (2000). [<meta itemprop='reference' content='pmid:10592173'/><a href="http://www.ncbi.nlm.nih.gov/pubmed/10592173">pubmed</a>]
  Kanehisa M, et al. PNE, 52(12), 1486-1491 (2007). [<meta itemprop='reference' content='http://lifesciencedb.jp/dbsearch/Literature/get_pne_cgpdf.php?year=2007&number=5212&file=o6YUOqyfHjzsI1vg5UpTlQ'/><a href="http://lifesciencedb.jp/dbsearch/Literature/get_pne_cgpdf.php?year=2007&number=5212&file=o6YUOqyfHjzsI1vg5UpTlQ">PNE</a>]
</div>

Discussion (Your review and comments are needed!)

1. How to markup taxonomy (4 candidates)

1-1. Original [use taxonID]

<div itemscope itemtype ="http://schema.org/BiologicalDatabaseEntry">
 <h1><a itemprop="url" href="http://www.uniprot.org/uniprot/Q401N2">
  <span itemprop="entryID">Q401N2</span> [<span itemprop="name">Zinc-activated ligand-gated ion channel</span>]
 </a></h1>

 Organism: <a href="http://www.uniprot.org/taxonomy/9606">Homo sapiens (human)</a>
 Taxonomy ID: <a href="http://www.uniprot.org/taxonomy/9606"><span itemprop="taxonID">9606</span></a> 
 [<a href="http://purl.uniprot.org/taxonomy/">UniProt Taxonomy</a>]

 <span itemprop="isEntryOf" itemscope itemtype="http://schema.org/BiologicalDatabase">
 Database: <a itemprop="url" href="http://www.uniprot.org/"><span itemprop="name">UniProt</span></a>
 </span>
</div>

1-2. Proposed change 1 [taxonID -> taxon] [Currently selected]

<div itemscope itemtype ="http://schema.org/BiologicalDatabaseEntry">
 <h1><a itemprop="url" href="http://www.uniprot.org/uniprot/Q401N2">
  <span itemprop="entryID">Q401N2</span> [<span itemprop="name">Zinc-activated ligand-gated ion channel</span>]
 </a></h1>

 <span itemprop="taxon" itemscope itemtype="http://schema.org/BiologicalDatabaseEntry">
 Organism: <span itemprop="name">Homo sapiens</span> (human)
 Taxonomy ID: <a itemprop="url" href="http://www.uniprot.org/taxonomy/9606"><span itemprop="entryID">9606</span></a>
  [<span itemprop="isEntryOf" itemscope itemtype="http://schema.org/BiologicalDatabase"><a itemprop="url" href="http://purl.uniprot.org/taxonomy/"><span itemprop="name">UniProt Taxonomy</span></a></span>]
 </span>

 <span itemprop="isEntryOf" itemscope itemtype="http://schema.org/BiologicalDatabase">
 Database: <a itemprop="url" href="http://www.uniprot.org/"><span itemprop="name">UniProt</span></a>
 </span>
</div>

1-3. Proposed change 2 [taxonID -> taxon] (simpler but less useful for search engines?)

<div itemscope itemtype ="http://schema.org/BiologicalDatabaseEntry">
 <h1><a itemprop="url" href="http://www.uniprot.org/uniprot/Q401N2">
  <span itemprop="entryID">Q401N2</span> [<span itemprop="name">Zinc-activated ligand-gated ion channel</span>]
 </a></h1>

 Organism: <span itemprop="taxon">Homo sapiens</span> (human)
 Taxonomy ID: <a href="http://www.uniprot.org/taxonomy/9606">9606</a> 
 [<a href="http://purl.uniprot.org/taxonomy/">UniProt Taxonomy</a>]

 <span itemprop="isEntryOf" itemscope itemtype="http://schema.org/BiologicalDatabase">
 Database: <a itemprop="url" href="http://www.uniprot.org/"><span itemprop="name">UniProt</span></a>
 </span>
</div>

1-4 Proposed change 3 [taxonID -> SpeciesCode<additional type>]

<div itemscope itemtype ="http://schema.org/BiologicalDatabaseEntry">
 <h1><a itemprop="url" href="http://www.uniprot.org/uniprot/Q401N2">
  <span itemprop="entryID">Q401N2</span> [<span itemprop="name">Zinc-activated ligand-gated ion channel</span>]
 </a></h1>

 Organism: <span itemprop="code" itemscope itemtype="http://schema.org/BiologicalDatabaseCode">Homo sapiens (human)
 Taxonomy ID: <span itemprop="code">9606</a> <meta itemprop="codingSystem" content="taxon">        
</span> 
</span>   

 <span itemprop="isEntryOf" itemscope itemtype="http://schema.org/BiologicalDatabase">
 Database: <a itemprop="url" href="http://www.uniprot.org/"><span itemprop="name">UniProt</span></a>
 </span>
</div>

1-5. Or other possibilities welcome.

2. Example markup with “seeAlso” (or “relatedLink”) property (2 candidates)

2-1. Candidate 1 (“relatedLink” is NOT applicable, which cannot take ‘BiologicalDatabaseEntry’ as datatype) [Currently selected]

<div itemscope itemtype ="http://schema.org/BiologicalDatabaseEntry">
 <h1><a itemprop="url" href="http://www.uniprot.org/uniprot/Q401N2">
  <span itemprop="entryID">Q401N2</span> [<span itemprop="name">Zinc-activated ligand-gated ion channel</span>]
 </a></h1>

 Cross-references:
  KEGG: <span itemprop="seeAlso" itemscope itemtype="http://schema.org/BiologicalDatabaseEntry"><a itemprop="url" href="http://purl.uniprot.org/kegg/hsa:353174"><span itemprop="name">hsa:353174</span></a></span>
  RefSeq: <span itemprop="seeAlso" itemscope itemtype="http://schema.org/BiologicalDatabaseEntry"><a itemprop="url" href="http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&id=NP_851321.2"><span itemprop="name">NP_851321.2</span></a></span>, <span itemprop="seeAlso" itemscope itemtype="http://schema.org/BiologicalDatabaseEntry"><a itemprop="url" href="http://www.ncbi.nlm.nih.gov/nuccore/NM_180990.3"><span itemprop="name">NM_180990.3</span></a></span>
  H-InvDB: <span itemprop="seeAlso" itemscope itemtype="http://schema.org/BiologicalDatabaseEntry"><a itemprop="url" href="http://h-invitational.jp/hinv/spsoup/locus_view?hix_id=HIX0027141"><span itemprop="name">HIX0027141</span></a></span>

 <span itemprop="isEntryOf" itemscope itemtype="http://schema.org/BiologicalDatabase">
 Database: <a itemprop="url" href="http://www.uniprot.org/"><span itemprop="name">UniProt</span></a>
 </span>
</div>

2-2. Candidate 2 (simpler but less useful for search engines?) (“relatedLink” is applicable)

<div itemscope itemtype ="http://schema.org/BiologicalDatabaseEntry">
 <h1><a itemprop="url" href="http://www.uniprot.org/uniprot/Q401N2">
  <span itemprop="entryID">Q401N2</span> [<span itemprop="name">Zinc-activated ligand-gated ion channel</span>]
 </a></h1>

 Cross-references:
  KEGG: <a itemprop="seeAlso" href="http://purl.uniprot.org/kegg/hsa:353174">hsa:353174</a>
  RefSeq: <a itemprop="seeAlso" href="http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&id=NP_851321.2">NP_851321.2</a>, <a itemprop="seeAlso" href="http://www.ncbi.nlm.nih.gov/nuccore/NM_180990.3">NM_180990.3</a>
  H-InvDB: <a itemprop="seeAlso" href="http://h-invitational.jp/hinv/spsoup/locus_view?hix_id=HIX0027141">HIX0027141</a>

 <span itemprop="isEntryOf" itemscope itemtype="http://schema.org/BiologicalDatabase">
 Database: <a itemprop="url" href="http://www.uniprot.org/"><span itemprop="name">UniProt</span></a>
 </span>
</div>

2-3. Or other possibilities welcome.

3. How to markup references in BiologicalDatabase (2 candidates)

3-1. Define a property ‘reference’, like ‘productID’ in Thing > Product (flexible, the term ‘reference’ is easily understandable) [Currently selected]

<div itemscope itemtype="http://schema.org/BiologicalDatabase">
 <span itemprop="name"><a itemprop="url" href="http://www.kegg.jp/">KEGG: Kyoto Encyclopedia of Genes and Genomes</a></span>
 Profile: <span itemprop="description">KEGG (Kyoto Encyclopedia of Genes and Genomes) is a database resource that integrates genomic, chemical and systemic functional information. In particular, gene catalogs from completely sequenced genomes are linked to higher-level systemic functions of the cell, the organism and the ecosystem.</span>
 Date established: <meta itemprop="dateCreated" content="1995">1995
 Current release: Release 64.0, <meta itemprop="dateModified" content="2012-10-01">October 1, 2012
 Developed by: <span itemprop="provider" itemscope itemtype="http://schema.org/Organization"><a itemprop="url" href="http://www.kanehisa.jp/"><span itemprop="name">Kanehisa Laboratories</span></a></span>

 References:
  Kanehisa M, et al. Nucleic Acids Res. 40, D109-D114 (2012). [<meta itemprop='reference' content='pmid:22080510'/><a href="http://www.ncbi.nlm.nih.gov/pubmed/22080510">pubmed</a>]
  Kanehisa M and Goto S. Nucleic Acids Res. 28, 27-30 (2000). [<meta itemprop='reference' content='pmid:10592173'/><a href="http://www.n'cbi.nlm.nih.gov/pubmed/10592173">pubmed</a>]
  Kanehisa M, et al. PNE, 52(12), 1486-1491 (2007). [<meta itemprop='reference' content='http://lifesciencedb.jp/dbsearch/Literature/get_pne_cgpdf.php?year=2007&number=5212&file=o6YUOqyfHjzsI1vg5UpTlQ'/><a href="http://lifesciencedb.jp/dbsearch/Literature/get_pne_cgpdf.php?year=2007&number=5212&file=o6YUOqyfHjzsI1vg5UpTlQ">PNE</a>]

</div>

3-2. Define properties ‘pmid’, ‘doi’, and ‘pmcid’, like ‘isbn’ in Thing > CreativeWork > Book

<div itemscope itemtype="http://schema.org/BiologicalDatabase">
 <span itemprop="name"><a itemprop="url" href="http://www.kegg.jp/">KEGG: Kyoto Encyclopedia of Genes and Genomes</a></span>
 Profile: <span itemprop="description">KEGG (Kyoto Encyclopedia of Genes and Genomes) is a database resource that integrates genomic, chemical and systemic functional information. In particular, gene catalogs from completely sequenced genomes are linked to higher-level systemic functions of the cell, the organism and the ecosystem.</span>
 Date established: <meta itemprop="dateCreated" content="1995">1995
 Current release: Release 64.0, <meta itemprop="dateModified" content="2012-10-01">October 1, 2012
 Developed by: <span itemprop="provider" itemscope itemtype="http://schema.org/Organization"><a itemprop="url" href="http://www.kanehisa.jp/"><span itemprop="name">Kanehisa Laboratories</span></a></span>

 References:
  Kanehisa M, et al. Nucleic Acids Res. 40, D109-D114 (2012). [<meta itemprop='pmid' content='22080510'/><a href="http://www.ncbi.nlm.nih.gov/pubmed/22080510">pubmed</a>]
  Kanehisa M and Goto S. Nucleic Acids Res. 28, 27-30 (2000). [<meta itemprop='pmid' content='10592173'/><a href="http://www.ncbi.nlm.nih.gov/pubmed/10592173">pubmed</a>]

</div>

3-3. Or other possibilities welcome.

Comments and Discussion

1. A Comment from Dan Brickley (2012-03-13).

  • Others have also mentioned interest in adding some notion of species.

2. Discussion in the BioHackathon ML (from 2012-08-10).

3. BioHackathon 2012 (2012-09-02/2012-09-07)

4. BioHackathon 12.12 (2012-12-19/2012-12-22)

5. SEMANTIC WEB HEALTH CARE AND LIFE SCIENCES (HCLS) INTEREST GROUP Seminar (2013-04-02)

6. A comment from Jamie Estill(2013-08-13)

  • It seems that author and contentLocation should also be included as a properties from CreativeWork

How to Join the Discussion

Please give your comments on the proposed schema by the following ways:

1. Reply to the original post on Mailing List (public-vocabs@w3.org)

2. Reply to the original post (@keyboardrobot) on Twitter

3. Reply to the original post (@keyboardrobot) on Twitter [in Japanese]

References

Search engines in the life science field

Meta data for biological databases

Validator for microdata