WebSchemas/Datasets
Overview
This page discusses a proposal extending schema.org for describing datasets and data catalogs. For additional information, see this demo page. Comments on this proposal are welcome.
Status
We have a detailed proposal here, and something in this direction is a strong candidate for official addition to schema.org.
The natural next steps before finalizing the addition are:
- some indication from potential publishers that there is willingness to adopt, and that the design is a good fit to existing sites/content
- some positive indication from relevant experts (eg. open gov data publishers; from W3C's Government Linked Data group or others) that the schema has had some careful review. Comments can be here in the Wiki, to the public-vocabs@w3.org list (see details at WebSchemas page), or passed along via mail to danbri@google.com.
Vocabulary
The Datasets extension introduces three new types, with associated properties:
- Thing > CreativeWork > Dataset: a body of structured information describing some topic(s) of interest
- catalog(DataCatalog): the data catalog which contains a dataset
- distribution(DataDownload): a downloadable form of this dataset, at a specific location, in a specific format
- spatial(Place): the range of spatial applicability of a dataset, e.g. for a dataset of New York weather, the state of New York
- temporal(DateTime): the range of temporal applicability of a dataset, e.g. for a 2011 census dataset, the year 2011 (in ISO 8601 time interval format)
- Thing > CreativeWork > DataCatalog: a collection of datasets
- dataset(Dataset): a dataset contained in a catalog
- Thing > CreativeWork > MediaObject > DataDownload: a dataset in downloadable form
Example
Microdata markup
<div itemscope="itemscope" itemtype="http://schema.org/Dataset" itemid="http://example.org/datasets/seismic-hazard-zones"> <meta itemprop="url" content="http://www.example.org/story.php?title=seismic-hazard-zones"/> <span itemprop="name"> <a href="http://www.example.org/story.php?title=seismic-hazard-zones"> <b>Seismic Hazard Zones</b> </a> </span> (<span itemprop="temporal">2011</span>, version "<span itemprop="version">2011-Sep-13</span>") <div itemprop="description">This is a dataset of liquefaction and landslide zones in the state of California.</div> <div itemprop="spatial" itemscope="itemscope" itemtype="http://schema.org/Country" itemid="http://dbpedia.org/resource/United_States"> <i>Country:</i> <a href="http://en.wikipedia.org/wiki/United_States"> <span itemprop="name">United States</span> </a> </div> <div itemprop="publisher" itemscope="itemscope" itemtype="http://schema.org/Organization"><i>Publisher:</i> <span itemprop="name">Department of Technology</span> <span itemprop="email">dot at example dot org</span> </div> <div><i>Topics:</i> <span itemprop="about" itemscope="itemscope" itemtype="http://schema.org/Thing" itemid="http://dbpedia.org/resource/Seismic_hazard"> <a href="http://en.wikipedia.org/wiki/Seismic_hazard"><span itemprop="name">seismic hazard</span></a> </span> </div> <div><i>Keywords:</i> <span itemprop="keywords"><span itemscope="itemscope" itemtype="http://schema.org/Text">layers</span></span>, <span itemprop="keywords"><span itemscope="itemscope" itemtype="http://schema.org/Text">geography</span></span>, <span itemprop="keywords"><span itemscope="itemscope" itemtype="http://schema.org/Text">maps</span></span>, <span itemprop="keywords"><span itemscope="itemscope" itemtype="http://schema.org/Text">gis</span></span> </div> <div itemprop="license" itemscope="itemscope" itemtype="http://schema.org/Webpage"><i>License:</i> <a href="http://opendatacommons.org/licenses/pddl/1.0/"> <span itemprop="name">ODC Public Domain Dedication and Licence (PDDL)</span> </a> <meta itemprop="url" content="http://opendatacommons.org/licenses/pddl/1.0/" /> </div> <div itemprop="distribution" itemscope="itemscope" itemtype="http://schema.org/DataDownload"><i>Download:</i> <a href="http://example.org/downloads/seismic-hazard-zones.nt.gz"> <meta itemprop="encodingFormat" content="text/plain" /> <meta itemprop="contentUrl" content="http://example.org/downloads/seismic-hazard-zones.nt.gz" /> <meta itemprop="inLanguage" content="en" /> <span itemprop="description">compressed N-Triples dump</span>, <span itemprop="datePublished" content="2011-08-12">August 12, 2011</meta> <span itemprop="contentSize" content="13.9">(13.97MB)</span> </a> </div> </div>
embedded data
Item type: http://schema.org/dataset name: Seismic Hazard Zones temporal: 2011 version: 2011-Sep-13 url: http://www.example.org/story.php?title=seismic-hazard-zones description: This is a dataset of liquefaction and landslide zones in the state of California. spatial: Item 1 publisher: Item 2 about: Item 3 keywords: layers keywords: geography keywords: maps keywords: gis license: Item 4 distribution: Item 5 Item 1 type: http://schema.org/country name: United States Item 2 type: http://schema.org/organization name: Department of Technology email: dot at example dot org Item 3 type: http://schema.org/thing name: seismic hazard Item 4 type: http://schema.org/webpage name: ODC Public Domain Dedication and Licence (PDDL) url: http://opendatacommons.org/licenses/pddl/1.0/ Item 5 type: http://schema.org/datadownload encodingformat: text/plain contenturl: http://example.org/downloads/seismic-hazard-zones.nt.gz inlanguage: en description: compressed N-Triples dump datepublished: 2011-08-12 contentsize: 13.9
equivalent RDFa 1.1 markup
<div vocab="http://schema.org/" prefix="dcat: http://www.w3.org/ns/dcat#" typeof="Dataset dcat:Dataset" about="http://example.org/datasets/seismic-hazard-zones"> <meta property="url" content="http://www.example.org/story.php?title=seismic-hazard-zones"/> <span property="name"><b><a href="http://www.example.org/story.php?title=seismic-hazard-zones">Seismic Hazard Zones</a></b></span> (<span property="temporal">2011</span>, version "2011-Sep-13") <div property="description">This is a dataset of liquefaction and landslide zones in the state of California.</div> <div rel="spatial" resource="http://dbpedia.org/resource/United_States"><i>Country:</i> <a href="http://dbpedia.org/resource/United_States"> <span about="http://dbpedia.org/resource/United_States" typeof="Country"> <span property="name">United States</span> </span> </a> </div> <div rel="publisher"><i>Publisher:</i> <span typeof="Organization"> <span property="name">Department of Technology</span> (<span property="email">dot at example dot org</span>) </span> </div> <div><i>Topics:</i> <span rel="about" resource="http://dbpedia.org/resource/Seismic_hazard"> <a href="http://en.wikipedia.org/wiki/Seismic_hazard"> <span property="name">seismic hazard</span></a> </span> </div> <div><i>Keywords:</i> <span property="keywords">layers</span>, <span property="keywords">geography</span>, <span property="keywords">maps</span>, <span property="keywords">gis</span> </div> <div rel="license"><i>License:</i> <a href="http://opendatacommons.org/licenses/pddl/1.0/"> <span typeof="Webpage"> <span property="name">ODC Public Domain Dedication and Licence (PDDL)</span> <meta property="url" content="http://opendatacommons.org/licenses/pddl/1.0/"/> </span> </a> </div> <div rel="distribution"><i>Download:</i> <a href="http://example.org/downloads/seismic-hazard-zones.nt.gz"> <span typeof="DataDownload"> <meta property="encodingFormat" content="text/plain" /> <meta property="contentUrl" content="http://example.org/downloads/seismic-hazard-zones.nt.gz" /> <meta property="inLanguage" content="en" /> <span property="description">compressed N-Triples dump</span>, <meta property="datePublished" content="2011-08-12">August 12, 2011</meta> <meta property="contentSize" content="13.9">(13.97MB)</span> </span> </a> </div> </div>
RDFa 1.1 Lite markup
<div vocab="http://schema.org/" prefix="dcat: http://www.w3.org/ns/dcat#" typeof="Dataset dcat:Dataset" about="http://example.org/datasets/seismic-hazard-zones"> <meta property="url" content="http://www.example.org/story.php?title=seismic-hazard-zones"/> <span property="name"><b><a href="http://www.example.org/story.php?title=seismic-hazard-zones">Seismic Hazard Zones</a></b></span> (<span property="temporal">2011</span>, version "2011-Sep-13") <div property="description">This is a dataset of liquefaction and landslide zones in the state of California.</div> <div><i>Keywords:</i> <span property="keywords">layers</span>, <span property="keywords">geography</span>, <span property="keywords">maps</span>, <span property="keywords">gis</span> </div> </div>
Related vocabularies
- Data Catalog Vocabulary (DCAT)
- Asset Description Metadata Schema (ADMS)
- Vocabulary of Interlinked Datasets (VoID)
- vocabulary of the International Open Government Search service (derived from the above)
- RDF Data Cube Vocabulary
Mappings
This table maps Datasets extension types and properties (including supporting schema.org vocabulary) to and from their approximate equivalents in DCAT and VoID. A further relevant vocabulary, ADMS, is a profile (specialization) of DCAT. Only the terms that are specialized in ADMS are listed separately, otherwise it uses DCAT.
Datasets extension | DCAT | ADMS | VoID |
---|---|---|---|
sdo:DataCatalog | dcat:Catalog | adms:AssetRepository (sub class of dcat:Catalog) | |
sdo:DataDownload | dcat:Distribution | adms:AssetDistribution (sub class of dcat:Distribution) | |
sdo:Dataset | dcat:Dataset | adms:Asset (sub class of dcat:Dataset) | void:Dataset |
sdo:catalog | |||
sdo:dataset | dcat:dataset | ||
sdo:distribution | dcat:distribution | void:dataDump | |
sdo:spatial | dcterms:spatial | ||
sdo:temporal | dcterms:temporal | ||
sdo:about | dcat:theme | ||
sdo:contentSize | dcat:byteSize | ||
sdo:contentUrl | dcat:downloadURL | ||
sdo:dateModified | dcterms:modified | ||
sdo:datePublished | dcterms:issued | ||
sdo:description | dcterms:description | ||
sdo:encodingFormat | dcterms:format | ||
sdo:inLanguage | dcterms:language | ||
sdo:keywords | dcat:keyword | ||
sdo:name | dcterms:title | ||
sdo:Organization | foaf:Organization | ||
sdo:Person | foaf:Person | ||
sdo:publisher | dcterms:publisher | ||
sdo:Thing | skos:Concept | ||
sdo:url | foaf:homepage | ||
sdo:version | owl:versionInfo (also adms:versionNotes) | ||
dcat:CatalogRecord | |||
dcat:Distribution | |||
dcat:byteSize | |||
dcat:downloadURL | |||
dcat:landingPage | |||
dcat:mediaType | |||
dcat:record | |||
dcat:themeTaxonomy | |||
dcterms:accrualPeriodicity | |||
dcterms:identifier | |||
dcterms:license | |||
foaf:primaryTopic | |||
skos:ConceptScheme | |||
adms:Identifier | |||
adms:identifier | |||
adms:includedAsset | |||
adms:interoperabilityLevel | |||
adms:last | |||
adms:next | |||
adms:prev | |||
adms:representationTechnique | |||
adms:schemeAgency | |||
adms:status | |||
adms:versionNotes | |||
void:class | |||
void:classes | |||
void:classPartition | |||
void:DatasetDescription | |||
void:distinctObjects | |||
void:distinctSubjects | |||
void:documents | |||
void:entities | |||
adms:sample | void:exampleResource | ||
void:feature | |||
void:inDataset | |||
void:linkPredicate | |||
void:Linkset | |||
void:objectsTarget | |||
void:openSearchDescription | |||
void:properties | |||
void:property | |||
void:propertyPartition | |||
void:rootResource | |||
void:sparqlEndpoint | |||
void:subjectsTarget | |||
void:subset | |||
void:target | |||
void:TechnicalFeature | |||
void:triples | |||
void:uriLookupEndpoint | |||
void:uriRegexPattern | |||
void:uriSpace | |||
adms:supportedSchema | void:vocabulary |