WebSchemas/Datasets
Overview
This page discusses a proposal extending schema.org for describing datasets and data catalogs. For additional information, see this demo page. Comments on this proposal are welcome.
Status
We have a detailed proposal here, and something in this direction is a strong candidate for official addition to schema.org.
The natural next steps before finalizing the addition are:
- some indication from potential publishers that there is willingness to adopt, and that the design is a good fit to existing sites/content
- some positive indication from relevant experts (eg. open gov data publishers; from W3C's Government Linked Data group or others) that the schema has had some careful review. Comments can be here in the Wiki, to the public-vocabs@w3.org list (see details at WebSchemas page), or passed along via mail to danbri@google.com.
Vocabulary
The Datasets extension introduces three new types, with associated properties:
- Thing > CreativeWork > Dataset: a body of structured information describing some topic(s) of interest
- catalog(DataCatalog): the data catalog which contains a dataset
- distribution(DataDownload): a downloadable form of this dataset, at a specific location, in a specific format
- spatial(Place): the range of spatial applicability of a dataset, e.g. for a dataset of New York weather, the state of New York
- temporal(DateTime): the range of temporal applicability of a dataset, e.g. for a 2011 census dataset, the year 2011 (in ISO 8601 time interval format)
- Thing > CreativeWork > DataCatalog: a collection of datasets
- dataset(Dataset): a dataset contained in a catalog
- Thing > CreativeWork > MediaObject > DataDownload: a dataset in downloadable form
Example
Microdata markup
<div itemscope="itemscope" itemtype="http://schema.org/Dataset" itemid="http://example.org/datasets/seismic-hazard-zones">
<meta itemprop="url" content="http://www.example.org/story.php?title=seismic-hazard-zones"/>
<span itemprop="name">
<a href="http://www.example.org/story.php?title=seismic-hazard-zones">
<b>Seismic Hazard Zones</b>
</a>
</span>
(<span itemprop="temporal">2011</span>, version "<span itemprop="version">2011-Sep-13</span>")
<div itemprop="description">This is a dataset of liquefaction and landslide zones in the state of California.</div>
<div itemprop="spatial" itemscope="itemscope" itemtype="http://schema.org/Country" itemid="http://dbpedia.org/resource/United_States">
<i>Country:</i>
<a href="http://en.wikipedia.org/wiki/United_States">
<span itemprop="name">United States</span>
</a>
</div>
<div itemprop="publisher" itemscope="itemscope" itemtype="http://schema.org/Organization"><i>Publisher:</i>
<span itemprop="name">Department of Technology</span>
<span itemprop="email">dot at example dot org</span>
</div>
<div><i>Topics:</i>
<span itemprop="about" itemscope="itemscope" itemtype="http://schema.org/Thing" itemid="http://dbpedia.org/resource/Seismic_hazard">
<a href="http://en.wikipedia.org/wiki/Seismic_hazard"><span itemprop="name">seismic hazard</span></a>
</span>
</div>
<div><i>Keywords:</i>
<span itemprop="keywords"><span itemscope="itemscope" itemtype="http://schema.org/Text">layers</span></span>,
<span itemprop="keywords"><span itemscope="itemscope" itemtype="http://schema.org/Text">geography</span></span>,
<span itemprop="keywords"><span itemscope="itemscope" itemtype="http://schema.org/Text">maps</span></span>,
<span itemprop="keywords"><span itemscope="itemscope" itemtype="http://schema.org/Text">gis</span></span>
</div>
<div itemprop="license" itemscope="itemscope" itemtype="http://schema.org/Webpage"><i>License:</i>
<a href="http://opendatacommons.org/licenses/pddl/1.0/">
<span itemprop="name">ODC Public Domain Dedication and Licence (PDDL)</span>
</a>
<meta itemprop="url" content="http://opendatacommons.org/licenses/pddl/1.0/" />
</div>
<div itemprop="distribution" itemscope="itemscope" itemtype="http://schema.org/DataDownload"><i>Download:</i>
<a href="http://example.org/downloads/seismic-hazard-zones.nt.gz">
<meta itemprop="encodingFormat" content="text/plain" />
<meta itemprop="contentUrl" content="http://example.org/downloads/seismic-hazard-zones.nt.gz" />
<meta itemprop="inLanguage" content="en" />
<span itemprop="description">compressed N-Triples dump</span>,
<span itemprop="datePublished" content="2011-08-12">August 12, 2011</meta>
<span itemprop="contentSize" content="13.9">(13.97MB)</span>
</a>
</div>
</div>
embedded data
Item type: http://schema.org/dataset name: Seismic Hazard Zones temporal: 2011 version: 2011-Sep-13 url: http://www.example.org/story.php?title=seismic-hazard-zones description: This is a dataset of liquefaction and landslide zones in the state of California. spatial: Item 1 publisher: Item 2 about: Item 3 keywords: layers keywords: geography keywords: maps keywords: gis license: Item 4 distribution: Item 5 Item 1 type: http://schema.org/country name: United States Item 2 type: http://schema.org/organization name: Department of Technology email: dot at example dot org Item 3 type: http://schema.org/thing name: seismic hazard Item 4 type: http://schema.org/webpage name: ODC Public Domain Dedication and Licence (PDDL) url: http://opendatacommons.org/licenses/pddl/1.0/ Item 5 type: http://schema.org/datadownload encodingformat: text/plain contenturl: http://example.org/downloads/seismic-hazard-zones.nt.gz inlanguage: en description: compressed N-Triples dump datepublished: 2011-08-12 contentsize: 13.9
equivalent RDFa 1.1 markup
<div vocab="http://schema.org/" prefix="dcat: http://www.w3.org/ns/dcat#" typeof="Dataset dcat:Dataset"
about="http://example.org/datasets/seismic-hazard-zones">
<meta property="url" content="http://www.example.org/story.php?title=seismic-hazard-zones"/>
<span property="name"><b><a href="http://www.example.org/story.php?title=seismic-hazard-zones">Seismic Hazard Zones</a></b></span>
(<span property="temporal">2011</span>, version "2011-Sep-13")
<div property="description">This is a dataset of liquefaction and landslide zones in the state of California.</div>
<div rel="spatial" resource="http://dbpedia.org/resource/United_States"><i>Country:</i>
<a href="http://dbpedia.org/resource/United_States">
<span about="http://dbpedia.org/resource/United_States" typeof="Country">
<span property="name">United States</span>
</span>
</a>
</div>
<div rel="publisher"><i>Publisher:</i>
<span typeof="Organization">
<span property="name">Department of Technology</span>
(<span property="email">dot at example dot org</span>)
</span>
</div>
<div><i>Topics:</i>
<span rel="about" resource="http://dbpedia.org/resource/Seismic_hazard">
<a href="http://en.wikipedia.org/wiki/Seismic_hazard">
<span property="name">seismic hazard</span></a>
</span>
</div>
<div><i>Keywords:</i>
<span property="keywords">layers</span>,
<span property="keywords">geography</span>,
<span property="keywords">maps</span>,
<span property="keywords">gis</span>
</div>
<div rel="license"><i>License:</i>
<a href="http://opendatacommons.org/licenses/pddl/1.0/">
<span typeof="Webpage">
<span property="name">ODC Public Domain Dedication and Licence (PDDL)</span>
<meta property="url" content="http://opendatacommons.org/licenses/pddl/1.0/"/>
</span>
</a>
</div>
<div rel="distribution"><i>Download:</i>
<a href="http://example.org/downloads/seismic-hazard-zones.nt.gz">
<span typeof="DataDownload">
<meta property="encodingFormat" content="text/plain" />
<meta property="contentUrl" content="http://example.org/downloads/seismic-hazard-zones.nt.gz" />
<meta property="inLanguage" content="en" />
<span property="description">compressed N-Triples dump</span>,
<meta property="datePublished" content="2011-08-12">August 12, 2011</meta>
<meta property="contentSize" content="13.9">(13.97MB)</span>
</span>
</a>
</div>
</div>
RDFa 1.1 Lite markup
<div vocab="http://schema.org/" prefix="dcat: http://www.w3.org/ns/dcat#" typeof="Dataset dcat:Dataset"
about="http://example.org/datasets/seismic-hazard-zones">
<meta property="url" content="http://www.example.org/story.php?title=seismic-hazard-zones"/>
<span property="name"><b><a href="http://www.example.org/story.php?title=seismic-hazard-zones">Seismic Hazard Zones</a></b></span>
(<span property="temporal">2011</span>, version "2011-Sep-13")
<div property="description">This is a dataset of liquefaction and landslide zones in the state of California.</div>
<div><i>Keywords:</i>
<span property="keywords">layers</span>,
<span property="keywords">geography</span>,
<span property="keywords">maps</span>,
<span property="keywords">gis</span>
</div>
</div>
Related vocabularies
- Data Catalog Vocabulary (DCAT)
- Asset Description Metadata Schema (ADMS)
- Vocabulary of Interlinked Datasets (VoID)
- vocabulary of the International Open Government Search service (derived from the above)
- RDF Data Cube Vocabulary
Mappings
This table maps Datasets extension types and properties (including supporting schema.org vocabulary) to and from their approximate equivalents in DCAT and VoID. A further relevant vocabulary, ADMS, is a profile (specialization) of DCAT. Only the terms that are specialized in ADMS are listed separately, otherwise it uses DCAT.
| Datasets extension | DCAT | ADMS | VoID |
|---|---|---|---|
| sdo:DataCatalog | dcat:Catalog | adms:AssetRepository (sub class of dcat:Catalog) | |
| sdo:DataDownload | dcat:Distribution | adms:AssetDistribution (sub class of dcat:Distribution) | |
| sdo:Dataset | dcat:Dataset | adms:Asset (sub class of dcat:Dataset) | void:Dataset |
| sdo:catalog | |||
| sdo:dataset | dcat:dataset | ||
| sdo:distribution | dcat:distribution | void:dataDump | |
| sdo:spatial | dcterms:spatial | ||
| sdo:temporal | dcterms:temporal | ||
| sdo:about | dcat:theme | ||
| sdo:contentSize | dcat:byteSize | ||
| sdo:contentUrl | dcat:downloadURL | ||
| sdo:dateModified | dcterms:modified | ||
| sdo:datePublished | dcterms:issued | ||
| sdo:description | dcterms:description | ||
| sdo:encodingFormat | dcterms:format | ||
| sdo:inLanguage | dcterms:language | ||
| sdo:keywords | dcat:keyword | ||
| sdo:name | dcterms:title | ||
| sdo:Organization | foaf:Organization | ||
| sdo:Person | foaf:Person | ||
| sdo:publisher | dcterms:publisher | ||
| sdo:Thing | skos:Concept | ||
| sdo:url | foaf:homepage | ||
| sdo:version | owl:versionInfo (also adms:versionNotes) | ||
| dcat:CatalogRecord | |||
| dcat:Distribution | |||
| dcat:byteSize | |||
| dcat:downloadURL | |||
| dcat:landingPage | |||
| dcat:mediaType | |||
| dcat:record | |||
| dcat:themeTaxonomy | |||
| dcterms:accrualPeriodicity | |||
| dcterms:identifier | |||
| dcterms:license | |||
| foaf:primaryTopic | |||
| skos:ConceptScheme | |||
| adms:Identifier | |||
| adms:identifier | |||
| adms:includedAsset | |||
| adms:interoperabilityLevel | |||
| adms:last | |||
| adms:next | |||
| adms:prev | |||
| adms:representationTechnique | |||
| adms:schemeAgency | |||
| adms:status | |||
| adms:versionNotes | |||
| void:class | |||
| void:classes | |||
| void:classPartition | |||
| void:DatasetDescription | |||
| void:distinctObjects | |||
| void:distinctSubjects | |||
| void:documents | |||
| void:entities | |||
| adms:sample | void:exampleResource | ||
| void:feature | |||
| void:inDataset | |||
| void:linkPredicate | |||
| void:Linkset | |||
| void:objectsTarget | |||
| void:openSearchDescription | |||
| void:properties | |||
| void:property | |||
| void:propertyPartition | |||
| void:rootResource | |||
| void:sparqlEndpoint | |||
| void:subjectsTarget | |||
| void:subset | |||
| void:target | |||
| void:TechnicalFeature | |||
| void:triples | |||
| void:uriLookupEndpoint | |||
| void:uriRegexPattern | |||
| void:uriSpace | |||
| adms:supportedSchema | void:vocabulary |