WebSchemas/Datasets
Overview
This page discusses a proposal extending schema.org for describing datasets and data catalogs. For additional information, see this demo page. Comments on this proposal are welcome.
Status
We have a detailed proposal here, and something in this direction is a strong candidate for official addition to schema.org.
The natural next steps before finalizing the addition are:
- some indication from potential publishers that there is willingness to adopt, and that the design is a good fit to existing sites/content
 - some positive indication from relevant experts (eg. open gov data publishers; from W3C's Government Linked Data group or others) that the schema has had some careful review. Comments can be here in the Wiki, to the public-vocabs@w3.org list (see details at WebSchemas page), or passed along via mail to danbri@google.com.
 
Vocabulary
The Datasets extension introduces three new types, with associated properties:
- Thing > CreativeWork > Dataset: a body of structured information describing some topic(s) of interest
- catalog(DataCatalog): the data catalog which contains a dataset
 - distribution(DataDownload): a downloadable form of this dataset, at a specific location, in a specific format
 - spatial(Place): the range of spatial applicability of a dataset, e.g. for a dataset of New York weather, the state of New York
 - temporal(DateTime): the range of temporal applicability of a dataset, e.g. for a 2011 census dataset, the year 2011 (in ISO 8601 time interval format)
 
 - Thing > CreativeWork > DataCatalog: a collection of datasets
- dataset(Dataset): a dataset contained in a catalog
 
 - Thing > CreativeWork > MediaObject > DataDownload: a dataset in downloadable form
 
Example
Microdata markup
<div itemscope="itemscope" itemtype="http://schema.org/Dataset" itemid="http://example.org/datasets/seismic-hazard-zones">
    <meta itemprop="url" content="http://www.example.org/story.php?title=seismic-hazard-zones"/>
    <span itemprop="name">
        <a href="http://www.example.org/story.php?title=seismic-hazard-zones">
            <b>Seismic Hazard Zones</b>
        </a>
    </span>
    (<span itemprop="temporal">2011</span>, version "<span itemprop="version">2011-Sep-13</span>")
    <div itemprop="description">This is a dataset of liquefaction and landslide zones in the state of California.</div>
    <div itemprop="spatial" itemscope="itemscope" itemtype="http://schema.org/Country" itemid="http://dbpedia.org/resource/United_States">
        <i>Country:</i>
        <a href="http://en.wikipedia.org/wiki/United_States">
            <span itemprop="name">United States</span>
        </a>
    </div>
    <div itemprop="publisher" itemscope="itemscope" itemtype="http://schema.org/Organization"><i>Publisher:</i>
        <span itemprop="name">Department of Technology</span>
        <span itemprop="email">dot at example dot org</span>
    </div>
    <div><i>Topics:</i>
        <span itemprop="about" itemscope="itemscope" itemtype="http://schema.org/Thing" itemid="http://dbpedia.org/resource/Seismic_hazard">
            <a href="http://en.wikipedia.org/wiki/Seismic_hazard"><span itemprop="name">seismic hazard</span></a>
        </span>
    </div>
    <div><i>Keywords:</i>
        <span itemprop="keywords"><span itemscope="itemscope" itemtype="http://schema.org/Text">layers</span></span>,
        <span itemprop="keywords"><span itemscope="itemscope" itemtype="http://schema.org/Text">geography</span></span>,
        <span itemprop="keywords"><span itemscope="itemscope" itemtype="http://schema.org/Text">maps</span></span>,
        <span itemprop="keywords"><span itemscope="itemscope" itemtype="http://schema.org/Text">gis</span></span>
    </div>
    <div itemprop="license" itemscope="itemscope" itemtype="http://schema.org/Webpage"><i>License:</i>
        <a href="http://opendatacommons.org/licenses/pddl/1.0/">
            <span itemprop="name">ODC Public Domain Dedication and Licence (PDDL)</span>
        </a>
        <meta itemprop="url" content="http://opendatacommons.org/licenses/pddl/1.0/" />
    </div>
    <div itemprop="distribution" itemscope="itemscope" itemtype="http://schema.org/DataDownload"><i>Download:</i>
        <a href="http://example.org/downloads/seismic-hazard-zones.nt.gz">
            <meta itemprop="encodingFormat" content="text/plain" />
            <meta itemprop="contentUrl" content="http://example.org/downloads/seismic-hazard-zones.nt.gz" />
            <meta itemprop="inLanguage" content="en" />
            <span itemprop="description">compressed N-Triples dump</span>,
            <span itemprop="datePublished" content="2011-08-12">August 12, 2011</meta>
            <span itemprop="contentSize" content="13.9">(13.97MB)</span>
	</a>
    </div>
</div>
embedded data
Item type: http://schema.org/dataset name: Seismic Hazard Zones temporal: 2011 version: 2011-Sep-13 url: http://www.example.org/story.php?title=seismic-hazard-zones description: This is a dataset of liquefaction and landslide zones in the state of California. spatial: Item 1 publisher: Item 2 about: Item 3 keywords: layers keywords: geography keywords: maps keywords: gis license: Item 4 distribution: Item 5 Item 1 type: http://schema.org/country name: United States Item 2 type: http://schema.org/organization name: Department of Technology email: dot at example dot org Item 3 type: http://schema.org/thing name: seismic hazard Item 4 type: http://schema.org/webpage name: ODC Public Domain Dedication and Licence (PDDL) url: http://opendatacommons.org/licenses/pddl/1.0/ Item 5 type: http://schema.org/datadownload encodingformat: text/plain contenturl: http://example.org/downloads/seismic-hazard-zones.nt.gz inlanguage: en description: compressed N-Triples dump datepublished: 2011-08-12 contentsize: 13.9
equivalent RDFa 1.1 markup
<div vocab="http://schema.org/" prefix="dcat: http://www.w3.org/ns/dcat#" typeof="Dataset dcat:Dataset"
     about="http://example.org/datasets/seismic-hazard-zones">
 
    <meta property="url" content="http://www.example.org/story.php?title=seismic-hazard-zones"/>
    <span property="name"><b><a href="http://www.example.org/story.php?title=seismic-hazard-zones">Seismic Hazard Zones</a></b></span>
    (<span property="temporal">2011</span>, version "2011-Sep-13")
    <div property="description">This is a dataset of liquefaction and landslide zones in the state of California.</div>
    <div rel="spatial" resource="http://dbpedia.org/resource/United_States"><i>Country:</i>
        <a href="http://dbpedia.org/resource/United_States">
            <span about="http://dbpedia.org/resource/United_States" typeof="Country">
                <span property="name">United States</span>
            </span>
        </a>
    </div>
    <div rel="publisher"><i>Publisher:</i>
        <span typeof="Organization">
            <span property="name">Department of Technology</span>
	    (<span property="email">dot at example dot org</span>)
        </span>
    </div>
    <div><i>Topics:</i>
        <span rel="about" resource="http://dbpedia.org/resource/Seismic_hazard">
            <a href="http://en.wikipedia.org/wiki/Seismic_hazard">
                <span property="name">seismic hazard</span></a>
        </span>
    </div>
    <div><i>Keywords:</i>
        <span property="keywords">layers</span>,
        <span property="keywords">geography</span>,
        <span property="keywords">maps</span>,
        <span property="keywords">gis</span>
    </div>
    <div rel="license"><i>License:</i>
    	<a href="http://opendatacommons.org/licenses/pddl/1.0/">
	<span typeof="Webpage">
	    <span property="name">ODC Public Domain Dedication and Licence (PDDL)</span>
	    <meta property="url" content="http://opendatacommons.org/licenses/pddl/1.0/"/>
	</span>
	</a>
    </div>
    
    <div rel="distribution"><i>Download:</i>
        <a href="http://example.org/downloads/seismic-hazard-zones.nt.gz">
        <span typeof="DataDownload">
            <meta property="encodingFormat" content="text/plain" />
            <meta property="contentUrl" content="http://example.org/downloads/seismic-hazard-zones.nt.gz" />
            <meta property="inLanguage" content="en" />
            <span property="description">compressed N-Triples dump</span>,
            <meta property="datePublished" content="2011-08-12">August 12, 2011</meta>
            <meta property="contentSize" content="13.9">(13.97MB)</span>
	</span>
	</a>
    </div>
</div>
RDFa 1.1 Lite markup
<div vocab="http://schema.org/" prefix="dcat: http://www.w3.org/ns/dcat#" typeof="Dataset dcat:Dataset"
     about="http://example.org/datasets/seismic-hazard-zones">
    <meta property="url" content="http://www.example.org/story.php?title=seismic-hazard-zones"/>
    <span property="name"><b><a href="http://www.example.org/story.php?title=seismic-hazard-zones">Seismic Hazard Zones</a></b></span>
    (<span property="temporal">2011</span>, version "2011-Sep-13")
    <div property="description">This is a dataset of liquefaction and landslide zones in the state of California.</div>
    <div><i>Keywords:</i>
        <span property="keywords">layers</span>,
        <span property="keywords">geography</span>,
        <span property="keywords">maps</span>,
        <span property="keywords">gis</span>
    </div>
</div>
Related vocabularies
- Data Catalog Vocabulary (DCAT)
 - Asset Description Metadata Schema (ADMS)
 - Vocabulary of Interlinked Datasets (VoID)
 - vocabulary of the International Open Government Search service (derived from the above)
 - RDF Data Cube Vocabulary
 
Mappings
This table maps Datasets extension types and properties (including supporting schema.org vocabulary) to and from their approximate equivalents in DCAT and VoID. A further relevant vocabulary, ADMS, is a profile (specialization) of DCAT. Only the terms that are specialized in ADMS are listed separately, otherwise it uses DCAT.
| Datasets extension | DCAT | ADMS | VoID | 
|---|---|---|---|
| sdo:DataCatalog | dcat:Catalog | adms:AssetRepository (sub class of dcat:Catalog) | |
| sdo:DataDownload | dcat:Distribution | adms:AssetDistribution (sub class of dcat:Distribution) | |
| sdo:Dataset | dcat:Dataset | adms:Asset (sub class of dcat:Dataset) | void:Dataset | 
| sdo:catalog | |||
| sdo:dataset | dcat:dataset | ||
| sdo:distribution | dcat:distribution | void:dataDump | |
| sdo:spatial | dcterms:spatial | ||
| sdo:temporal | dcterms:temporal | ||
| sdo:about | dcat:theme | ||
| sdo:contentSize | dcat:byteSize | ||
| sdo:contentUrl | dcat:downloadURL | ||
| sdo:dateModified | dcterms:modified | ||
| sdo:datePublished | dcterms:issued | ||
| sdo:description | dcterms:description | ||
| sdo:encodingFormat | dcterms:format | ||
| sdo:inLanguage | dcterms:language | ||
| sdo:keywords | dcat:keyword | ||
| sdo:name | dcterms:title | ||
| sdo:Organization | foaf:Organization | ||
| sdo:Person | foaf:Person | ||
| sdo:publisher | dcterms:publisher | ||
| sdo:Thing | skos:Concept | ||
| sdo:url | foaf:homepage | ||
| sdo:version | owl:versionInfo (also adms:versionNotes) | ||
| dcat:CatalogRecord | |||
| dcat:Distribution | |||
| dcat:byteSize | |||
| dcat:downloadURL | |||
| dcat:landingPage | |||
| dcat:mediaType | |||
| dcat:record | |||
| dcat:themeTaxonomy | |||
| dcterms:accrualPeriodicity | |||
| dcterms:identifier | |||
| dcterms:license | |||
| foaf:primaryTopic | |||
| skos:ConceptScheme | |||
| adms:Identifier | |||
| adms:identifier | |||
| adms:includedAsset | |||
| adms:interoperabilityLevel | |||
| adms:last | |||
| adms:next | |||
| adms:prev | |||
| adms:representationTechnique | |||
| adms:schemeAgency | |||
| adms:status | |||
| adms:versionNotes | |||
| void:class | |||
| void:classes | |||
| void:classPartition | |||
| void:DatasetDescription | |||
| void:distinctObjects | |||
| void:distinctSubjects | |||
| void:documents | |||
| void:entities | |||
| adms:sample | void:exampleResource | ||
| void:feature | |||
| void:inDataset | |||
| void:linkPredicate | |||
| void:Linkset | |||
| void:objectsTarget | |||
| void:openSearchDescription | |||
| void:properties | |||
| void:property | |||
| void:propertyPartition | |||
| void:rootResource | |||
| void:sparqlEndpoint | |||
| void:subjectsTarget | |||
| void:subset | |||
| void:target | |||
| void:TechnicalFeature | |||
| void:triples | |||
| void:uriLookupEndpoint | |||
| void:uriRegexPattern | |||
| void:uriSpace | |||
| adms:supportedSchema | void:vocabulary |