WebSchemas/Datasets

From W3C Wiki


This is an archived WebSchemas proposal Datasets for schema.org. See Proposals listing for more. Note: active schema.org development is now based at github




Overview

This page discusses a proposal extending schema.org for describing datasets and data catalogs. For additional information, see this demo page. Comments on this proposal are welcome.

Status

We have a detailed proposal here, and something in this direction is a strong candidate for official addition to schema.org.

The natural next steps before finalizing the addition are:

  • some indication from potential publishers that there is willingness to adopt, and that the design is a good fit to existing sites/content
  • some positive indication from relevant experts (eg. open gov data publishers; from W3C's Government Linked Data group or others) that the schema has had some careful review. Comments can be here in the Wiki, to the public-vocabs@w3.org list (see details at WebSchemas page), or passed along via mail to danbri@google.com.

Vocabulary

The Datasets extension introduces three new types, with associated properties:

  • Thing > CreativeWork > Dataset: a body of structured information describing some topic(s) of interest
    • catalog(DataCatalog): the data catalog which contains a dataset
    • distribution(DataDownload): a downloadable form of this dataset, at a specific location, in a specific format
    • spatial(Place): the range of spatial applicability of a dataset, e.g. for a dataset of New York weather, the state of New York
    • temporal(DateTime): the range of temporal applicability of a dataset, e.g. for a 2011 census dataset, the year 2011 (in ISO 8601 time interval format)
  • Thing > CreativeWork > DataCatalog: a collection of datasets
    • dataset(Dataset): a dataset contained in a catalog
  • Thing > CreativeWork > MediaObject > DataDownload: a dataset in downloadable form

Example

Microdata markup

<div itemscope="itemscope" itemtype="http://schema.org/Dataset" itemid="http://example.org/datasets/seismic-hazard-zones">

    <meta itemprop="url" content="http://www.example.org/story.php?title=seismic-hazard-zones"/>

    <span itemprop="name">
        <a href="http://www.example.org/story.php?title=seismic-hazard-zones">
            <b>Seismic Hazard Zones</b>
        </a>
    </span>

    (<span itemprop="temporal">2011</span>, version "<span itemprop="version">2011-Sep-13</span>")

    <div itemprop="description">This is a dataset of liquefaction and landslide zones in the state of California.</div>

    <div itemprop="spatial" itemscope="itemscope" itemtype="http://schema.org/Country" itemid="http://dbpedia.org/resource/United_States">
        <i>Country:</i>
        <a href="http://en.wikipedia.org/wiki/United_States">
            <span itemprop="name">United States</span>
        </a>
    </div>

    <div itemprop="publisher" itemscope="itemscope" itemtype="http://schema.org/Organization"><i>Publisher:</i>
        <span itemprop="name">Department of Technology</span>
        <span itemprop="email">dot at example dot org</span>
    </div>

    <div><i>Topics:</i>
        <span itemprop="about" itemscope="itemscope" itemtype="http://schema.org/Thing" itemid="http://dbpedia.org/resource/Seismic_hazard">
            <a href="http://en.wikipedia.org/wiki/Seismic_hazard"><span itemprop="name">seismic hazard</span></a>
        </span>
    </div>

    <div><i>Keywords:</i>
        <span itemprop="keywords"><span itemscope="itemscope" itemtype="http://schema.org/Text">layers</span></span>,
        <span itemprop="keywords"><span itemscope="itemscope" itemtype="http://schema.org/Text">geography</span></span>,
        <span itemprop="keywords"><span itemscope="itemscope" itemtype="http://schema.org/Text">maps</span></span>,
        <span itemprop="keywords"><span itemscope="itemscope" itemtype="http://schema.org/Text">gis</span></span>
    </div>

    <div itemprop="license" itemscope="itemscope" itemtype="http://schema.org/Webpage"><i>License:</i>
        <a href="http://opendatacommons.org/licenses/pddl/1.0/">
            <span itemprop="name">ODC Public Domain Dedication and Licence (PDDL)</span>
        </a>
        <meta itemprop="url" content="http://opendatacommons.org/licenses/pddl/1.0/" />
    </div>

    <div itemprop="distribution" itemscope="itemscope" itemtype="http://schema.org/DataDownload"><i>Download:</i>
        <a href="http://example.org/downloads/seismic-hazard-zones.nt.gz">
            <meta itemprop="encodingFormat" content="text/plain" />
            <meta itemprop="contentUrl" content="http://example.org/downloads/seismic-hazard-zones.nt.gz" />
            <meta itemprop="inLanguage" content="en" />
            <span itemprop="description">compressed N-Triples dump</span>,
            <span itemprop="datePublished" content="2011-08-12">August 12, 2011</meta>
            <span itemprop="contentSize" content="13.9">(13.97MB)</span>
	</a>
    </div>

</div>

embedded data

Item 
type: http://schema.org/dataset	
name: Seismic Hazard Zones
temporal: 2011
version: 2011-Sep-13
url: http://www.example.org/story.php?title=seismic-hazard-zones
description: This is a dataset of liquefaction and landslide zones in the state of California.
spatial: Item 1
publisher: Item 2
about: Item 3
keywords: layers
keywords: geography
keywords: maps
keywords: gis
license: Item 4
distribution: Item 5

Item 1
type: http://schema.org/country
name: United States

Item 2
type: http://schema.org/organization
name: Department of Technology
email: dot at example dot org

Item 3
type: http://schema.org/thing
name: seismic hazard

Item 4
type: http://schema.org/webpage
name: ODC Public Domain Dedication and Licence (PDDL)
url: http://opendatacommons.org/licenses/pddl/1.0/

Item 5
type: http://schema.org/datadownload
encodingformat: text/plain
contenturl: http://example.org/downloads/seismic-hazard-zones.nt.gz
inlanguage: en
description: compressed N-Triples dump
datepublished: 2011-08-12
contentsize: 13.9

equivalent RDFa 1.1 markup

<div vocab="http://schema.org/" prefix="dcat: http://www.w3.org/ns/dcat#" typeof="Dataset dcat:Dataset"
     about="http://example.org/datasets/seismic-hazard-zones">
 
    <meta property="url" content="http://www.example.org/story.php?title=seismic-hazard-zones"/>

    <span property="name"><b><a href="http://www.example.org/story.php?title=seismic-hazard-zones">Seismic Hazard Zones</a></b></span>

    (<span property="temporal">2011</span>, version "2011-Sep-13")

    <div property="description">This is a dataset of liquefaction and landslide zones in the state of California.</div>

    <div rel="spatial" resource="http://dbpedia.org/resource/United_States"><i>Country:</i>
        <a href="http://dbpedia.org/resource/United_States">
            <span about="http://dbpedia.org/resource/United_States" typeof="Country">
                <span property="name">United States</span>
            </span>
        </a>
    </div>

    <div rel="publisher"><i>Publisher:</i>
        <span typeof="Organization">
            <span property="name">Department of Technology</span>
	    (<span property="email">dot at example dot org</span>)
        </span>
    </div>

    <div><i>Topics:</i>
        <span rel="about" resource="http://dbpedia.org/resource/Seismic_hazard">
            <a href="http://en.wikipedia.org/wiki/Seismic_hazard">
                <span property="name">seismic hazard</span></a>
        </span>
    </div>

    <div><i>Keywords:</i>
        <span property="keywords">layers</span>,
        <span property="keywords">geography</span>,
        <span property="keywords">maps</span>,
        <span property="keywords">gis</span>
    </div>

    <div rel="license"><i>License:</i>
    	<a href="http://opendatacommons.org/licenses/pddl/1.0/">
	<span typeof="Webpage">
	    <span property="name">ODC Public Domain Dedication and Licence (PDDL)</span>
	    <meta property="url" content="http://opendatacommons.org/licenses/pddl/1.0/"/>
	</span>
	</a>
    </div>
    
    <div rel="distribution"><i>Download:</i>
        <a href="http://example.org/downloads/seismic-hazard-zones.nt.gz">
        <span typeof="DataDownload">
            <meta property="encodingFormat" content="text/plain" />
            <meta property="contentUrl" content="http://example.org/downloads/seismic-hazard-zones.nt.gz" />
            <meta property="inLanguage" content="en" />
            <span property="description">compressed N-Triples dump</span>,
            <meta property="datePublished" content="2011-08-12">August 12, 2011</meta>
            <meta property="contentSize" content="13.9">(13.97MB)</span>
	</span>
	</a>
    </div>

</div>

RDFa 1.1 Lite markup

<div vocab="http://schema.org/" prefix="dcat: http://www.w3.org/ns/dcat#" typeof="Dataset dcat:Dataset"
     about="http://example.org/datasets/seismic-hazard-zones">

    <meta property="url" content="http://www.example.org/story.php?title=seismic-hazard-zones"/>

    <span property="name"><b><a href="http://www.example.org/story.php?title=seismic-hazard-zones">Seismic Hazard Zones</a></b></span>

    (<span property="temporal">2011</span>, version "2011-Sep-13")

    <div property="description">This is a dataset of liquefaction and landslide zones in the state of California.</div>

    <div><i>Keywords:</i>
        <span property="keywords">layers</span>,
        <span property="keywords">geography</span>,
        <span property="keywords">maps</span>,
        <span property="keywords">gis</span>
    </div>

</div>

Related vocabularies

Mappings

This table maps Datasets extension types and properties (including supporting schema.org vocabulary) to and from their approximate equivalents in DCAT and VoID. A further relevant vocabulary, ADMS, is a profile (specialization) of DCAT. Only the terms that are specialized in ADMS are listed separately, otherwise it uses DCAT.

Datasets extension DCAT ADMS VoID
sdo:DataCatalog dcat:Catalog adms:AssetRepository (sub class of dcat:Catalog)
sdo:DataDownload dcat:Distribution adms:AssetDistribution (sub class of dcat:Distribution)
sdo:Dataset dcat:Dataset adms:Asset (sub class of dcat:Dataset) void:Dataset
sdo:catalog
sdo:dataset dcat:dataset
sdo:distribution dcat:distribution void:dataDump
sdo:spatial dcterms:spatial
sdo:temporal dcterms:temporal
sdo:about dcat:theme
sdo:contentSize dcat:byteSize
sdo:contentUrl dcat:downloadURL
sdo:dateModified dcterms:modified
sdo:datePublished dcterms:issued
sdo:description dcterms:description
sdo:encodingFormat dcterms:format
sdo:inLanguage dcterms:language
sdo:keywords dcat:keyword
sdo:name dcterms:title
sdo:Organization foaf:Organization
sdo:Person foaf:Person
sdo:publisher dcterms:publisher
sdo:Thing skos:Concept
sdo:url foaf:homepage
sdo:version owl:versionInfo (also adms:versionNotes)
dcat:CatalogRecord
dcat:Distribution
dcat:byteSize
dcat:downloadURL
dcat:landingPage
dcat:mediaType
dcat:record
dcat:themeTaxonomy
dcterms:accrualPeriodicity
dcterms:identifier
dcterms:license
foaf:primaryTopic
skos:ConceptScheme
adms:Identifier
adms:identifier
adms:includedAsset
adms:interoperabilityLevel
adms:last
adms:next
adms:prev
adms:representationTechnique
adms:schemeAgency
adms:status
adms:versionNotes
void:class
void:classes
void:classPartition
void:DatasetDescription
void:distinctObjects
void:distinctSubjects
void:documents
void:entities
adms:sample void:exampleResource
void:feature
void:inDataset
void:linkPredicate
void:Linkset
void:objectsTarget
void:openSearchDescription
void:properties
void:property
void:propertyPartition
void:rootResource
void:sparqlEndpoint
void:subjectsTarget
void:subset
void:target
void:TechnicalFeature
void:triples
void:uriLookupEndpoint
void:uriRegexPattern
void:uriSpace
adms:supportedSchema void:vocabulary

See also