WebSchemas/Datasets

From W3C Wiki
< WebSchemas
Revision as of 22:25, 11 June 2012 by Danbri (Talk | contribs)

Jump to: navigation, search


This is a WebSchemas proposal Datasets for schema.org. See Proposals listing for more. Status: Discussion




Overview

This page discusses a proposal extending schema.org for describing datasets and data catalogs. For additional information, see this demo page. Comments on this proposal are welcome.

Status

We have a detailed proposal here, and something in this direction is a strong candidate for official addition to schema.org.

The natural next steps before finalizing the addition are:

  • some indication from potential publishers that there is willingness to adopt, and that the design is a good fit to existing sites/content
  • some positive indication from relevant experts (eg. open gov data publishers; from W3C's Government Linked Data group or others) that the schema has had some careful review. Comments can be here in the Wiki, to the public-vocabs@w3.org list (see details at WebSchemas/ page), or passed along via mail to danbri@google.com.

Vocabulary

The Datasets extension introduces three new types, with associated properties:

  • Thing > CreativeWork > Dataset: a body of structured information describing some topic(s) of interest
    • catalog(DataCatalog): the data catalog which contains a dataset
    • distribution(DataDownload): a downloadable form of this dataset, at a specific location, in a specific format
    • keyword(Text): a keyword describing a dataset
    • spatial(Place): the range of spatial applicability of a dataset, e.g. for a dataset of New York weather, the state of New York
  • Thing > CreativeWork > DataCatalog: a collection of datasets
    • dataset(Dataset): a dataset contained in a catalog
  • Thing > CreativeWork > MediaObject > DataDownload: a dataset in downloadable form

Example

Microdata markup

<div itemscope="itemscope" itemtype="http://schema.org/Dataset">
    <a href="http://www.datasf.org/story.php?title=seismic-hazard-zones-"><span itemprop="name">
        <b>Seismic Hazard Zones</b>
    </span></a>

    <div><meta itemprop="url" content="http://www.datasf.org/story.php?title=seismic-hazard-zones-"/>
    <span itemprop="description">The dataset represents the Liquefaction and Landslide Zones as determined bt the California Dept. of Conservation Division of Mines and Geology. Liquefaction is the transformation of a confined layer of sandy or silty water-satuated material into a liquid -like state because of earthquake shaking. San Francisco Building Code Section1804.5 requires a geotechnical investigation in seismic hazard zones.</span></div>

    <div><i>Country:</i>
    <a href="http://dbpedia.org/resource/United_States"><span itemprop="spatial" itemscope="itemscope" itemtype="http://schema.org/Country">
            <span itemprop="name">United States</span>
        </span>
    </a></div>

    <div><i>Publisher:</i>
    <span itemprop="publisher" itemscope="itemscope" itemtype="http://schema.org/Organization">
            <span itemprop="name">Department of Technology</span>
        </span>
    </div>

    <i>Categories:</i>
    <span itemprop="keyword"><span itemscope="itemscope" itemtype="http://schema.org/Text">layers </span></span>,
    <span itemprop="keyword"><span itemscope="itemscope" itemtype="http://schema.org/Text">geography</span></span>,
    <span itemprop="keyword"><span itemscope="itemscope" itemtype="http://schema.org/Text">maps </span></span>,
    <span itemprop="keyword"><span itemscope="itemscope" itemtype="http://schema.org/Text">gis </span></span>
</div>

embedded data

Item 
Type: http://schema.org/dataset
name = Seismic Hazard Zones 
url = http://www.datasf.org/story.php?title=seismic-hazard-zones- 
description = The dataset represents the Liquefaction and Landslide Zones as determined bt the California Dept. of Conservation Division of Mines and Geology. Liquefaction is the transformation of a confined... 
spatial = Item( 1 ) 
publisher = Item( 2 ) 
keyword = layers 
keyword = geography 
keyword = maps 
keyword = gis 

Item 1 
Type: http://schema.org/country
name = United States 

Item 2 
Type: http://schema.org/organization
name = Department of Technology 

equivalent RDFa markup

<div about="http://logd.tw.rpi.edu/source/datasf-org/dataset/catalog/datasf.org/version/2011-Jun-07/thing_89" typeof="dcat:Dataset">
    <div><b><a href="http://www.datasf.org/story.php?title=seismic-hazard-zones-"><span about="http://logd.tw.rpi.edu/source/datasf-org/dataset/catalog/datasf.org/version/2011-Jun-07/thing_89"><span property="dcterms:title">
Seismic Hazard Zones
    </span></span></a></b></div>
    <div property="dcterms:description">
The dataset represents the Liquefaction and Landslide Zones as determined bt the California Dept. of Conservation Division of Mines and Geology. Liquefaction is the transformation of a confined layer of sandy or silty water-satuated material into a liquid -like state because of earthquake shaking. San Francisco Building Code Section1804.5 requires a geotechnical investigation in seismic hazard zones.
    </div>
    <div rel="dcterms:spatial" resource="http://dbpedia.org/resource/United_States"><i>Country:</i>
        <a href="http://dbpedia.org/resource/United_States">
            <span about="http://dbpedia.org/resource/United_States" typeof="adms:Country">
                <span property="dcterms:title">United States</span>
            </span>
        </a>
    </div>
    <div rel="dcterms:publisher"><i>Publisher:</i>
        <span typeof="foaf:Organization">
            <span property="dcterms:title">Department of Technology</span>
        </span>
    </div>
    <i>Categories:</i>
    <span property="dcat:keyword">layers </span>,
    <span property="dcat:keyword">geography</span>,
    <span property="dcat:keyword">maps </span>,
    <span property="dcat:keyword">gis </span>
</div>

Related vocabularies

Mappings

This table maps Datasets extension types and properties (including supporting schema.org vocabulary) to and from their approximate equivalents in DCAT, ADMS, and VoID.

Datasets extension DCAT ADMS VoID
ds:DataCatalog dcat:Catalog adms:Repository
ds:DataDownload dcat:Download adms:Release
ds:Dataset dcat:Dataset adms:Asset void:Dataset
ds:catalog
ds:dataset dcat:dataset adms:asset
ds:distribution dcat:distribution adms:release void:dataDump
ds:keyword dcat:keyword
ds:license dcterms:license
ds:spatial dcterms:spatial
sdo:about dcat:theme adms:domain
sdo:contentSize dcat:size
sdo:contentURL dcat:accessURL
sdo:copyrightHolder adms:owner
sdo:Country adms:Country
sdo:dateModified dcterms:modified
sdo:datePublished dcterms:issued
sdo:description dcterms:description
sdo:encodingFormat dcterms:format adms:model
sdo:inLanguage dcterms:language
sdo:name dcterms:title
sdo:Organization foaf:Organization
sdo:Person foaf:Person
sdo:publisher dcterms:publisher
sdo:Thing skos:Concept adms:Domain
sdo:url foaf:homepage
sdo:version adms:releaseReference
dcat:CatalogRecord
dcat:dataDictionary
dcat:dataQuality
dcat:Distribution
dcat:Feed
dcat:granularity
dcat:record
dcat:themeTaxonomy
dcat:WebService
dcterms:accrualPeriodicity
dcterms:identifier
dcterms:references
dcterms:temporal
foaf:primaryTopic
skos:Concept
skos:ConceptScheme
adms:basedOn
adms:containedIn
adms:contains
adms:documentationLang
adms:extSource
adms:followUpTo
adms:License
adms:Model
adms:publicationState
adms:relatedProject
adms:relatedTo
adms:Semantic
adms:semantic
adms:Syntax
adms:syntax
adms:variantOf
void:class
void:classes
void:classPartition
void:DatasetDescription
void:distinctObjects
void:distinctSubjects
void:documents
void:entities
void:exampleResource
void:feature
void:inDataset
void:linkPredicate
void:Linkset
void:objectsTarget
void:openSearchDescription
void:properties
void:property
void:propertyPartition
void:rootResource
void:sparqlEndpoint
void:subjectsTarget
void:subset
void:target
void:TechnicalFeature
void:triples
void:uriLookupEndpoint
void:uriRegexPattern
void:uriSpace
void:vocabulary