W3C

Data Catalog Vocabulary (DCAT)

W3C Working Draft 12 March 2013

This version:
http://www.w3.org/TR/2013/WD-vocab-dcat-20130312/
Latest published version:
http://www.w3.org/TR/vocab-dcat/
Latest editor's draft:
http://dvcs.w3.org/hg/gld/raw-file/default/dcat/index.html
Previous version:
http://www.w3.org/TR/2012/WD-vocab-dcat-20120405/
Editors:
Fadi Maali, DERI, NUI Galway
John Erickson, Tetherless World Constellation (RPI)
Phil Archer, W3C/ERCIM

Abstract

DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. This document defines the schema and provides examples for its use.

By using DCAT to describe datasets in data catalogs, publishers increase discoverability and enable applications easily to consume metadata from multiple catalogs. It further enables decentralized publishing of catalogs and facilitates federated dataset search across sites. Aggregated DCAT metadata can serve as a manifest file to facilitate digital preservation.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

The original dcat vocabulary was developed at DERI; it was further developed by the eGov Interest Group, before being brought onto the Recommendation Track by Government Linked Data (GLD) Working Group.

This document was published by the Government Linked Data Working Group as a Last Call Working Draft. This document is intended to become a W3C Recommendation. If you wish to make comments regarding this document, please send them to public-gld-comments@w3.org (subscribe, archives). The Last Call period ends 08 April 2013. All comments are welcome.

Publication as a Last Call Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This is a Last Call Working Draft and thus the Working Group has determined that this document has satisfied the relevant technical requirements and is sufficiently stable to advance through the Technical Recommendation process.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

1. Introduction

This section is non-normative.

Data can come in many formats, ranging from spreadsheets over XML and RDF to various speciality formats. DCAT does not make any assumptions about the format of the datasets described in a catalog. Other, complementary vocabularies may be used together with DCAT to provide more detailed format-specific information. For example, properties from the VoID vocabulary [VOID] can be used to express various statistics about a DCAT-described dataset if that dataset is in RDF format.

This document does not prescribe any particular method of deploying data expressed in DCAT. DCAT is applicable in many contexts including RDF accessible via SPARQL endpoints, embedded in HTML pages as RDFa, or serialized as e.g. RDF/XML or Turtle. The examples in this document use Turtle simply because of Turtle's readability.

2. Namespaces

The namespace for DCAT is http://www.w3.org/ns/dcat#. However, it should be noted that DCAT makes extensive use of terms from other vocabularies, in particular Dublin Core. DCAT itself defines a minimal set of classes and properties of its own. A full set of namespaces and prefixes used in this document is shown in the table below.

PrefixNamespace
dcathttp://www.w3.org/ns/dcat#
dcthttp://purl.org/dc/terms/
dctypehttp://purl.org/dc/dcmitype/
foafhttp://xmlns.com/foaf/0.1/
rdfhttp://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfshttp://www.w3.org/2000/01/rdf-schema#
skoshttp://www.w3.org/2004/02/skos/core#
xsdhttp://www.w3.org/2001/XMLSchema#

3. Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key words must, must not, required, should, should not, recommended, may, and optional in this specification are to be interpreted as described in [RFC2119].

A data catalog conforms to DCAT if: A DCAT profile is a specification for data catalogs that adds additional constraints to DCAT. A data catalog that conforms to the profile also conforms to DCAT. Additional constraints in a profile may include:

4. Vocabulary Overview

This section is non-normative.

DCAT is an RDF vocabulary well-suited to representing government data catalogs such as Data.gov and data.gov.uk. DCAT defines three main classes:

Another important class in DCAT is dcat:CatalogRecord which describes a dataset entry in the catalog. Notice that while dcat:Dataset represents the dataset itself, dcat:CatalogRecord represents the record that describes a dataset in the catalog. The use of the CatalogRecord is considered optional. It is used to capture provenance information about dataset entries in a catalog. If this distinction is not necessary then CatalogRecord can be safely ignored.

UML model of DCAT classes and properties

All RDF examples in this document are written in Turtle syntax [TURTLE-TR].

4.1 Basic Example

This section is non-normative.

This example provides a quick overview of how DCAT might be used to represent a government catalog and its datasets.

First, the catalog description:

   :catalog
       a dcat:Catalog ;
       dct:title "Imaginary Catalog" ;
       rdfs:label "Imaginary Catalog" ;
       foaf:homepage <http://example.org/catalog> ;
       dct:publisher :transparency-office ;
       dct:language <http://id.loc.gov/vocabulary/iso639-1/en>  ;
       dcat:dataset :dataset-001  , :dataset-002 , :dataset-003 ; 
       .

The publisher of the catalog has the relative URI :transparency-office. Further description of the publisher can be provided as in the following example:

   :transparency-office
       a foaf:Organization ;
       rdfs:label "Transparency Office" ;
       .

The catalog lists each of its datasets via dcat:dataset property. In the example above, an example dataset was mentioned with the relative URI :dataset-001. A possible description of it using DCAT is shown below:

   :dataset-001
       a dcat:Dataset ;
       dct:title "Imaginary dataset" ;
       dcat:keyword "accountability","transparency" ,"payments" ;
       dct:issued "2011-12-05"^^xsd:date ;
       dct:modified "2011-12-05"^^xsd:date ;
       dct:publisher :finance-ministry ;
       dct:language <http://id.loc.gov/vocabulary/iso639-1/en>  ;
       dcat:distribution :dataset-001-csv ;
       .

The dataset distribution :dataset-001-csv can be downloaded as a 5Kb CSV file. This information is represented via an RDF resource of type dcat:Distribution.

   :dataset-001-csv
       a dcat:Distribution ;
       dcat:downloadURL <http://www.example.org/files/001.csv> ;
       dct:title "CSV distribution of imaginary dataset 001" ;
       dcat:mediaType "text/csv" ;
       dcat:byteSize "5120"^^xsd:decimal ;
       .

4.2 Classifying datasets

The catalog classifies its datasets according to a set of domains represented by the relative URI :themes. SKOS can be used to describe the domains used:

   :catalog dcat:themeTaxonomy :themes .
   :themes
       a skos:ConceptScheme ;
       skos:prefLabel "A set of domains to classify documents" ;
       .
   :dataset-001 dcat:theme :accountability  .

Notice that this dataset is classified under the domain represented by the relative URI :accountability. This should be part of the concepts scheme identified by the URI :themes that was used to describe the catalog domains. An example SKOS description

   :accountability 
       a skos:Concept ;
       skos:inScheme :themes ;
       skos:prefLabel "Accountability" ;
       .

4.3 Describing catalog records metadata

If the catalog publisher decides to keep metadata describing its records (i.e. the records containing metadata describing the datasets), dcat:CatalogRecord can be used. For example, while  :dataset-001 was issued on 2011-12-05, its description on Imaginary Catalog was added on 2011-12-11. This can be represented by DCAT as in the following:

   :catalog  dcat:record :record-001  .
   :record-001
       a dcat:CatalogRecord ;
       foaf:primaryTopic :dataset-001 ;
       dct:issued "2011-12-11"^^xsd:date ;
       .  

4.4 A dataset available only behind some Web page

:dataset-002 is available as a CSV file. However :dataset-002 can only be obtained through some Web page where the user needs to click some links, provide some information and check some boxes before accessing the data

   :dataset-002 
       a dcat:Dataset ;
       dcat:landingPage <http://example.org/dataset-002.html> ;
       dcat:distribution :dataset-002-csv ;
       .
   :dataset-002-csv 
       a dcat:Distribution ;
       dcat:accessURL <http://example.org/dataset-002.html> ;
       dcat:mediaType "text/csv" ;
       .
Notice the use of dcat:landingPage and the definition of the dcat:Distribution instance.

4.5 A dataset available as download and behind some Web page

On the other hand, :dataset-003 can be obtained through some landing page but also can be downloaded from a known URL.

   :dataset-003 
       a dcat:Dataset ;
       dcat:landingPage <http://example.org/dataset-003.html> ;
       dcat:distribution :dataset-003-csv ;
       .
   :dataset-003-csv 
       a dcat:Distribution ;
       dcat:downloadURL <http://example.org/dataset-003.csv> .
       dcat:mediaType "text/csv" ;
       .
Notice that we used dcat:downloadURL with the downloadable distribution and that the other distribution through the landing page does not have to be defined as a spearate dcat:Distribution instance.

5. Vocabulary specification

5.1 Class: Catalog

A data catalog is a curated collection of metadata about datasets.

RDF class:dcat:Catalog
Usage note:Typically, a web-based data catalog is represented as a single instance of this class.
See also: Catalog record, Dataset

Property: title

A name given to the catalog.

RDF Property:dct:title
Range:rdfs:Literal

Property: description

free-text account of the catalog.

RDF Property:dct:description
Range:rdfs:Literal

Property: release date

Date of formal issuance (e.g., publication) of the catalog.

RDF Property:dct:issued
Range:rdfs:Literal typed as xsd:date. The date is encoded as a literal in "YYYY-MM-DD" form (ISO 8601 Date and Time Formats). If the specific day or month are not known, then 01 should be specified.
See also: dataset release date, catalog record listing date and distribution release date

Property: update/modification date

Most recent date on which the catalog was changed, updated or modified.

RDF Property:dct:modified
Range:rdfs:Literal typed as xsd:date. The date is encoded as a literal in "YYYY-MM-DD" form (ISO 8601 Date and Time Formats). If the specific day or month are not known, then 01 should be specified.
See also: dataset modification date, catalog record modification date and distribution modification date

Property: language

The language of the catalog. This refers to the language used in the textual metadata describing titles, descriptions, etc. of the datasets in the catalog.

RDF Property:dct:language
Range: rdfs:Resource
Resources defined by the Library of Congress (1, 2) should be used.
If a ISO 639-1 (two-letter) code is defined for language, then its corresponding IRI should be used; if no ISO 639-1 code is defined, then IRI corresponding to the ISO 639-2 (three-letter) code should be used.
Usage note:Multiple values can be used. The publisher might also choose to describe the language on the dataset level (see dataset language).

Property: homepage

The homepage of the catalog.

RDF Property:foaf:homepage
Range:foaf:Document
Usage note:foaf:homepage is an inverse functional property (IFP) which means that it should be unique and precisely identify the catalog. This allows smushing various descriptions of the catalog when different URIs are used.

Property: publisher

The entity responsible for making the catalog online.

RDF Property:dct:publisher
Usage note:Resources of type foaf:Agent are recommended as values for this property.
See also:Class: Organization/Person

Property: spatial/geographic

The geographical area covered by the catalog.

RDF Property:dct:spatial
Range:dct:Location

Property: themes

The knowledge organization system (KOS) used to classify catalog's datasets.

RDF Property:dcat:themeTaxonomy
Range:skos:ConceptScheme

Property: license

This describes the license under which the catalog can be used/reused and not the datasets. Even if the license of the catalog applies to all of its datasets and distributions, it should be replicated on each distribution.

RDF Property:dct:license
Range:dctype:LicenseDocument
See also:distribution license

Property: dataset

A dataset that is part of the catalog.

RDF Property:dcat:dataset
Domain:dcat:Catalog
Range:dcat:Dataset

Property: catalog record

A catalog record that is part of the catalog.

RDF Property:dcat:record
Domain:dcat:Catalog
Range:dcat:CatalogRecord

5.2 Class: Catalog record

A record in a data catalog, describing a single dataset.

RDF Class:dcat:CatalogRecord
Usage noteThis class is optional and not all catalogs will use it. It exists for catalogs where a distinction is made between metadata about a dataset and metadata about the dataset's entry in the catalog. For example, the publication date property of the dataset reflects the date when the information was originally made available by the publishing agency, while the publication date of the catalog record is the date when the dataset was added to the catalog. In cases where both dates differ, or where only the latter is known, the publication date should only be specified for the catalog record.
See alsoDataset

If a catalog is represented as an RDF Dataset with named graphs (as defined in [SPARQL-QUERY-11]), then it is appropriate to place the description of each dataset (consisting of all RDF triples that mention the dcat:Dataset, dcat:CatalogRecord, and any of its dcat:Distributions) into a separate named graph. The name of that graph should be the IRI of the catalog record.

Property: title

A name given to the record.

RDF Property:dct:title
Range:rdfs:Literal

Property: description

free-text account of the record.

RDF Property:dct:description
Range:rdfs:Literal

Property: listing date

The date of listing the corresponding dataset in the catalog.

RDF Property:dct:issued
Range:rdfs:Literal typed as xsd:date. The date is encoded as a literal in "YYYY-MM-DD" form (ISO 8601 Date and Time Formats). If the specific day or month are not known, then 01 should be specified.
Usage note:This indicates the date of listing the dataset in the catalog and not the publication date of the dataset itself.
See also: dataset release date

Property: update/modification date

Most recent date on which the catalog entry was changed, updated or modified.

RDF Property:dct:modified
Range:rdfs:Literal typed as xsd:date. The date is encoded as a literal in "YYYY-MM-DD" form (ISO 8601 Date and Time Formats). If the specific day or month are not known, then 01 should be specified.
Usage note:This indicates the date of last change of a catalog entry, i.e. the catalog metadata description of the dataset, and not the date of the dataset itself.
See also: dataset modification date

Property: primary topic

Links the catalog record to the dcat:Dataset resource described in the record.

RDF Property:foaf:primaryTopic
Usage note:foaf:primaryTopic property is functional: each catalog record can have at most one primary topic i.e. describes one dataset.

5.3 Class: Dataset

A collection of data, published or curated by a single source, and available for access or download in one or more formats.

RDF Class:dcat:Dataset
Sub class of:dctype:Dataset
Usage note:This class represents the actual dataset as published by the dataset publisher. In cases where a distinction between the actual dataset and its entry in the catalog is necessary (because metadata such as modification date and maintainer might differ), the catalog record class can be used for the latter.
See also:Catalog record

Property: title

A name given to the dataset.

RDF Property:dct:title
Range:rdfs:Literal

Property: description

free-text account of the dataset.

RDF Property:dct:description
Range:rdfs:Literal

Property: release date

Date of formal issuance (e.g., publication) of the dataset.

RDF Property:dct:issued
Range:rdfs:Literal typed as xsd:date. The date is encoded as a literal in "YYYY-MM-DD" form (ISO 8601 Date and Time Formats). If the specific day or month are not known, then 01 should be specified.
Usage note:This property should be set using the first known date of issuance.

Property: update/modification date

Most recent date on which the dataset was changed, updated or modified.

RDF Property:dct:modified
Range:rdfs:Literal typed as xsd:date. The date is encoded as a literal in "YYYY-MM-DD" form (ISO 8601 Date and Time Formats). If the specific day or month are not known, then 01 should be specified.
Usage note:The value of this property indicates a change to the actual dataset, not a change to the catalog record. An absent value may indicate that the dataset has never changed after its initial publication, or that the date of last modification is not known, or that the dataset is continuously updated.
See also:frequency

Property: language

The language of the dataset.

RDF Property:dct:language
Range:rdfs:Resource
Resources defined by the Library of Congress (1, 2) should be used.
If a ISO 639-1 (two-letter) code is defined for language, then its corresponding IRI should be used; if no ISO 639-1 code is defined, then IRI corresponding to the ISO 639-2 (three-letter) code should be used.
Usage note:This overrides the value of the catalog language in case of conflict.

Property: publisher

An entity responsible for making the dataset available.

RDF Property:dct:publisher
Usage note:Resources of type foaf:Agent are recommended as values for this property.
See also:Class: Organization/Person

Property: frequency

The frequency at which dataset is published.

RDF Property:dct:accrualPeriodicity
Range:dct:Frequency (A rate at which something recurs)

Property: identifier

A unique identifier of the dataset.

RDF Property:dct:identifier
Range:rdfs:Literal
Usage note:The identifier might be used as part of the URI of the dataset, but still having it represented explicitly is useful.

Property: spatial/geographical coverage

Spatial coverage of the dataset.

RDF Property:dct:spatial
Range:dct:Location (A spatial region or named place)

Property: temporal coverage

The temporal period that the dataset covers.

RDF Property:dct:temporal
Range:dct:PeriodOfTime (An interval of time that is named or defined by its start and end dates)

Property: theme/category

The main category of the dataset. A dataset can have multiple themes.

RDF Property:dcat:theme
Sub property of:dct:subject
Range:skos:Concept
Usage note:The set of skos:Concepts used to categorize the datasets are organized in a skos:ConceptScheme describing all the categories and their relations in the catalog.
See also:catalog themes taxonomy

Property: keyword/tag

A keyword or tag describing the dataset.

RDF Property:dcat:keyword
Range:rdfs:Literal

Property: dataset distribution

Connects a dataset to its available distributions.

RDF Property:dcat:distribution
Domain:dcat:Dataset
Range:dcat:Distribution

Property: landing page

A Web page that can be navigated to in a Web browser to gain access to the dataset, its distributions and/or additional information.

RDF Property:dcat:landingPage
Sub property of:foaf:page
Domain:dcat:Dataset
Range:foaf:Document
Usage note: If the distribution(s) are accessible only through a landing page (i.e. direct download URLs are not known), then the landing page link should be duplicated as accessURL on a distribution. (see example 4.4)

5.4 Class: Distribution

Represents a specific available form of a dataset. Each dataset might be available in different forms, these forms might represent different formats of the dataset or different endpoints. Examples of distributions include a downloadable CSV file, an API or an RSS feed

RDF class:dcat:Distribution
Usage note:This represents a general availability of a dataset it implies no information about the actual access method of the data, i.e. whether it is a direct download, API, or some through Web page. The use of dcat:downloadURL property indicates directly downloadable distributions.

Property: title

A name given to the distribution.

RDF Property:dct:title
Range:rdfs:Literal

Property: description

free-text account of the distribution.

RDF Property:dct:description
Range:rdfs:Literal

Property: release date

Date of formal issuance (e.g., publication) of the distribution.

RDF Property:dct:issued
Range:rdfs:Literal typed as xsd:date. The date is encoded as a literal in "YYYY-MM-DD" form (ISO 8601 Date and Time Formats). If the specific day or month are not known, then 01 should be specified.
Usage note:This property should be set using the first known date of issuance.
See also: dataset release date

Property: update/modification date

Most recent date on which the distribution was changed, updated or modified.

RDF Property:dct:modified
Range:rdfs:Literal typed as xsd:date. The date is encoded as a literal in "YYYY-MM-DD" form (ISO 8601 Date and Time Formats). If the specific day or month are not known, then 01 should be specified.
See also: dataset modification date

Property: license

The license under which the distribution is made available.

RDF Property:dct:license
Range:dctype:LicenseDocument

Property: access URL

Could be any kind of URL that gives access to a distribution of the dataset. E.g. landing page, download, feed URL, SPARQL endpoint. Use when your catalog does not have information on which it is or when it is definitely not a download.

RDF Property:dcat:accessURL
Range:rdfs:Resource
Usage note:
  • the value is a URL.
  • If the distribution(s) are accessible only through a landing page (i.e. direct download URLs are not known), then the landing page link should be duplicated as accessURL on a distribution. (see example 4.4)
See alsodistribution download URL

Property: download URL

This is a direct link to a downloadable file in a given format. E.g. CSV file or RDF file. The format is described by the distribution's dc:format and/or dcat:mediaType

RDF Property:dcat:downloadURL
Range:rdfs:Resource
Usage note:the value is a URL.
See alsodistribution access URL

Property: byteSize

The size of a distribution in bytes.

RDF Property:dcat:byteSize
Range:rdfs:Literal typed as xsd:decimal.
Usage note:The size in bytes can be approximated when the precise size is not known.

Property: media type

The media type of the distribution as defined by IANA.

RDF Property:dcat:mediaType
Sub property of:dct:format
Range:dct:MediaTypeOrExtent
Usage note:This property should be used when the media type of the distribution is defined in IANA, otherwise dct:format may be used with different values.
See also: format

Property: format

The file format of the distribution.

RDF Property:dct:format
Range:dct:MediaTypeOrExtent
Usage note: dcat:mediaType should be used if the type of the distribution is defined by IANA.

5.5 Class: Category and category scheme

The knowledge organization system (KOS) used to represent themes/categories of datasets in the catalog.

RDF Classes:skos:ConceptScheme, skos:Concept
Usage note:It is necessary to use either skos:inScheme or skos:topConceptOf on every skos:Concept otherwise it's not clear which concept scheme they belong to.
See also:catalog themes, dataset theme

5.6 Class: Organization/Person

RDF Classes:foaf:Person for people and foaf:Organization for government agencies or other entities.
Usage note:FOAF provides sufficient properties to describe these entities.

A. Acknowledgements

This document contains a significant contribution from Ricahrd Cyganiak. Ricahrd Cyganiak is one of the initiators of the DCAT work and significantly contributed to the work on this specification as it made its way through the W3C process.

The editors would like to thank Vassilios Peristeras for his comments and support for the original DCAT work. We would also like to thank Rufus Pollock for his significant input and comments.

This document has benefited from inputs from many members of the Government Linked Data Working Group. Specific thanks are due to Ghislain Atemezing, Martin Alvarez and Makx Dekkers.

B. References

B.1 Normative references

[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Internet RFC 2119. URL: http://www.ietf.org/rfc/rfc2119.txt

B.2 Informative references

[SPARQL-QUERY-11]
Steve Harris; Andy Seaborne. SPARQL 1.1 Query Language 8 November 2012. W3C Proposed Recommendation. URL: http://www.w3.org/TR/2012/PR-sparql11-query-20121108/
[TURTLE-TR]
Eric Prud'hommeaux, Gavin Carothers. Turtle: Terse RDF Triple Language. 19 February 2013. W3C Candidate Recommendation. URL: http://www.w3.org/TR/2013/CR-turtle-20130219/
[VOID]
Keith Alexander; Richard Cyganiak; Michael Hausenblas; Jun Zhao. Describing Linked Datasets with the VoID Vocabulary 03 March 2011. Interest Group Note. URL: http://www.w3.org/TR/void/