W3C

Data Catalog Vocabulary (DCAT)

W3C Working Draft 05 April 2012

This version:
http://www.w3.org/TR/2012/WD-vocab-dcat-20120405/
Latest published version:
http://www.w3.org/TR/vocab-dcat/
Latest editor's draft:
http://dvcs.w3.org/hg/gld/raw-file/default/dcat/index.html
Editors:
Fadi Maali, DERI, NUIG
John Erickson, Tetherless World Constellation (RPI)
Phil Archer, W3C/ERCIM

Abstract

DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. This document defines the schema and provides examples for its use.

By using DCAT to describe datasets in data catalogs, publishers increase discoverability and enable applications easily to consume metadata from multiple catalogs. It further enables decentralized publishing of catalogs and facilitates federated dataset search across sites. Aggregated DCAT metadata can serve as a manifest file to facilitate digital preservation.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This publication signals the move of DCAT onto the W3C Recommendation Track. DCAT was first developed and published by DERI and has seen widespread adoption at the time of this publication. The original vocabulary was further developed by the eGov Interest Group, before being brought onto the Recommendation Track by Government Linked Data (GLD) Working Group.

This document was published by the Government Linked Data (GLD) Working Group as a First Public Working Draft. This document is intended to become a W3C Recommendation. If you wish to make comments regarding this document, please send them to public-gld-comments@w3.org (subscribe, archives). All feedback is welcome.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

1. Introduction

This section is non-normative.

This document does not prescribe any particular method of deploying data expressed in DCAT. DCAT is applicable in many contexts including RDF accessible via SPARQL endpoints, embedded in HTML pages as RDFa, or serialized as e.g. RDF/XML or Turtle. The examples in this document use Turtle simply because of Turtle's readability.

2. Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key words must, must not, required, should, should not, recommended, may, and optional in this specification are to be interpreted as described in [RFC2119].

3. Namespaces

The namespace for DCAT is http://www.w3.org/ns/dcat#. However, it should be noted that DCAT makes extensive use of terms from other vocabularies, in particular Dublin Core. DCAT itself defines a minimal set of classes and properties of its own. A full set of namespaces and prefixes used in this document is shown in the table below.

PrefixNamespace
dcathttp://www.w3.org/ns/dcat#
dctermshttp://purl.org/can never remember
foaf http://xmlns.com/foaf/0.1/

4. Vocabulary Overview

This section is non-normative.

DCAT is an RDF vocabulary well-suited to representing government data catalogs such as Data.gov and data.gov.uk. DCAT defines three main classes:

Another important class in DCAT is dcat:CatalogRecord which describes a dataset entry in the catalog. Notice that while dcat:Dataset represents the dataset itself, dcat:CatalogRecord represents the record that describes a dataset in the catalog. The use of the CatalogRecord is considered optional. It is used to capture provenance information about dataset entries in a catalog. If this distinction is not necessary then CatalogRecord can be safely ignored.

@@Fadi:I will update the figure once the list of properties are listed

UML model of DCAT classes and properties

Example

This example provides a quick overview of how dcat might be used to represent a government catalog and its datasets.

@@@ TODO.jse: Illustrate more clearly how these "examples" might appear "in the wild." Esp. the use of skos:Concept (etc) in RDFa... @@@

First, the catalog description:

   :catalog
       a dcat:Catalog ;
       dct:title "Imaginary catalog" ;
       rdfs:label "Imaginary catalog" ;
       foaf:homepage <http://example.org/catalog> ;
       dct:publisher :transparency-office ;
       dcat:themes :themes ;
       dct:language "en"^^xsd:language ;
       dcat:dataset :dataset/001 ;
       .

The publisher of the catalog has the relative URI :transparency-office. Further description of the publisher can be provided as in the following example:

   :transparency-office
       a foaf:Agent ;
       rdfs:label "Transparency Office" ;
       .

The catalog classify its datasets according to a set of domains represented by the relative URI :themes. SKOS can be used to describe the domains used:

   :themes
       a skos:ConceptScheme ;
       skos:prefLabel "A set of domains to classify documents" ;
   .

The catalog connect to each of its datasets via dcat:dataset. In the example above, an example dataset was mentioned with the relative URI :dataset/001. A possible description of it using dcat is shown below:

   :dataset/001
       a       dcat:Dataset ;
       dct:title "Imaginary dataset" ;
       dcat:keyword "accountability","transparency" ,"payments" ;
       dcat:theme :themes/accountability ;
       dct:issued "2011-12-05"^^xsd:date ;
       dct:updated "2011-12-05"^^xsd:date ;
       dct:publisher :agency/finance-ministry ;
       dct:accrualPeriodicity "every six months" ;
       dct:language "en"^^xsd:language ;
       dcat:Distribution :dataset/001/csv ;
       .

Notice that this dataset is classified under the domain represented by the relative URI :themes/accountability. This should be part of the domains set identified by the URI :themes that was used to describe the catalog domains. An example SKOS description

    :themes/accountability 
        a skos:Concept ;
        skos:inScheme :themes ;
        skos:prefLabel "Accountability" ;
        .

The dataset can be downloaded in CSV format via the distribution represented by :dataset/001/csv.

   :dataset/001/csv
       a dcat:Distribution ;
       dcat:accessURL <http://www.example.org/files/001.csv> ;
       dct:title "CSV distribution of imaginary dataset 001" ;
       dct:format [
            a dct:IMT; 
            rdf:value "text/csv"; 
            rdfs:label "CSV"
       ]
       .

Finally, if the catalog publisher decides to keep metadata describing its records (i.e. the records containing metadata describing the datasets) dcat:CatalogRecord can be used. For example, :dataset/001 was issued on 2011-12-05. however, its description on Imaginary Catalog was added on 2011-12-11. This can be represented by dcat:

   :record/001
       a dcat:CatalogRecord ;
       foaf:primaryTopic :dataset/001 ;
       dct:issued "2011-12-11"^^xsd:date ;
   .
   :catalog
       dcat:record :record/001 ;
   .

Encoding of property values

5. Class: Catalog

A data catalog is a curated collection of metadata about datasets.

RDF class:dcat:Catalog
Usage note:Typically, a web-based data catalog is represented as a single instance of this class.
See also: Catalog record, Dataset

5.1 Property: homepage

The homepage of the catalog.

RDF Property:foaf:homepage
Range:foaf:Document
Usage note:foaf:homepage is an inverse functional property (IFP) which means that it should be unique and precisely identify the catalog. This allows smushing various descriptions of the catalog when different URIs are used.

5.2 Property: publisher

The entity responsible for making the catalog online.

RDF Property:dcterms:publisher
Range:foaf:Agent

5.3 Property: spatial/geographic coverage>

The geographical area covered by the catalog.

RDF Property:dcterms:spatial
Range:dcterms:Location (Spatial region or named place)

5.4 Property: themes

The knowledge organization system (KOS) used to classify catalog's datasets.

RDF Property:dcat:themeTaxonomy
Range:skos:ConceptScheme

5.5 Property: title

A name given to the catalog.

RDF Property:dcterms:title
Range:rdfs:Literal

5.6 Property: description

free-text account of the catalog.

RDF Property:dcterms:description
Range:rdfs:Literal

5.7 Property: language

The language of the catalog. This refers to the language used in the textual metadata describing titles, descriptions, etc. of the datasets in the catalog.

RDF Property:dct:language
Range:rdfs:Literal a string representing the code of the language as described in http://www.ietf.org/rfc/rfc3066.txt
Usage note:Multiple values can be used. The publisher might also choose to describe the language on the dataset level (see dataset language).

5.8 Property: license

This describes the license under which the catalog can be used/reused and not the datasets. Even if the license of the catalog applies to all of its datasets it should be replicated on each dataset.

RDF Property:dcterms:license
Range:dctype:LicenseDocument
Usage note:To allow automatic analysis of datasets, it is important to use canonical identifiers for well-known licenses, see @@@void guide@@@ for a list.
See also:dataset license

5.9 Property: dataset

A dataset that is part of the catalog.

RDF Property:dcat:dataset
Range:dcat:Dataset

5.10 Property: catalog record

A catalog record that is part of the catalog.

RDF Property:dcat:record
Range:dcat:CatalogRecord

6. Class: Catalog record

A record in a data catalog, describing a single dataset.

RDF Class:dcat:CatalogRecord
Usage noteThis class is optional and not all catalogs will use it. It exists for catalogs where a distinction is made between metadata about a dataset and metadata about the dataset's entry in the catalog. For example, the publication date property of the dataset reflects the date when the information was originally made available by the publishing agency, while the publication date of the catalog record is the date when the dataset was added to the catalog. In cases where both dates differ, or where only the latter is known, the publication date should only be specified for the catalog record.
See alsoDataset

In web-based catalogs, the URL of the catalog page should be used as URI for the catalog record if it is a permalink.

If named graphs are used, all RDF triples describing the catalog record, the dataset, and its distributions, should go into a graph named with the catalog record's URI.

6.1 Property: listing date

The date of listing the corresponding dataset in the catalog.

See Issue-3

RDF Property:dcterms:issued
Range:rdfs:Literal typed as xsd:date. The date is encoded as a literal in "YYYY-MM-DD" form (ISO 8601 Date and Time Formats). If the specific day or month are not known, then 01 should be specified.
Usage note:This indicates the date of listing the dataset in the catalog and not the publication date of the dataset itself.
See also: dataset release date

6.2 Property: update/modification date

Most recent date on which the catalog entry was changed, updated or modified.

See Issue-3

RDF Property:dcterms:modified
Range:rdfs:Literal typed as xsd:date. The date is encoded as a literal in "YYYY-MM-DD" form (ISO 8601 Date and Time Formats). If the specific day or month are not known, then 01 should be specified.
Usage note:This indicates the date of last change of a catalog entry, i.e. the catalog metadata description of the dataset, and not the date of the dataset itself.
See also: dataset modification date

6.3 Property: dataset

Links the catalog record to the dcat:Dataset resource described in the record.

See Issue-4

RDF Property:foaf:primaryTopic
Range:dcat:Dataset

7. Class: Dataset

A collection of data, published or curated by a single source, and available for access or download in one or more formats.

RDF Class:dcat:Dataset
Usage note:This class represents the actual dataset as published by the dataset publisher. In cases where a distinction between the actual dataset and its entry in the catalog is necessary (because metadata such as modification date and maintainer might differ), the catalog record class can be used for the latter.
See also:Catalog record

7.1 Property: update/modification date

Most recent date on which the dataset was changed, updated or modified.

See Issue-3

RDF Property:dcterms:modified
Range:rdfs:Literal typed as xsd:date. The date is encoded as a literal in "YYYY-MM-DD" form (ISO 8601 Date and Time Formats). If the specific day or month are not known, then 01 should be specified.
Usage note:The value of this property indicates a change to the actual dataset, not a change to the catalog record. An absent value may indicate that the dataset has never changed after its initial publication, or that the date of last modification is not known, or that the dataset is continuously updated. Example: 2010-05-07
See also:frequency

7.2 Property: title

A name given to the dataset.

RDF Property:dcterms:title
Range:rdfs:Literal

7.3 Property: description

free-text account of the dataset.

RDF Property:dcterms:description
Range:rdfs:Literal

7.4 Property: publisher

An entity responsible for making the dataset available.

See Issue-4

RDF Property:dcterms:publisher
Range:foaf:Organization
See also:Class: Organization/Person

7.5 Property: release date

Date of formal issuance (e.g., publication) of the dataset.

See Issue-3

RDF Property:dcterms:issued
Range:rdfs:Literal typed as xsd:date. The date is encoded as a literal in "YYYY-MM-DD" form (ISO 8601 Date and Time Formats). If the specific day or month are not known, then 01 should be specified.
Usage note:This property should be set using the first known date of issuance. Example: 2010-05-07

7.6 Property: frequency

The frequency with which dataset is published.

See Issue-5

RDF Property:dcterms:accrualPeriodicity
Range:dcterms:Frequency (A rate at which something recurs)
Usage note:@@@ values should come from a controlled vocabulary i.e. predefined set of resources. It could use values like placetime.com intervals
Domain:dcterms:Collection so, a Catalog must be a dcterms:Collection as well.

7.7 Property: identifier

A unique identifier of the dataset.

RDF Property:dcterms:identifier
Range:rdfs:Literal
Usage note:The identifier might be used to coin permanent and unique URI for the dataset, but still having it represented explicitly is useful.

7.8 Property: spatial/geographical coverage

Spatial coverage of the dataset.

RDF Property:dcterms:spatial
Range:dcterms:Location (A spatial region or named place)
Usage note: @@@ controlled vocabulary. geonames???

7.9 Property: temporal coverage

@@@ The temporal period that the dataset covers.

RDF Property:dcterms:temporal
Range:dcterms:PeriodOfTime (An interval of time that is named or defined by its start and end dates)
Usage note: @@@ controlled vocabulary. http://www.placetime.com/ might be an option???

7.10 Property: language

The language of the dataset.

RDF Property:dct:language
Range:rdfs:Literal a string representing the code of the language as described in http://www.ietf.org/rfc/rfc3066.txt
Usage note:This overrides the value of the catalog language in case of conflict.

7.11 Property: license

The license under which the dataset is published and can be reused.

RDF Property:dcterms:license
Range:dctype:LicenseDocument
Usage note:See Section 2.4 of Describing Linked Datasets with the VoID Vocabulary.

7.12 Property: granularity

describes the level of granularity of data. @@@ elaborate more@@@

RDF Property:dcat:granularity
Range:rdfs:Resource
Usage note:This is usually geographical or temporal but can also be other dimension e.g. Person can be used to describe granularity of a dataset about average income.

A set of sample values used in data.gov: country, county, longitude/latitude, region, plane, airport.

7.13 Property: data dictionary

provides some sort of description that helps understanding the data. This usually consisits of a table providing explanation of columns meaning, values interpretation and acronyms/codes used in the data.

RDF Property:dcat:dataDictionary
Range:foaf:Document
Usage note:@@@ Review @@@ It is rarely provided in the current catalogs and does not have a consistent usage, however when it is provided it is a link to some document or embeded in a document packaged together with the dataset. It is recommended to represent it as a resource having the URL of the online document as its URI. Statistical datasets, as a particular yet common case, can have a more structured description and the on-progress work on SDMX+RDF can be utilized here.

7.14 Property: data quality

describes the quality of data.

RDF Property:dcat:dataQuality
Range:rdfs:Literal
Usage note:@@@Review@@@ This is a very general property and it is not clear how exactly it will be used as catalogs currently do not use it or use it with meaningless values. Catalogs are expected to define more specific sub-properties to describe quality characteristics e.g. statistical data usually have a lot to describe about the quality of sampling, collection mode, non-response adjustment…

7.15 Property: theme/category

The main category of the dataset. A dataset can have multiple themes.

RDF Property:dcat:theme
Range:skos:Concept
Usage note:The set of skos:Concepts used to categorize the datasets are organized in a skos:ConceptScheme describing all the categories and their relations in the catalog.

7.16 Property: keyword/tag

A keyword or tag describing the dataset.

RDF Property:dcat:keyword
Range:rdfs:Literal

7.18 Property: dataset distribution

Connects a dataset to its available distributions.

RDF Property:dcat:distribution
Range:dcat:Distribution

7.19 Class: Distribution

Represents a specific available form of a dataset. Each dataset might be available in different forms, these forms might represent different formats of the dataset, different endpoints,... Examples of Distribution include a downloadable CSV file, an XLS file representing the dataset, an RSS feed…

RDF Property:dcat:Distribution
Range:Has no defined range
Usage note:This represents a general availability of a dataset it implies no information about the actual access method of the data, i.e. whether it is a direct download, API, or a splash page. Use one of its subclasses when the particular access method is known.
See alsoDownload, WebService, Feed

7.20 Property: access/download

points to the location of a distribution. This can be a direct download link, a link to an HTML page containing a link to the actual data, Feed, Web Service etc. the semantic is determined by its domain (Distribution, Feed, WebService, Download).

If the value is always a URI, shouldn't the range be rdfs:Resource?

RDF Property:dcat:accessURL
Range:rdfs:Literal
Usage note:the value is a URL.
See alsoDownload, WebService, Feed

7.21 Property: size

The size of a distribution.

RDF Property:dcat:size
Range:rdfs:Resource
Usage note:dcat:size is usually used with a blank node described using rdfs:label and dcat:bytes.
Example:
   :distribution dcat:size [dcat:bytes 5120^^xsd:integer; rdfs:label "5KB"]

7.22 Property: format

the file format of the distribution.

RDF Property:dcterms:format
Range:dcterms:MediaTypeOrExtent
Usage note:MIME type is used for values. A list of MIME types URLs can be found at IANA. However ESRI Shape files have no specific MIME type (A Shape distribution is actually a collection of files), currently this is still an open question? @@@.
Example:
:distribution dcterms:format [
   a dcterms:IMT;
   rdf:value "text/csv";
   rdfs:label "CSV"
]

8. Class: Download

Represents a downloadable distribution of a dataset.

RDF Class:dcat:Download
Range: accessUrl of the Download distribution should be a direct download link (a one-click access to the data file).
See also:Distribution, WebService, Feed

9. Class: WebService

Represents a web service that enables access to the data of a dataset.

RDF Class:dcat:WebService
Range:dcterms:MediaTypeOrExtent
Usage note:Describe the web service using accessUrl, format and size. Further description of the web service is out the scope of dcat.
See also:Distribution, Download, Feed

10. Class: Feed

represent availability of a dataset as a feed.

RDF Class:dcat:Feed
Usage note:Describe the feed using accessUrl, format and size. Further description of the web service is out the scope of dcat.
See also:Distribution, Download, WebService

11. Class: Category and category scheme

The knowledge organization system (KOS) used to represent themes/categories of datasets in the catalog.

RDF Classes:skos:ConceptScheme, skos:Concept
Usage note:It's necessary to use either skos:inScheme or skos:topConceptOf on every skos:Concept otherwise it's not clear which concept scheme they belong to.
See also:catalog themes, dataset theme

12. Class: Organization/Person

RDF Classes:foaf:Person for people and foaf:Organization for government agencies or other entities.
Usage note:FOAF provides sufficient properties to describe these entities.

13. Extending the DCAT vocabulary

A. References

A.1 Normative references

[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Internet RFC 2119. URL: http://www.ietf.org/rfc/rfc2119.txt

A.2 Informative references

No informative references.