Copyright © 2012 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. This document defines the schema and provides examples for its use.
By using DCAT to describe datasets in data catalogs, publishers increase discoverability and enable applications easily to consume metadata from multiple catalogs. It further enables decentralized publishing of catalogs and facilitates federated dataset search across sites. Aggregated DCAT metadata can serve as a manifest file to facilitate digital preservation.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This publication signals the move of DCAT onto the W3C Recommendation Track. DCAT was first developed and published by DERI and has seen widespread adoption at the time of this publication. The original vocabulary was further developed by the eGov Interest Group, before being brought onto the Recommendation Track by Government Linked Data (GLD) Working Group.
This document was published by the Government Linked Data (GLD) Working Group as a First Public Working Draft. This document is intended to become a W3C Recommendation. If you wish to make comments regarding this document, please send them to public-gld-comments@w3.org (subscribe, archives). All feedback is welcome.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
This section is non-normative.
This document does not prescribe any particular method of deploying data expressed in DCAT. DCAT is applicable in many contexts including RDF accessible via SPARQL endpoints, embedded in HTML pages as RDFa, or serialized as e.g. RDF/XML or Turtle. The examples in this document use Turtle simply because of Turtle's readability.
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
The key words must, must not, required, should, should not, recommended, may, and optional in this specification are to be interpreted as described in [RFC2119].
The namespace for DCAT is http://www.w3.org/ns/dcat#
. However,
it should be noted that DCAT makes extensive use of terms from other vocabularies,
in particular Dublin Core. DCAT itself defines a minimal set of classes and
properties of its own. A full set of namespaces and prefixes used in this
document is shown in the table below.
Prefix | Namespace |
---|---|
dcat | http://www.w3.org/ns/dcat# |
dcterms | http://purl.org/can never remember |
foaf | http://xmlns.com/foaf/0.1/ |
This section is non-normative.
DCAT is an RDF vocabulary well-suited to representing government data catalogs such as Data.gov and data.gov.uk. DCAT defines three main classes:
Another important class in DCAT is dcat:CatalogRecord which describes a dataset entry in the catalog. Notice that while dcat:Dataset represents the dataset itself, dcat:CatalogRecord represents the record that describes a dataset in the catalog. The use of the CatalogRecord is considered optional. It is used to capture provenance information about dataset entries in a catalog. If this distinction is not necessary then CatalogRecord can be safely ignored.
@@Fadi:I will update the figure once the list of properties are listed
This example provides a quick overview of how dcat might be used to represent a government catalog and its datasets.
@@@ TODO.jse: Illustrate more clearly how these "examples" might appear "in the wild." Esp. the use of skos:Concept (etc) in RDFa... @@@
First, the catalog description:
:catalog a dcat:Catalog ; dct:title "Imaginary catalog" ; rdfs:label "Imaginary catalog" ; foaf:homepage <http://example.org/catalog> ; dct:publisher :transparency-office ; dcat:themes :themes ; dct:language "en"^^xsd:language ; dcat:dataset :dataset/001 ; .
The publisher of the catalog has the relative URI :transparency-office. Further description of the publisher can be provided as in the following example:
:transparency-office a foaf:Agent ; rdfs:label "Transparency Office" ; .
The catalog classify its datasets according to a set of domains represented by the relative URI :themes. SKOS can be used to describe the domains used:
:themes a skos:ConceptScheme ; skos:prefLabel "A set of domains to classify documents" ; .
The catalog connect to each of its datasets via dcat:dataset. In the example above, an example dataset was mentioned with the relative URI :dataset/001. A possible description of it using dcat is shown below:
:dataset/001 a dcat:Dataset ; dct:title "Imaginary dataset" ; dcat:keyword "accountability","transparency" ,"payments" ; dcat:theme :themes/accountability ; dct:issued "2011-12-05"^^xsd:date ; dct:updated "2011-12-05"^^xsd:date ; dct:publisher :agency/finance-ministry ; dct:accrualPeriodicity "every six months" ; dct:language "en"^^xsd:language ; dcat:Distribution :dataset/001/csv ; .
Notice that this dataset is classified under the domain represented by the relative URI :themes/accountability. This should be part of the domains set identified by the URI :themes that was used to describe the catalog domains. An example SKOS description
:themes/accountability a skos:Concept ; skos:inScheme :themes ; skos:prefLabel "Accountability" ; .
The dataset can be downloaded in CSV format via the distribution represented by :dataset/001/csv.
:dataset/001/csv a dcat:Distribution ; dcat:accessURL <http://www.example.org/files/001.csv> ; dct:title "CSV distribution of imaginary dataset 001" ; dct:format [ a dct:IMT; rdf:value "text/csv"; rdfs:label "CSV" ] .
Finally, if the catalog publisher decides to keep metadata describing its records (i.e. the records containing metadata describing the datasets) dcat:CatalogRecord can be used. For example, :dataset/001 was issued on 2011-12-05. however, its description on Imaginary Catalog was added on 2011-12-11. This can be represented by dcat:
:record/001 a dcat:CatalogRecord ; foaf:primaryTopic :dataset/001 ; dct:issued "2011-12-11"^^xsd:date ; . :catalog dcat:record :record/001 ; .
A data catalog is a curated collection of metadata about datasets.
RDF class: | dcat:Catalog |
---|---|
Usage note: | Typically, a web-based data catalog is represented as a single instance of this class. |
See also: | Catalog record, Dataset |
The homepage of the catalog.
RDF Property: | foaf:homepage |
---|---|
Range: | foaf:Document |
Usage note: | foaf:homepage is an inverse functional property (IFP) which means that it should be unique and precisely identify the catalog. This allows smushing various descriptions of the catalog when different URIs are used. |
The entity responsible for making the catalog online.
RDF Property: | dcterms:publisher |
---|---|
Range: | foaf:Agent |
The geographical area covered by the catalog.
RDF Property: | dcterms:spatial |
---|---|
Range: | dcterms:Location (Spatial region or named place) |
The knowledge organization system (KOS) used to classify catalog's datasets.
RDF Property: | dcat:themeTaxonomy |
---|---|
Range: | skos:ConceptScheme |
free-text account of the catalog.
RDF Property: | dcterms:description |
---|---|
Range: | rdfs:Literal |
The language of the catalog. This refers to the language used in the textual metadata describing titles, descriptions, etc. of the datasets in the catalog.
RDF Property: | dct:language |
---|---|
Range: | rdfs:Literal a string representing the code of the language as described in http://www.ietf.org/rfc/rfc3066.txt |
Usage note: | Multiple values can be used. The publisher might also choose to describe the language on the dataset level (see dataset language). |
This describes the license under which the catalog can be used/reused and not the datasets. Even if the license of the catalog applies to all of its datasets it should be replicated on each dataset.
RDF Property: | dcterms:license |
---|---|
Range: | dctype:LicenseDocument |
Usage note: | To allow automatic analysis of datasets, it is important to use canonical identifiers for well-known licenses, see @@@void guide@@@ for a list. |
See also: | dataset license |
A dataset that is part of the catalog.
RDF Property: | dcat:dataset |
---|---|
Range: | dcat:Dataset |
A catalog record that is part of the catalog.
RDF Property: | dcat:record |
---|---|
Range: | dcat:CatalogRecord |
A record in a data catalog, describing a single dataset.
RDF Class: | dcat:CatalogRecord |
---|---|
Usage note | This class is optional and not all catalogs will use it. It exists for catalogs where a distinction is made between metadata about a dataset and metadata about the dataset's entry in the catalog. For example, the publication date property of the dataset reflects the date when the information was originally made available by the publishing agency, while the publication date of the catalog record is the date when the dataset was added to the catalog. In cases where both dates differ, or where only the latter is known, the publication date should only be specified for the catalog record. |
See also | Dataset |
In web-based catalogs, the URL of the catalog page should be used as URI for the catalog record if it is a permalink.
If named graphs are used, all RDF triples describing the catalog record, the dataset, and its distributions, should go into a graph named with the catalog record's URI.
The date of listing the corresponding dataset in the catalog.
See Issue-3
RDF Property: | dcterms:issued |
---|---|
Range: | rdfs:Literal typed as xsd:date. The date is encoded as a literal in "YYYY-MM-DD" form (ISO 8601 Date and Time Formats). If the specific day or month are not known, then 01 should be specified. |
Usage note: | This indicates the date of listing the dataset in the catalog and not the publication date of the dataset itself. |
See also: | dataset release date |
Most recent date on which the catalog entry was changed, updated or modified.
See Issue-3
RDF Property: | dcterms:modified |
---|---|
Range: | rdfs:Literal typed as xsd:date. The date is encoded as a literal in "YYYY-MM-DD" form (ISO 8601 Date and Time Formats). If the specific day or month are not known, then 01 should be specified. |
Usage note: | This indicates the date of last change of a catalog entry, i.e. the catalog metadata description of the dataset, and not the date of the dataset itself. |
See also: | dataset modification date |
Links the catalog record to the dcat:Dataset resource described in the record.
See Issue-4
RDF Property: | foaf:primaryTopic |
---|---|
Range: | dcat:Dataset |
A collection of data, published or curated by a single source, and available for access or download in one or more formats.
RDF Class: | dcat:Dataset |
---|---|
Usage note: | This class represents the actual dataset as published by the dataset publisher. In cases where a distinction between the actual dataset and its entry in the catalog is necessary (because metadata such as modification date and maintainer might differ), the catalog record class can be used for the latter. |
See also: | Catalog record |
Most recent date on which the dataset was changed, updated or modified.
See Issue-3
RDF Property: | dcterms:modified |
---|---|
Range: | rdfs:Literal typed as xsd:date. The date is encoded as a literal in "YYYY-MM-DD" form (ISO 8601 Date and Time Formats). If the specific day or month are not known, then 01 should be specified. |
Usage note: | The value of this property indicates a change to the actual dataset, not a change to the catalog record. An absent value may indicate that the dataset has never changed after its initial publication, or that the date of last modification is not known, or that the dataset is continuously updated. Example: 2010-05-07 |
See also: | frequency |
free-text account of the dataset.
RDF Property: | dcterms:description |
---|---|
Range: | rdfs:Literal |
An entity responsible for making the dataset available.
See Issue-4
RDF Property: | dcterms:publisher |
---|---|
Range: | foaf:Organization |
See also: | Class: Organization/Person |
Date of formal issuance (e.g., publication) of the dataset.
See Issue-3
RDF Property: | dcterms:issued |
---|---|
Range: | rdfs:Literal typed as xsd:date. The date is encoded as a literal in "YYYY-MM-DD" form (ISO 8601 Date and Time Formats). If the specific day or month are not known, then 01 should be specified. |
Usage note: | This property should be set using the first known date of issuance. Example: 2010-05-07 |
The frequency with which dataset is published.
See Issue-5
RDF Property: | dcterms:accrualPeriodicity |
---|---|
Range: | dcterms:Frequency (A rate at which something recurs) |
Usage note: | @@@ values should come from a controlled vocabulary i.e. predefined set of resources. It could use values like placetime.com intervals |
Domain: | dcterms:Collection so, a Catalog must be a dcterms:Collection as well. |
A unique identifier of the dataset.
RDF Property: | dcterms:identifier |
---|---|
Range: | rdfs:Literal |
Usage note: | The identifier might be used to coin permanent and unique URI for the dataset, but still having it represented explicitly is useful. |
Spatial coverage of the dataset.
RDF Property: | dcterms:spatial |
---|---|
Range: | dcterms:Location (A spatial region or named place) |
Usage note: | @@@ controlled vocabulary. geonames??? |
@@@ The temporal period that the dataset covers.
RDF Property: | dcterms:temporal |
---|---|
Range: | dcterms:PeriodOfTime (An interval of time that is named or defined by its start and end dates) |
Usage note: | @@@ controlled vocabulary. http://www.placetime.com/ might be an option??? |
The language of the dataset.
RDF Property: | dct:language |
---|---|
Range: | rdfs:Literal a string representing the code of the language as described in http://www.ietf.org/rfc/rfc3066.txt |
Usage note: | This overrides the value of the catalog language in case of conflict. |
The license under which the dataset is published and can be reused.
RDF Property: | dcterms:license |
---|---|
Range: | dctype:LicenseDocument |
Usage note: | See Section 2.4 of Describing Linked Datasets with the VoID Vocabulary. |
describes the level of granularity of data. @@@ elaborate more@@@
RDF Property: | dcat:granularity |
---|---|
Range: | rdfs:Resource |
Usage note: | This is usually geographical or temporal but can also be other dimension e.g. Person can be used to describe granularity of a dataset about average income. |
A set of sample values used in data.gov: country, county, longitude/latitude, region, plane, airport.
provides some sort of description that helps understanding the data. This usually consisits of a table providing explanation of columns meaning, values interpretation and acronyms/codes used in the data.
RDF Property: | dcat:dataDictionary |
---|---|
Range: | foaf:Document |
Usage note: | @@@ Review @@@ It is rarely provided in the current catalogs and does not have a consistent usage, however when it is provided it is a link to some document or embeded in a document packaged together with the dataset. It is recommended to represent it as a resource having the URL of the online document as its URI. Statistical datasets, as a particular yet common case, can have a more structured description and the on-progress work on SDMX+RDF can be utilized here. |
describes the quality of data.
RDF Property: | dcat:dataQuality |
---|---|
Range: | rdfs:Literal |
Usage note: | @@@Review@@@ This is a very general property and it is not clear how exactly it will be used as catalogs currently do not use it or use it with meaningless values. Catalogs are expected to define more specific sub-properties to describe quality characteristics e.g. statistical data usually have a lot to describe about the quality of sampling, collection mode, non-response adjustment… |
The main category of the dataset. A dataset can have multiple themes.
RDF Property: | dcat:theme |
---|---|
Range: | skos:Concept |
Usage note: | The set of skos:Concepts used to categorize the datasets are organized in a skos:ConceptScheme describing all the categories and their relations in the catalog. |
A keyword or tag describing the dataset.
RDF Property: | dcat:keyword |
---|---|
Range: | rdfs:Literal |
Connects a dataset to its available distributions.
RDF Property: | dcat:distribution |
---|---|
Range: | dcat:Distribution |
Represents a specific available form of a dataset. Each dataset might be available in different forms, these forms might represent different formats of the dataset, different endpoints,... Examples of Distribution include a downloadable CSV file, an XLS file representing the dataset, an RSS feed…
RDF Property: | dcat:Distribution |
---|---|
Range: | Has no defined range |
Usage note: | This represents a general availability of a dataset it implies no information about the actual access method of the data, i.e. whether it is a direct download, API, or a splash page. Use one of its subclasses when the particular access method is known. |
See also | Download, WebService, Feed |
points to the location of a distribution. This can be a direct download link, a link to an HTML page containing a link to the actual data, Feed, Web Service etc. the semantic is determined by its domain (Distribution, Feed, WebService, Download).
If the value is always a URI, shouldn't the range be rdfs:Resource?
RDF Property: | dcat:accessURL |
---|---|
Range: | rdfs:Literal |
Usage note: | the value is a URL. |
See also | Download, WebService, Feed |
The size of a distribution.
RDF Property: | dcat:size |
---|---|
Range: | rdfs:Resource |
Usage note: | dcat:size is usually used with a blank node described using rdfs:label and dcat:bytes. |
Example: | :distribution dcat:size [dcat:bytes 5120^^xsd:integer; rdfs:label "5KB"] |
the file format of the distribution.
RDF Property: | dcterms:format |
---|---|
Range: | dcterms:MediaTypeOrExtent |
Usage note: | MIME type is used for values. A list of MIME types URLs can be found at IANA. However ESRI Shape files have no specific MIME type (A Shape distribution is actually a collection of files), currently this is still an open question? @@@. |
Example: |
:distribution dcterms:format [ a dcterms:IMT; rdf:value "text/csv"; rdfs:label "CSV" ] |
Represents a downloadable distribution of a dataset.
RDF Class: | dcat:Download |
---|---|
Range: | accessUrl of the Download distribution should be a direct download link (a one-click access to the data file). |
See also: | Distribution, WebService, Feed |
Represents a web service that enables access to the data of a dataset.
RDF Class: | dcat:WebService |
---|---|
Range: | dcterms:MediaTypeOrExtent |
Usage note: | Describe the web service using accessUrl, format and size. Further description of the web service is out the scope of dcat. |
See also: | Distribution, Download, Feed |
represent availability of a dataset as a feed.
RDF Class: | dcat:Feed |
---|---|
Usage note: | Describe the feed using accessUrl, format and size. Further description of the web service is out the scope of dcat. |
See also: | Distribution, Download, WebService |
The knowledge organization system (KOS) used to represent themes/categories of datasets in the catalog.
RDF Classes: | skos:ConceptScheme, skos:Concept |
---|---|
Usage note: | It's necessary to use either skos:inScheme or skos:topConceptOf on every skos:Concept otherwise it's not clear which concept scheme they belong to. |
See also: | catalog themes, dataset theme |
RDF Classes: | foaf:Person for people and foaf:Organization for government agencies or other entities. |
---|---|
Usage note: | FOAF provides sufficient properties to describe these entities. |
No informative references.