DCAT Context

From Dataset Exchange Working Group

DCAT in Context of Standards and related Work

Standards and related Work

Dublin Core

TBD

Asset Description Metadata Schema (ADMS)

W3C Note, August 2013. ADMS is a DCAT profile focusing on "semantic assets" within a catalog, esp. used for eGovernment systems (documents).

Comprehensive Knowledge Archive Network (CKAN)

Open source data portal supporting DCAT via an extension.

DatA Tag Suite (DATS)

Meta data model defined in terms of modular JSON Schema by the bioCADDIE WG3 Descriptive Metadata for Datasets.

DDI-RDF Discovery Vocabulary (Disco)

Public working draft of a potential specification by the DDI Alliance (March 2015). RDF Schema vocabulary for publishing metadata about research and survey data on the Web based on DDI (Data Documentation Initiative) XML formats.

RDF Data Cube Vocabulary (DQ)

W3C Recommendation, January 2014 for describing multi-dimensional data using RDF compatible with the cube model of SDMX (Statistical Data and Metadata eXchange), an ISO standard for exchanging and sharing statistical data and metadata among organizations.

Geographic Information – Metadata (ISO 19115)

ISO 19115-1:2014 defines the schema required for describing geographic information and services by means of metadata. It provides information about the identification, the extent, the quality, the spatial and temporal aspects, the content, the spatial reference, the portrayal, distribution, and other properties of digital geographic data and services.

Besides datasets and services, the types of "resources" that can be described in ISO 19115 include individual data items (e.g., features and attributes), software, initiatives, and others. For more details, see also ISO Scope Codes - NOAA Environmental Data Management Wiki.

The GeoDCAT-AP provides a mapping from ISO 19115 as a formal DCAT profile.

Dataset Descriptions: Community Profile (HCLS)

The W3C Note proposes a description of structured data from the Health Care and Life Sciences domains. Not a formal DCAT profile, but could be?

Schema.org

Initiative launched in 2011 by search engine providers in order to create shared set of vocabularies for structured data markup on web pages etc.

Vocabulary of Interlinked Datasets (VoID)

W3C Interest Group Note, March 2011 defining an RDF Schema vocabulary for expressing metadata about RDF datasets.

DataCite

Metadata schema designed for data citation purposes.

DataCite allows the description of different types of resources - including datasets, software, services, events. The definition of these resources is based on the corresponding Dublin Core one (when available).

Research Data Alliance

Several Interest Groups and Working Groups, in particular

Comparative analysis of the "Catalog" concept

Geographic Information – Metadata (ISO 19115)

In ISO 19115, a catalog is one of the possible services that can be described. It is defined as

Service that provides discovery and management services on a store of metadata about instances.

Service-specific metadata elements are defined in a separate standard, namely, ISO 19119:2016. For the types of services defined in ISO 19119, see:

Comparative analysis of the "Dataset" concept

Dublin Core

Dublin Core has a specific class, namely, dctype:Dataset, defined as follows:

Definition: Data encoded in a defined structure.

Comment: Examples include lists, tables, and databases. A dataset may be useful for direct machine processing.

ADMS

The ADMS Asset is a sub class of dcat:Dataset and reflects the intellectual content and characteristics independent of physical embodiment. Various properties support the definition of containment (includedAsset) and linked lists (next, prev, last).

CKAN

CKAN dataset is a container consisting of any number of resources. Their relation and semantics is not explicit i.e. they might represent different content dimensions, subsets or file formats. Contains properties of DCAT's Distribution: file format, file or link.

DATS

Static or dynamic set of dimensions about an entity being observed. DATS Dataset definition refers to DCAT Dataset. Likewise, the DATS Datasets link to DatasetDistribution and DatsetRepository, recursive containment of Dataset supported.

DQ

DQ DataSet is a collection of multi-dimensional, statistical data (observations) possibly organized into various slices, conforming to a defined structure.

Disco

Disco LogicalDataSet describes the contents of a data set (resulting from a Study, i.e. process by which a data set was generated or collected), subclass of DCAT Dataset. Content is organized into a set of Variables.

HCLS

HCLS reuses the definition of DC Dataset for general ("Summary Level") and versioned content description ("Version Level"), void:Dataset and dcat:Distribution for physical distribution. Partition containment is modeled via dct:hasPart.

Schema.org

Schema.org Dataset represents "body of structured information describing some topic(s) of interest". Like DCAT links to distribution object Data Download and DataCatalog. Dataset shares recursive containment structure, file format from CreativeWork. DataFeed is apparently a dynamic (and only) subtype of Dataset.

VoID

VoID Dataset is a meaningful collection of RDF triples that are published, maintained or aggregated by a single provider. They either cover certain topic, originate from a certain source or process, are hosted on a certain server, or are aggregated by a certain custodian. VoID Datasets directly link to the downloadable representation (sparqlEndpoint, dataDump). Containment partitions via void:subset, further class- and property-based partitions supported.

Geographic Information – Metadata (ISO 19115)

A dataset is one of the possible resources that can be described in ISO 19115. The official definition:

4.3

dataset

identifiable collection of data

Note 1 to entry: A dataset can be a smaller grouping of data which, though limited by some constraint such as spatial extent or feature type, is located physically within a larger dataset. Theoretically, a dataset can be as small as a single feature (4.5) or feature attribute contained within a larger dataset. A hardcopy map or chart can be considered a dataset.

ISO 19115 includes also a resource type "dataset series", defined as follows:

4.4

dataset series

collection of datasets (4.3) sharing common characteristics

DataCite

A dataset is one of the possible resources that can be described in DataCite. For the notion of dataset, DataCite re-uses the Dublin Core definition - i.e., "Data encoded in a defined structure.".

Comparative analysis of the "Distribution" concept

ADMS

ADMS Asset Distribution represents the physical embodiment of an Asset.

Disco

Disco DataFile is the physical representation of the data and subclass of DCAT Distribution.

Geographic Information – Metadata (ISO 19115)

[Official definition to be added]

See also: ISO Distribution Information - NOAA Environmental Data Management Wiki.

It is worth noting that, differently from DCAT, in ISO 19115 information concerning access and use conditions are attached to the parent resource (e.g., dataset), and not to the distribution.

DataCite

DataCite does not have a separate notion for distribution. Actually, all the information that in DCAT concerns distributions (format, download URL, license and rights), in DataCite are attached to the parent resource (e.g., dataset).