- 1 DCAT in Context of Standards and related Work
- 1.1 Standards and related Work
- 1.1.1 Dublin Core
- 1.1.2 Asset Description Metadata Schema (ADMS)
- 1.1.3 Comprehensive Knowledge Archive Network (CKAN)
- 1.1.4 DatA Tag Suite (DATS)
- 1.1.5 DDI-RDF Discovery Vocabulary (Disco)
- 1.1.6 RDF Data Cube Vocabulary (DQ)
- 1.1.7 Geographic Information – Metadata (ISO 19115)
- 1.1.8 Dataset Descriptions: Community Profile (HCLS)
- 1.1.9 Schema.org
- 1.1.10 Vocabulary of Interlinked Datasets (VoID)
- 1.1.11 DataCite
- 1.1.12 Research Data Alliance
- 1.2 Comparative analysis of the "Catalog" concept
- 1.3 Comparative analysis of the "Dataset" concept
- 1.4 Comparative analysis of the "Distribution" concept
- 1.1 Standards and related Work
Asset Description Metadata Schema (ADMS)
W3C Note, August 2013. ADMS is a DCAT profile focusing on "semantic assets" within a catalog, esp. used for eGovernment systems (documents).
Comprehensive Knowledge Archive Network (CKAN)
DatA Tag Suite (DATS)
Meta data model defined in terms of modular JSON Schema by the bioCADDIE WG3 Descriptive Metadata for Datasets.
DDI-RDF Discovery Vocabulary (Disco)
Public working draft of a potential specification by the DDI Alliance (March 2015). RDF Schema vocabulary for publishing metadata about research and survey data on the Web based on DDI (Data Documentation Initiative) XML formats.
RDF Data Cube Vocabulary (DQ)
W3C Recommendation, January 2014 for describing multi-dimensional data using RDF compatible with the cube model of SDMX (Statistical Data and Metadata eXchange), an ISO standard for exchanging and sharing statistical data and metadata among organizations.
Geographic Information – Metadata (ISO 19115)
ISO 19115-1:2014 defines the schema required for describing geographic information and services by means of metadata. It provides information about the identification, the extent, the quality, the spatial and temporal aspects, the content, the spatial reference, the portrayal, distribution, and other properties of digital geographic data and services.
Besides datasets and services, the types of "resources" that can be described in ISO 19115 include individual data items (e.g., features and attributes), software, initiatives, and others. For more details, see also ISO Scope Codes - NOAA Environmental Data Management Wiki.
The GeoDCAT-AP provides a mapping from ISO 19115 as a formal DCAT profile.
Dataset Descriptions: Community Profile (HCLS)
The W3C Note proposes a description of structured data from the Health Care and Life Sciences domains. Not a formal DCAT profile, but could be?
Initiative launched in 2011 by search engine providers in order to create shared set of vocabularies for structured data markup on web pages etc.
W3C Interest Group Note, March 2011 defining an RDF Schema vocabulary for expressing metadata about RDF datasets.
Metadata schema designed for data citation purposes.
DataCite allows the description of different types of resources - including datasets, software, services, events. The definition of these resources is based on the corresponding Dublin Core one (when available).
Research Data Alliance
- Data Description Registry Interoperability (DDRI) WG
- Data Versioning WG
- Research Data Collections WG
- Data Discovery Paradigms IG
Comparative analysis of the "Catalog" concept
In ISO 19115, a catalog is one of the possible services that can be described. It is defined as
Service that provides discovery and management services on a store of metadata about instances.
Service-specific metadata elements are defined in a separate standard, namely, ISO 19119:2016. For the types of services defined in ISO 19119, see:
- INSPIRE Registry - Classification of spatial data services
- INSPIRE Registry - Spatial data service types
Comparative analysis of the "Dataset" concept
Dublin Core has a specific class, namely, dctype:Dataset, defined as follows:
Definition: Data encoded in a defined structure.
Comment: Examples include lists, tables, and databases. A dataset may be useful for direct machine processing.
The ADMS Asset is a sub class of dcat:Dataset and reflects the intellectual content and characteristics independent of physical embodiment. Various properties support the definition of containment (includedAsset) and linked lists (next, prev, last).
CKAN dataset is a container consisting of any number of resources. Their relation and semantics is not explicit i.e. they might represent different content dimensions, subsets or file formats. Contains properties of DCAT's Distribution: file format, file or link.
Static or dynamic set of dimensions about an entity being observed. DATS Dataset definition refers to DCAT Dataset. Likewise, the DATS Datasets link to DatasetDistribution and DatsetRepository, recursive containment of Dataset supported.
DQ DataSet is a collection of multi-dimensional, statistical data (observations) possibly organized into various slices, conforming to a defined structure.
Disco LogicalDataSet describes the contents of a data set (resulting from a Study, i.e. process by which a data set was generated or collected), subclass of DCAT Dataset. Content is organized into a set of Variables.
HCLS reuses the definition of DC Dataset for general ("Summary Level") and versioned content description ("Version Level"), void:Dataset and dcat:Distribution for physical distribution. Partition containment is modeled via dct:hasPart.
Schema.org Dataset represents "body of structured information describing some topic(s) of interest". Like DCAT links to distribution object Data Download and DataCatalog. Dataset shares recursive containment structure, file format from CreativeWork. DataFeed is apparently a dynamic (and only) subtype of Dataset.
VoID Dataset is a meaningful collection of RDF triples that are published, maintained or aggregated by a single provider. They either cover certain topic, originate from a certain source or process, are hosted on a certain server, or are aggregated by a certain custodian. VoID Datasets directly link to the downloadable representation (sparqlEndpoint, dataDump). Containment partitions via void:subset, further class- and property-based partitions supported.
A dataset is one of the possible resources that can be described in ISO 19115. The official definition:
identifiable collection of data
Note 1 to entry: A dataset can be a smaller grouping of data which, though limited by some constraint such as spatial extent or feature type, is located physically within a larger dataset. Theoretically, a dataset can be as small as a single feature (4.5) or feature attribute contained within a larger dataset. A hardcopy map or chart can be considered a dataset.
ISO 19115 includes also a resource type "dataset series", defined as follows:
collection of datasets (4.3) sharing common characteristics
A dataset is one of the possible resources that can be described in DataCite. For the notion of dataset, DataCite re-uses the Dublin Core definition - i.e., "Data encoded in a defined structure.".
Comparative analysis of the "Distribution" concept
ADMS Asset Distribution represents the physical embodiment of an Asset.
Disco DataFile is the physical representation of the data and subclass of DCAT Distribution.
[Official definition to be added]
It is worth noting that, differently from DCAT, in ISO 19115 information concerning access and use conditions are attached to the parent resource (e.g., dataset), and not to the distribution.
DataCite does not have a separate notion for distribution. Actually, all the information that in DCAT concerns distributions (format, download URL, license and rights), in DataCite are attached to the parent resource (e.g., dataset).