User:Rcygania2/SDMX and DDI

From RDF Working Group Wiki
Jump to: navigation, search

RDF multigraph use cases from SDMX and DDI

This page provides some background on the SDMX and DDI standards, and lists use cases for RDF multi-graph support arising from the effort to express these standards in RDF.

Background

DDI and SDMX are two complementary XML-based standards widely used in the social and economic sciences and in national statistics. DDI is concerned with metadata that describes the production of statistical data (surveys, censuses, questionnaires, data cleanup, tabulation etc). SDMX is concerned with dissemination and reporting of aggregated “cube” data. An RDF Schema expression of the core of SDMX has been created as the RDF Data Cube Vocabulary; a similar effort is currently ongoing for DDI.

Versioning

SDMX and DDI have strong notions of ownership and versioning. Artefacts such as code lists, question banks and data structure definitions are managed by an authority (“maintenance agency”) and have, besides an authority-assigned “local” identifier also a numeric version identifier. The identity of an artefact consists of its agency, artefact type, local identifier, version number, and (in special cases) the identity of a parent artefact. Different versions of an otherwise same artefact can exist side-by-side in the same containing XML instance.

Published and unpublished artefacts

DDI and SDMX impose a policy that an artefact can be marked as “published”, and once published it MUST NOT be changed unless a new version number is assigned.

Composition

An actual SDMX or DDI instance is often composed from various artefacts that can be maintained by different agencies. For example, an SDMX dataset that reports national statistics may use a data structure definition maintained by a supranational statistics organization such as Eurostat, and may use code lists defined by various standards bodies. Each of these artefacts are independently versioned. When referencing an artefact, the specific referenced version has to be part of the reference. The exception is “late binding”, where the latest available version is assumed, leading to a more brittle but easier to manage setup.

The XML specifications contain special elements that agencies can use to publish collections of re-usable artefacts.

When an actual SDMX dataset or DDI instance is processed, the processor must be able to retrieve the correct versions of all referenced artefacts and build a complete representation.