Data Cube Vocabulary

From Government Linked Data (GLD) Working Group Wiki
Revision as of 14:06, 22 March 2012 by Zwhitley (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The Data Cube Vocabulary is an RDF vocabulary for representing multi-dimensional “data cubes” in RDF. It builds on SKOS. It is a simplified version of the core of the SDMX Information Model. The GLD Working Group is currently working towards advancing this vocabulary to a W3C Recommendation, in order to address the following work item from its charter: The group will produce a vocabulary, compatible with SDMX, for expressing some kinds of statistical data. This need not be as expressive as all of SDMX, but may provide a subset as in the RDF Data Cube vocabulary. It may also include ways to annotate data to indicate its assumptions and comparability.

Documents and Deliverables

Issue tracking

Discussion and feedback

People

The following WG members are interested in this work:

The history of Data Cube, SDMX-RDF and SCOVO

SDMX-RDF started as an effort to translate the SDMX standard to RDF. But SDMX is massive, and many parts of the standard are not really relevant for the kind of web-based data publishing that RDF excels in. So we identified a core of SDMX that seemed most relevant for data publishing, called that core “Data Cube”, and published it separately. We modified the SDMX-RDF work-in-progress so that it uses Data Cube for these core parts, and SDMX-RDF native classes and properties for the other parts of SDMX. But work on SDMX-RDF has been pretty much dormant since, and all efforts have been focused on Data Cube.

SCOVO is an earlier effort at designing an RDF vocabulary for statistical data. It is very simple and has seen some adoption. It has a number of significant limitations: It doesn't describe the structure of the data cube, but only its contents; and it describes the contents in a way that makes recovering the structure very hard. The effect is that SPARQLing of SCOVO data is rather difficult. SDMX-RDF and Data Cube are both designed with an eye on making SPARQL queries against the data easy.

Further reading and material

Current users and published datasets

Work items

  • Document a set of use cases
  • Define exactly the subset of SDMX that should be covered
  • Update the specification to reflect the latest updates in SDMX 2.1 (it's currently based on 2.0)
  • Get in touch with SDMX sponsors to encourage participation in the GLD WG
  • Better documentation on the interplay with other vocabularies: SKOS, metadata, standard government URI sets
  • Discuss whether the SDMX Content-Oriented Guidelines are in scope
  • Discuss document structure – pure specification vs. cookbook/tutorial/guide

Current discussion topics at Google Group on publishing-statistical-data

These should be transformed into issues for the W3C tracker.

  • Representing aggregates using QB
  • Features, introduced by ISO Extensions to SKOS
    • Classification levels. When to use a new level and when to reuse one. E.g., is skosclass:depth intrinsical to a level?
    • Renaming of skosclass:hasDomainConcept and skosclass:hasRangeConcept
  • qb:DimensionProperty subClassOf qb:CodedProperty ?