The Data Cube Vocabulary is an RDF vocabulary for representing multi-dimensional “data cubes” in RDF. It builds on SKOS. It is a simplified version of the core of the SDMX Information Model. The GLD Working Group is currently working towards advancing this vocabulary to a W3C Recommendation, in order to address the following work item from its charter: The group will produce a vocabulary, compatible with SDMX, for expressing some kinds of statistical data. This need not be as expressive as all of SDMX, but may provide a subset as in the RDF Data Cube vocabulary. It may also include ways to annotate data to indicate its assumptions and comparability.
Documents and Deliverables
- The RDF Data Cube Vocabulary (Editor's Draft)
- Use Cases and Requirements for the Data Cube Vocabulary (Editor's Draft)
- wiki version: Data Cube Vocabulary: Use Cases
Discussion and feedback
- Comments mailing list: firstname.lastname@example.org (archive) All feedback is welcome!
- Working Group mailing list: email@example.com (archive). Only Working Group members can post to this list.
- Publishing Statistical Data Group: Data Cube can be discussed, like any other topic surrounding the publication of statistical data on the Web, in the publishing-statistical-data Google Group.
The following WG members are interested in this work:
The history of Data Cube, SDMX-RDF and SCOVO
SDMX-RDF started as an effort to translate the SDMX standard to RDF. But SDMX is massive, and many parts of the standard are not really relevant for the kind of web-based data publishing that RDF excels in. So we identified a core of SDMX that seemed most relevant for data publishing, called that core “Data Cube”, and published it separately. We modified the SDMX-RDF work-in-progress so that it uses Data Cube for these core parts, and SDMX-RDF native classes and properties for the other parts of SDMX. But work on SDMX-RDF has been pretty much dormant since, and all efforts have been focused on Data Cube.
SCOVO is an earlier effort at designing an RDF vocabulary for statistical data. It is very simple and has seen some adoption. It has a number of significant limitations: It doesn't describe the structure of the data cube, but only its contents; and it describes the contents in a way that makes recovering the structure very hard. The effect is that SPARQLing of SCOVO data is rather difficult. SDMX-RDF and Data Cube are both designed with an eye on making SPARQL queries against the data easy.
Further reading and material
- Dave Reynolds' slideset (presented at SemTech 2011)
- Richard Cyganiak's slideset (presented at first GLD WG face-to-face meeting)
- Epimorphics Ltd. white paper
- Code repository for SDMX-RDF and Data Cube
- Google Group for SDMX-RDF and Data Cube
- There is a chapter on statistical data and Data Cube in the Linked Government Data book
Current users and published datasets
- List of datasets using the RDF Data Cube Vocabulary
- Datasets tagged “qb” on the Data Hub
- DERI's Eurostat conversion
- (there are more; Richard keeps a list somewhere)
- Document a set of use cases
- Define exactly the subset of SDMX that should be covered
- Update the specification to reflect the latest updates in SDMX 2.1 (it's currently based on 2.0)
- Get in touch with SDMX sponsors to encourage participation in the GLD WG
- Better documentation on the interplay with other vocabularies: SKOS, metadata, standard government URI sets
- Discuss whether the SDMX Content-Oriented Guidelines are in scope
- Discuss document structure – pure specification vs. cookbook/tutorial/guide
Current discussion topics at Google Group on publishing-statistical-data
These should be transformed into issues for the W3C tracker.
- Representing aggregates using QB
- Features, introduced by ISO Extensions to SKOS
- Classification levels. When to use a new level and when to reuse one. E.g., is skosclass:depth intrinsical to a level?
- Renaming of skosclass:hasDomainConcept and skosclass:hasRangeConcept
- qb:DimensionProperty subClassOf qb:CodedProperty ?