[QB] ISSUE-29 Well-formedness

ISSUE-29 is the one that entails the most work but which we don't want 
to drop.

The issue is about defining constraints on what a "well formed" Data 
Cube should look like when published, which consuming software can then 
rely on.

This is about more than the usual semantic constraints on the vocabulary 
because in this domain we also want some notions of closed-world 
completeness. If expected assertions are missing that's not an 
inconsistency from an OWL point of view but is an interoperability 
problem for Data Cube.

I think we need to agree on:
   1. roughly what constraints we wish to impose
   2. the approach for expressing constraints
   3. then the precise details of the set of constraints

This note is to sketch out a rough approach before getting too far down 
into the details.

# Outline approach

1. Define a set of expansion rules which can derive a full Data Cube 
from abbreviated format.

2. Define a "well formed abbreviated Data Cube" as one which, when 
expanded using the expansion rules, is a "well formed Data Cube".

3. The expansion rules will be expressed as SPARQL 1.1 CONSTRUCT 
expressions. Similar to what we did on ORG. We define an order (possibly 
iterative but that may not be needed) in which the CONSTRUCT expressions 
are applied. At each stage the result is the union of the source Data 
Cube graph and the graph generated by the CONSTRUCT.

4. The primary expansions rules will be:
    - expansion of components which have been abbreviated through use of 
qb:componentAttachment
    - propagation of dimensions given on a qb:Slice to each 
qb:observation within that slice
    - deduction of some implicit rdf:type assertions from domain/range 
constraints
    - (possibly) deduction of default value for qb:componentRequired for 
any component which is not explicitly marked as optional (some details 
around measures on multi-measure cubes to sort here)

5. A "well formed Data Cube" is an RDF graph which uses elements from 
the RDF Data Cube vocabulary and for which every SPARQL ASK query in a 
set of validation rules (see later) returns false. In addition the RDF 
graph should be consistent under RDF D-entailment using the XSD datatype 
map (as defined in RDF Semantics).

6. Within the context of a particular data interchange additional 
constraints may be imposed beyond Data Cube well formedness. In 
particular such interchange may require that the Data Cube graph, 
together with some import closure of ontologies, also be consistent 
under OWL with RDF or DL semantics. The Data Cube specification itself 
does not require this or specify any mechanism to declare the relevant 
semantics or ontologies to import beyond those available in existing W3C 
standards.

# Sketch of validation checks

a. Every qb:Observation has precisely one qb:dataSet property (no 
orphaned observations).

b. Every qb:DataSet has precisely one qb:structure property (all data 
sets have a data structure definition)

c. For every qb:Observation o :-
      For every qb:component (cp) within the qb:DataStructureDefinition 
of the qb:DataSet of o which is marked as qb:componentRequired true :-
        o has a value for cp

[This will need some modifications in the case of multi-measure cubes 
which use MeasureDimensions, working through the details.]

d. For each qb:Slice which has a qb:sliceStructure value (sk) :-
     for each qb:componentProperty of sk (cp) :-
        the qb:Slice should have a value for cp

e. If the Data Cube is a measure dimension multi-measure cube then every 
qb:Observation has a value for qb:measureType and a value for only one 
measure.

g. Every qb:DimensionProperty must have a declared rdfs:range and if 
that range is skos:Concept it must have an associated qb:codeList. The 
range may be an xsd data type

Does that seem like a reasonable approach?

Any immediately obvioius problems or holes with the outline checks?

Dave

[1] http://www.w3.org/2011/gld/track/issues/29

Received on Thursday, 28 February 2013 15:04:32 UTC