Re: [QB] ISSUE-29 Well-formedness

On 28 Feb 2013, at 15:04, Dave Reynolds wrote:
> # Outline approach
> 
> 1. Define a set of expansion rules which can derive a full Data Cube from abbreviated format.
> 
> 2. Define a "well formed abbreviated Data Cube" as one which, when expanded using the expansion rules, is a "well formed Data Cube".
> 
> 3. The expansion rules will be expressed as SPARQL 1.1 CONSTRUCT expressions. Similar to what we did on ORG. We define an order (possibly iterative but that may not be needed) in which the CONSTRUCT expressions are applied. At each stage the result is the union of the source Data Cube graph and the graph generated by the CONSTRUCT.
> 
> 4. The primary expansions rules will be:
>   - expansion of components which have been abbreviated through use of qb:componentAttachment
>   - propagation of dimensions given on a qb:Slice to each qb:observation within that slice

+1 till here

>   - deduction of some implicit rdf:type assertions from domain/range constraints

I would prefer not to reverse engineer RDFS entailment in SPARQL construct queries. Can we treat RDFS entailment as orthogonal to the Cube rules?

>   - (possibly) deduction of default value for qb:componentRequired for any component which is not explicitly marked as optional (some details around measures on multi-measure cubes to sort here)

Hm, I never realized that the default for this property is "true". Would have preferred it to be "false". In other words, if it's not explicitly stated, then you cannot assume that it's required. Have to think more about this one.

> 5. A "well formed Data Cube" is an RDF graph which uses elements from the RDF Data Cube vocabulary and for which every SPARQL ASK query in a set of validation rules (see later) returns false. In addition the RDF graph should be consistent under RDF D-entailment using the XSD datatype map (as defined in RDF Semantics).

Isn't it simpler to define the rules in a way that the cube is valid if the query returns *true*? If false is indeed simpler, then fine :-)

I'd prefer if we say that the datatype map must contain all datatypes actually used in the cube. Thus, if I use some literals with custom datatypes, these literals must be valid too. (An implementation that doesn't know about your custom data type would probably display a warning that it didn't attempt to validate these literals.)

> 6. Within the context of a particular data interchange additional constraints may be imposed beyond Data Cube well formedness. In particular such interchange may require that the Data Cube graph, together with some import closure of ontologies, also be consistent under OWL with RDF or DL semantics. The Data Cube specification itself does not require this or specify any mechanism to declare the relevant semantics or ontologies to import beyond those available in existing W3C standards.

+1

> # Sketch of validation checks
> 
> a. Every qb:Observation has precisely one qb:dataSet property (no orphaned observations).
> 
> b. Every qb:DataSet has precisely one qb:structure property (all data sets have a data structure definition)
> 
> c. For every qb:Observation o :-
>     For every qb:component (cp) within the qb:DataStructureDefinition of the qb:DataSet of o which is marked as qb:componentRequired true :-
>       o has a value for cp

Exactly one value?

> [This will need some modifications in the case of multi-measure cubes which use MeasureDimensions, working through the details.]

Yeah... I'm also scratching my head about multi-measure cubes.

> d. For each qb:Slice which has a qb:sliceStructure value (sk) :-
>    for each qb:componentProperty of sk (cp) :-
>       the qb:Slice should have a value for cp
> 
> e. If the Data Cube is a measure dimension multi-measure cube then every qb:Observation has a value for qb:measureType and a value for only one measure.
> 
> g. Every qb:DimensionProperty must have a declared rdfs:range and if that range is skos:Concept

... or a subclass of skos:Concept ... ?

> it must have an associated qb:codeList. The range may be an xsd data type

I quite strongly feel that custom data types should be allowed too.

> Does that seem like a reasonable approach?
> 
> Any immediately obvioius problems or holes with the outline checks?

If a dimension property has a qb:codeList, then the value of the dimension property on every observation must be in the code list.

There must be at least one measure among the components of a structure.

Two observations in the same cube must not have exactly the same values for all dimensions.

I think a cube where a dimension is optional doesn't make any sense... So a well-formed cube should have componentRequired true for all dimensions.

Slice must have exactly one sliceStructure (modulo the other issue)

SliceKey must be connected to DSD

SliceKey components must be subset of the DSD's component

if A qb:slice B and B qb:observation C then C qb:dataSet A

qb:order on components within a dataset must be consecutive integers starting with 1

That's all I can think of at this time. Multi-measure cube stuff is still missing. If any of those things are hard to formalize in SPARQL, at least we should be really clear about the rule in prose.

Best,
Richard



> 
> Dave
> 
> [1] http://www.w3.org/2011/gld/track/issues/29
> 
> 

Received on Saturday, 2 March 2013 19:44:24 UTC