Re: [QB] ISSUE-29 Well-formedness

On 02/03/13 19:30, Richard Cyganiak wrote:
> On 28 Feb 2013, at 15:04, Dave Reynolds wrote:
>> # Outline approach
>>
>> 1. Define a set of expansion rules which can derive a full Data Cube from abbreviated format.
>>
>> 2. Define a "well formed abbreviated Data Cube" as one which, when expanded using the expansion rules, is a "well formed Data Cube".
>>
>> 3. The expansion rules will be expressed as SPARQL 1.1 CONSTRUCT expressions. Similar to what we did on ORG. We define an order (possibly iterative but that may not be needed) in which the CONSTRUCT expressions are applied. At each stage the result is the union of the source Data Cube graph and the graph generated by the CONSTRUCT.
>>
>> 4. The primary expansions rules will be:
>>    - expansion of components which have been abbreviated through use of qb:componentAttachment
>>    - propagation of dimensions given on a qb:Slice to each qb:observation within that slice
>
> +1 till here
>
>>    - deduction of some implicit rdf:type assertions from domain/range constraints
>
> I would prefer not to reverse engineer RDFS entailment in SPARQL construct queries. Can we treat RDFS entailment as orthogonal to the Cube rules?

We certainly could.

My concern is that if we require readers to understand RDFS entailment 
in order to understand what a normalized Data Cube looks like then that 
might be scary.

In essence I just want to say that you can omit rdf:type statements on 
observations, slices and on dsd components that are referenced using the 
qb:dimension/qb:measure/qb:attribute properties.

My preference would be to include specific closure rules for those cases 
just to make it clear that those types may be omited, but to include a 
narrative note that they are implied by normal RDFS closure.

Would that be acceptable?

>>    - (possibly) deduction of default value for qb:componentRequired for any component which is not explicitly marked as optional (some details around measures on multi-measure cubes to sort here)
>
> Hm, I never realized that the default for this property is "true". Would have preferred it to be "false". In other words, if it's not explicitly stated, then you cannot assume that it's required. Have to think more about this one.

My reading of the first bullet in 6.4 [1] is that we are saying if it is 
optional you must specify "false" which seems to imply that the default 
is true. However, it isn't clear - which is one reason for wanting to 
pin things down in closure rules :)

[1] https://dvcs.w3.org/hg/gld/raw-file/default/data-cube/index.html#dsd-dsd

I assume that "required" ought to be "true" for all Dimensions at least. 
We could have a default different for Dimensions than for Attributes but 
that might be confusing.

>> 5. A "well formed Data Cube" is an RDF graph which uses elements from the RDF Data Cube vocabulary and for which every SPARQL ASK query in a set of validation rules (see later) returns false. In addition the RDF graph should be consistent under RDF D-entailment using the XSD datatype map (as defined in RDF Semantics).
>
> Isn't it simpler to define the rules in a way that the cube is valid if the query returns *true*? If false is indeed simpler, then fine :-)

I'll reserve the right to pick whichever version proves easiest when I 
work through the details :) (hopefully later tody).

> I'd prefer if we say that the datatype map must contain all datatypes actually used in the cube. Thus, if I use some literals with custom datatypes, these literals must be valid too. (An implementation that doesn't know about your custom data type would probably display a warning that it didn't attempt to validate these literals.)

+1

>> 6. Within the context of a particular data interchange additional constraints may be imposed beyond Data Cube well formedness. In particular such interchange may require that the Data Cube graph, together with some import closure of ontologies, also be consistent under OWL with RDF or DL semantics. The Data Cube specification itself does not require this or specify any mechanism to declare the relevant semantics or ontologies to import beyond those available in existing W3C standards.
>
> +1
>
>> # Sketch of validation checks
>>
>> a. Every qb:Observation has precisely one qb:dataSet property (no orphaned observations).
>>
>> b. Every qb:DataSet has precisely one qb:structure property (all data sets have a data structure definition)
>>
>> c. For every qb:Observation o :-
>>      For every qb:component (cp) within the qb:DataStructureDefinition of the qb:DataSet of o which is marked as qb:componentRequired true :-
>>        o has a value for cp
>
> Exactly one value?

True.

>> [This will need some modifications in the case of multi-measure cubes which use MeasureDimensions, working through the details.]
>
> Yeah... I'm also scratching my head about multi-measure cubes.
>
>> d. For each qb:Slice which has a qb:sliceStructure value (sk) :-
>>     for each qb:componentProperty of sk (cp) :-
>>        the qb:Slice should have a value for cp
>>
>> e. If the Data Cube is a measure dimension multi-measure cube then every qb:Observation has a value for qb:measureType and a value for only one measure.
>>
>> g. Every qb:DimensionProperty must have a declared rdfs:range and if that range is skos:Concept
>
> ... or a subclass of skos:Concept ... ?

True.

>> it must have an associated qb:codeList. The range may be an xsd data type
>
> I quite strongly feel that custom data types should be allowed too.

OK.

>> Does that seem like a reasonable approach?
>>
>> Any immediately obvioius problems or holes with the outline checks?
>
> If a dimension property has a qb:codeList, then the value of the dimension property on every observation must be in the code list.
>
> There must be at least one measure among the components of a structure.
>
> Two observations in the same cube must not have exactly the same values for all dimensions.
>
> I think a cube where a dimension is optional doesn't make any sense... So a well-formed cube should have componentRequired true for all dimensions.
>
> Slice must have exactly one sliceStructure (modulo the other issue)
>
> SliceKey must be connected to DSD
>
> SliceKey components must be subset of the DSD's component
>
> if A qb:slice B and B qb:observation C then C qb:dataSet A
>
> qb:order on components within a dataset must be consecutive integers starting with 1

Good list, thanks.

> That's all I can think of at this time. Multi-measure cube stuff is still missing. If any of those things are hard to formalize in SPARQL, at least we should be really clear about the rule in prose.

Agreed.

Dave

Received on Sunday, 3 March 2013 15:52:08 UTC