RDF Datacube for Coverages

From Spatial Data on the Web Working Group
Jump to: navigation, search

Now a W3C Note

<https://www.w3.org/TR/qb4st/>

Original discussion follows:

Introduction to QB4ST

QB4ST is an extension to RDF-QB to provide mechanisms for defining spatio-temporal aspects of dimension and measure descriptions.

QB4ST is intended to enable the development of semantic descriptions of specific spatio-temporal data elements by appropriate communities of interest, rather than to enumerate a static list of such definitions. It provides a minimal ontology of spatio-temporal properties and defines abstract classes for datacube components (i.e. dimensions and measures) that use these, to allow classification and discovery of specialised component definitions using general terms.

QB4ST does not enumerate all possible specialised QB components, or define how such components may be managed or published as Web resources, however it is designed to support the publication of re-usable and comparable definitions using a registration model allowing delegation of governance to appropriate communities of practice.

Current work is accessible via Github at <https://github.com/rob-metalinkage/sdw/tree/gh-pages/coverages/qb4st>.

A proof-of-concept implementation of a registry of QB-component specialisations is deployed at <http://resources.opengeospatial.org/def/qbcomponents>. This implementation uses Linked Data principles to offer several possible views of such definitions, however these mechanisms are not the subject of this Note and are subject to change.

Rationale:

The RDF Datacube vocabulary (QB) was designed to support the publication of statistical data (and specifically the scope of SDMX standard) as RDF. It provides some basic concepts that apply to all multi-dimensional data, and envisions such use: " This cube model is very general and so the Data Cube vocabulary can be used for other data sets such as survey data, spreadsheets and OLAP data cubes [OLAP].

From <https://www.w3.org/TR/vocab-data-cube/> "

QB contains definitions to support the attachment of metadata to observations, directly or indirectly through "structural metadata":

"Structural metadata Having located an observation, we need certain metadata in order to be able to interpret it. What is the unit of measurement? Is it a normal value or a series break? Is the value measured or estimated? These metadata are provided as attributes and can be attached to individual observations, or to higher levels."

From <https://www.w3.org/TR/vocab-data-cube/>

By defining "higher level structural metadata" one can us QB to define aspects of the structure of a set of observations, independently of the encoding of the observations themselves. If we treat the observation values as a URL reference, QB can be used to exchange information about any multidimensional data accessible on the Web, in any format.

The semantics of QB distinguishes between three different types of properties of an observation:


"A cube is organized according to a set of dimensions, attributes and measures. We collectively call these components. The dimension components serve to identify the observations. A set of values for all the dimension components is sufficient to identify a single observation. Examples of dimensions include the time to which the observation applies, or a geographic region which the observation covers. The measure components represent the phenomenon being observed. The attribute components allow us to qualify and interpret the observed value(s). They enable specification of the units of measure, any scaling factors and metadata such as the status of the observation (e.g. estimated, provisional)."

From <https://www.w3.org/TR/vocab-data-cube/#cubes-model>

Components share a set of properties (all optional in the QB model) - including:

qb.concept : being the underlying semantic concept - such as time, space being represented qb.codeList : a binding to a set of (potentially) hierarchical terms that denote values within the space defined by the qb.concept rdfs:range: a RDFS defined datatype designator for component values


Note: qb:concept can be multivalued - referencing concepts from many possible vocabularies. Its less obvious what the semantics of a multi-valued qb:codeList is. Multiple values of rdfs:range must all be true (its the intersection of the sets)

Spatio-temporal coverages that use grids (irregular or regular) would thus have dimensions defined for spatial and temporal dimensions. Discrete coverages would have space and/or time described using measure properties. Each type of property would need to share the same attribute property definitions for concept, codeList and rdfs:range etc.

If such descriptions were to be made for arbitrary encodings of observation value sets, instead of defining the specific RDF encoding, then one additional piece of information is required for the QB model to be applied - the ability to designate which structural element of the encoding a ComponentSpecification

The definition of qb:concept has important implications. The definition is:

qb:concept a rdf:Property, owl:ObjectProperty;
   rdfs:label "concept"@en;
   rdfs:comment "gives the concept which is being measured or indicated by a ComponentProperty"@en;
   rdfs:domain qb:ComponentProperty;
   rdfs:range skos:Concept;

This definition makes a statement about any qb:ComponentProperty - that SKOS is the canonical vocabulary for defining sets of terms.

Thus - a set of SKOS concepts defining spatio-temporal concepts is necessary to create interoperable definitions of data using QB. This can be achieved in the context of canonical Time and Spatial ontologies by simply declaring each defined rdfs:Class in these ontologies to also be a skos:Concept.

By defining dimensions and measures that support both fully described coordinates and usages where this information is implicit (e.g. Embedded wgs84 latitude, longitude properties) , the relationship between these can be formally defined and equivalence (and differences) asserted.

We can define a set of "baseline" spatio-temporal components relevant to most mass-market and OGC service-oriented applications:

  • X,y and z dimensions
  • (x,y) and (x,y,z) measures
  • Time dimension
  • Time measure
  • Nested grid dimensions

Each of these is a specialised Class - and instances may be defined to specify particular content. Mass market application profiles can be defined as well-known instances of such dimensions