Abstract

This document describes how dense geospatial raster data can be represented using the W3C RDF Data Cube (QB) ontology [vocab-data-cube] in concert with other popular ontologies including the W3C/OGC Semantic Sensor Network ontology (SSN) [vocab-ssn], the W3C/OGC Time ontology (Time) [owl-time], the W3C Simple Knowledge Organisation System (SKOS) [skos-reference], W3C PROV-O [prov-o] and the W3C/OGC QB4ST [qb4st]. It offers general methods supported by worked examples that focus on Earth observation imagery. Current triple stores, as the default database architecture for RDF, are not suitable for storing voluminous data like imagery derived from Landsat satellite sensors. We show how SPARQL queries can be served through an OGC Discrete Global Grid System for observations, coupled with a triple store for observational metadata. While the approach may also be suitable for other forms of coverage data, we leave the application to such data as an exercise for the reader.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.

For OGC This is a Public Draft of a document prepared by the Spatial Data on the Web Working Group (SDWWG) — a joint W3C-OGC project (see charter). The document is prepared following W3C conventions. The document is released at this time to solicit public comment.

The methods discussed in this note relate to other deliverables from SDWWG, notably the Use Cases and Requirements, The Best Practices, the Semantic Sensor Network Ontology, QB4ST and OWL-Time. Over the coming 3-4 months we expect to resolve any differences and update the note accordingly. All additional work planned, including that above, is marked as an issue in this Note.

This document was published by the Spatial Data on the Web Working Group as a . If you wish to make comments regarding this document, please send them to public-sdw-comments@w3.org (subscribe, archives). All comments are welcome.

Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 1 September 2015 W3C Process Document.

1. Introduction

Publishing data on the Web using Linked Data technologies makes it more accessible, easier to discover, and machine-readable. This format is appropriate for sparse statistical data, but publishers of denser coverage data (such as Landsat imagery) may be justifiably reluctant to embrace the size explosion that accompanies converting data to RDF. While such a conversion provides maximum machine-readability, most of the benefits of Linked Data can be realized with a compromise approach where only the metadata is expressed in RDF, or where the RDF is generated on-the-fly to service requests.

This document will refer to extracts of a small example to illustrate the approach. The complete source file is the ANU-LED example.

Add references throughout, including to other published linked-data-ish approaches to coverages.

2. The RDF Data Cube

The RDF Data Cube is an existing standard for representing data as RDF. It is typically used for data that is associated with statistical regions (e.g. suburbs). Common practice includes using the SKOS vocabulary to define the concepts being measured [ Observed property in coverage]. The RDF data model allows the user to define all the relevant components of their data and the concept they measure; including “attribute” components (metadata like sensor, resolution, uncertainty [ Quality per sample, Uncertainty in observations] etc.), “measure” components (the data proper) and “dimension” components (the domain of the dataset e.g. the time and statistical region each datum refers to). These techniques can easily be adapted to coverages, as the data model is flexible enough to define the appropriate attributes.

Example 1
:lat a qb:DimensionProperty ;
    rdfs:subPropertyOf geo:lat .

:long a qb:DimensionProperty ;
    rdfs:subPropertyOf geo:long .

:time a qb:DimensionProperty ;
    rdfs:range xsd:dateTime ;
    qb:concept sdmx-concept:timePeriod .

:dataValue a qb:MeasureProperty ;
    rdfs:range xsd:integer ;
    qb:concept :reflectance ;
    qb:concept sdmx-concept:obsValue .
    
:resolution a qb:AttributeProperty ;
    rdfs:range :pixelsPerDegree .
    
:pixelsPerDegree rdf:type rdfs:Datatype ;
    owl:equivalentClass xsd:double .

The ontology QB4ST [qb4st] extends the Data Cube for extra power and consistency when describing spatio-temporal aspects of data. [Georeferenced spatial data]. Any number of such dimensions can be defined, allowing for 1D, 2D, 3D or 4D coverages [Support for 3D, Time series, 4D model of space-time].

Example 2
:lat a qb4st:SpatialDimension ;
    rdfs:subPropertyOf geo:lat ;
    qb4st:crs <http://www.opengis.net/def/crs/EPSG/0/4326> .

:long a qb4st:SpatialDimension ;
    rdfs:subPropertyOf geo:long ;
    qb4st:crs <http://www.opengis.net/def/crs/EPSG/0/4326> .

:time a qb:DimensionProperty, qb4st:TemporalProperty ;
    rdfs:range xsd:dateTime ;
    qb:concept sdmx-concept:timePeriod .

2.1 Metadata and data

There is traditionally a distinction between data (the observations proper e.g. Landsat pixels) and metadata (which adds context to the observations e.g. resolution). In the Linked Data world, this distinction is not strict. However, it is still possible to separate the two in a typical Data Cube.

The value of an RDF Data Cube component can be attached to each individual observation or to the dataset as a whole. Dataset-wide metadata can therefore be distinguished from the rest of the dataset, because it is attached to the dataset object. This also makes it easy to fetch the metadata alone with a SPARQL query. This dataset-wide description alone is already a useful (and web-of-data friendly) approach to publishing spatial data [ Spatial metadata].

Example 3
:exampleDataset a qb:DataSet, prov:Entity ;
    qb:structure :exampleStructure ;
    :instrument :OLI ;
    :satellite :landsat-8 ;
    :band "4" ;
    :coverageSpatialDomain "POLYGON((90 41.87, 93.33 41.87, 93.33 38.18, 90 38.18, 90 41.87))"^^ogc:wktLiteral ;
    :coverageTemporalDomain :timeDomain ;
    prov:wasGeneratedBy :ANU-led-resampling ;
    prov:wasDerivedFrom :AGDC .
    
:p1 a :Pixel ;
    qb:dataSet :exampleDataset ;
    :lat "90.5556";
    :long "41.2444";
    :time "2001-10-26T21:32:52"^^xsd:dateTime ;
    :dataValue "15"^^xsd:integer ;
    :resolution "2.7"^^:pixelsPerDegree ;
    :dggsCell "R00004" ;
    :bounds  "POLYGON((90.37 41.45, 90.74 41.45, 90.74 41.04, 90.37 41.04, 90.37 41.45))"^^ogc:wktLiteral ;
    prov:wasDerivedFrom :example-tile .

Using the RDF Data Cube also opens up the possibility of more detailed metadata, such as individually tracing the provenance of each observation. It is not practical to serve Landsat imagery with RDF metadata attached to each pixel. However, it is reasonable to attach such metadata to tiles of a certain size. Thus, the RDF data model can be used, as long as RDF Data Cube observations are whole tiles rather than individual pixels [Support for tiling].

Example 4
:dataValue a qb:MeasureProperty ;
    rdfs:range [owl:unionOf(xsd:anyURI xsd:integer)] ;
    qb:concept :reflectance ;
    qb:concept sdmx-concept:obsValue .
    
:s1 a :GridSquare ;
    qb:dataSet :exampleDataset ;
    :lat "91.6667";
    :long "40.0270";
    :time "2001-10-26T21:32:52"^^xsd:dateTime ;
    :dataValue "http://www.example.org/led-example-image-R000" ;
    :resolution "0.9"^^:pixelsPerDegree ;
    :dggsCell "R000" ;
    :dggsLevelSquare "3" ;
    :dggsLevelPixel "4" ;
    :bounds  "POLYGON((90 41.87, 93.33 41.87, 93.33 38.18, 90 38.18, 90 41.87))"^^ogc:wktLiteral ;
    prov:wasDerivedFrom :example-tile .

3. A spectrum of linkiness

In the ideal web of data, every single observation has a unique URI, can be queried using SPARQL, and have metadata attached to it. Upon hearing this, anyone familiar with Landsat data would be forgiven for rejecting the whole enterprise as entirely impractical. But all is not lost! Most of the benefits of Linked Data (namely, linkability, enhanced discoverability, machine-readability) can be realized by just publishing the dataset-wide metadata in this format. More 'linkiness' provides diminishing returns along with increasing costs. It is up to the publisher to decide where along this spectrum is the compromise position appropriate for them.

To characterize the spectrum, we can broadly define three applications of RDF for coverages. From most to least costly, these are: to store a coverage dataset, to serve a coverage (“serialization”), and to describe the metadata of a coverage (“description”).

3.1 Storing a coverage

RDF data is stored natively in a triple store. The Data Cube, and RDF in general, are too verbose to be viable for large coverages.

3.2 Serving a coverage

In this model, coverage data is stored in some more appropriate format (such as HDF5). Specialized middleware receives SPARQL queries from a client and responds by sending a response in dynamically-generated RDF. Such a response is fairly verbose, but the cost is much smaller than actually storing the whole coverage in RDF. Further optimization is still necessary for this to be viable. It is possible to use tiles as the “observations” in the RDF Data Cube, rather than individual pixels [Support for tiling]. This significantly reduces the blowup that comes from encoding data as RDF [Compressible].

The advantage of serving a coverage in RDF is that the entire coverage, and individual tiles within it, become linkable [ Linkability]; this could be a major contribution to the Linked Data Web. With sufficiently advanced middleware, SPARQL queries over the dataset could be served just as if the data were stored in RDF, but for a fraction of the storage cost. The publisher can thus leverage the full power of Linked Data.

It is common to want only a chunk of the data available. For example, all observations near (within 10km of) Canberra, in the past year. The RDF Data Cube provides only for “slices” — predefined chunks that hold one or more dimensions constant. Unfortunately, this is an area where the Data Cube is insufficiently powerful for coverages. When data is stored and served in some other format, it is up to the publisher to provide ways for the consumer to acquire only certain chunks of the data. If serving coverage data using RDF, a possible approach is to use SPARQL queries to define the appropriate chunks [ Reference data chunks] (for example, a query returning all tiles of a certain resolution within a certain spatial rectangle). Even if using this method, publishers should still make it easy for a user to select chunks without the use of SPARQL, e.g. by providing an interface to generate the appropriate query using a few predefined operators.

3.3 Describing a coverage

A large portion of the benefits of Linked Data may be realized by describing only the metadata of a coverage in RDF. Then the dataset can be linked to [Linkability], and its essential properties are naturally machine-readable [Discoverability, Machine to machine]. The coverage itself can remain in whatever efficient format the publisher prefers.

No matter what approach is taken, it should be as easy as possible for the user to grab just the metadata, without having to figure out how to write an appropriate query. The definition of a qb:DataSet and the associated qb:DataStructureDefinition can serve this role, but it is still up to the publisher to make it easy for the user to download those definitions. This is also the appropriate place to think about the experience of Web crawlers on the publisher's Web page [Crawlability, Machine to machine].

It is also helpful if the user can easily identify the domain of a coverage (the spatial and temporal area where measurements are taken) [ Spatial metadata]. QB4ST [qb4st] does not currently have a term for that, but it may in the future.

Example 5
:exampleDataset a qb:DataSet, prov:Entity ;
    qb:structure :exampleStructure ;
    :instrument :OLI ;
    :satellite :landsat-8 ;
    :band "4" ;
    :coverageSpatialDomain "POLYGON((90 41.87, 93.33 41.87, 93.33 38.18, 90 38.18, 90 41.87))"^^ogc:wktLiteral ;
    :coverageTemporalDomain :timeDomain ;
    prov:wasGeneratedBy :ANU-led-resampling ;
    prov:wasDerivedFrom :AGDC .
    
    

:exampleStructure a qb4st:SpatioTemporalDSD ;
    qb:component :spatialDomainComponent ,
                 :temporalDomainComponent ,
                 :latitudeComponent ,
                 :longitudeComponent ,
                 :timeComponent ,
                 :satelliteComponent ,
                 :instrumentComponent ,
                 :bandComponent ,
                 :dataComponent ,
                 :dggsCellComponent ,
                 :dggsLevelSquareComponent ,
                 :dggsLevelPixelComponent ,
                 :resolutionComponent ,
                 :boundsComponent .

:spatialDomainComponent a qb4st:SpatialComponentSpecification ;
    qb:attribute :coverageSpatialDomain .
    
:temporalDomainComponent a qb4st:TemporalComponentSpecification ;
    qb:attribute :coverageTemporalDomain .

:latitudeComponent a qb4st:SpatialComponentSpecification ;
    qb:dimension :lat .

:longitudeComponent a qb4st:SpatialComponentSpecification ;
    qb:dimension :long .

:timeComponent a qb4st:TemporalComponentSpecification ;
    qb:dimension :time .

:satelliteComponent a qb:ComponentSpecification ;
    qb:attribute :satellite .

:instrumentComponent a qb:ComponentSpecification ;
    qb:attribute :instrument .

:bandComponent a qb:ComponentSpecification ;
    qb:attribute :band .

:dataComponent a qb:ComponentSpecification ;
    qb:measure :dataValue .

:dggsCellComponent a qb4st:SpatialComponentSpecification ;
    qb:dimension :dggsCell .

:dggsLevelSquareComponent a qb:ComponentSpecification ;
    qb:dimension :dggsLevelSquare .

:dggsLevelPixelComponent a qb:ComponentSpecification ;
    qb:dimension :dggsLevelPixel .

:resolutionComponent a qb:ComponentSpecification ;
    qb:attribute :resolution .

:boundsComponent a qb4st:SpatialComponentSpecification ;
    qb:attribute :bounds .
    
    
    
:coverageSpatialDomain a qb:AttributeProperty, qb4st:SpatialProperty ;
    rdfs:subPropertyOf :bounds .
    
:coverageTemporalDomain a qb:AttributeProperty, qb4st:TemporalProperty ;
    rdfs:range time:DateTimeInterval ;
    qb:concept sdmx-concept:timePeriod .

:lat a qb4st:SpatialDimension ;
    rdfs:subPropertyOf geo:lat ;
    qb4st:crs <http://www.opengis.net/def/crs/EPSG/0/4326> .

:long a qb4st:SpatialDimension ;
    rdfs:subPropertyOf geo:long ;
    qb4st:crs <http://www.opengis.net/def/crs/EPSG/0/4326> .

:time a qb:DimensionProperty, qb4st:TemporalProperty ;
    rdfs:range xsd:dateTime ;
    qb:concept sdmx-concept:timePeriod .

:satellite a qb:AttributeProperty ;
    rdfs:range ssn:Platform ;
    qb:concept sdmx-concept:collMethod .

:instrument a qb:AttributeProperty ;
    rdfs:range ssn:Sensor ;
    qb:concept sdmx-concept:collMethod .

:band a qb:AttributeProperty ;
    rdfs:range xsd:integer .

:dataValue a qb:MeasureProperty ;
    rdfs:range [owl:unionOf(xsd:anyURI xsd:integer)] ;
    qb:concept :reflectance ;
    qb:concept sdmx-concept:obsValue .

:dggsCell a qb4st:SpatialDimension ;
    qb4st:crs "rHEALPix WGS84 Ellipsoid" ;
    rdfs:range xsd:string ;
    qb:concept sdmx-concept:refArea .

:dggsLevelSquare a qb:DimensionProperty ;
    rdfs:range xsd:integer .

:dggsLevelPixel a qb:DimensionProperty ;
    rdfs:range xsd:integer .

:resolution a qb:AttributeProperty ;
    rdfs:range :pixelsPerDegree .

:bounds a qb:AttributeProperty, qb4st:SpatialProperty ;
    rdfs:subPropertyOf ogc:asWKT ;
    rdfs:domain :GridSquare ;
    qb4st:crs <http://www.opengis.net/def/crs/EPSG/0/4326> ;
    qb:concept sdmx-concept:refArea .

4. Discrete Global Grid Systems

Discrete global grid systems are a family of spatial reference systems that subdivide the Earth's surface into a hierarchy of cells. Larger cells are subdivided into smaller cells deeper in the hierarchy. Instead of a latitude and longitude, a location is specified by a cell id. Smaller cells are more precise, so choosing a cell forces the publisher to include a measure of uncertainty for any spatial measure. Cells are also appropriate units of tiling for gridded coverages. Each pixel in a tile covering a larger cell can represent a measurement made on a smaller cell. The OGC is currently in the process of standardizing a method to represent DGGSs.

The ANU-LED example in this document does not depend on the use of a DGGS. However, the DGGS has some convenient properties that make it particularly suitable for a Linked Data representation. First, each DGGS cell has a unique identifier, so it is easy to generate natural URIs for each piece of data. Second, the DGGS we use (rHEALPix, PDF) defines cell geometries so that cells at the same level of the hierarchy have equal areas. This makes rHEALPix a suitable format for storing multiple datasets at different resolutions, or several different resolution views of the same dataset. The equal-area constraint means different resolution pixels are directly comparable, and no resampling is required [Avoid coordinate transformations]. Third, the hierarchical nature of the DGGS makes it natural to implement spatial optimizations when responding to queries, by pruning the tree early to eliminate whole regions of unpromising cells that fall outside the desired area. Data structures other than DGGS, such as n-dimensional gridded data, whether geospatial or not, and hierarchies of data such as tile sets, octree or quadtree structures, are also amenable to these approaches.

4.1 Implementation

A proof of concept demonstrating the ANU-LED example with a SPARQL query system to retrieve satellite imagery has been implemented. This section briefly describes some of the strategies employed to make the implementation efficient. All code referenced here is available on GitHub.

As discussed previously, scalable implementations of a data cube for Earth observations must grapple with the verbosity of RDF representations relative to specialized coverage formats like GeoTIFF. This precludes materializing the entire dataset as RDF, storing it on disk, and serving it using an off-the-shelf triple store. Instead, implementations must employ a “virtual graph”, which can be used to service SPARQL queries without materializing all triples at once.

For the purpose of illustrating how triple stores service SPARQL queries — regardless of whether they are backed by virtual or materialized graphs — consider the query below.

Example 6
SELECT ?s ?v WHERE {
    ?s a :egType ;
        rdfs:label "Example" ;
        :value ?v .
    FILTER (?v < 15)
}

The heart of the query above is a Basic Graph Pattern (BGP) which specifies the triples to be accessed. In this case, the BGP contains three patterns. Written explicitly, they are:

Example 7
?s a :egType .
?s rdfs:label "Example" .
?s :value ?v .

A typical triple store will begin servicing the query above by iterating through each triple pattern in turn. First, a set of bindings for ?s will be generated that are consistent with ?s a :egType. That set of bindings will then be filtered by matching them against the pattern ?s rdfs:label "Example". The final ?s :value ?v will further filter the bindings for ?s by considering only subjects ?s with a :value property; it will also introduce a corresponding set of bindings for ?v. Having generated all bindings relevant to the BGP, a typical triple store will then apply the FILTER condition to each. This general approach works for both traditional storage backends (like on-disk RDF databases) and non-traditional ones (like virtual graphs).

There are a number of ways in which this naive, pattern-by-pattern approach can be improved. We have implemented two simple extensions:

  1. User-supplied triple patterns often generate useful constraints on the data returned by a SPARQL query. For example, the pattern ?s :dggsLevelSquare 5 . might allow a virtual graph implementation to ignore all observations not corresponding to cells at the fifth level of the DGGS hierarchy. In a naive implementation, only one such constraint can be considered at a time; this makes heuristics like query ordering essential. In contrast, a virtual graph implementation can simultaneously consider all supplied constraints in conjunction. For instance, if the user specifies ?s :dggsLevelSquare 5; :etmBand 3, then the virtual graph implementation can safely narrow its search to observations at level 5 of the DGGS hierarchy which correspond to Landsat's third ETM band.
  2. Consumers of spatial datasets typically want only a small spatial slice of the available data. In SPARQL, such a slice can be identified by a FILTER statement restricting the appropriate location properties. By inspecting the contents of FILTER statements, virtual graph implementations can preemptively narrow the set of bindings they generate to include only bindings which are spatially relevant. In general, this approach can yield excellent gains when the spatial extent of queries is small relative to the spatial extent of the overall dataset.

These simple optimizations can improve query time substantially. Consider the following SPARQL query, which fetches the intensity (?val) and URI (?s) associated with each single-pixel observation in a satellite imagery database.

Example 8
SELECT DISTINCT ?s ?val WHERE {
    ?s a led:Pixel
        ; led:etmBand "1"^^xsd:int
        ; led:dggsLevelSquare "5"^^xsd:int
        ; led:latMin ?latMin
        ; led:longMax ?longMax
        ; led:value ?val.
    # Everything north-west of Parliament House
    FILTER (?latMin > -35.3082
        && ?longMax < 149.1244)
}

The above query was executed on a 500MB HDF5 dataset containing over 4000 distinct observations. Repeating the query a thousand times with ten concurrent clients on a desktop machine yielded the following mean running times. In the following, the “naive” implementation simply iterates through the BGP specified above on a pattern-by-pattern basis, subsequently passing results to the SPARQL engine for evaluation against the filter constraint. “Multiple pattern-matching” corresponds to the first optimization identified above, and “additional spatial optimizations” refers to a combination of the first and second optimizations.

Implementation Mean runtime (± standard deviation)
Naive 378ms (±65.5ms)
…with multiple-pattern matching 35ms (±22.2ms)
…with additional spatial optimisations 17ms (±11.8ms)

“Multiple-pattern matching” is a relatively simple optimization, yet is sufficient to improve query performance tenfold. Accounting for the bounding box constraint specified in the query improves performance by another factor of two. It is likely that further performance gains could be found with more sophisticated optimizations. In particular, servicing queries with spatial restrictions could be further improved by employing an R-tree or some other specialized spatial data structure.

Could also point to the client implementation --with a screen dump and pointer to github code? The running implementation itself may be too long-term unstable to reference.

5. Use of existing ontologies

RDF makes it easy to re-use terms defined in external ontologies and some of the most widely applicable are explained here. See the ANU-LED example for some specific examples of these.

Note
Several ontologies are being looked at by the Spatial Data on the Web working group. The final best practice for using these ontologies will depend on the outputs of the group.
References to SSN, Latitude/Longitude, QB4ST, OWL-Time and other relevant Best Practices should be updated to reflect the outputs of the working group.

5.1 SSN

The Semantic Sensor Network ontology [vocab-ssn] defines terms for describing the sensors used to collect the data [ Sensor metadata]. The ANU-LED example illustrates a minimal description of Landsat 8 OLI observations using SSN [ SSN-like representation]. Much more detailed descriptions are possible. In particular, SSN description can be attached to individual tiles [ Quality per sample].

Example 9
:exampleDataset a qb:DataSet, prov:Entity ;
    qb:structure :exampleStructure ;
    :instrument :OLI ;
    :satellite :landsat-8 ;
    :coverageSpatialDomain "POLYGON((90 41.87, 93.33 41.87, 93.33 38.18, 90 38.18, 90 41.87))"^^ogc:wktLiteral .
    
:landsat-8 a ssn:Platform ;
    owl:sameAs cci-platform:plat_landsat_8 .

:OLI a ssn:Sensor ;
    ssn:onPlatform :landsat-8 ;
    ssn:hasMeasurementCapability :oli-capability ;
    ssn:observes :reflectance ;
    owl:sameAs cci-sensor:sens_oli .

:oli-capability a ssn:MeasurementCapability ;
    ssn:forProperty :reflectance .

:reflectance a ssn:Property, skos:Concept ;
    owl:sameAs sweet:Reflectance ;
    owl:sameAs cci-dataType:dtype_sr .

5.2 PROV-O

The Provenance ontology [prov-o] allows the provenance of data to be traced [ Provenance]. It provides terms for describing what entities the data is based on, what processes were used to convert those entities into each other and into the final data, and what individuals and organisations were responsible for these processes. PROV-O descriptions can be attached at the dataset level, or even at the individual observation/tile level to indicate precisely which source material each observation is derived from.

Example 10
:exampleDataset a qb:DataSet, prov:Entity ;
    qb:structure :exampleStructure ;
    prov:wasGeneratedBy :ANU-led-resampling ;
    prov:wasDerivedFrom :AGDC .
    
:ANU-led-resampling a prov:Activity ;
    prov:wasAssociatedWith :DmitryBrizhinev ;
    prov:used :AGDC .

:DmitryBrizhinev a prov:Agent, prov:Person ;
    foaf:givenName "Dmitry"^^xsd:string ;
    foaf:mbox      <mailto:dmitry.brizhinev@anu.edu.au> .

:AGDC a prov:Collection ;
    prov:wasAssociatedWith :GeoscienceAustralia ;
    prov:hadMember :example-tile .

:example-tile a prov:Entity ;
    prov:alternateOf <http://dapds00.nci.org.au/thredds/catalog/rs0/tiles/EPSG4326_1deg_0.00025pixel/LS8_OLI_TIRS/148_-035/2016/catalog.html?dataset=rs0/tiles/EPSG4326_1deg_0.00025pixel/LS8_OLI_TIRS/148_-035/2016/LS8_OLI_TIRS_FC_148_-035_2016-01-12T23-55-57.tif> .

:GeoscienceAustralia a prov:Agent, prov:Organization .

:s1 a :GridSquare ;
    qb:dataSet :exampleDataset ;
    :lat "91.6667";
    :long "40.0270";
    :dataValue "http://www.example.org/led-example-image-R000" ;
    prov:wasDerivedFrom :example-tile .

5.3 Latitude and longitude

The working group wishes to discourage unqualified uses of “latitude” and “longitude”. Most commonly, these terms refer to the WGS-84 Coordinate Reference System (CRS), but published data should always make its CRS explicit [Georectification]. In RDF, the WGS-84 geo vocabulary is often used, with its provided geo:lat and geo:long properties. The working group intends to standardize better properties, which allow the use of other CRSs. Once these exist, they should always be used in place of the geo properties. QB4ST defines the qb4st:crs property which will inherit from those [ CRS definition, Spatial metadata]. The RDF Data Cube and QB4ST make is easy to define several CRSs and use them simultaneously, providing clients with several views of the data [Multiple CRSs]. In the example below, a grid square can be identified by latitude and longitude of its centroid, by its boundary, or by its rHEALPix cell.

Example 11
:lat a qb4st:SpatialDimension ;
    rdfs:subPropertyOf geo:lat ;
    qb4st:crs <http://www.opengis.net/def/crs/EPSG/0/4326> .

:long a qb4st:SpatialDimension ;
    rdfs:subPropertyOf geo:long ;
    qb4st:crs <http://www.opengis.net/def/crs/EPSG/0/4326> .
    
:dggsCell a qb4st:SpatialDimension ;
    qb4st:crs "rHEALPix WGS84 Ellipsoid" ;
    rdfs:range xsd:string .
    
:bounds a qb:AttributeProperty, qb4st:SpatialProperty ;
    rdfs:subPropertyOf ogc:asWKT ;
    qb4st:crs <http://www.opengis.net/def/crs/EPSG/0/4326> .
    
:latitudeComponent a qb4st:SpatialComponentSpecification ;
    qb:dimension :lat .

:longitudeComponent a qb4st:SpatialComponentSpecification ;
    qb:dimension :long .
    
:dggsCellComponent a qb4st:SpatialComponentSpecification ;
    qb:dimension :dggsCell .
    
:boundsComponent a qb4st:SpatialComponentSpecification ;
    qb:attribute :bounds .
    
:s1 a :GridSquare ;
    :lat "91.6667";
    :long "40.0270";
    :dggsCell "R000" ;
    :bounds  "POLYGON((90 41.87, 93.33 41.87, 93.33 38.18, 90 38.18, 90 41.87))"^^ogc:wktLiteral .

5.4 GeoSPARQL

The GeoSPARQL ontology [GeoSPARQL] defines some terms and predicates for reasoning about objects and shapes in space [ Spatial operators]. It allows for the use of several encodings, including WKT, to describe polygons [ Encoding for vector geometry]. The ANU-LED example uses these terms to define the area covered by individual tiles in the coverage, and the entire spatial domain.

Example 12
:exampleDataset a qb:DataSet, prov:Entity ;
    qb:structure :exampleStructure ;
    :coverageSpatialDomain "POLYGON((90 41.87, 93.33 41.87, 93.33 38.18, 90 38.18, 90 41.87))"^^ogc:wktLiteral .

:bounds a qb:AttributeProperty, qb4st:SpatialProperty ;
    rdfs:subPropertyOf ogc:asWKT ;
    rdfs:domain :GridSquare ;
    qb4st:crs <http://www.opengis.net/def/crs/EPSG/0/4326> ;
    qb:concept sdmx-concept:refArea .

:s1 a :GridSquare ;
    qb:dataSet :exampleDataset ;
    :lat "91.6667";
    :long "40.0270";
    :dataValue "http://www.example.org/led-example-image-R000" ;
    :bounds  "POLYGON((90 41.87, 93.33 41.87, 93.33 38.18, 90 38.18, 90 41.87))"^^ogc:wktLiteral .

5.5 SKOS concepts

The RDF Data Cube is commonly used in conjunction with a SKOS [skos-reference] concept scheme (such as SDMX-RDF and its concept scheme) to define the meanings of the components [Observed property in coverage]. It is appropriate to use this for coverages also, but appropriate SKOS concepts do not always exist. They may need to be published along with the data proper.

Example 13
:reflectance a ssn:Property, skos:Concept ;
    owl:sameAs sweet:Reflectance ;
    owl:sameAs cci-dataType:dtype_sr .
    
:time a qb:DimensionProperty, qb4st:TemporalProperty ;
    rdfs:range xsd:dateTime ;
    qb:concept sdmx-concept:timePeriod .

:satellite a qb:AttributeProperty ;
    rdfs:range ssn:Platform ;
    qb:concept sdmx-concept:collMethod .

:instrument a qb:AttributeProperty ;
    rdfs:range ssn:Sensor ;
    qb:concept sdmx-concept:collMethod .
    
:dataValue a qb:MeasureProperty ;
    rdfs:range [owl:unionOf(xsd:anyURI xsd:integer)] ;
    qb:concept :reflectance ;
    qb:concept sdmx-concept:obsValue .

:dggsCell a qb4st:SpatialDimension ;
    qb4st:crs "rHEALPix WGS84 Ellipsoid" ;
    rdfs:range xsd:string ;
    qb:concept sdmx-concept:refArea .

5.6 OWL-Time

Coverages should be annotated appropriately with the times observations were taken [ Coverage temporal extent]. OWL-Time [owl-time] defines terms for time intervals that are useful for expressing the temporal domain of the dataset. It also allows temporal reference systems other than the Gregorian calendar. However, for Gregorian time instants, a datatype property using the built-in xsd:dateTime datatype is sufficient.

QB4ST defines terms that work nicely with OWL-Time.

Example 14
:coverageTemporalDomain a qb:AttributeProperty, qb4st:TemporalProperty ;
    rdfs:range time:DateTimeInterval ;
    qb:concept sdmx-concept:timePeriod .
    
:time a qb:DimensionProperty, qb4st:TemporalProperty ;
    rdfs:range xsd:dateTime ;
    qb:concept sdmx-concept:timePeriod .
    
:exampleDataset a qb:DataSet, prov:Entity ;
    qb:structure :exampleStructure ;
    :coverageSpatialDomain "POLYGON((90 41.87, 93.33 41.87, 93.33 38.18, 90 38.18, 90 41.87))"^^ogc:wktLiteral ;
    :coverageTemporalDomain :timeDomain .

:timeDomain a time:Interval ;
    time:hasBeginning :timeBeginning ;
    time:hasEnd :timeEnd .
    
:timeBeginning a time:Instant ;
    time:inXSDDateTime "2001-10-26T21:32:52"^^xsd:dateTime .

:timeEnd a time:Instant ;
    time:inXSDDateTime "2001-10-26T21:32:52"^^xsd:dateTime .

:s1 a :GridSquare ;
    qb:dataSet :exampleDataset ;
    :time "2001-10-26T21:32:52"^^xsd:dateTime .

A. Acknowledgements

This spec would not be possible without the TechLauncher program of the Australian National University and its ardent convenor, Shayne Flint. We also thank Matthew Purss of Geoscience Australia for participating in the program and supporting this project. Finally, Ed Parsons of Google, Robert Woodcock of CSIRO, and Robert Atkinson of the OGC provided valuable discussions and feedback.

B. References

B.1 Informative references

[GeoSPARQL]
Matthew Perry; John Herring. GeoSPARQL - A Geographic Query Language for RDF Data. 10 September 2012. URL: http://www.opengeospatial.org/standards/geosparql
[owl-time]
Simon Cox; Chris Little. W3C. Time Ontology in OWL. 12 July 2016. W3C Working Draft. URL: https://www.w3.org/TR/owl-time/
[prov-o]
Timothy Lebo; Satya Sahoo; Deborah McGuinness. W3C. PROV-O: The PROV Ontology. 30 April 2013. W3C Recommendation. URL: https://www.w3.org/TR/prov-o/
[skos-reference]
Alistair Miles; Sean Bechhofer. W3C. SKOS Simple Knowledge Organization System Reference. 18 August 2009. W3C Recommendation. URL: https://www.w3.org/TR/skos-reference
[vocab-data-cube]
Richard Cyganiak; Dave Reynolds. W3C. The RDF Data Cube Vocabulary. 16 January 2014. W3C Recommendation. URL: https://www.w3.org/TR/vocab-data-cube/
[qb4st]
Rob Atkinson. W3C/OGC. QB4ST: RDF Data Cube extensions for spatio-temporal components. 5 January 2017. W3C/OGC Working Draft. URL: https://www.w3.org/TR/qb4st/
[vocab-ssn]
Kerry Taylor; Krzysztof Janowicz; Danh Le Phuoc; Armin Haller. W3C. Semantic Sensor Network Ontology. 31 May 2016. W3C Working Draft. URL: https://www.w3.org/TR/vocab-ssn/