Beginnings of a W3C note

From Spatial Data on the Web Working Group
Jump to: navigation, search

The discussion in the Coverage subgroup seems to be moving towards publishing a note on CoverageJSON and another on the use of RDF and the RDF datacube. This note would also be an OGC discussion paper. The ANU team has offered to lead the editing of this note. Anyone is welcome to contribute ideas to this page. Other important pages are listed in Coverage subgroup.

The formal draft of this note is now located here [1].

Example approaches

Two main examples currently exist of extensions to the RDF Datacube for coverage data.

Rob Atkinson is working on QB4ST. See the source [2].

The ANU team has their own ontology. See the source [3] and a minimal example of using it [4].

Three possible uses

We can broadly define three applications of RDF for coverages.

These are: to describe a coverage dataset (its metadata), to serve a coverage, and to store a coverage. Alternatively we can divide this into "description" (the first) and "serialisation" (the last two).

Storing a coverage

RDF data is stored natively in a triple store. The datacube, and RDF in general, are too verbose to be viable for large coverages.

Serving a coverage

This is the approach currently being tested by the ANU team. In this model, coverage data is stored in some more appropriate format (such as HDF5). Specialised middleware receives SPARQL queries from a client and responds by sending a response in dynamically-generated RDF. Such a response is fairly verbose, but the cost is much smaller than actually storing the whole coverage in RDF. The ANU team feels that an important contribution of their approach is using tiles as "observations" in the RDF datacube, rather than individual pixels. This significantly reduces the blowup that comes from encoding data as RDF.

The advantage of serving a coverage in RDF is that the entire coverage, and individual tiles within it, become linkable; this could be a major contribution to the linked data web. With sufficiently advanced middleware, SPARQL queries over the dataset could be served just as if the data were stored in RDF, but for a fraction of the storage cost. We can thus leverage the full power of linked data. The ANU team is currently optimising their implementation of such a middleware.

Describing a coverage

A large portion of the benefits of linked data may be realised by describing only the metadata of a coverage in RDF. Then the dataset can be linked to, and its essential properties are naturally machine-readable. The coverage itself can remain in whatever efficient format the publisher prefers.

Rob has identified a requirement that, no matter what approach is taken, it should be as easy as possible for the user to grab *just* the metadata, without having to figure out how to write an appropriate query.

The RDF Datacube

The RDF Datacube [5] is an existing standard for representing data as RDF. It is typically used for data that is associated with statistical regions (e.g. suburbs). Common practice includes using the SKOS vocabulary to define the concepts being measured. The RDF data model allows the user to define all the relevant components of their data and the concept they measure; including "attribute" components (metadata like sensor, resolution, etc.), "measure" components (the data proper) and "dimension" components (the domain of the dataset e.g. the statistical region each datum refers to). We see no reason why these techniques cannot be used for coverages.

The value of each component can be attached to each individual observation or to the dataset as a whole. It can also be attached to pre-defined "slices" of the data. It is not practical to serve landsat imagery with RDF metadata attached to each pixel. However, it is reasonable to attach such metadata to tiles of a certain size. Thus, the RDF data model can be used, as long as RDF datacube observations are whole tiles rather than individual pixels. The data model is flexible enough to define the appropriate attributes. The extension proposed by Rob (QB4ST) extends the datacube for extra power when describing gridded data.

Our own work consists of an ontology that essentially just defines an observation which is a tile [6]. Our example [7] shows how the properties of a coverage can be defined using the RDF Datacube model. This approach can be used only for dataset-wide properties (in which case it becomes a description of the metadata), or for individual tiles as well.

Use of existing ontologies

Several ontologies are being looked at by the working group. The final best practice for using these ontologies will depend on the outputs of the group.

SSN

The semantic sensor network ontology defines terms for describing the sensors used to collect the data. See our example [8] for a minimal description of landsat 8 OLI observations using SSN. Much more detailed descriptions are possible.

PROV-O

The PROV-O ontology allows the provenance of data to be traced. It provides terms for describing what entities the data is based on, what processes were used to convert those entities into each other and into the final data, and what individuals and organisations were responsible for these processes. PROV-O descriptions can be attached at the dataset level, or even at the individual observation/tile level to indicate precisely which source material each observation is derived from. See our example [9].

Latitude and longitude

The working group wishes to discourage unqualified uses of "latitude" and "longitude". Most commonly, these terms refer to the WGS-84 CRS (coordinate reference system), but published data should always make its CRS explicit. In RDF, the WGS-84 geo vocabulary [10] is often used, with its provided geo:lat and geo:long properties. The working group intends to standardise better properties, which allow the use of other CRSs. Once these exist, they should always be used in place of the geo properties.

SKOS concepts

The datacube is commonly used in conjunction with a SKOS concept scheme (such as SDMX-RDF [11] [12]) to define the meanings of the components. It is appropriate to use this for coverages also (see our example [13]), but appropriate SKOS concepts do not always exist. They may need to be published along with the data proper.