Warning:
This wiki has been archived and is now read-only.

Input to §10 How to use these best practices

From Spatial Data on the Web Working Group
Jump to: navigation, search

@jtandy's request for input to §10 "How to use these best practices":

What we need is the hard won experience from you experts in the working group to be distilled / synthesised into simple statements that non-experts can follow; the steps and decisions that a spatial data publisher should make or consider. I can edit things together (for the WD release after next) - what I _really_ need is for you to write down the steps you go through when publishing spatial data. I especially need you to think about why you chose format X or vocabulary Y ... it seems we often intuit this step (based on some upstream thinking or perhaps a dogmatic approach!) so please be introspective and try to unpick your thought processes.

Thank you in advance...

UPDATE: 7-May-2017 (Jeremy) Having reviewed the "How to use" section and the proposals made here, I concluded that this information was largely covered elsewhere in the document. So I have removed this section and slotted anything that wasn't already said elsewhere into the relevant parts of the BP document.


Introduction

The steps below are relevant to organisations who may already be publishing data directly via some form of spatial data portal or service. In many cases the steps below will be in addition to the current process and will provide a mechanism to make the data more accessible and useful beyond the existing user community. If organisations do not yet publish data on the web, spatial data can be published in legacy formats on an organisation's web site (five-star scheme level <=3) with little overhead (see https://w3c.github.io/dwbp/bp.html).

Providing access to spatial data with point geometry

If your data concerns points only (latitude, longitude and altitude) and you are ok with WGS84, use the W3C WGS84 vocabulary (http://www.w3.org/2003/01/geo/wgs84_pos#) and you are done. Spatial data represented in the W3C WGS84 vocabulary can be easily integrated into existing RDF data.

Keep in mind that the W3C WGS84 vocabulary is quite simple. First, you can only represent points (no regions), and second, the modelling does not distinguish between Feature and Geometry (see next).

@@@Is there a difference between region and geometry? In this document we use both and I get the impression that they are used synonymously. If there is a difference, we need to elaborate what it is.

Distinguishing between Feature and Geometry

Step: make sure to distinguish in your modelling between Feature and Geometry; decide on identifiers for both Feature and Geometry (if the geometry is fairly small, it can be expressed as a b-node and does not need its own identifier).

In GIS, Features and Geometries are often treated separately (e.g., https://gis.stackexchange.com/questions/137946/difference-between-feature-and-geometry) @@@what is _the_ GIS link for Feature vs. Geometry?.

The distinction between Feature and Geometry may be implicit. For example, consider administrative areas such as electoral districts. There are two cases:

  • First, the district may be represented only by the geometry of its boundary, and there is a single canonical version of the geometry.
  • Second, the district may be represented by different geometries of its boundary (due to different scale of data capture or different precision), so there are multiple versions of the geometry representing the same district.

As example, consider OSi Electoral Divisions. The example shows... @@@please elaborate what the link illustrates.

The distinction between Feature and Geometry becomes relevant when integrating data. Consider two datasets, A and B, talking about the country of Ireland (the Feature). Each dataset provides a region geometry about Ireland. While it is possible to equate the two features (say, A:Ireland and B:Eire) and thus integrate the statements made about both, the two geometries are very likely different and should be treated separately.

Providing access to spatial data with region geometry

If your spatial data concerns regions, you have different choices on the mode of publication. Do you want to publish your data via a dump file, a queryable endpoint or an API (such as Linked Data)? @@@Does this assume that the spatial data is regions-only or that the data dump contains features and regions? If it contains features, too, it can be made 5-star data by connecting the features.

  • Data dump: you may use different levels (c.f. five-star scheme): Shape files... (five-star scheme level <=3), RDF dump in GeoSPARQL vocabulary with geometries as RDF literals (five-star scheme level 4, level 5 if the Features are connected: :GreatBritain dct:isPartOf :EuropeanUnion, assuming that :GreatBritain and :EuropeanUnion have geometries)
  • Queryable endpoint: GeoSPARQL is fine (but be aware that running a SPARQL endpoint is expensive in terms of computational resources) (five-star scheme level 4, maybe 5 if the RDF data is really linked, that is, you can traverse from the index file to all other entities)

@@@Is there a triple pattern fragments implementation that supports GeoSPARQL? That would reduce the expensiveness.

  • API: provide Features as Linked Data (URIs for entities, data in RDF on HTTP GET, links to other URIs), use content negotiation on Geometry URIs (URIs for geometries, data in GML, KML, SVG... on HTTP GET), think about providing access to spatial relations (RCC) via the API (five-star scheme level 5)

@@@Need to explain what RCC is

Data Formats

@@@Currently, the how-to-use only talks about JSON and not about GeoJSON. Does it make sense to introduce both as early as possible in the document?

Is this section too technical?

§12.4 Parse that! states: "Imagery formats JPEG [JPEG2000] and PNG [PNG] can also be coerced to carry data; providing 3 or 4 channels of 8-bit data values."

An end user would need to understand that imagery is typical presented as a set of bands of data, where each band is a segment of the Electromagnetic Spectrum.

Using the example of a set a 7 bands within a Landsat 7 scene, only three bands are typically presented in a JPEG/PNG file as RGB values. These RGB values in the file may actually depict a different band combination from the image, e.g. data from Blue, NIR and MIR bands.

The paragraph goes on to discuss the need to avoid compressed data, but does not discuss lossy and non-lossy compression algorithms.

Lossy compression should be avoided (as stated), but lossless may well be OK.

Though this will also depend on intended use. If the data is just to be used as a 'pretty picture' background then the lossy format is probably OK.

The JPEG2000 image format that is used as an example is typically compressed with a wavelet compression. This is usually a lossy compression, unless the data creator has explicitly created a lossless image.