Spatial Data on the Web Best Practices

4. Best Practices Summary

This document contains a variety of best practices related to the publication and usage of spatial data on the Web. First, it continues with several more in-depth introductions on spatial things and geometry, coverages, spatial relations, coordinate reference systems, linked data, and Spatial Data Infrastructures. Then it describes how these best practices can be used, depending on your starting point and context. After that, the best practices themselves are described. They are about metadata, quality, versioning, identifiers, vocabularies, (API) access, linking, and large datasets.

The following best practices can be found in this document:

Best Practices Summary

Best Practice 1: Include spatial metadata in dataset metadata
Best Practice 2: Provide context required to interpret data values
Best Practice 3: Specify Coordinate Reference System for high-precision applications
Best Practice 4: Make your spatial data indexable by search engines
Best Practice 5: Describe the positional accuracy of spatial data
Best Practice 6: How to describe properties that change over time
Best Practice 7: Use globally unique persistent HTTP URIs for spatial things
Best Practice 8: Provide geometries on the Web in a usable way
Best Practice 9: How to describe relative positions
Best Practice 10: Use spatial semantics for Spatial Things
Best Practice 11: Expose spatial data through 'convenience APIs'
Best Practice 12: Include search capability in your data access API
Best Practice 13: Provide subsets for large spatial datasets
Best Practice 14: Publish links from spatial things to related resources
Best Practice 15: Use links in spatial datasets to find related data
Best Practice 16: Provide a minimum set of information about spatial things for your intended application
Best Practice 17: Describe the location according to a Coordinate Reference System

5. Spatial Things, Features and Geometry

In spatial data standards from the Open Geospatial Consortium (OGC) and the 19100 series of ISO geographic information standards from ISO/TC 211 the primary entity is the feature. [ISO-19101] defines a feature as an: “abstraction of real world phenomena”.

This terse definition is a little confusing, so let’s unpack it.

Firstly, it talks about “real world phenomena”; that’s everything from highways to helicopters, parking meters to postcode areas, water bodies to weather fronts and more. These can be physical things that you can touch (e.g. a phone box) or an abstract concept that has spatial extent (e.g. a postcode area). Features can even be fictional (e.g. “Dickensian London”) and may even lack any concrete location information such as the mythical Atlantis.

The key point is that these “features” are things that one talks about in the universe of discourse - which is defined in [ISO-19101] as the “view of the real or hypothetical world that includes everything of interest”.

Secondly, the definition of feature talks about “abstraction”. Take the example of Eddystone Lighthouse. A helicopter pilot might see it a “vertical obstruction” and be interested in attributes such as its height and precise location. Whereas a sailor may see it as a “maritime navigation aid” and need information about its light characteristic and general location. Depending on one’s set of concerns, only a subset of the attributes of a given “real world phenomenon” are relevant. In the case of Eddystone Lighthouse, we defined two separate “abstractions”. As is common practice in many information modelling activities, the common sets of attributes for a given “abstraction” are used to define classes. In the parlance of [ISO-19101], such a class is known as “feature type”.

Note

Although the exact semantics differ a little, there is a good correlation between the concept of “feature type” as defined in spatial data standards and the concept of “class” defined in [RDF-SCHEMA]. The former is an information modelling construct that binds a fixed set of attributes to an identified resource, whereas the latter defines the set of all resources that share the same group of attributes.

When combined with the open-world assumption embraced by RDF Schema and the Web Ontology Language (OWL) [OWL2-OVERVIEW], the set-based approach to classes provides more flexibility when combining information from multiple sources. For example, the “Eddystone Lighthouse” resource can be seen as both a “vertical obstruction” and a “maritime navigation aid” as it meets the criteria for membership of both sets. Conversely, this flexibility makes it much more difficult to build software applications as there is no guarantee that an information resource will specify a given attribute. Web standards such the Shapes Constraint Language [SHACL] are being defined to remedy this issue.

However, the term “feature” is also commonly used to mean a capability of a system, application or component. Also, in some domains and/or applications no distinction is made between "feature" and the corresponding real-world phenomena.

To avoid confusion, we adopt the term “spatial thing” throughout the remainder of this best practice document. “Spatial thing” is defined in [[W3C-BASIC-GEO] as “Anything with spatial extent, i.e. size, shape, or position. e.g. people, places, bowling balls, as well as abstract areas like cubes”.

The concept of “spatial thing” is considered to include both "real-world phenomena" and their abstractions (e.g. “feature” as defined in [ISO-19101]). Furthermore, we treat it as inclusive of other commonly used definitions; e.g. Feature from [NeoGeo], described as “A geographical feature, capable of holding spatial relations”.

Note

A spatial thing may move. We must take care not to oversimplify our concept of spatial thing by assuming that it is equivalent to definitions such as Location (from [DCTERMS]) or Place (from [SCHEMA-ORG]), which are respectively described as “A spatial region or named place” and "Entities that have a somewhat fixed, physical extension".

Issue 382

How do we ensure alignment with the terminology being used in the further development of GeoSPARQL? We expect a new spatial ontology to be published which will contain clear and unambiguous definitions for the terms used therein.

Looking more closely, it is important to note that geometry is typically a property of a spatial thing.

Example 2: Eddystone Lighthouse geometry (encoded as GeoJSON)

{
  “geometry”: {
    “type”: “Point”,
    “coordinates”: [50.184, -4.268]
  }
}

In actual fact, this is only one geometry that may be used to describe Eddystone Lighthouse. Other geometries might include a 2D polygon that defines the footprint of the lighthouse in a horizontal plane and a 3D solid describing the volumetric shape of the lighthouse.

Furthermore, these geometries may be subject to change due to, say, a resurvey of the lighthouse. In such a situation, the geometry object would be updated- but the spatial thing that we are talking about is still Eddystone Lighthouse. Following the best practices presented below, we use a HTTP URI to unambiguously identify Eddystone Lighthouse: http://d-nb.info/gnd/1067162240 (URI sourced from Deutsche Nationalbibliothek).

We say that the spatial thing is disjoint from the geometry object. The spatial thing, Eddystone Lighthouse (http://d-nb.info/gnd/1067162240), is the “real world phenomenon” about which we want to state facts (such as it has a focal height is at 41 meters above sea level) and link to other real world phenomena (for example, that it is located at Eddystone Rocks, Cornwall; another spatial thing identified as http://sws.geonames.org/2650253 by GeoNames).

6. Coverages: describing properties that vary with location (and time)

Many aspects of spatial things can be described with single-valued, static properties. However, in some applications it is more useful to describe the variation of property values in space and time. Such descriptions are formalized as coverages. Users of spatial information may employ both viewpoints.

So what is a coverage? As defined by [ISO-19123] it is simply a data structure that maps points in space and time to property values. For example, an aerial photograph can be thought of as a coverage that maps positions on the ground to colors. A river gauge maps points in time to flow values. A weather forecast maps points in space and time to values of temperature, wind speed, humidity and so forth. One way to think of a coverage is as a mathematical function, where data values are a function of coordinates in space and time.

Note

Sometimes you’ll hear the word “coverage” used synonymously with “gridded data” or “raster data” but this isn’t really accurate. You can see from the above paragraph that non-gridded data (like a river gauge measurement) can also be modelled as coverages. Nevertheless, you will often find a bias toward gridded data in discussions (and software) that concern coverages.

A coverage is not itself a spatial thing. The definition above presents a coverage as a data construct - in which case, it does not exist in the real world. Accordingly, we might say in the hydrology example, where a river gauge measures flow values at regular sampling times, that the “river segment” (a spatial thing) has a property “flow rate” that is expressed as coverage data.

Spatial things and coverages may be related in several ways:

signals in coverages may be used to provide the evidence for the existence, location and type of spatial things; for example, within a geophysical borehole the variation in soil/rock type may be used to infer the presence of particular rock-units at underground locations
as the property value of a spatial thing whose value varies within the extent of that spatial thing; for example, the varying strength of mobile-network coverage throughout the UK
the values of a common property for a distributed set of spatial things provide a discrete sampling of a coverage; for example, the measurement of soil moisture based at a set of sampling stations can be compiled to show the spatial variation of soil moisture across the region where the sampling stations are located

A coverage can be defined using three main pieces of information:

The domain of the coverage is the set of points in space and time for which we have data values. For example, in a river gauge measurement, the domain is the set of times at which the flow was measured. In a satellite image, the domain is the set of pixels. In a weather forecast, the domain is a set of grid cells.
The range of the coverage is the set of measured, simulated or observed data values. A single coverage may record values for lots of different quantities; for example a weather forecast predicts values for many things (temperature, humidity etc.) on the same domain. So the range of a coverage often consists of a number of lists of data values, one for each measured variable. Each element within each list corresponds with one of the elements of the domain (e.g. a pixel or grid cell).
The range metadata describes the range of the coverage, to help users to understand what the data values mean. This may include links to definitions of variables, units of measure and other bits of useful information.

Usually, the most complex piece of information in the coverage is the definition of the domain. This can vary quite widely from coverage type to coverage type, as the list above shows. For this reason, coverages are often defined by the spatiotemporal geometry of their domain. You will hear people talking about “multidimensional grid coverages” or “time-series coverages” or “vertical profile coverages” for example.

7. Spatial relations

A spatial relation specifies how an object is located in space in relation to a reference object. Commonly used types of spatial relations are: topological, directional and distance relations.

Topological relations describe the relationships between geometric objects that are invariant to rotation, translation and scaling. As such, topological relations can support qualitative spatial reasoning without reference to the geometries themselves; for example to assert that object A touches object B. These relations, also known as “spatial predicates”, include concepts such as: equals, disjoint, intersects, touches, within, contains, overlaps and crosses.
Directional relations specify the relative direction between object and reference. Examples include: left, in front of and astern.
Distance relations specify how far the object is from the reference object. Examples include: at, nearby and far away.

Issue 383

Do we also need to talk about spatial relationships? And how they are related to spatial things and geometries?

8. Coordinate Reference Systems (CRS)

Issue 392

Introduction to CRS does not yet cover non-geographic cases.

Best Practice scope is "spatial data" - which includes non-geographic location (e.g. where things aren't positioned relative to the Earth). For example, we have a microscopy use case where the locations of cells are described.

One of the most fundamental aspects of publishing spatial data, data about location, is how to express and share the location in a consistent way. In almost all cases where you are publishing data for use by the wider web community the use of Latitude and Longitude is most appropriate. Lat and Long measurement are global and offer a level of precision well suited for many applications, e.g. can express a location to within a few metres perfect for locating a Starbucks, geocoding a photograph or capturing an augmented reality Pokemon hiding in your local park.

As with everything to do with spatial data, of course things can get more complicated. There is not complete agreement over the order in which to present the measurements LAT/LONG or LONG/LAT or whether to express them in decimal degrees or as degrees, minutes and seconds.

Therefore it is very important to provide explicit information to your users. For example, this snippet of results from the Google Geocoding API makes explicit which is the latitude and which is the longitude coordinate. See Best Practice 18: Describe the location according to a Coordinate Reference System for more information.

Example 3: A Google Geocoding API result snippet

"formatted_address" : "1600 Amphitheatre Parkway, Mountain View, CA 94043, USA",
        "geometry" : {
           "location" : {
              "lat" : 37.4224764,
              "lng" : -122.0842499
           },
           "location_type" : "ROOFTOP",
           "viewport" : {
              "northeast" : {
                 "lat" : 37.4238253802915,
                 "lng" : -122.0829009197085
              },
              "southwest" : {
                 "lat" : 37.4211274197085,
                 "lng" : -122.0855988802915
              }
           }
        },

The following is a little more technical; in most cases this should only be for information.

The Long/Lat measurements are of course angular measurements expressing a position on the surface of a sphere. We are assuming that the sphere in question is (usually) the Planet earth, and that the sphere is actually a sphere. To make this more explicit we need to use a defined reference system and geodetic datum: in simple terms this tells us where we make the angular measurements from (e.g. the Equator and Greenwich Meridian) and gives us an agreed definition of the size and shape of the sphere (turns out the Earth isn’t one, though it is often approximated as a flattened sphere).

In almost all cases when you find Long/Lat measurements they are using a reference system and geodetic datum called WGS84. WGS84 was defined to support the GPS system, so that’s handy for all those mobile apps.

90% of people can stop reading now, but of course there are going to be a few cases where WGS84 Long/Lat is not appropriate.

In many parts of the world location data has been collected using local coordinate systems that are specific to particular countries or regions. These local coordinate systems often use projected measurements defined on a flat, two-dimensional surface which are easier to use that angular measurements and are needed if you are making topographic maps. (But be aware that being flattened, these projected maps distort the true size of countries, and also distance and angular measurements.)

So it may be that you have information in a local Coordinate Reference System (CRS), rather than Long/Lat - what should you do? You can publish information in a local CRS as it is, but you need to tell users what particular CRS is being used, because there are many many CRS systems in use. A good directory of them is maintained by the EPSG, a oil industry organisation. It is common for a CRS to be described by its ESPG code, EPSG:4277 is the UK National Grid for example.

Alternatively you can re-project your coordinates to WGS84 Long/Lat using many available tools online. So for example the location at 516076,170953 in UK National Grid Coordinates is -0.331841, 51.425708 in WGS84 Long/Lat. This converstion is a useful step as it makes you data more accessible to global users, so if it is possible it is helpful to publish data in both local and global coordinates.

So we are now at the point where 99.9% of people can stop reading, but for the remaining few people that have more specific requirements in terms of higher precision there are a few more topics. If you need to be able to measure in terms of a few centimetres or less then things are more complicated. With this level of precision required you need to take into account a more sophisticated model of the shape of the earth and take into account plate tectonics.

For these use cases more complex reference system and geodetic datums are used, for example in Europe a system called ETRS89 can be used instead of WGS84, in North America a similar system called NAD-83 is used. So it might be that you have measurements made using these reference systems, again best practice is to be explicit in describing their use, and in these use cases be careful re-projecting to different systems as required accuracy may be lost.

Finally another issue is that points on the surface of the earth are actually moving relative to the coordinate system, due to geologic processes. You may think this is of interest only to geologists, but when I tell you that Australia has moved around 1.5m since the framework was last reset 20 years ago, and remind you that we are entering the age of self-driving cars, then you will probably think again. Re-calculating the datum from time to time, or maybe continuously, really does matter for some applications. See Best Practice 3: Specify Coordinate Reference System for high-precision applications for more information.

9. Linked Data

The term ‘Linked Data’ refers to an approach to publishing data that puts linking at the heart of the notion of data, and uses the linking technologies provided by the Web to enable the weaving of a global distributed database. By naming real world entities - be they web resources, physical objects such as the Eiffel Tower, or even more abstract things such as relations or concepts - with URLs data can be published and linked in the same way web pages can. [LDP-PRIMER]

The 5-star scheme at 5 Star Data states:

★ make your stuff available on the Web (whatever format) under an open license

★★ make it available as structured data (e.g., Excel instead of image scan of a table)

★★★ make it available in a non-proprietary open format (e.g., CSV as well as of Excel)

★★★★ use URIs to denote things, so that people can point at your stuff

★★★★★ link your data to other data to provide context

We think that the concept of Linked Data is fundamental to the publishing of spatial data on the Web: it is the links that connect data together that are the foundational to the Web of data.

These best practices promote a Linked Data approach.

Sources such as the Best Practices for Publishing Linked Data [LD-BP] assert a strong association between Linked Data and the Resource Description Framework (RDF) [ RDF11-PRIMER]. Yet we believe that Linked Data requires only that the formats used to publish data support Web linking (see [WEBARCH] §4.4 Hypertext). 5 Star Data (based on [5STAR-LOD]) asserts only that data formats be open and non-proprietary (★★★); and infers the need for data formats to support use of URIs as identifiers (★★★★) and Web linking (★★★★★).

Within this document we include examples that use RDF and related technologies such as triple stores and SPARQL because we see evidence of its use in real world applications that support Linked Data. However, we must make clear to readers that there is no requirement for all publishers of spatial data on the Web to embrace the wider suite of technologies associated with the Semantic Web; we recognize that in many cases, a Web developer has little or no interest in the toolchains associated with Semantic Web due to the addition of complexity to any Web-centric solution.

Although we think that Linked Data need not necessarily require the use of RDF, it is probably the most commonly representation. We note that [JSON-LD] provides a bridge between those worlds by providing a data format that is compatible with RDF but relies on standard JSON tooling.

Furthermore, as the examples in this document illustrate, we often see a ‘hybrid’ approach being used in real-world applications; using RDF to work with graphs of information that interlink resources, while relying on other technologies to query and process the spatial aspects of that information for performance reasons.

10. Why are traditional Spatial Data Infrastructures not enough?

Finding, accessing and using data disseminated through spatial data infrastructures (SDI) based on OGC web services is difficult for non-expert users. There are several reasons, including:

In spatial data infrastructures, catalog services are intended to be used for discovering spatial assets, not the general purpose search engines of the Web. OGC web services do not address indexing of their content by those search engines.
By design, the catalog services only provide access to metadata - and in general metadata that is focused on the needs of expert users - not the data itself.
Users cannot just “follow links” to access data, it is typically necessary to construct some kind of query to access data. Often these queries are complex to define, requiring in depth knowledge both of the data structure and the domain-specific query language.
In addition, it is often difficult for non-expert users to understand and use the data. Part of this are domain-specific complexities that are difficult for non-experts (e.g., handling of coordinates in different coordinate reference systems), but hard to avoid entirely. But the datasets often address requirements of expert communities with diverse needs, resulting in comprehensive, but complex specifications that cover many edge cases, too. And the data is typically available in formats that are not easy to process for non-expert users.

However, spatial data infrastructures are a key component of the broader spatial data ecosystem. Such infrastructures typically include workflows and tools related to the management and curation of spatial datasets, and provide mechanism to support the rich set of capabilities required by the expert community. Our goal is to help spatial data publishers build on these foundations to enable the spatial data from SDIs to be fully integrated with the Web of data.

When your starting point is a spatial data infrastructure, you should at least read the following best practices. These provide the most important extra steps that should be taken in order to bring spatial data from spatial data infrastructures to the Web:

Best Practice 4: Make your spatial data indexable by search engines
Best Practice 7: Use globally unique persistent HTTP URIs for spatial things
Best Practice 11: Expose spatial data through 'convenience APIs'

The rest of the best practices provide more detail on specific aspects of publishing spatial data on the Web, such as metadata, geometries, CRS information, versioned data, and so on.

11. How to use these best practices

Issue 381

Section 11. How to use these best practices is incomplete.

Estimate that this covers only a quarter of the "spatial data publication pathway" that we are trying to help would-be spatial data publishers navigate. More material to be added describing the full range of considerations when publishing spatial data on the Web in the next public draft.

11.1 What are the starting points?

Preparations for publishing spatial data on the Web need to start somewhere. Typically, your spatial data will be in the following places:

plain text documents; e.g. historical texts, government reports, blog posts etc.
data files containing structured content or markup; e.g. geospatial vector data in Shapefile or GML format, statistical data in tabular CSV format or a spreadsheet, as GPX data with “waypoints” and “tracks”, satellite imagery in GeoTIFF, climate simulations in CF-NetCDF etc.
a data repository; e.g. PostGIS (a spatially enabled relational database), Elasticsearch (a document-oriented noSQL repository based on Apache Lucene), Apache Jena’s TDB (an RDF triple store)
exposed via an existing API; including OGC-compliant web services such as WFS and WCS

If your spatial data is managed within a software system it is likely that you will be able to access that data through one or more of the methods identified above; as structured data from a bulk extract (e.g. a “data dump”), via direct access to the underpinning data repository or through a bespoke or standards-compliant API provided by the system.

As working with specific spatial data management systems is beyond the scope of this best practice document we will assume that one of the four methods identified above is your starting point.

Each of these starting points have their own challenges, but working with plain text documents can be particularly tricky as you will need to parse the natural language to identify the spatial things and their properties before you can proceed any further. Natural Language Processing (NLP) is also beyond the scope of this best practice document - so we will assume that you’ve already completed this step and have parsed any plain documents into structured data records of some kind.

11.2 What are you talking about?

The Web is an information space in which the items of interest, referred to as resources, are identified by URIs ([WEBARCH] §1. Introduction). The spatial data you want to publish is one such resource. Depending on the nature of your spatial data, it may be a single dataset or a collection of datasets. [VOCAB-DCAT] provides a useful definition of dataset: “A collection of data, published or curated by a single agent, and available for access or download in one or more formats.”

Deciding whether your spatial data is a single dataset or not is somewhat arbitrary. To decide this, it is often useful to consider attributes such as the license under which the data will be made available, the refresh or publication schedules, the quality of the data and the governance regime applied in managing the data. Typically, all of these attributes should be consistent within a single dataset.

As a first step in publishing your spatial data on the Web, we need to stitch your data into the Web’s information space by assigning a URI to each dataset (see [DWBP] Best Practice 9: Use persistent URIs as identifiers of datasets). Furthermore, if you anticipate your data changing over time and you want users to be able to refer to a particular version of your dataset you should also consider assigning a URI to each version of the dataset (see [DWBP] Best Practice 11: Assign URIs to dataset versions and series).

Note

[DWBP] section 8.6 Data Versioning provides further guidance on working with versioned resources: providing metadata to indicate the particular version of a given dataset resource (see [DWBP] Best Practice 7: Provide a version indicator) and enabling users to browse the history of your dataset (see [DWBP] Best Practice 8: Provide version history).

We also need to look inside the datasets at the resources described within your data. If you want these resources to be visible within the Web’s information space, by which we mean that others can refer to or talk about those resources, then they must also be assigned URIs (see [DWBP] Best Practice 10: Use persistent URIs as identifiers within datasets). These URIs are like 'Web-scale foreign keys' that enable information from different sources to be stitched together.

In spatial data, our primary concern is always the spatial things; these are the things with spatial extent (i.e. size, shape, or position) that we talk about in our data - anything from physical things like people, places and post boxes to abstractions such as administrative areas. Spatial things should always be assigned URIs (see Best Practice 7: Use globally unique persistent HTTP URIs for spatial things) - potentially reusing existing URIs that are already in common usage. A common pattern used when assigning URIs to spatial things is append the locally-scoped identifiers used within the dataset to a URI path within an internet DNS domain where one has administrative rights to publish content.

Depending on how you organize your data, it may also be helpful to give your geometry objects URIs. For example, you may want to reuse a line string when describing the boundaries of adjacent administrative areas, or you may need to serve geometry data from an alternate URL because property data and geometry data are managed in different systems. Essentially, if you want to refer to a resource on the Web, you need to assign a URI to it.

11.3 Who is your audience?

Once you have determined the subjects of your spatial data, you should then consider your users - and the software tools, applications and capabilities they might have at their disposal.

Your objective should be to reduce the “friction” created for users to work with your data by providing it in a form that is closest to what their chosen software environment supports.

It is likely that you will be able to identify your intended “community of use” - and on that basis discern how best to publish data for them. However, increasingly data is being repurposed to derive insight in ways that the original publisher had never foreseen. This “unanticipated re-use” can add significant value to your data (e.g. because you didn’t know that your data could be used that way!) but this introduces the challenge of working with a large set of unknown users, developers and devices.

So while you should always prioritize your known users when publishing spatial data on the Web (often, because they are your stakeholders and their happiness can lead to continued funding!), it will often reap dividends to “design for the masses”: providing your spatial data in a way that is most readily usable with the (geo)spatial JavaScript libraries commonly employed across the Web.

Things that you should consider when choosing how to publish your spatial data on the Web are described next …

11.4 Parse that!

For users to work with your data, software agents (a.k.a. the “machines”) need to be able to parse it - to resolve the serialized data into its component parts. You should make your data available in machine-readable, standardized data formats (see [DWBP] Best Practice 12: Use machine-readable standardized data formats); e.g. JSON [RFC7159], XML [XML11], CSV [RFC4180] and other tabular data formats, YAML [YAML], protocol-buffers [PROTO3] etc. According to the 5 Star Data [5STAR-LOD] scheme, using open and non-proprietary structured data formats yields a 3-star rating (★★★), so you’re well on your way to good practice.

Consider that Web applications are most often written in JavaScript, probably the most “frictionless” data format for Web developers is JSON. That said, it is reasonably simple to parse other formats for use in JavaScript using widely available libraries. In some cases, there are even standards to define how this should be done (for example: [CSV2JSON])

You should also consider whether there are any attributes of these machine-readable standardized data formats that offset a little inconvenience for your data user. For example, protocol-buffers [PROTO3] and CBOR [RFC7049] (“Concise Binary Object Representation”) provide a significantly more compact encoding that JSON. The inconvenience of having to use additional libraries to parse these binary formats is offset by the convenience of much faster load times.

Imagery formats JPEG [JPEG2000] and PNG [PNG] can also be coerced to carry data; providing 3 or 4 channels of 8-bit data values. This can be an attractive way to encode gridded coverage data values as it is highly compact. So long as you don’t apply compression algorithms to the “image”; while compression retains visual integrity, it can ruin your data integrity. Experience indicates that network providers often do apply compression to image formats - even if you don’t want that. The key point is to ensure that you choose formats that are unaffected by the transport network.

When selecting the data format, make sure that your community of use have access to libraries or other software components required to work with that format. Let’s take [GeoTIFF] as an “anti-example”: it’s the de facto format for encoding geo-referenced imagery data - such as that available from satellites - but the lack of widely available libraries for working with it in a JavaScript application make it unsuitable for publishing spatial data on the Web. Although a developer could write a byte-level parser, it puts an additional burden on any re-use.

12. The Best Practices

12.1 Spatial Metadata

[DWBP] provides best practices discussing the provision of metadata to support discovery and reuse of data (see [DWBP] section 8.2 Metadata for more details). Providing metadata at the dataset level supports a mode of discovery well aligned with the practices used in Spatial Data Infrastructure (SDI) where a user begins their search for spatial data by submitting a query to a catalog. Once the appropriate dataset has been located, the information provided by the catalog enables the user to find a service end-point from which to access the data itself - which may be as simple as providing a mechanism to download the entire dataset for local usage or may provide a rich API enabling the users to request only the required parts for their needs. The dataset-level metadata is used by the catalog to match the appropriate dataset(s) with the user's query.

This section includes best practices for including the spatial extent and the CRS of the dataset in the metadata. These are the extra metadata items needed to make spatial datasets discoverable and reusable. A third best practice in this section helps you go a step further: exposing spatial data on the web in such a way that the individual entities within the dataset are discoverable.

Best Practice 1: Include spatial metadata in dataset metadata

The description of datasets that have spatial features should include explicit metadata about the spatial coverage

Note

This best practice extends [DWBP] Best Practice 2: Provide descriptive metadata.

Why

For spatial data, it is often necessary to describe the spatial details of the dataset - such as spatial coverage or extent of the dataset or, put in simpler terms, which area of the world the data is about. This information is used, for example, by SDI catalog services that offer spatial querying to find data - but also by users to understand the nature of the dataset. In some cases, for example when dealing with crowd-sourced data, provenance information is important as well.

Intended Outcome

Dataset metadata should include the information necessary to enable spatial queries within catalog services such as those provided by SDIs.

Dataset metadata should include the information required for a user to evaluate whether the spatial data is suitable for their intended application.

Possible Approach to Implementation

When publishing a dataset, provide as much spatial metadata as necessary, but at least the spatial coverage. Other examples of spatial metadata are:

number of dimensions (1D, 2D, 3D)
spatial representation type (e.g. grid, vector, text table)
positioning system used for acquiring location
Coordinate Reference System(s) - see Best Practice 3: Specify Coordinate Reference System for high-precision applications and Best Practice 18: Describe the location according to a Coordinate Reference System
spatial resolution - Best Practice 5: Describe the positional accuracy of spatial data

In Spatial Data Infrastructures the accepted standard for describing metadata is [ ISO19115].

To provide information about the spatial attributes of the dataset on the web one can:

As shown in [DWBP] Best Practice 2: Provide descriptive metadata: Include the spatial coverage of the features described by the dataset using [ VOCAB-DCAT] and a reference to a named place in a common vocabulary for geospatial semantics (e.g. GeoNames),
Again, use [VOCAB-DCAT], but instead of a reference to a named place, use a set of coordinates to specify the boundaries of the area either as a bounding box (add glossary ref) or a polygon.
Use the spatial extension of [VOCAB-DCAT], [GeoDCAT-AP], to specify spatial attributes that are not available in [VOCAB-DCAT]. GeoDCAT-AP provides an RDF syntax binding for the metadata elements defined in the core profile of [ISO19115] and in the INSPIRE metadata schema [INSPIRE-MD].
Use geospatial ontologies (see W3C Geospatial Incubator Group (GeoXG)'s report) to describe the spatial data for the datasets.

Example 4

Spatial representation type

GeoDCAT-AP models this information by using adms:representationTechnique + URIs corresponding to the items in the appropriate ISO 19115 code list.

Example: GeoDCAT-AP specification of a dataset using a vector spatial representation type

a:Dataset a dcat:Dataset ;
  adms:representationTechnique <http://inspire.ec.europa.eu/metadata-codelist/SpatialRepresentationTypeCode/vector> .

Example: GeoDCAT-AP specification of a dataset using a grid spatial representation type

a:Dataset a dcat:Dataset ;
  adms:representationTechnique <http://inspire.ec.europa.eu/metadata-codelist/SpatialRepresentationTypeCode/grid> .

Note

The URIs in the example, denoting the spatial representation type, are part of a register yet to be added to the INSPIRE Registry. Therefore, they currently do not resolve.

How to Test

Check if the spatial metadata for the dataset itself includes the overall features of the dataset in a human-readable format.

Check if the descriptive spatial metadata is available in a valid machine-readable format.

Evidence

Relevant requirements: R-Discoverability, R-Compatibility, R-BoundingBoxCentroid, R-Crawlability, R-SpatialMetadata and R-Provenance.

Benefits

Reuse
Discoverability

Best Practice 2: Provide context required to interpret data values

Data values should be linked to spatial, temporal and thematic information that describes them.

Issue 384

This best practice is under review by the WG to see if it is sufficiently covered in DWBP. (see action)

Why

For users of spatial or temporal data it should always be possible to look up spatial, temporal or thematic metadata about a given value. This allows them to determine, for example, which reference system (CRS or TRS) and unit of measure (UoM) is used for a numeric value, the accuracy of the data value, and so on. Such metadata may be attached to metadata for collections, as described in Best Practice 1: Include spatial metadata in dataset metadata, or to individual values. The latter is necessary when this metadata is important for processing and interpreting the data, but varies from one value to the next. This information should be specified as explicit semantic data and/or be provided as linked to other resources.

Intended Outcome

The contextual data will specify spatial, temporal and thematic data and other information that can assist to interpret data values; this can include information related to quality, location, time, topic, type, etc.

Possible Approach to Implementation

The context required to interpret data values will require:

Specify explicit semantics that describe temporal, spatial and thematic information related to an entity
Provide links to other related resources that can describe contextual information related to an entity
Specify provenance and other related information

How to Test

...

Evidence

Relevant requirements: R-CRSDefinition, R-Provenance, R-QualityPerSample, R-SpatialMetadata, R-SensorMetadata.

Best Practice 3: Specify Coordinate Reference System for high-precision applications

A coordinate referencing system (CRS) should be specified for high-precision applications to locate geospatial entities.

Why

The CRS is a special metadata attribute of spatial data that must be known for users to judge if the data is usable to them. Clients or users must always be able to determine what CRS is used. Sometimes the CRS is left implicit: it is then determined by the specification of the data format that is used. Preferably, the CRS is specified at least as part of the metadata so that clients and users can judge if the data is usable, and can find spatial data with a specific CRS.

The choice of CRS is sensitive to the intended domain of application for the spatial data. For the majority of applications a common global CRS (WGS84) is fine, but high precision applications (such as precision agriculture and defense) require spatial referencing to be accurate to a few meters or even centimeters. Specific, highly accurate CRS exist to provide a coordinate system for a specific region of the world (often a specific country). Spatial data from France is never going to use the Dutch coordinate system and vice versa.

Different CRS exist mainly because the positions on the surface of the earth relative to each other are constantly changing. For example, North America and Europe are receding from each other by a couple of centimeters per year, whereas Australia is moving several centimeters per year north-eastwards. So, for better than one meter accuracy in Europe, the European Terrestrial Reference System 1989 (ETRS89) was devised and it is frequently revised to take account of the drifting European tectonic plate. Consequently, coordinates in the ETRS89 system will change by a couple of centimeters per year with respect to WGS84.

Even if a CRS, tied to a tectonic plate, is used, local coordinates in some areas may still change over time, if the plate is rotating with respect to the rest of the earth. Many existing useful maps pre-date GPS and WGS84 based mapping, so that location errors of tens of meters, or more, may exist when compared to the same location derived from a different technology, and these errors may vary in size across the extent of a single map.

Another reason why different CRS exist has nothing to do with tectonic drift, but with projecting the 3D globe on a flat, 2D map: Cartesian projections. These are useful e.g. for calculating areas.

Note

The misuse of spatial data, because of confusion about the CRS, can result in catastrophic results; e.g. both the bombing of the Chinese Embassy in Belgrade during the Balkan conflict and fatal incidents along the East Timor border are generally attributed to spatial referencing problems.

Intended Outcome

Clients or users can determine which CRS is used. Also, a Coordinate Reference System (CRS) sensitive to the intended domain of application (e.g. high precision applications) for the spatial data should be chosen.

Possible Approach to Implementation

Recommendations about CRS referencing should consider:

If your goal is to make data available to mass-market web users, make it available in WGS84. This CRS is suitable for many applications, but be aware (and perhaps publish) the limitations of doing so.
If your goal is high accuracy, choose the best local CRS for your data.
Publishing data in multiple CRSs is fine, and may help users to combine your data with other sources, as well as serving multiple types of user.
It is preferable to explicitly state which CRS(s) you are using. For convenience, the CRS is often designated within the data format or vocabulary specification (e.g. W3C WGS84 Geo Positioning vocabulary) and, therefore, does not appear in the data itself. This is often considered as a default CRS. Data publishers and consumers should make sure they are aware of the specified CRS and any limitations that this may pose regarding the use of the data.
Where a specific CRS is required, the data publisher should choose a vocabulary where the CRS can be defined explicitly within the data.

Example 7

In [GeoDCAT-AP] metadata, the "link" between data and the relevant CRS(s) is made with dct:conformsTo - just like conformance with a "standard" is expressed in [ VOCAB-DQV].

Example: GeoDCAT-AP specification of a dataset using coordinate reference system "WGS 84 / UTM zone 30N"

a:Dataset a dcat:Dataset ;
  dct:conformsTo <http://www.opengis.net/def/crs/EPSG/0/32630> .

<http://www.opengis.net/def/crs/EPSG/0/32630> a dct:Standard, skos:Concept ;
  dct:type <http://inspire.ec.europa.eu/glossary/SpatialReferenceSystem> ;
  dct:identifier "http://www.opengis.net/def/crs/EPSG/0/32630"^^xsd:anyURI ;
  skos:prefLabel "WGS 84 / UTM zone 30N"@en ;
  skos:inScheme <http://www.opengis.net/def/crs/EPSG/0/> .

How to Test

...

Evidence

Relevant requirements: R-DeterminableCRS

Benefits

Comprehension

Best Practice 4: Make your spatial data indexable by search engines

Search engines should be able to crawl spatial data on the Web and index spatial things for direct discovery by users.

Why

In SDIs information about spatial datasets is published as authoritative metadata records and collated in Web-based catalogues. This approach causes a number of problems:

the catalogues are often designed to primarily support expert users - people may not even be aware of their existence;
once you have discovered a dataset that meets your needs and identified where it is available from, a second step is required to access the data itself - often requiring the use of unfamiliar protocols or complex API requests; and
the data itself is not indexed - discovery relies on the metadata records that are often sparsely populated or out of date.

Search engines are the common starting point for people looking for content on the Web that is widely understood. By publishing spatial data in a way that enables their crawlers to index spatial datasets including each spatial thing, the fidelity of search results should improve. Users will be able to directly search for specific entities rather than having to look for a dataset and then parse through it; e.g. to search for "Anne Frank’s House" (https://g.co/kg/m/02s5hd) rather than looking for a dataset about "Cultural Heritage in Amsterdam" and hoping that it contains a reference to what you’re interested in.

Note

At present, spatial information is not widely exploited by search engines. However, by increasing the volume of spatial information presented to search engines, and the consistency with which it is provided, we expect search engines to begin offering spatial search functions. We already see evidence of this in the form of contextual search, such as prioritization of search results from nearby entities. In addition, search engines are beginning to offer more structured, custom searches that return only results that include certain [SCHEMA-ORG] types, like Dataset, Place or City.

Intended Outcome

Information about spatial datasets and things is indexed by search engines.

Users can find spatial things using common search engines.

Possible Approach to Implementation

In general, you need to:

publish a HTML Web-page for the spatial dataset and each spatial thing that it describes; and
make sure that those pages can be crawled.

The Web-page for the dataset is an entry-point for humans to browse and for the search engines to crawl your data. This landing page should provide descriptive metadata that helps users evaluate whether the dataset meets their needs (see Best Practice 1: Include spatial metadata in dataset metadata and [DWBP] Best Practice 2: Provide descriptive metadata), and may provide links to other service end-points, APIs or tools that will help a user work with the dataset. The landing page should be indexable by the search engines so that it can be discovered too!

To enable humans and Web-crawlers to find HTML pages for the spatial things, the "landing page" needs to include hyperlinks that can be followed. Where you have a larger collection of spatial things, you should support paging through the collection.

You may also consider using Sitemaps to direct the Web-crawler; noting that sitemaps currently are limited to several thousands of entries and will not work for larger datasets.

For very large datasets paging through thousands of pages is not useful for a human either. Consider supporting filtering and/or organise the spatial things into subsets, as described in Best Practice 13: Provide subsets for large spatial datasets.

A pre-condition for this best practice is Best Practice 7: Use globally unique persistent HTTP URIs for spatial things as persistent identifiers are essential to support reliable indexing and linking. Traditionally spatial datasets have not been maintained with stable identifiers for spatial things, but to share spatial data on the Web stable identifiers are a must. Sharing spatial data is more than "just" making the dataset available on the Web.

Each Web-page can likely be generated programmatically from the data you hold about the spatial thing, either directly from the data or by using an API that makes the data available on the Web.

It is important to keep in mind that the HTML representations should not mainly be designed for the search engines, but they should present the data in a clear and understandable way to human users. The page about the spatial thing should be useful to a user and encourage others to link to the page when they share other information about the spatial thing. This typically will also improve the ranking of these pages in search results.

Example 10

The Property Search in the City of Nanaimo, Canada provides a landing page and one page per property. The landing page offers a search capability and the option to browse by street. This data is indexed; a search for, for example, "2100 AARON WAY, NANAIMO, BC" in a popular search engine returns the Nanaimo data for this spatial thing as one of the first results.

The Bathing Water Quality Explorer for England provides a landing page and one page per site. Sites can be searched, selected from a list or in a map.

In both cases, the pages of the spatial things are generated from the underlying data at request time.

The property Web-pages in Nanaimo also use [MICRODATA] annotations using [SCHEMA-ORG], which is discussed below.

In addition to exposing the spatial data as linked HTML Web-pages, indexing by web-engines can be further enhanced by incorporating a description of the spatial thing as structured markup (in particular [MICRODATA] or [JSON-LD] annotations using [SCHEMA-ORG]) as this enables the search engines to make more detailed assumptions about your resource. It is important to note that this is not only helpful to search engines, but also to other tools that want to understand more about the semantics of the resource, for example, its location.

In [SCHEMA-ORG], a spatial dataset is a Dataset and a spatial thing is in general a Place or an Event. For some types of spatial things, more specific sub-types exist, for example City or Mountain.

Location information about a spatial thing is typically provided using a geometry (GeoCoordinates or GeoShape) or a PostalAddress. [SCHEMA-ORG] coordinates are restricted to WGS 84 with longitude and latitude. Supported geometry types are points, line strings, polygons, boxes and circles.

Through the use of [SCHEMA-ORG] annotations, search engines and others can connect location information with other information, e.g. about the nature of the spatial thing, opening hours, contact details, etc.

The use of [SCHEMA-ORG] for spatial data is in its early days and has to be understood as an "emerging practice".

Example 11

This code-snippet illustrates a [JSON-LD] annotation using a [SCHEMA-ORG] Dataset for an address dataset in the Netherlands that may be embedded in the HTML of the Web-page. It includes a name, a description, the spatial coverage using a bounding box, the URL of the Web-page, and a link to another dataset containing this dataset. The same annotation could also be provided using [MICRODATA], but we use [JSON-LD] here as this presents the structured data in a more human-readable way.

<script type="application/ld+json">
{
  "@context" : {
    "@vocab" : "http://schema.org/"
  },
  "@type" : "Dataset",
  "@id" : "http://www.ldproxy.net/bag/inspireadressen/",
  "name" : "Adressen",
  "description" : "INSPIRE Adressen afkomstig uit de basisregistratie Adressen, beschikbaar voor heel Nederland",
  "url" : "http://www.ldproxy.net/bag/inspireadressen/",
  "isPartOf" : {
    "@type" : "Dataset",
    "url" : "http://www.ldproxy.net/bag/"
  },
  "keywords" : "Adressen",
  "spatialCoverage" : {
    "@type" : "Place",
    "geo" : {
      "@type" : "GeoShape",
      "box" : "3.053,47.975 7.24,53.504"
    }
  }
}
</script>

This code-snippet illustrates a [JSON-LD] annotation using a [SCHEMA-ORG] Place for the address of the "Anne Frank’s House" in that dataset. It includes the location, the URL of the Web-page, and the structured postal address information.

<script type="application/ld+json">
{
  "@context" : {
    "@vocab" : "http://schema.org/"
  },
  "@type" : "Place",
  "@id" : "http://www.ldproxy.net/bag/inspireadressen/inspireadressen.3329155",
  "url" : "http://www.ldproxy.net/bag/inspireadressen/inspireadressen.3329155",
  "geo" : {
    "@type" : "GeoCoordinates",
    "longitude" : "4.8839893538143055",
    "latitude" : "52.37520202332491"
  },
  "name": "Anne Franks House",
  "description": "Museum house where Anne Frank & her family hid from the Nazis in a secret annex, during WWII.",
  "address" : {
    "@type" : "PostalAddress",
    "streetAddress" : "Prinsengracht 267",
    "addressLocality" : "Amsterdam",
    "postalCode" : "1016GV"
  }
}
</script>

The Web-pages should also provide a mechanism to download data in the formats you decide to support. [DWBP] Best Practice 14: Provide data in multiple formats provides guidance.

Typically multiple formats for a resource are supported using two mechanisms: HTTP content negotiation and by adding format-specific file extensions to the resource URI like ".json", ".xml" or ".ttl". Content negotiation is the standard mechanism of HTTP and the format-specific URIs enable the use of clickable links to the resource in a specific format.

Search engines may also index resource representations in other formats than HTML.

Note

In 2016, these topics were analysed in a testbed organised by Geonovum in the Netherlands. More details can be found in reports from the testbed: Spatial Data on the Web using the current SDI and Crawlable geospatial data using the ecosystem of the Web and Linked Data.

The use of [SCHEMA-ORG] for describing spatial information is continually evolving; spatial data publishers should familiarise themselves with current practices. A useful Introduction to Structured Data is provided in Google's developer portal.

How to Test

Using a Web browser,

search for the landing page of your dataset, and
check that you can browse to human-readable HTML pages for each spatial thing that the dataset describes.

Monitor the search consoles of the search engines about the progress in indexing your Web-pages and their structured data. In case any errors are reported, try to fix them.

Evidence

Relevant requirements: R-BoundingBoxCentroid, R-Crawlability, R-Discoverability, R-Linkability, R-MachineToMachine.

Benefits

Discoverability

12.2 Spatial Data Quality

[DWBP] provides a best practice discussing how the quality of data on the web should be described (see [DWBP] section 8.5 Data Quality for more details). This section is based on the Data Quality section from [DWBP] and adds a best practice specific for spatial data.

In the Spatial Metadata section we provided a Best Practice on how to deal with CRS in spatial data on the web. There is also a clear link between CRS and data quality, because the accuracy of spatial data depends for a large part on the CRS used. This can be seen as conformance of data with a "standard" - in this case, a (spatial or temporal) reference system. This is how you can describe spatial data quality using different vocabularies. We will provide an example in this section.

Best Practice 5: Describe the positional accuracy of spatial data

Accuracy and precision of spatial data should be specified in machine-interpretable and human-readable form.

Why

The amount of detail that is provided in spatial data and the resolution of the data can vary. No measurement system is infinitely precise and in some cases the spatial data can be intentionally generalized (e.g. merging entities, reducing the details, and aggregation of the data) [Veregin].

Note

It is important to understand the difference between precision and accuracy. Seven decimal places of a latitude degree corresponds to about one centimeter. Whatever the precision of the specified coordinates, the accuracy of positioning on the actual earth's surface using WGS84 will only approach about a meter horizontally and may have apparent errors of up to 100 meters vertically, because of assumptions about reference systems, tectonic plate movements and which definition of the earth's 'surface' is used.

Intended Outcome

When known, the resolution and precision of spatial data should be specified in a way to allow consumers of the data to be aware of the resolution and level of details that are considered in the specifications.

Possible Approach to Implementation

Describe the accuracy of spatial data in a way that is understandable for humans.

In addition, describe the accuracy of spatial data in a machine-readable format. [ VOCAB-DQV] is such a format. It is a vocabulary for describing data quality, including the details of quality metrics and measurements.

Issue 125

We need some explanations for the approaches to describe positional (in)accuracy.

Example 14: GeoDCAT-AP specification of a dataset conformance with the INSPIRE Regulation on spatial data and services interoperability

a:Dataset a dcat:Dataset ;
  dct:conformsTo <http://data.europa.eu/eli/reg/2010/1089/oj> .

<http://data.europa.eu/eli/reg/2010/1089/oj> a dct:Standard , foaf:Document ;
  dct:title "COMMISSION REGULATION (EU) No 1089/2010 of 23 November 2010
             implementing Directive 2007/2/EC of the European Parliament
             and of the Council as regards interoperability of spatial
             data sets and services"@en ;
  dct:issued "2010-12-08"^^xsd:date .

The following example shows how DQV can express the precision of a spatial dataset:

Example 15: DQV specification of data quality

:myDataset a dcat:Dataset ;
   dqv:hasQualityMeasurement :myDatasetPrecision, :myDatasetAccuracy .

:myDatasetPrecision a dqv:QualityMeasurement ;
   dqv:isMeasurementOf :spatialResolutionAsDistance ;
   dqv:value "1000"^^xsd:decimal ;
   sdmx-attribute:unitMeasure  <http://www.wurvoc.org/vocabularies/om-1.8/metre>
   .

:spatialResolutionAsDistance  a  dqv:Metric;
    skos:definition "Spatial resolution of a dataset expressed as distance"@en ;
    dqv:expectedDataType xsd:decimal ;
    dqv:inDimension dqv:precision
    .

This example was taken from [VOCAB-DQV]. For more examples of expressing spatial data precision and accuracy see DQV, Express dataset precision and accuracy.

How to Test

...

Evidence

Relevant requirements: R-MachineToMachine, R-QualityPerSample.

Benefits

Reuse
Trust

12.3 Spatial Data Versioning

Spatial things and their attributes can change over time. For example, a lake may grow or shrink due to changes in climate, water extraction or any number of reasons. For many applications, it is important that information about spatial things is kept up to date. When new information is available, the data publisher may make this available on the Web according to their update schedule and policies. [DWBP] section 8.6 Data Versioning and Best Practice 21: Provide data up to date provide directly applicable guidance.

When dealing with change to a spatial thing, you should consider its lifecycle; in particular, how much change is acceptable before a spatial thing can no longer be considered as the same resource. Consider Eddystone Lighthouse for example: the “Eddystone Light”, a maritime navigation aid, has existed in (more or less) the same place on Eddystone Rocks since 1698. A single HTTP URI (such as http://dbpedia.org/resource/Eddystone_Lighthouse) is used to identify “the lighthouse on Eddystone rocks” for all that period. The lighthouse's attributes (such as its focal height, visible range and light characteristic) have changed over that period, but we still consider it to be the same lighthouse. However, if our interest is historic buildings, we would identify the four different structures that have stood on that site as different spatial things, from Winstanley's Eddystone Lighthouse (the first incarnation) to Douglass' Eddystone Lighthouse (the 4th and current incarnation). Incremental change for these structures during the entire period from 1698 is not appropriate; one structure replaces another and so each structure should be assigned a unique identifier. In summary, different things are important to different people!

Essentially, the decision to assign a new identifier in response to change depends on how domain experts think about the lifecycle of the spatial thing, which then manifests in a data modelling choice. [DWBP] section 8.9 Data Vocabularies and section 12.5 Spatial Data Vocabularies provide further guidance on the topic of data modelling; determining which concepts and relationships should be used to describe your area of interest.

Data publishers should not attempt to guess all the purposes for which someone might use or reference their data - ending up with a super-complex data model that tries to cover every possible use case. Instead, data publishers should try to help data consumers make informed decisions about the best way to use the data by providing good metadata. When it comes to spatial things, or any resource, that changes over time, it is important to provide metadata about the life cycle of those entities and the resources used to describe them. Given that information, data consumers can make considered choices about which resource they want to link to. [DWBP] section 8.2 Metadata provides useful guidance.

All that said, if you consider that the change affects the fundamental nature of the spatial thing, then you should assign a new identifier. See section 12.4 Spatial Data Identifiers for more details. Otherwise, read on for guidance on how to describe properties that change over time.

Best Practice 6: How to describe properties that change over time

Spatial data should include metadata that allows a user to determine when it is valid for.

Why

Spatial things and their attributes change over time. Mostly, users are interested in current information. They need to be able to determine whether the published description of a spatial thing meets their needs. For example, is the published geographic extent of the City of Amsterdam relevant for a land-usage study of the nineteenth century? (Gemeentegeschiedenis.nl, "Parish History", illustrates how the extent of Amsterdam has changed during the past 200-years, in HTML and GeoJSON). Where the information is available, a user may want to browse older versions of the published information to understand the nature of any changes or to find historical information.

Intended Outcome

Users are provided with the most recent version of information about a spatial things and its attributes by default.

Users are able to determine the time period for which data is applicable.

If a version history of changes is available, users are able to browse through a set of changes to see how a spatial thing and its attributes have changed over time.

Possible Approach to Implementation

When publishing information about a spatial thing that is subject to change there are three main approaches to consider:

simply updating the description of the spatial thing in response to a change;
providing a series of immutable snapshots that describe the spatial thing at various points in its lifecycle; and
capturing a time-series of data values within an attribute of the spatial thing.

Whichever approach is chosen, publishers of spatial data should consider how dataset metadata plays an important part in helping users determine whether a dataset is fit for their use. Particularly where the contents of a dataset change with time, statements about the (most recent) publication date, the frequency of update and the time period for which the dataset is relevant (i.e. temporal extent) should be provided. Please refer to [DWBP] section 8.2 Metadata for more details about dataset metadata.

A description of the lifecycle of the spatial things (e.g. what triggers a change and whether those changes are versioned etc.) should also be provided in either the dataset's metadata, schema or specification. For example, the UK's Digital National Framework policy states that data publishers must provide these lifecycle rules.

Approach (1) is lightweight and should only be used where there are no user requirements that require access to older descriptions of the spatial things. Data publishers simply replace the old description of the spatial thing with the amended description and keep users informed about updates by providing the appropriate metadata (e.g. when the data was changed). This may be achieved using dataset metadata (as outlined above) or by including the metadata attributes in the description of each spatial thing.

Where users are anticipated to need to understand how a spatial thing has changed over time, approaches (2) and (3) must be considered.

Approach (2) requires the data publisher to publish immutable resources that describe the spatial thing at specific points in time (i.e. "snapshots") and provide a mechanism for users to browse between those snapshots. Given that each snapshot of the spatial thing is published as a separate resource, this approach is suited to infrequent changes so that the number of snapshots does not become unweildy.

The URI for the spatial thing, the base URI, should resolve to provide the current information and a link to its version history of snapshots. [DWBP] Best Practice 8: Provide version history describes how a version history may be implemented. Each snapshot resource within the version history must be uniquely identified; a common approach is to append a date/time stamp to the base URI as a version indicator. [DWBP] Best Practice 7: Provide a version indicator provides relevant guidance.

Approach (3) is suitable where a spatial thing has a small number of attributes that are frequently updated. For example, the GPS-position of a runner or when streaming data from a sensor, such as the water level from a stream guage.

With this approach, the description of the spatial thing must include a property that contains a sequentially-ordered set of data-points, each of which defines a time-stamp and the values for the time-varying attribute(s). By definition, this property can be considered as a time-series coverage. Standard data encodings are available for time-series data, including: [TIMESERIESML] for GML, plus [COVERAGE-JSON] and [SENSORTHINGS] for JSON.

Note

The OGC [MOVING-FEATURES-XML] and [MOVING-FEATURES-CSV] specifications follow the pattern described above. A trajectory element is used to describe the position of a spatial thing, and varying attributes (such as orientation or rotation) can be added alongside the tuples in the trajectory. However, there is limited evidence of adoption outside of Japan.

How to Test

...

Evidence

Relevant requirements: R-MachineToMachine, R-MovingFeatures, R-Streamable

Benefits

Comprehension
Trust
Access

12.4 Spatial Data Identifiers

The primary topics of any spatial dataset are spatial things, each described by a set of attributes and usually at least one geometry. How your spatial data is structured will depend on the vocabulary or data model you use (see section 12.5 Spatial Data Vocabularies for further details on vocabulary choice). This will determine the types of entities that, along with the spatial things themselves, are important enough to be given identifiers so that statements can be made about them. Geometry objects are an example of an entity that is often assigned a unique identifier so that they can be referenced or reused.

To publish spatial data on the Web, we need to stitch the spatial things and their corresponding entities into the Web’s information space; contributing to the Web of data. First: [WEBARCH] Good Practice: Identify with URIs states that "agents should provide URIs as identifiers for resources". Second: the 5 Star Data scheme states: "★★★★ use URIs to denote things, so that people can point at your stuff".

[DWBP] Best Practice 10: Use persistent URIs as identifiers within datasets provides directly applicable guidance. When identifying resources, it advises:

Seek and reuse existing URIs, ensuring that the URIs are persistent and they are published by a trusted group or organization; or
Create your own persistent URIs.

Furthermore, given ubiquitous use of the Hyper Text Transfer Protocol (HTTP) on the Web, we SHOULD use HTTP URIs to identify resources in spatial data.

Note

We consider identifiers in the Web’s information space to be unaffected by the choice to serve HTTP content securely or not. For example, http://example.org/country/suriname and https://example.org/country/suriname both identify the same spatial thing - in this case the South American country of Suriname.

Resources identified with HTTP URIs can be specified as the target of links within the Web’s global information space, enabling information from different sources to be related and combined. This is the fundamental basis of 5★ Linked Data: "★★★★★ link your data to other data to provide context".

Best Practice 7: Use globally unique persistent HTTP URIs for spatial things

Use stable HTTP URIs to identify spatial things, re-using commonly used URIs where they exist and it is appropriate to do so.

Why

The Web works with resources that are identified using HTTP URIs. We want Spatial things to be first class resources on the Web that we want to make statements about and relate to other resources. To do this, spatial things need to be addressable resources in the Web’s global information space which means they must be identified using HTTP URIs.

This is a fundamentally different data publication approach to what is typical today where the dataset is (often) globally identified, but individual spatial things, or "features" in SDI parlance, are not - at least not with a persistent identifier.

The HTTP URIs used to identify spatial things need to be stable or persistent so that relationships that link them to other resources don’t break.

Intended Outcome

Spatial things become part of the Web’s global information space enabling them be linked with other spatial things and other resources and for those links to be durable. In other words, spatial data becomes part of the Web of Data.

Possible Approach to Implementation

The Web of data is made up of subjects and objects; the things we talk about and the things we refer to. For example, we could say that Anne Frank's House (the subject) is within the Municipality of Amsterdam (the object). In RDF this looks like:

<https://g.co/kg/m/02s5hd> schema:containedInPlace <http://sws.geonames.org/2759793> .

When considering HTTP URIs for objects (e.g. the target of our hyperlinks) it makes sense to reuse existing identifiers. After all, you are trying to stitch your spatial data into the Web so that we can "link your data to other data" and achieve a ★★★★★ rating! Organizations such as DBPedia, GeoNames and government mapping and cadastral authorities (that publish national registers of addresses, buildings, etc.) are good sources of stable, authoritative URIs. Appendix B. Authoritative sources of geographic identifiers lists sources of URIs for spatial things, and the steps described for discovering existing vocabularies [LD-BP] can be readily adapted to find more. For more details about how you might link to these authoritative identifiers, see section 12.7 Linking Spatial Data.

However, HTTP URIs for subjects (e.g. the resource that we want to make statements about) can be a bit more tricky. If you are working purely with data then you can reuse existing URIs minted by other authorities for your subject URIs. But publishing spatial data on the Web means that the URIs for each spatial thing should resolve to Web pages or data resources that provide useful information (see ). An HTTP request will be directed to a host Web server, identified by the internet domain name (or IP address) in the requested URI. If you use a URI with an internet domain name where you have no control over how the Web server behaves, then there is no way for your statements to be included in the Web server's response.

To take control of how information about spatial things is presented, data publishers need to assign their subject spatial things HTTP URIs from an internet domain name where they have authority over how the Web server responds. Typically, this means minting new HTTP URIs. It's all worth considering that the use of a particular internet domain may reinforce the authority of the information served. For example, a URI for Anne Frank's House is: https://monumentenregister.cultureelerfgoed.nl/monuments?MonumentId=4296. The use of the internet domain registered to the Cultural Heritage Agency of the Netherlands gives the definition authenticity.

Note

The need to control what information is provided about a given spatial thing means that it is not uncommon for a spatial thing to be identified by multiple HTTP URIs. The equality between two URIs that refer to the same resource can be stated using a property such as owl:sameAs. Care must always be taken when using owl:sameAs to determine that the two URIs actually refer to the same resource, rather than two resources that are similar. Warning: don't say if you're not sure it's true!

For more information about the types of properties that can be used to link between spatial things, and between spatial things and other resources, see section 12.7 Linking Spatial Data.

When minting your own URIs, [DWBP] Best Practice 10: Use persistent URIs as identifiers within datasets cites the advice from GS1's SmartSearch Implementation Guideline [GS1] which suggests that your URIs should include the type of resource that is being identified to help human readability. Also, given the need for the HTTP URIs for spatial things to be used throughout their lifetime (and perhaps beyond) you should give some thought to designing a URI that is persistent.

Example 17

This URI identifies the Amsterdam Central train station:

https://brt.basisregistraties.overheid.nl/top10nl/id/gebouw/102625209

This URI was minted using the recommendations in the Dutch URI strategy. Although minted by the Kadaster, they chose to use the domain ‘basisregistraties.overheid.nl’ (which translates to ‘base registries . government . nl’) because this is expected to be a more persistent name than ‘kadaster.nl’. Even though the Kadaster is over a 100 years old, organization names are not considered persistent in general as organizations may merge or their names may change. ‘top10nl’ is the name of the dataset, and ‘gebouw’ means ‘building’ – giving the human reader of this URI a clue of what is being identified. The last part of the URI is the building number from the dataset.

[DWBP] Best Practice 9: Use persistent URIs as identifiers of datasets cites the European Commission's Study on Persistent URIs [PURI] as a good source from which to gain insights about designing persistent URIs.

When an HTTP URI is resolved, the server will respond with a sequence of bytes: by its nature, HTTP can only serve information resources such as Web pages or JSON documents. Yet a spatial thing is actually a real or conceptual phenomenon - a lake is made from water not information! Using a single URI to refer to both the spatial thing and the page/document that describes the spatial thing introduces a URI collision. This can impose a cost in communication due to the effort required to resolve ambiguities. [URLs-in-data] has more to say on this subject, including recommending URI design patterns that enable differentiation between the spatial thing and the page/document that describes it.

However, in most cases using a single URI for both spatial thing and the page/document is simpler to implement and meets the expectations of most end-users. As stated in [WEBARCH] section 2.2.3 Indirect Identification, identifiers are commonly used in this way. There is no obligation to distinguish between the spatial thing and the page/document unless your application requires this.

Note

While there is a cost to this conflation, problems can be mitigated by avoiding making statements that confuse spatial thing and the page/document, such as “Uluru is available in KML format”; e.g. <http://sws.geonames.org/7645281> dc:hasFormat ex:kml .

This statement is clearly not true; an ancient monolith covering more than 3 km² cannot be provided in XML!

Issue 208

There is a level of discomfort in the wider community (based on discussion with Platform Linked Data Nederland folks amongst others) about whether this best practice should recommend "indirect identifiers" (where spatial thing and page/document both share the same URI) while the TAG Guidance (albeit from 2005) states that a HTTP 303 (see other) response should be provided by servers resolving the URI of a non-information resource (such as a spatial thing), referring the user agent to the corresponding information resource. (i.e. the /id and /doc pattern that is in widespread use but often seems to confuse users and even some experts).

We pretty much agreed that use of indirect identifiers was OK during our discussion at TPAC-2016. That said, we didn't record a resolution.

If we want to stick with the TAC guidance, suggest that we remove the paragraph beginning

However, in most cases using a single URI for both spatial thing and the page/document [...]

and the following note, and replace with:

Dereferencing URIs for spatial things should result in a HTTP 303 (see other) response that redirects the user agent to the corresponding page/document. This means that the spatial thing and the page/resource MUST have different URIs. It is common to use /id as part of the URI for non-information resources, and /doc for the corresponding page/document.

That said, [URLs-in-data] provides other alternatives such as using a #id fragment.

HTTP URIs for spatial things should not include any indication of the data format used to encode the page/document as this may change as your systems evolve. That said, you may wish to provide a set of complementary resources that specify a particular format as part of your content negotiation strategy. For example, the URI http://sws.geonames.org/7645281/about.rdf resolves to provide an RDF/XML encoding of the information about Uluru in the Northern Territory of Australia (http://sws.geonames.org/7645281).

[DWBP] Best Practice 10: Use persistent URIs as identifiers within datasets notes that URIs can be long. You may need to define identifiers that are locally unique within your spatial dataset and provide a mechanism to programmatically convert each local identifier to a URI. For example, the Metadata Vocabulary for Tabular Data [TABULAR-METADATA] achieves this using URI Templates as described in [RFC6570].

It is also good practice to use a redirection service to hide complex and potentially changing service end-point URLs, such as for a Web Feature Service behind well-designed URIs. This means that users don’t need to be aware of the complexities of the API or changes in endpoint URIs or API versions in order to request information about a particular spatial thing. For example, the URI http://data.example.org/aan/id/perceel/aan.2528 could be used as proxy for the WFS GetFeature request http://geodata.nationaalgeoregister.nl/aan/wfs?VERSION=2.0.0&SERVICE=WFS&REQUEST=GetFeature&featureID=aan.2528.

Finally, while it is simple to use a query-pattern URL to serve information about a resource identified with a URI from a third-party internet domain, e.g. http://example.org/museums?q=http://sws.geonames.org/6618987, these URLs are unsuitable as persistent identifiers. More often than not, your intended users will dereference the "official" URI, e.g. http://sws.geonames.org/6618987. That said, this kind of search operation does provide a useful mechanism to find particular spatial things. See Best Practice 12: Include search capability in your data access API for further details.

How to Test

Check that within the data spatial things, such as countries, regions and people, are referred to by HTTP URIs or by short identifiers that can be converted to HTTP URIs. Ideally the URIs should resolve, however, they have value as globally scoped variables whether they resolve or not.

Evidence

Relevant requirements: R-Linkability, R-GeoReferencedData, R-IndependenceOnReferenceSystems.

Benefits

12.5 Spatial Data Vocabularies

In this document there is no section on formats for publishing spatial data on the web. The formats are basically the same as for publishing any other data on the web: XML, JSON, CSV, RDF, etc. Refer to [DWBP] section 8.6 Data Formats for more information and best practices.

That being said, it is important to publish your spatial data with clear semantics, i.e. to provide information about the contents of your data. The primary use case for this is you have information about a collection of SpatialThings and you want to publish precise information about their attributes and how they are inter-related. Another use case is the publication on the Web of a dataset that has a spatial component in a form that search engines will understand.

Depending on the format you use, the semantics may already be described in some form. For example, in GeoJSON [RFC7946] this description is present in the specification. When using JSON it is possible to add semantics using a JSON-LD @context. For providing semantics to search engines, using schema.org is a good option, as explained in Best Practice 4: Make your spatial data indexable by search engines.

In a linked data setting, the attributes of a spatial thing can be described using existing vocabularies, where each term has a published definition. [DWBP] Best Practice 15: Reuse vocabularies, preferably standardized ones recommends using terms from an established widely used vocabulary. If you can't find a suitable existing vocabulary term, you should create your own, and publish a clear definition for the new term (see [LD-BP]. We recommend that you link your own vocabulary to commonly used existing ones because this increases its usefulness. We provide the mapping between some commonly used spatial vocabularies.

Issue 225

We must avoid being overly focused on RDF.

The [LD-BP] reference makes this section very RDF dependent. Is there a need / justification for this? Are we saying that RDF is the only recommended way to publish data and models on the Web? Web developers may not care about RDF vocabularies and maybe they prefer a Swagger document (just to pick an example)?

To reduce the RDF focus, some text was added.

Note

The current list of RDF vocabularies / OWL ontologies for spatial data being considered by the SDW WG are provided below. Some of these will be used in examples. Full details, including mapping between vocabularies, pointers about inconsistencies in vocabularies (if any are evident), and recommendations avoiding their use as these may lead to confusion, will be published in a complementary NOTE: Comparison of geospatial vocabularies.

The NOTE will be concerned with helping data publishers choose the right spatial data format or vocabulary. It provides a methodology for making that choice. We do this rather than recommending one vocabulary because this recommendation would not be durable as vocabularies are released or amended.

Vocabularies can discovered from Linked Open Vocabularies (LOV); using search terms like 'location' or Tags place, Geography, Geometry and Time.

W3C WGS84 Position
schema.org
OGC GeoSPARQL
GeoSPARQL spatial ontology update proposal
neogeo
DBpedia (including dbpedia:Place)
UK Ordnance Survey geometry
UK Ordnance Survey spatial relations
UK Office for National Statistics http://statistics.data.gov.uk/def/statistical-geography# and http://statistics.data.gov.uk/def/statistical-entity# (URIs do not resolve)
XKOS (used for geographical hierarchies in some examples)
Dublin core metadata 'DC Terms' (including dct:spatial and dct:coverage etc.)
BBC's Place
ISA Programme Location Core Vocabulary (LOCN)
SWPortal ontology includes definition of location
VCard ontology includes definition of location
GeoRSS OWL
stSPARQL from Strabon system (see publications iswc-strabon.pdf and eswc2013.pdf)
gndo:#PlaceOrGeographicName
IGN ontology for describing coordinate systems
IGN ontology for describing geometry (re-using NeoGeo and geosparql)
IGN ontology for describing administrative units
IGN ontology for describing main features on the territory
INSEE ontology for describing geometry

No attempts have yet been made to rank these vocabularies; e.g. in terms of expressiveness, adoption etc.

Note

The motivation behind the ISA Programme Location Core Vocabulary was establishing a minimal core common to existing spatial vocabularies. However, experience suggests that such a minimal common core is not very useful as one quickly need to employ specific semantics to meet one's application needs.

Issue 37

Do we need a subclass of SpatialThing for entities that do not have a clearly defined spatial extent; or a property that expresses the fuzziness the extent?

12.5.1 Describing location

Location information is often a common thread running through such data and can be an important 'hook' for finding information and for integrating different datasets. There are different ways of describing the location of spatial things. You can use and/or refer to the name of a well known named place, provide the location's coordinates as a geometry or describe it in relation to another location. These last two options are described in this section.

Best Practice 8: Provide geometries on the Web in a usable way

Geometry data should be expressed in a way that allows its publication and use on the Web.

Why

This best practice helps with choosing the right format for describing geometry based on aspects like performance and tool support. It also helps when deciding on whether or not using literals for geometric representations is a good idea.

Intended Outcome

The format chosen to express geometry data should:

Support the dimensionality of the geometry;
Be supported by the software tools used within data user community;
Keep geometry descriptions to a size that is convenient for Web applications;
Support the CRS you need.

Possible Approach to Implementation

Steps to follow:

Decide on the geometric data representations based on performance; often the geometry data is a large proportion of the total size of a dataset.
Determine the dimensionality of geometry data (0d 'point' to 3d 'volume').
Determine in which coordinate reference system(s) (CRS) the data should be published (ref to section about CRS). Not all formats and vocabularies support the use of other CRS besides the most common one on the Web, WGS84.
Determine which format(s) are supported by software tools that you anticipate your user community to employ; where multiple formats are in required, consider offering as many representations as you can - balancing the benefit of ease of use against the cost of the additional storage or additional processing if converting on-the-fly. See [DWBP] Best Practice 19: Use content negotiation for serving data available in multiple formats for more information.
Choose the right format and decide when to use geometry literals. For geometry literals, several solutions are available, like Well-Known Text (WKT) representations, GeoHash and other geocoding representations. The alternative is to use structured geometry objects as is possible, for example, in [GeoSPARQL].
There are also several suitable binary data formats (e.g. Google's protocol buffers for vector tiling); however, some binary formats do not (effectively) work on the Web as there are no software tools for working with those formats from within a typical Web application; to work with data in such formats, you must first download the data and then work with it locally.
There are widespread practices for representing geometric data as linked data, such as using W3C WGS84 Geo Positioning vocabulary (geo) geo:lat and geo:long that are used extensively for describing geo:Point objects.
Concrete geometry types are available, such as those defined in the OpenGIS [ Simple-Features] Specification, namely 0-dimensional Point and MultiPoint; 1-dimensional curve LineString and MultiLineString; 2-dimensional surface Polygon and MultiPolygon; and the heterogeneous GeometryCollection.

Example 18

Example(s) to be added; including:

bounding box, centroid (ref: R-BoundingBoxCentroid)
show Coordinate Reference System (CRS) definition (ref: R-DeterminableCRS)
vector geometry (ref: R-EncodingForVectorGeometry)
non-geospatial example e.g. microscopy (ref: R-IndependenceOnReferenceSystems)
encoding information in different ways; e.g. height … above mean sea level, or floor 5 of The Shard
provision of multiple geometries for a single feature; e.g. a single reference point (for "pin on map"), a bounding box (for search), a simple geometry (for coarse spatial analysis), a detailed geometry (for resolving cadastral boundary disputes) etc.

How to Test

...

Evidence

Relevant requirements: R-BoundingBoxCentroid, R-Compressible, R-CRSDefinition, R-EncodingForVectorGeometry, R-IndependenceOnReferenceSystems, R-MachineToMachine, R-SpatialMetadata, R-3DSupport, R-TimeDependentCRS, R-TilingSupport.

Benefits

Best Practice 9: How to describe relative positions

Provide a relative positioning capability in which the entities can be linked to a specific position.

Why

In some cases, it is necessary to describe the location of an entity in relation to another location or in relation to location of another entity. For example, South-West of Guildford, close to London Bridge.

Intended Outcome

It should be possible to describe the location of an entity in relation to another entity or in relation to a specific location, instead of specifying a geometry.

The relative positioning descriptions should be machine-interpretable and/or human-readable.

Possible Approach to Implementation

The relative positioning should be provided as:

A positioning capability to describe the position of entities with explicit links to a specific location and/or other entities.
Semantic descriptions for relative positions and relative associations to an explicit or absolute positioning capability.

Issue 121

Do we need this as a best practice; if yes, this best practice needs more content

How to Test

...

Evidence

Relevant requirements: R-MachineToMachine, R-SamplingTopology.

Benefits

12.5.2 Publishing data with clear semantics

In most cases, the effective use of information resources requires understanding thematic concepts in addition to the spatial ones; "spatial" is just a facet of the broader information space. For example, when the Dutch Fire Service responded to an incident at a day care center, they needed to evacuate the children. In this case, the 2nd closest alternative day care center was preferred because it was operated by the same organization as the one that was subject of the incident, and they knew who all the children were.

This best practice document provides mechanisms for determining how places and locations are related - but determining the compatibility or validity of thematic data elements is beyond our scope; we're not attempting to solve the problem of different views on the same/similar resources.

That said, there is one aspect of thematic semantics that must be mentioned. The most important semantic statement you can make when publishing spatial data - or any data - is to specify the type of a resource. For spatial things, there are a number of types that define "spatialness" (see section 12.5.1 Describing location). But you should also consider non-spatial aspects when designating the type of a spatial thing. For example, should a fire incident occur at Amsterdam Central railway station, it might seem sensible for the Municipal Fire Department to designate a type such as Building or Station (the Dutch Government Base Registry defines Amsterdam Central railway station, identified as https://brt.basisregistraties.overheid.nl/top10nl/id/gebouw/102625209, designates both of these types). However, the Fire Department are concerned with a fire incident - not the railway station itself. The fire incident is a spatial thing (it has spatial extent) but it is not the station. For example, the fire may spread to adjacent buildings. The Fire Department might designate their spatial thing as having type FireIncident or similar. Advice on how to assign a persistent identifier to the fire incident is provided in Best Practice 7: Use globally unique persistent HTTP URIs for spatial things, and section 12.7 Linking Spatial Data provides guidance on how one might relate the fire incident to other conincident spatial things such as Amsterdam Central railway station.

Note

Thematic semantics are out of scope for this best practice document. For associated best practices, please refer to [DWBP] section 8.2 Metadata, Best Practice 3: Provide structural metadata; and [DWBP] section 8.9 Data Vocabularies, Best Practice 15: Reuse vocabularies, preferably standardized ones and Best Practice 16: Choose the right formalization level.

Benefits

Issue 38

We might publish in the Best Practice Note or a complimentary Note a set of statements mapping the set of available vocabularies about spatial things. There are mappings available e.g. GeoNames has a mapping with schema.org. http://www.geonames.org/ontology/mappings_v3.01.rdf

12.5.3 Temporal aspects of spatial data

Temporal relationship types will be described here and be entered eventually as link relationship types into the IANA registry, Link relations, just like the spatial relationships.

In the same sense as with spatial data, temporal data can be fuzzy.

Note

Retain section; point to where temporal data is discussed in detail elsewhere in this document.

12.6 Spatial Data Access

SDIs have long been used to provide access to spatial data via web services; typically using open standard specifications from the Open Geospatial Consortium (OGC). With the exception of the Web Map Service, these OGC Web service specifications have not seen widespread adoption beyond the geospatial expert community. In parallel, we have seen widespread emergence of Web applications that use spatial data.

[DWBP] provides best practices discussing access to data using Web infrastructure (see [DWBP] section 8.10 Data Access). This section provides additional insight for publishers of spatial data.

Making data available on the Web requires data publishers to provide some form of access to the data. There are numerous mechanisms available, each providing varying levels of utility and incurring differing levels of effort and cost to implement and maintain. Publishers of spatial data should make their data available on the Web using affordable mechanisms in order to ensure long-term, sustainable access to their data.

When determining the mechanism to be used provide Web access to data, publishers need to assess utility against cost. In order of increasing usefulness and cost:

Bulk-download or streaming of the entire dataset (see [DWBP] Best Practice 17: Provide bulk download) - note that providing bulk download is useful in any case
Generalized query API (for spatial data: WFS or [GeoSPARQL] )
Bespoke API designed to support a particular application (see [DWBP] Best Practice 23: Make data available through an API)

Note

Read [DWBP] Best Practice 23: Make data available through an API, Best Practice 24: Use Web Standards as the foundation of APIs, Best Practice 25: Provide complete documentation for your API, and Best Practice 26: Avoid Breaking Changes to Your API for general recommendations about publishing data using APIs.

Best Practice 11: Expose spatial data through 'convenience APIs'

If you have a specific application in mind for publishing your data, tailor the spatial data API to meet that goal.

Why

When access to spatial data is provided by bulk download or through a generalized query service, users need to understand how the data is structured in order to work effectively with that data. Given that spatial data may be arbitrarily complex, this burdens the data user with significant effort before they can even perform simple queries. In addition, spatial datasets tend to be large. Convenience APIs are tailored to meet a specific goal; enabling a user to engage with arbitrarily complex data structures using (a set of) simple queries.

Intended Outcome

The API provides a coherent set of queries and operations, including spatial ones, that help users get working with the data quickly to achieve common tasks. The API provides both machine readable data and human readable HTML markup; the latter is used by search engines to index the spatial data.

Possible Approach to Implementation

The API should offer both machine readable data and human readable HTML that includes the structured metadata required by search engines seeking to index content (see Best Practice 4: Make your entity-level data indexable by search engines for more details).

1. Reuse your existing spatial data infrastructure

In the geospatial domain there are a lot of WFS services providing data. A RESTful API as a wrapper, proxy or a shim layer can be created around WFS services. Content from the WFS service can be provided in this way as linked data, JSON or another Web friendly format. This approach is similar to the use of Z39.50 in the library community; that protocol is still used but 'modern' Web sites and web services are wrapped around it.

There are examples of this approach of creating a convenience API that works dynamically on top of WFS such as the experimental ldproxy. This requires relatively little effort and is an attractive option for quickly exposing spatial data from existing WFS services on the Web. The approach is to create an intermediate layer by introducing a proxy on top of the WFS (data service) and CSW (metadata service) so the contained resources are made available. The proxy maps the data and metadata to schema.org according to a provided mapping scheme; assigns URIs to all resources based on a pattern; makes each resource available in HTML, XML, JSON-LD, GML, GeoJSON, and RDF/XML (metadata only); and generates links to data in other datasets using SPARQL queries.

[Add description of another example using a similar approach: PDOK]

2. Provide web-friendly access to the data as an alternative

A more effective route may be to provide an alternative 'Linked Data friendly' access path to the data source; creating a new, complementary service endpoint e.g. expose the underpinning postGIS database via SPARQL endpoint (using something like ontop-spatial) and Linked Data API.

3. To limit the amount of modifications and load on your SDI but still maintain a direct link between the data provided through the SDI and the web friendly version of the spatial features, use 'rdf_seealso' as spatial feature attribute to point to the web friendly representation.

Example of providing an alternative access path to WFS GML source data — Fig. 1 Providing an alternative 'Linked Data friendly' access path to a WFS data source.

Example 22

Example(s):

An event based API for traffic information, that tells you when a certain speed threshold is reached; e.g. traffic is too slow (using AMQP or MQTT protocol?).
A RESTful API as a wrapper or a shim layer around a WFS service, providing GML content as Linked Data. (Geluidskaarten API in SwaggerHub, based on a WFS service published in the Dutch geoportal Publieke Dienstverlening op de Kaart (PDOK).
http://geo.resc.info:8080/geoserver/netage/wms?service=WMS&version=1.1.0&request=GetMap&layers=netage:provincie&styles=&bbox=10426.282,306846.198,278026.09,621876.3&width=652&height=768&srs=EPSG:28992&format=application/openlayers [server does not resolve (21 Dec 2016) - to be checked (see GitHub ISSUE 482 )]: example of an API providing an alternative access path to the data source complementing an existing WFS end-point (to be provided by @bartvanleeuwen).
The OGC Sensor Things API.
'triple pattern fragments' from Linked Data Fragments provide a very efficient, lightweight way of querying linked data.
Environment Agency Bathing Water Quality API (implemented using the Epimorphic's ELDA implementation of the Linked Data API; enables configured queries against (general) SPARQL endpoints to be exposed as RESTful web services).
OGC Web Processing Service (WPS) encapsulate the underlying complexity of complex services like OGC WCS. This is probably the only way such complex services will ever be exposed in operational environments.
Paginating large responses from an API; using W3C Linked Data Platform Paging 1.0 or Hydra pagination.
Requesting a subset of a coverage or an aggregate set of SpatialThings; for example, using a well-known pattern like OpenSearch Geo and Temporal extensions.
More examples to be added.

How to Test

...

Evidence

Relevant requirements: R-Compatibility, R-LightweightAPI.

Benefits

Best Practice 12: Include search capability in your data access API

If you publish an API to access your data, make sure it allows users to search for specific data.

Issue 186

Should best practice "Include search capability in your data access API" move or be removed?

Why

It can be hard to find a particular resource within a dataset, requiring either prior knowledge of the respective identifier for that resource and/or some intelligent manual guesswork. It is likely that users will not know the URI of the resource that they are looking for- but may know (at least part of) the name of the resource or some other details. A search capability will help a user to determine the identifier for the resource(s) they need using the limited information they have.

Intended Outcome

A user can do a text search on the name, label or other property of an entity that they are interested in to help them find the URI of the related resource.

Possible Approach to Implementation

to be added

How to Test

...

Evidence

Relevant requirements: {... hyperlinked list of use cases ...}

Benefits

Best Practice 13: Provide subsets for large spatial datasets

Identify subsets of large spatial data resources that are a convenient size for applications to work with

Issue 195

Is the term "subset" correct?

Why

[DWBP] Best Practice 18: Provide Subsets for Large Datasets explains why providing subsets is important. Spatial datasets, particularly coverages such as satellite imagery, sensor measurement time-series and climate prediction data, are often very large. In these cases it is useful to provide subsets by having identifiers for conveniently sized subsets of large datasets that Web applications can work with.

Intended Outcome

Being able to refer to subsets of a large spatial data resource that are sized for convenient usage in Web applications.

Possible Approach to Implementation

Two possible approaches are described below:

Create named subsets.
- Determine how users may seek to access the dataset, determining subsets of the larger dataset that are appropriate for the intended application. A data provider may consider a general approach to improve accessibility of a dataset, while a data user might want to publish details of a workflow for others to reuse referencing only the relevant parts of the large dataset.
- Given the anticipated access pattern, create new resources and mint a new identifier for each subset.
- Provide metadata to indicate how a given subset resource is related to the original large dataset, with reference to an identified Mutually Exclusive Collectively Exhaustive (MECE) set where appropriate.
Map a URI pattern to an existing data-access API.

Note

A Web service URL in general does not provide a good URI for a resource as it is unlikely to be persistent. A Web service URL is often technology and implementation dependent and both are very likely to change with time. For example, consider oft used parameters such as ?version=. Good practice is to use URIs that will resolve if the resource is relevant and may be referenced by others, therefore identifiers for subsets should be protocol independent.
- Identify the service end-point that provides access to the larger dataset.
- Determine which parameters offered by the service end-point are required to construct a meaningful subset.
- Map these parameters into a URI pattern and configure an HTTP server to apply the necessary URL-rewrite.

How to Test

...

Evidence

Relevant requirements: R-Compatibility, R-Linkability, R-Provenance, R-ReferenceDataChunks.

Benefits

Issue 113

More content needed for this best practice.

12.7 Linking Spatial Data

Earlier in this document, the Linked Data section explained that we believe that Linked Data requires only that the formats used to publish data support Web linking. In other words, linking spatial data does not automatically mean the use of RDF; links can also be created, for example, using GML, HTML or JSON-LD.

Links, in whatever machine-readable form, are important. In the wider Web, it is links that enable the discovery of web pages: from user-agents following a hyperlink to find related information to search engines using links to prioritize and refine search results. This section is concerned with the creation and use of those links to support discovery of the SpatialThings described in spatial datasets.

For data to be on the web the resources it describes need to be connected, or linked, to other resources. The connectedness of data is one of the fundamentals of the Linked Data approach that these best practices build upon. The 5-star rating for Linked Open Data asserts that to achieve the fifth star you must "link your data to other data to provide context". The benefits for consumers and publishers of linking to other data are listed as:

You can discover more (related) data while consuming the data.
You can directly learn about the data schema.
You make your data discoverable.
You increase the value of your data.
Your own organization will gain the same benefits from the links as the [other] consumers.

Just like any type of data, spatial data benefits massively from linking when publishing on the web. The widespread use of links within data is regarded as one of the most significant departures from contemporary practices used within SDIs. That's why this topic is included in this Best Practice.

Crucially, the use of links is predicated on the ability to identify the origin and target, or beginning and end, of the link. Best Practice 6: Use globally unique persistent HTTP URIs for spatial things is a prerequisite.

This section extends [DWBP] by providing a best practice about creating links between the resources described inside datasets.

Note

[DWBP] identifies Linkability as one of the benefits gained from implementing the Data on the Web best practices (see [DWBP] section 8.7 Data Identifiers Best Practice 9: Use persistent URIs as identifiers of datasets and Best Practice 10: Use persistent URIs as identifiers within datasets). However, no discussion is provided about how to create the links that the use those persistent URIs.

Best Practice 14: Publish links from spatial things to related resources

The data should be published with explicit links, including spatial links, to spatial things or other resources, both in the same dataset and in other datasets.

Why

Exposing entity-level links to Web applications, user-agents and Web crawlers allows the relationships between resources to be found without the data user needing to download the entire dataset for local analysis. Entity-level links, preferably meaningful links, provide explicit description of the relationships between resources and enable users to find related data and determine whether the related data is worth accessing. Entity-level links can be used to combine information from different sources; for example, to determine correlations in statistical data relating to the same location.

Data publishers should assert the relationships that they know about. Relationships between resources with spatial extent (i.e. size, shape, or position; SpatialThings) can often be inferred from their spatial properties, but this is a complex task. It requires both complex spatial processing (e.g. region connection calculus) and some degree of understanding about the semantics of the two, potentially related, resources in order to determine how they are related, if at all. This should not be left to the data user.

When your spatial resources are linked to those in common usage it will be easier to discover them. For example, a data user interested in air quality data about the place they live might begin by searching for that place in popular data repositories such as GeoNames, Wikidata or DBpedia. Once the user finds the resource that describes the correct place, they can search for data that refers to the identified resource that, in this case, relates to air quality. Furthermore, by referring to resources in common usage, it becomes even easier to find those resources as search engines will prioritize resources that are linked to more often.

Note

It is not always feasible to link your spatial things to other resources in common usage. For example, if you were maintaining a registry of cultural heritage in Amsterdam, it would be reasonably simple to look up identifiers for the city's 50 or so museums and map these to your spatial things. But it would be a huge task for, say, a topographic mapping agency to cross-reference their entire catalogue of named places containing tens of thousands of spatial things with third-party resources (although in the spirit of crowd-sourcing, if someone else found those links useful, they may take on the task of relating the spatial things and publishing those relationships to the Web as a complementary resource!).

In essence, you should only manage the data that you have the resources to maintain.

Intended Outcome

It should be possible for humans and for machine agents to understand, interpret and follow the entity-level links between resources. Preferably, the definition of the meaning of a given link is precise and explicit.

It should be possible for humans and machine agents to find spatial relationships between Things without performing post geometric processing.

Spatial things should be related to commonly used resources.

Possible Approach to Implementation

Steps:

1. Choose one of the following general methods to provide explicit entity-level links:

Publish the data with links (to uniquely identified objects) embedded in the data.
Publish sets of links as a complementary resource.
Publish summaries of links for the dataset so that the semantics of the links can be evaluated and accessed if deemed appropriate.

Issue 36

The use of Linksets needs further discussion as evidence indicates that it is not yet a widely adopted best practice. It may be appropriate to publish such details in a Note co-authored with the DWBP Working Group.

Example 25

example(s) to be added; including ...

Encoding of entity-level links in a number of formats;
- [JSON-LD]
- [GML] (using [XLINK11])
- [BEACON] link dump format
Linking to a Thing as it was at a particular time
Publish summary of links for dataset using [VoID] Linksets;
- Spatial Identifier Reference Framework (SIRF) in which a summary of links for the data set is published as a Linkset; individual links are cached or computed at run-time and may be discovered via "technical features" of the Linkset.
- Open PHACTS Discovery Platform; bringing together pharmacological data resources in an integrated, interoperable infrastructure.
Third-party provision of links: sameas.org provides a (collection of) Link-set(s).
Meaningful predicates in RDF statements defined in a resolvable RDF vocabulary / ontology.
Use of SKOS to define hierarchical relationships between resources (see [[skos-primer]).
Formally defined XML elements defined in a resolvable XML schema.
Use formally registered IANA Link relations within HTTP link headers.
Use a 'profile' link relation type, as specified in [RFC6906], to refer to online documentation that describes the links and properties used within the referring resource.

Note

[GML] adopted the [XLINK11] standard to represent links between resources. At the time of adoption, XLink was the only W3C-endorsed standard mechanism for describing links between resources within XML documents. The Open Geospatial Consortium anticipated broad adoption of XLink over time - and, with that adoption, provision of support within software tooling. While XML Schema, XPath, XSLT and XQuery etc. have seen good software support over the years, this never happened with XLink. The authors of GML note that given the lack of widespread support, use of Xlink within GML provided no significant advantage over and above use a bespoke mechanism tailored to the needs of GML.

Note

[VoID] provides guidance on how to discover VoID descriptions (including Linksets)- both by browsing the VoID dataset descriptions themselves to find related datasets, and using /.well-known/void (as described in [RFC5758]).

Issue 162

How would a (user) agent discover that these 3rd-party link-sets existed? Is there evidence of usage in the wild?

Issue 163

Does the [BEACON] link dump format allow the use of wild cards / regex in URLs (e.g. URI Template as defined in [RFC6570]?

Issue 166

How do we know what is at the end of a link - and what can I do with it / can it do for me (e.g. the 'affordances' of the target resource).

How to describe the 'affordances' of the target resource?

2. Make spatial relationships explicit

It is a good idea to express spatial relationships between things explicitly in your data. A possible approach for this is to find out if they have corresponding geometries using spatial functions, and then express these correspondences as explicit relationships. If the spatial datasets you want to reconcile are managed in a Geographic Information System (GIS) or a spatial database, you can use the GIS spatial functions to find related spatial things. If your spatial things are expressed as Linked Data, you can use [GeoSPARQL], which has a set of spatial query functions available.

Note

The mechanics of how to decide when two resources are related, if they don't have geometric or topological properties that allow you to determine this, are beyond the scope of this best practice. Tools (e.g. OpenRefine and Silk Linked Data Integration Framework) are available to assist with reconciliation based on e.g. geographical names or addresses present in the data and may provide further insight.

Note

Where possible, existing identifiers should be reused when referring to resources (see Best Practice 7: Use globally unique persistent HTTP URIs for spatial things). However, the use of multiple identifiers for the same resource is commonplace, for example, where data publishers from different jurisdictions refer to the same SpatialThing. In this special case, properties such as owl:sameAs can be used to declare that multiple identifiers refer to the same resource. It is often the case that data published from different sources about the same physical or conceptual resource may provide different view points. Note that you shouldn't use owl:sameAs to indicate that two spatial things are only similar, while not exactly the same.

Issue 224

Maintaining links to *all* related resources doesn't scale. Redraft required.

3. Decide which spatial relationships to use

To let user agents know what is at the end of a link, it's a good idea to use explicitly defined relationship types to link between resources. [DWBP] section 8.9 Data Vocabularies provides information on general relationship types described in well-defined vocabularies (see [DWBP] Best Practice 15: Reuse vocabularies, preferably standardized ones).

Describing the spatial relationships between SpatialThings can be based on relationships such as topological (e.g. contains), geographical (e.g. nearby) and hierarchical (e.g. part of) links.

A good place to find spatial relationships to use is in spatial vocabularies. See Best Practice 9: Use spatial semantics for Spatial Things.
The geographical, topological and social hierarchy should be described with clear semantics and registered with IANA Link relations.
Hierarchical relationships (i.e. part of, for example between administrative regions, have a specific need for defining Mutually Exclusive Collectively Exhaustive (MECE) set.
Topological relationships that are described for an entity should have references to concepts such as over, under etc.
Spatial relationships can use concepts such as Region Connection Calculus (RCC8) that contains, overlaps, touches, intersects, adjacent to, or "spatial predicates" or also other similar concepts from Allen Calculus.

Social relationships can be defined based on perception; e.g. "samePlaceAs", nearby, south of. These relationships can also be defined based on temporal concepts such as: after, around, etc. In current practice, there is no such property as samePlaceAs to express the social notion of place; enabling communities to unambiguously indicate that they are referring to the same place but without getting hung up on the semantic complexities inherent in the use of owl:sameAs or similar.

Issue 215

Include details of which spatial relationships are published as IANA Link relations.

Issue 34

Which vocabularies out there have social spatial relationships? FOAF, GeoNames, ...

4. Determine the things to link to

The links should connect spatial things rather than information resource(s) that describe them. In many cases, different identifiers are used to describe the SpatialThing and the information resource that describes that SpatialThing. For example, within DBpedia, the city of Sapporo, Japan, is identified as http://dbpedia.org/resource/Sapporo, while the information resource that describes this resource is identified as http://dbpedia.org/page/Sapporo. Care should be taken to select the identifier for the SpatialThing rather than the information resource that describes it; in the example above, this is http://dbpedia.org/resource/Sapporo.

Besides making the links to things within the same dataset explicit, data publishers should also relate their data to commonly used spatial resources such as GeoNames using links.

A list of sources of commonly used spatial resources is provided in section B. Authoritative sources of geographic identifiers.

Note

This best practice is concerned with the connections between SpatialThings. When describing an individual SpatialThing itself, it is often desirable to decompose the information into several uniquely identified objects. For example, the geometry of an administrative area may be managed as a separate resource due to the large number of geometric points required to describe the boundary.

Example 28

Observation and measurement data usually represents a feature of interest related to Things: some thing or phenomenon in the real world that is being observed and measured. This link between the observation and measurement data and real world concepts and their feature of interest will help interpreting and using the data more effectively and will specify their relationships with concepts in the real world.

More example(s) to be added; including ...

Link between (social) places with indistinct boundaries; e.g. relate an event to its colloquially identified location (such as 'Trafalgar Square').
Link between to SpatialThings with well-defined boundaries; e.g. hierarchical administrative regions.
Link to a SpatialThing as it was at a particular time.
Bathing Water resources used within the Environment Agency's Bathing Water Quality application are related to administrative units published by the Ordnance Survey, the UK's national mapping agency. For example, Instow bathing water is linked to North Devon using the property district (http://statistics.data.gov.uk/def/administrative-geography/district). See the bathing water linked data page for Instow for further details.

How to Test

...

Evidence

Relevant requirements: R-Linkability, R-MachineToMachine, R-SamplingTopology, R-SpatialRelationships, R-SpatialOperators. R-Crawlability, R-Discoverability.

Benefits

Best Practice 15: Use links in spatial datasets to find related data

Related data to a spatial dataset and its individual data items should be discoverable by browsing the links

Why

In much the same way as the document Web allows one to find related content by following hyperlinks, the links between spatial datasets, SpatialThings described in those datasets and other resources on the Web enable humans and software agents to explore rich and diverse content without the need to download a collection of datasets for local processing in order to determine the relationships between resources.

Spatial data is typically well structured; datasets contain SpatialThings that can be uniquely identified. This means that spatial data is well suited to the use of links to find related content.

Issue 217

Are we missing a best practice describing how to discover and annotate information within unstructured resources?

Note

The emergency response to natural disasters is often delayed by the need to download and correlate spatial datasets before effective planning can begin. Not only is the initial response hampered, but often the correlations between resources in datasets are discarded once the emergency response is complete because participants have not been able to capture and republish those correlations for subsequent usage.

Intended Outcome

It should be possible for humans to explore the links between a spatial dataset (or its individual items) and other related data on the Web.

It should be possible for software agents to automatically discover related data by exploring the links between a spatial dataset (or its individual items) and other resources on the Web.

It should be possible for a human or software agent to determine which links are relevant to them and which links can be considered trustworthy.

Issue 66

What do we expect user-agents to do with a multitude of links from a single resource? A document hyperlink has just one target; but in data, a resource may be related to many things.

Possible Approach to Implementation

For a given subject resource find all the related resources that it refers to.
Evaluate the property type that is used to relate each resource in order to determine relevance (see Best Practice 14: Publish links from spatial things to related resources).
Use the metadata for the dataset within which the subject resource is described in order to determine which links to "trust" (e.g. whether to use the data or not); owner / publisher, quality information, community annotations (“likes”), publication date etc.
Aggregate links from trusted sources into a database. Referring URLs can be indexed to determine which resources refer to the subject resource, i.e. "what points to me?". These referring links are sometimes called back-links. Dataset-level metadata may provide information regarding the frequency of update for the information sources, enabling one to determine a mechanism for keeping the aggregated link-set fresh.

Note

These "back-links" can be traversed to find related information and also help a publisher assess the value of their content by making it possible to see who is using (or referencing) their data.
Use network / graph analysis algorithms to determine related information that is not directly connected; i.e. resources that are connected via a chain of links and intermediate resources.

How to Test

...

Evidence

Relevant requirements: {... hyperlinked list of use cases ...}

Benefits

12.8 Dealing with large datasets

When publishing large datasets on the web, and designing APIs for accessing those datasets, data publishers must be aware of the constraints of operating in a Web environment. Providing access to large spatial datasets, such as coverages, is a particular challenge. The API should provide mechanisms to request subsets of the dataset that are a convenient size for client applications to manage.

There are several best practices in this document dealing with large datasets and coverages:

Best Practice 13: Provide subsets for large spatial datasets
Best Practice 8: Provide geometries on the Web in a usable way

Issue 149

Should we discuss scalability issues here?

13. Other best practices

Note

This section is a placeholder for best practices that were in the FPWD but have not yet been placed in the new doc structure. They may be removed, merged, or moved.

Best Practice 16: Provide a minimum set of information about spatial things for your intended application

When someone looks up a URI for a SpatialThing, provide useful information, using the common representation formats

Why

This will allow to distinguish SpatialThings from one another by looking at their properties; e.g. type, label. It will also allow to get the basic information about SpatialThings by referring to their URI.

Intended Outcome

This requirement should serve a minimum set of information for a SpatialThing against a URI. In general, this will allow to look up the properties and features of a SpatialThings, and get information from machine-interpretable and/or human-readable descriptions.

Possible Approach to Implementation

This requirement specifies that useful information should be returned when a resource is referenced. This can include:

Expressing properties and features of interest for a SpatialThing using common semantic descriptions.
Expressing names of places; provides multiple names for your SpatialThings if they are known. These could be toponyms (names that appear on a map) or colloquial names (that local people use for a place). This part will explain in more detail how to provide the names/labels for the spatial things that are referred to. (e.g. a way to do this could be rdfs:label)
A 'place' may have an indistinct (or even undefined) boundary. It is often useful to identify spatial things even though they are fuzzy. For example: 'Renaissance Italy' or 'the American West'.
Information (about a SpatialThing; a place) should be provided with information about authority (owner, publisher), timeliness (i.e. is it valid now? is it historical data?) and, (if applicable) quality. It is common, for example, that there exist many maps of a place - none of them the same. In that case users need to know who produced each one, to be able to choose the right one to use.

How to Test

...

Evidence

Relevant requirements: R-MachineToMachine, R-MultilingualSupport, R-SpatialVagueness

The best practices described in this document will incorporate practice from both Observations and Measurements [OandM] and W3C Semantic Sensor Network Ontology [ VOCAB-SSN].

See also W3C Generic Sensor API and OGC Sensor Things API. These are more about interacting with sensor devices.

Best Practice 17: Describe the location according to a Coordinate Reference System

While publishing location data, it should be also specified what particular Coordinate Reference System (CRS) is being used.

Why

The spatial data can be published and shared between different communities through the use of common standards. However there are several standards and different ways to represent spatial data; e.g. describing coordinates based on WGS84 Long/Lat. Each community structures the spatial data according to their standards and their own domain of interest and their own CRS. This variety can create confusion and inconsistencies in using and interpreting the spatial data. To allow end-users (i.e. human users and machines) to interpret the spatial information correctly and consistently, the CRS, in which the data is represented based on, should be also provided and described using (machine interpretable) metadata.

Intended Outcome

The Coordinate Reference System should be specified and publish as machine readable and interpretable metadata and according to the common vocabularies.

It should be possible for human and machine users to access and interpret the CRS that is used to describe the spatial data.

Possible Approach to Implementation

There is not complete agreement over the order in which to present the measurements LAT/LONG or LONG/LAT or to express them in decimal degrees or as degrees, minutes and seconds, and/or the data can be represented according to other different CRS systems. There is a growing consensus to use the convention longitude followed by latitude using decimal degrees (as used in GeoJSON).
You need to tell users what particular CRS is being used, as there are many CRS systems in use. A good directory of the CRSs is maintained by the EPSG, an oil industry organization. A CRS can be described by its ESPG code: for example, the UK National Grid has code EPSG:4277.
Alternatively you can re-project your coordinates to WGS84 Long/Lat using many available tools online. For example an address in UK National Grid Coordinates can be 516076,170953 and in WGS84 Long/Lat it will be -0.331841, 51.425708.

The following describes some of the key considerations in choosing a CRS.

Consider the audience: As a Publisher one can make decisions on behalf of the users. Who are the primary and secondary users? Target those whose requirements match the data. Is the primary audience is known and work within a limited well defined spatial extent, of say an urban region? Then it is likely that there is a local projected CRS by which data is commonly shared in that region. Share the data in this projection.
Consider sharing the data in more than one projection: Listen to the customers. Metadata should provide contacts to publishers. Encourage feedback particularly for new secondary users. This feedback may give insight to projections you should support.
Consider the Data: Vector data is quite easy for web mapping tools to project from geographic coordinates. Raster data requires much more intensive computation for this task. Raster is commonly published in a projected CRS instead of geographic coordinates. Raster data should be published in the CRS in which it was produced and if possible and useful, also in web Mercator.
Consider the Use: The seemingly multitude of CRSs exist because of different requirements. Some projections are better for measuring area, some are better for measure distance, some are better for measuring angles, some simply look better when displayed and make for a better user interface. Are the aesthetic and readability issues associated with the data? What projections display the data to best convey the most information you wish to share? Which way is north is effected by the projection chosen. Spatial data often are used to provide user interface via a map. Some map projections may support this use better than others. While it is likely true that the majority of users may be satisfied with WGS 84 and Web Mercator, the most value may be gained by those users requiring other projections.
Consider how the data is distributed: Different requirements may also occur if the data is downloaded as a whole dataset or accessed via a Web service.

Example 31

This is a snippet of results that is taken from the Google Geocoding API:

              "formatted_address" : "1600 Amphitheatre Parkway, Mountain View, CA 94043, USA",
                "geometry" : {
                    "location" : {
                        "lat" : 37.4224764,
                        "lng" : -122.0842499
                    },
                "location_type" : "ROOFTOP",
                "viewport" : {
                    "northeast" : {
                        "lat" : 37.4238253802915,
                        "lng" : -122.0829009197085
                    },
                    "southwest" : {
                        "lat" : 37.4211274197085,
                        "lng" : -122.0855988802915
                    }
                }
                },

How to Test

...

Evidence

...

14. Narrative - the Nieuwhaven flooding

This narrative introduces a flooding scenario as a background story for the Best Practice.

Note

Names and places used in this scenario are fictional, procedures and practices may not reflect those used in the real world. Our intent is to provide a coherent context within which the best practices can be illustrated. We do not attempt to provide best practice for management of flooding events. However, many of the procedures discussed are based on information from flood-risk-and-water-management-in-the-netherlands.

Nieuwhaven is a flourishing coastal city in the Netherlands. In common with much of the Netherlands, the low lying nature of Nieuwhaven make it prone to flooding from both rivers and the North Sea. To mitigate or reduce risks to homes and businesses, significant investment has been made to flood control and water management infrastructure.

Flood Risk Management and Water Management are integrated in the Netherlands. By combining responsibilities for daily water management and flood risk management, the same people are involved who have a detailed knowledge of their water systems and flood defenses.

Multi-layer safety for flood risk management — Fig. 2 “Multi-layer safety” for Flood Risk Management (source: §2.3 Flood Risk and Water Management in the Netherlands)

Flood risk management can be separated into three layers:

Flood alerts, evacuation, response and recovery (civil protection issues); both organizational and physical measures (e.g. identifying, checking, repairing and signaling evacuation routes).
Spatial planning issues; reducing the impact of flooding through planning measures.
Flood protection; constructing flood defenses to reduce the probability [of inundation and the impact of flooding]

Our scenario concentrates on element (3).

14.1 Describing predicted flood extent using vector geometries

The Nieuwhaven Water Board is the independent local government body responsible for regional water management; maintaining the system of dikes, drainage, canals and pumping stations that are designed to keep the city and surrounding environment from flooding.

Based on assessment of historical flooding events, Newhaven Water Board is able to determine the extent of flooding that would occur as the result of hypothetical storm surge and river flooding events.

Example

tbd add example including:

inundation extent from hypothetical scenarios
vector geometries for the inundation extent based on assessment against high-resolution DEM (Digital Elevation Model) / DTM (Digital Terrain Model) derived from photogrammetry & lidar
API enables users to define the geometry resolution (from 1m resolution up to 50m?) they need for their application using a query parameter [e.g. to manage the volume of complex geometries]

14.2 Publishing statistical data for geographic areas

Municipal emergency services, public health authorities and water boards are grouped according to a “safety region” in order to establish a multi-disciplinary “emergency team” for crisis management. This helps to ensure that there is effective communication between those responsible for public safety and those responsible for flood control and water management.

Each safety region prepares systematically for its own specific characteristics, based on available capabilities. This plan, the “Flood Response Plan”, includes evacuation strategies that are developed in response to hypothetical flooding events. Scenarios are prepared beforehand and carefully considered. The emergency team must be prepared at all times to deliver an assessment on a disaster / incident scenario and advise on proposed interventions, e.g. evacuation and deployment of temporary flood defenses.

In developing the Flood Response Plan, the numbers of citizens impacted by each hypothetical flooding event are determined by cross-referencing the areas affected by surface water flooding with census data.

Statistics Netherlands (CBS) publishes reliable and coherent statistical information which respond to the needs of Dutch society and is responsible for compiling official national statistics.

CBS makes use of OData, the Open Data Protocol v4, to provide open datasets for use by third parties. Furthermore, CBS provides a search interface to help a user find the dataset of interest. Data is also provided in CSV format.

CBS provides metadata for the census dataset, in both human- and machine-readable forms. A download of each dataset is available, but may leave the data user with more data than they can conveniently work with. Users are often interested only in the subset of areas that are relevant to them - such as those by flooding. CBS provides an API enabling the user to retrieve the relevant data by selecting the area of interest and, optionally, choosing specific dimensions of the statistical data.

Census data naturally takes the form of a statistical 'data cube', with statistical dimensions of area, time, gender, age range etc. A useful standards-based approach to making the data available would be to represent it as RDF, using the RDF Data Cube Vocabulary [VOCAB-DATA-CUBE]. This offers a standards based way to represent statistical data and associated metadata as RDF. API access to the data could be provided via a SPARQL endpoint, or a more specific API. The Linked Data API, implemented by Epimorphics’ ELDA, provides a useful mechanism to expose simple RESTful APIs on top of RDF/SPARQL.

14.3 Publishing data about administrative areas

Population data from a census is typically broken down by area, gender, age (and perhaps other statistical dimensions) and relates to a particular time.

CBS uses established URLs to identify each administrative area for which population data is available. Details of the administrative areas for Nieuwhaven are published by the municipal government. This information includes the geometry for each administrative area.

Data about administrative areas are often useful - perhaps they represent one of the most popular spatial datasets. In this case they are useful for coordinating the emergency response, i.e. predicting and tracking which neighborhoods or districts are threatened. Because the names of local administrative areas such as neighborhoods are very well known they are also useful for communication with citizens, i.e. letting them know if their neighborhood is threatened by the flood or not.

Because the administrative area datasets are quite popular, all kinds of data users will want to use it - not only GIS experts. To enable them to find the data on the web, it was published in such a way that search engines can crawl the data, making the data findable using popular search engines.

Example

tbd add example including:

publish administrative areas with geometry
geometries published with national CRS via SDI (this could be converted in the browser using proj4.js)

14.4 Correlating statistical and geographic areas to assess impact

By cross-referencing the population statistics, administrative areas and surface water flooding extent (e.g. by calculating the intersection of the flood with administrative areas), the number of citizens impacted by each hypothetical flooding event can be estimated.

Once the number of citizens that need refuge has been determined, the emergency teams can designate public buildings, such as schools and sports centers, as evacuation points and define safe transit routes to get to those points.

14.5 Publishing data about topographic features with associated discovery metadata

The municipal government published details of the built infrastructure within Nieuwhaven, including public buildings and transport infrastructure.

Example

tbd add example including:

each feature is uniquely identified
each feature is indexed by search engines
dataset is published as vector tile-set (like OSM)

The municipal government also publishes metadata describing each dataset (DWBP-BP1) that, besides free text descriptions (e.g., title, abstract), include the following information:

the type of objects/features described - e.g., with a thematic classification (DWBP-BP2)
spatial coverage / temporal coverage - to identify if data match the area of interest
coordinate reference system(s) used - to correctly interpret geometries
spatial resolution - to identify data with the right level of detail
distribution format(s) and API to get access to the data (at a different level of granularity) - to identify those datasets consumable by the intended application(s) (DWBP-BP4, DWBP-BP13)
date of last modification - to see whether data are up to date (DWBP-BP8)
the parties responsible for the creation and maintenance of the data - to verify data authoritativeness (DWBP-BP6)

To facilitate data discoverability, metadata should be published via different channels and formats (DWBP-BP22). Nowadays, such metadata are typically maintained in geospatial catalogues, encoded based on [ISO19115] - the standard for spatial metadata. In addition to this, such metadata can be served in RDF, and made queryable via a SPARQL endpoint; e.g. [GeoDCAT-AP] provides an XSLT-based mechanism to automatically transform ISO 19115 metadata into RDF, following a schema based on the W3C Data Catalog Vocabulary [VOCAB-DCAT].

This solution can be further enhanced by making data discoverable and indexable via search engines. The advantage is that this would allow data consumers to discover the data even though they do not know the relevant catalogue(s), and to find alternative data sources.

This can be achieved, following Search Engine Optimization (SEO) techniques, by embedding metadata in catalogue’s Web pages, with mechanisms like HTML-RDFa, Microdata, and Microformats. Examples of this approach include the following ones:

In the Geonovum testbed, dataset pages from a spatial catalogue embed metadata, represented by using the [SCHEMA-ORG] vocabulary, directly generated from the relevant ISO 19115 records.
The experimental GeoDCAT-AP API allows data publishers to serve ISO 19115 records in different RDF serialization formats, including HTML+RDFa, on top of a spatial catalogue and/or an OGC Catalog Service for the Web [CSW].

Example

tbd add example including:

publish dataset metadata

14.6 Using spatial relations

Temporary flood defenses are common where roads and railways cross permanent flood defenses or are built up on boulevards along rivers. Also, temporary flood defenses are also deployed where dikes have not passed their annual visual inspection or 5-yearly assessment. Information regarding the condition of dikes cannot be incorporated into the plan, and must be considered during an actual flood event.

Example

tbd add examples including:

individual transport network segments and flood defense features are uniquely identified
spatial relations are used to define where transport infrastructure cross flood defenses, and hence quickly determine where to deploy temporary flood defenses without the need for detailed spatial analysis … this can be used to demonstrate 3rd-party linking; e.g. the spatial relations are published by an organization that owns nether of the target datasets
locations for temporary flood defenses are provided to the relevant emergency services teams

14.7 Publishing coverage data about predicted water depth

Storm surge and river flood warning services are provided by the National Water Management Centre (WMCN) at Rijkswaterstaat, who are responsible for the design, construction, management and maintenance of the main infrastructure facilities in the Netherlands such as the main road network, the main waterway network and water systems.

The storm surge warning service is triggered by storm surge alert from the Royal Netherlands Meteorological Institute (KNMI), the Dutch national weather service. A forecast combination of heavy rainfall, high-tide and storm makes it likely that a flooding will occur in the next 120 hours. Specialists use meteorological, hydrological and urban flood prediction models within the Flood Early Warning System (FEWS) to estimate peak water-levels, when these will occur and which area will likely be flooded.

The Storm surge warnings consist of predicted maximum water levels and a general description of wind and tide. 10-minute water level forecasts are computed and distributed, including details of wave run-up and overtopping for dikes.

Every 6 hours, new meteorological predications are incorporated into the flood prediction, resulting in a new version of the 10-minute water level forecast dataset being made available.

Example

tbd add example including:

landing page (with descriptive metadata) for each forecast dataset
entire forecast dataset available in a number of (compact binary) formats for offline use (e.g. NetCDF, HDF5)
”current" forecast uniquely identified
exposed via a self-describing restful API; subdivided by time (each time-step listed in metadata)
use RDF Data Cube to describe the dataset structure
whole time-slice available as covjson
simple point data extraction (in WGS84 coordinates); covjson point feature with water depth time series
simple bbox extraction (in WGS84 coordinates); covjson
use covjson data to illustrate (1) depth of flooding, (2) changes in inundation through time

14.8 Using an API to make it easy to work with spatial data

The emergency team for the Nieuwhaven safety region compare the predictions for the forecast flooding event against the hypothetical scenarios developed in the Flood Response Plan to determine which of the prepared response plans to execute.

Based on this assessment, the imminent flooding event requires a number of temporary flood defenses to be deployed and evacuation of some districts of Nieuwhaven.

The emergency team identify where additional temporary flood defenses are required due to any dikes that are in a state of disrepair (e.g. having failed their annual or 5-yearly assessments).

Example

tbd add example:

cross reference the location of each damaged dike with predicted high-water level, determined via an API call into the 10-minute water level forecasts to extract a water level time-series for a given point to determine if the water level is predicted to exceed a threshold, in which case, temporary flood defenses will be required.

14.9 Making spatial data consumable by both humans and machines

Details of the emergency and the evacuation plan must be communicated to citizens. They are kept informed during and prior to the event using multiple channels:

local radio and television networks
news and media agencies; television, radio and on-line
official national Government website www.crisis.nl, including specific information about flooding events
cell broadcasting via the Government’s NL-Alert system, providing SMS message alerts to all mobile phones in the vicinity of a life and health threatening emergency
air raid sirens

The evacuation plan must be discoverable by the public. The intent is for each plan to be both human (primarily) and machine readable. The requirement for machine readability is mostly to support automated discovery of the content via web search. The URL itself ideally should also be "human friendly" as it should be easy to share verbally in addition to being embedded and linked to from other web pages.

While making the plans clear and understandable to human readers is well understood (and beyond the scope of this best practices document!). The challenge is to make the content machine readable. The use of a simple tag based schema using microdata, RDFa or JSON-LD is recommended. A simple first step might be to use the schema.org "schema:Event" item tag <div class="event-wrapper" itemscope itemtype="http://schema.org/Event">, which has useful generic properties of date, location, duration etc. The places of evacuation refuges (e.g. schools, sports centers etc.) should be tagged using the generic "schema:Place".

Example

tbd add example including:

publish simple, authoritative Web pages that describe the evacuation plans; include structured mark-up to help search engines index the rich content; each evacuation plan must be uniquely identified
the evacuation plans link to the spatial things (e.g. schools, sports centers, administrative areas etc.) designated as refuges etc.

Details of the evacuation route should be provided ideally as a textual description (perhaps machine readable using the schema.org "TravelAction" item, although this is rather limited) and a graphical representation. Potentially route information could be encoded using a format such as OpenLR but this has not achieved widespread adoption.

Example

tbd add example including:

describe transit routes

14.10 Providing simple access to spatial data through 'convenience' APIs

News and media agencies provide Web applications that help communicate the evacuation to citizens as effectively as possible; e.g. by creating simple Web applications that direct one to the correct evacuation plan based on their postal code or online mapping tools. Media agencies may cross-reference evacuation plans with Features that have non-official identifiers; e.g. from What3Words (W3W) or GeoNames.

Example

tbd add example including:

simple App to help people determine if they will be flooded ... simple lookup based on postcode area; x-ref postcode area with predicted surface water extent (from forecast dataset) via spatial analysis of geometries; use API into forecast dataset to extract water-level time series for a given location

14.11 Publishing geographic information in simple tabular data

During a flood event, the Flood Response Plan indicates that emergency services will have to focus their efforts on reducing the number of fatalities. This means that if an evacuation order is given, the efforts of the emergency services will be focused on traffic control and on non self reliant groups.

As the flood event progresses, the emergency services provide evacuation assistance for the vulnerable, such as the residents of care homes.

The municipal public health authority publishes details of care homes and other health care facilities on-line as open data, using a simple CSV format.

Example

tbd add example including:

CSV formatted spatial data … either using well-known-text encoded geometry _or_ providing an address that can be geocoded? … It is important that the structure and meaning of the data is documented, by providing a definition for each column header and information on the type of data to be expected in the cells. This should follow the approach defined in the W3C Metadata Vocabulary for [TABULAR-METADATA]

14.12 Publishing data about moving objects

The position of each vehicle used by the emergency services is tracked in near real-time using GPS. The coordinators within the emergency team are able to view both current position and where the vehicles have been; gauging the evacuation progress against the Flood Response Plan.

Example

tbd add example including:

moving features; where “geometry” changes with time

14.13 Publishing real-time data streams from sensors

During floods and storm surges, professionals (often aided by trained volunteers) constantly monitor all flood defenses. Measurements include: water level, wave height, wind speed and direction.

The emergency team use these observations to monitor the rising water levels to ensure that these are consistent with the predictions (both in terms of timing and peak water-level) in case additional interventions, such as evacuating more districts, are required.

A real-time data stream of water level at a specific location within Nieuwhaven’s canals is published from an automated monitoring system operated by the Water Board; e.g. a Web-enabled sensor.

Example

Example tbd including:

metadata about the data stream enabling discovery and interpretation of the data stream values; e.g. what quantity kind is being measured with which units of measurement, what is the sensor etc.
relate the sensor (and the data-stream it provides) to the water body whose water level it is intended to monitor
describe the sensor location
SensorThings API example; what about use of protocols other than HTTP; e.g. MQTT?

14.14 Describing location using relative positions

Fortunately, the prediction is sufficiently accurate that the evacuation plan remains effective. However, the emergency team notice that the water level in one particular sector is higher than predicted- and rising. Further analysis indicates that an automated control gate has malfunctioned.

A team is dispatched to use the manual override. The manual control is located using relative positioning.

Example

tbd add example including:

the manual override / control is located using relative positioning

An attempt at matrix of the common formats (GeoJSON, GML, RDF, JSON-LD) and what you can or can't achieve with it. (source: @eparsons)
Format	Openness	Binary/text	Usage	Discoverability	Granular links	CRS Support	Verbosity	Semantics vocab?	Streamable	3D Support
ESRI Shape	Open'ish	Binary	Geometry only attributes and metadata in linked DB files	Poor	In Theory?	Yes	Lightweight	No	No	Yes
GeoJSON [RFC7946]	Open	Text	Geometry and attributes inline array	Good ?	In Theory?	No	Lightweight	No	No	No
DXF	Proprietary	Binary	Geometry only attributes and metadata in linked DB files	Poor	Poor	No	Lightweight	No	No	Yes
GML	Open	Text	Geometry and attributes inline or xlinked	Good ?	In Theory ?	Yes	Verbose	No	No	Yes
KML	Open	Text	Geometry and attributes inline or xlinked	Good ?	In Theory ?	No	Lightweight	No	Yes?	Yes

An attempt at a matrix of the formats for spatial data in current use and detailed aspects. (source: @portele)
	GML	GML-SF0	JSON-LD	GeoSPARQL (vocabulary)	schema.org	GeoJSON	KML	GeoPackage	Shapefile	GeoServices / Esri JSON	Mapbox Vector Tiles
Governing Body	OGC, ISO	OGC	W3C	OGC	Google, Microsoft, Yahoo, Yandex	Authors (now in IETF process)	OGC	OGC	Esri	Esri	Mapbox
Based on	XML	GML	JSON	RDF	HTML with RDFa, Microdata, JSON-LD	JSON	XML	SQLite, SF SQL	dBASE	JSON	Google protocol buffers
Requires authoring of a vocabulary/schema for my data (or use of existing ones)	Yes (using XML Schema)	Yes (using XML Schema)	Yes (using @context)	Yes (using RDF schema)	No, schema.org specifies a vocabulary that should be used	No	No	Implicitly (SQLite tables)	Implicitly (dBASE table)	No	No
Supports reuse of third party vocabularies for features and properties	Yes	Yes	Yes	Yes	Yes	No	No	No	No	No	No
Supports extensions (geometry types, metadata, etc.)	Yes	No	Yes	Yes	Yes	No (under discussion in IETF)	Yes (rarely used except by Google)	Yes	No	No	No
Supports non-simple property values	Yes	No	Yes	Yes	Yes	Yes (in practice: not used)	No	No	No	No	No
Supports multiple values per property	Yes	No	Yes	Yes	Yes	Yes (in practice: not used)	No	No	No	No	No
Supports multiple geometries per feature	Yes	Yes	n/a	Yes	Yes (but probably not in practice?)	No	Yes	No	No	No	No
Support for Coordinate Reference Systems	any	any	n/a	many	WGS84 latitude, longitude	WGS84 longitude, latitude with optional elevation	WGS84 longitude, latitude with optional elevation	many	many	many	WGS84 spherical mercator projection
Support for non-linear interpolations in curves	Yes	Only arcs	n/a	Yes (using GML)	No	No	No	Yes, in an extension	No	No	No
Support for non-planar interpolations in surfaces	Yes	No	n/a	Yes (using GML)	No	No	No	No	No	No	No
Support for solids (3D)	Yes	Yes	n/a	Yes (using GML)	No	No	No	No	No	No	No
Feature in a feature collection document has URI (required for ★★★★)	Yes, via XML ID	Yes, via XML ID	Yes, via @id keyword	Yes	Yes, via HTML ID	No	Yes, via XML ID	No	No	No	No
Support for hyperlinks (required for ★★★★★)	Yes	Yes	Yes	Yes	Yes	No	No	No	No	No	No
Media type	application/gml+xml	application/gml+xml with profile parameter	application/ld+json	application/rdf+xml, application/ld+json, etc.	text/html	application/vnd.geo+json	application/vnd.google-earth.kml+xml, application/vnd.google-earth.kmz	-	-	-	-
Remarks	comprehensive and supporting many use cases, but requires strong XML skills	simplified profile of GML	no support for spatial data, a GeoJSON-LD is under discussion	GeoSPARQL also specifies related extension functions for SPARQL; other spatial vocabularies exist, see ???	schema.org markup is indexed by major search engines	supported by many mapping APIs	focussed on visualisation of and interaction with spatial data, typically in Earth browsers liek Google Earth	used to support "native" access to spatial data across all enterprise and personal computing environments, including mobile devices	supported by almost all GIS	mainly used via the GeoServices REST API	used for sharing spatial data in tiles, mainly for display in maps

Cross reference of requirements against best practices
UC Requirements	Best Practice
Discoverability	Include spatial metadata in dataset metadata Make your spatial data indexable by search engines Publish links from spatial things to related resources
Compatibility	Include spatial metadata in dataset metadata Expose spatial data through 'convenience APIs' Provide subsets for large spatial datasets
BoundingBoxCentroid	Include spatial metadata in dataset metadata Make your spatial data indexable by search engines Provide geometries on the Web in a usable way
Crawlability	Include spatial metadata in dataset metadata Make your spatial data indexable by search engines Publish links from spatial things to related resources
SpatialMetadata	Include spatial metadata in dataset metadata Provide context required to interpret data values Provide geometries on the Web in a usable way
Provenance	Include spatial metadata in dataset metadata Provide context required to interpret data values Provide subsets for large spatial datasets
CRSDefinition	Provide context required to interpret data values Provide geometries on the Web in a usable way
QualityPerSample	Provide context required to interpret data values Describe the positional accuracy of spatial data
SensorMetadata	Provide context required to interpret data values
DeterminableCRS	Specify Coordinate Reference System for high-precision applications
Linkability	Make your spatial data indexable by search engines Use globally unique persistent HTTP URIs for spatial things Provide subsets for large spatial datasets Publish links from spatial things to related resources
MachineToMachine	Make your spatial data indexable by search engines Describe the positional accuracy of spatial data How to describe properties that change over time Provide geometries on the Web in a usable way How to describe relative positions Use spatial semantics for Spatial Things Publish links from spatial things to related resources Provide a minimum set of information about spatial things for your intended application
MovingFeatures	How to describe properties that change over time Use spatial semantics for Spatial Things
Streamable	How to describe properties that change over time
GeoreferencedData	Use globally unique persistent HTTP URIs for spatial things
IndependenceOnReferenceSystems	Use globally unique persistent HTTP URIs for spatial things Provide geometries on the Web in a usable way
Compressible	Provide geometries on the Web in a usable way
EncodingForVectorGeometry	Provide geometries on the Web in a usable way
3DSupport	Provide geometries on the Web in a usable way
TimeDependentCRS	Provide geometries on the Web in a usable way
TilingSupport	Provide geometries on the Web in a usable way
SamplingTopology	How to describe relative positions Publish links from spatial things to related resources
MobileSensors	Use spatial semantics for Spatial Things
LightweightAPI	Expose spatial data through 'convenience APIs'
ReferenceDataChunks	Provide subsets for large spatial datasets
SpatialRelationships	Publish links from spatial things to related resources
SpatialOperators	Publish links from spatial things to related resources
MultilingualSupport	Provide a minimum set of information about spatial things for your intended application
SpatialVagueness	Provide a minimum set of information about spatial things for your intended application

F.1 Changes since the first public working draft of 19 January 2016

The document has undergone substantial changes since the first public working draft. Below are some of the changes made:

Focusing the document to suit the needs of practitioners: those either publishing spatial data themselves, or developing software tools to support the publication of spatial data (see sections 2. Audience and 3. Scope)
Addition of new introductory material to explain the fundamentals of spatial data to readers (see sections 5. Spatial Things, Features and Geometry, 6. Coverages: describing properties that vary with location (and time), 7. Spatial relations, 9. Linked Data and 10. Why are traditional Spatial Data Infrastructures not enough?)
Consolidation of the best practices from 30 down to 17 - based on merging duplicate or closely related best practices and focusing our scope only on spatial data concerns so that, for example, best practices relating to handling sensor data are removed (we expect these subjects to be included in future iterations of the Sensor Network deliverables of the working group)
Alignment of the remaining best practices with those from [DWBP] - including organizing them according to the same sub-section headings
As a consequence of the consolidation and [DWBP] alignment, the best practices are renumbered (although the fragment-identifiers remain unchanged from those used in the first public working draft) and the cross-reference to the Requirements from [SDW-UCR] (section C. Cross reference of use case requirements against best practices) has been updated
Addition of a new, partially complete section (see 11. How to use these best practices) that is intended to help readers understand the steps they should take and the questions they should consider when publishing spatial data on the Web; referencing both the general [DWBP] best practices and those specific to spatial data described in this document
Improvements to the D. Glossary

Section 14. Narrative - the Nieuwhaven flooding has also be introduced in this draft in an attempt to provide a context for the best practices using an end-to-end narrative based on an urban flooding event. This introduces a number of case studies that are intended to illustrate how the challenges associated with publishing different kinds of spatial data on the Web may be approached. This draft includes only the overview for each case study and does not (yet) describe the activities that each actor should undertake in order to publish their spatial data.

F.2 Changes since working draft of 25 October 2016

Significant updates to:

Best Practice 4: Make your spatial data indexable by search engines
Best Practice 6: How to describe properties that change over time
Best Practice 7: Use globally unique persistent HTTP URIs for spatial things
Best Practice 11: Expose spatial data through 'convenience APIs'

(further updates to these best practices are expected in the next WD release, circa end January 2017)

Plus minor changes that include adding a list of most important best practices for data publishers that start from an existing SDI to section 9, and changing of a few best practice titles to include the word spatial.

F.3 Changes since working draft of 5 January 2017

Significant updates to:

Section 8. Coordinate Reference Systems (CRS)
Best Practice 4: Make your spatial data indexable by search engines
Best Practice 6: How to describe properties that change over time
Best Practice 7: Use globally unique persistent HTTP URIs for spatial things
Best Practice 14: Publish links from spatial things to related resources

Also:

The BP summary has been moved and is now section 4 of the document;
Best Practice 17, "How to work with crowd-sourced observations" was removed; and
the Best Practices Template section, explaining the template used to describe Best Practices, was removed.

G.1 Informative references

[5STAR-LOD]: Tim Berners-Lee. Is your Linked Open Data 5 Star?. URL: https://www.w3.org/DesignIssues/LinkedData#fivestar
[BEACON]: J. Voß; M. Schindler. BEACON link dump format. 6 July 2014. URL: https://gbv.github.io/beaconspec/beacon.html
[COVERAGE-JSON]: Jon Blower; Maik Riechert; Bill Roberts. The CoverageJSON Format Specification (Editors Draft). 2 February 2017. URL: http://w3c.github.io/sdw/coverage-json/
[CSV2JSON]: Jeremy Tandy; Ivan Herman. W3C. Generating JSON from Tabular Data on the Web. 17 December 2015. W3C Recommendation. URL: https://www.w3.org/TR/csv2json/
[CSW]: Douglas Nebert; Uwe Voges; Lorenzo Bigagli. Catalogue Services 3.0 - General Model. 10 June 2016. URL: http://www.opengeospatial.org/standards/cat
[DCTERMS]: Dublin Core metadata initiative. DCMI Metadata Terms. 14 June 2012. DCMI Recommendation. URL: http://dublincore.org/documents/dcmi-terms/
[DWBP]: Bernadette Farias Loscio; Caroline Burle; Newton Calegari. W3C. Data on the Web Best Practices. 15 December 2016. W3C Proposed Recommendation. URL: https://www.w3.org/TR/dwbp/
[GML]: Open Geospatial Consortium Inc. Geography Markup Language (GML) Encoding Standard. URL: http://www.opengeospatial.org/standards/gml
[GS1]: Mark Harrison; Ken Traub. GS1. SmartSearch Implementation Guideline. November 2015. URL: http://www.gs1.org/gs1-smartsearch/guideline/gtin-web-implementation-guideline
[GeoDCAT-AP]: GeoDCAT-AP: A geospatial extension for the DCAT application profile for data portals in Europe. 23 December 2015. URL: https://joinup.ec.europa.eu/node/139283/
[GeoSPARQL]: Matthew Perry; John Herring. GeoSPARQL - A Geographic Query Language for RDF Data. 10 September 2012. URL: http://www.opengeospatial.org/standards/geosparql
[GeoTIFF]: Niles Ritter; Mike Ruth. GeoTIFF Format Specification. 28 December 2000. URL: http://web.archive.org/web/20160403164508/http://www.remotesensing.org/geotiff/spec/geotiffhome.html
[INSPIRE-MD]: INSPIRE Metadata Implementing Rules: Technical Guidelines based on EN ISO 19115 and EN ISO 19119. 29 October 2013. URL: http://inspire.jrc.ec.europa.eu/documents/Metadata/MD_IR_and_ISO_20131029.pdf
[ISO-19101]: ISO/TC 211. ISO. ISO 19101-1:2014 Geographic information -- Reference model -- Part 1: Fundamentals. 15 November 2014. URL: http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=59164
[ISO-19109]: ISO/TC 211. ISO. ISO 19109:2015 Geographic information -- Rules for application schema. 15 December 2015. URL: http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=59193
[ISO-19123]: ISO/TC 211. ISO. ISO 19123:2005 Geographic information -- Schema for coverage geometry and functions. 15 August 2005. URL: http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=40121
[ISO19115]: ISO/TC 211. ISO. Geographic information -- Metadata. 2003. URL: http://www.iso.org/iso/catalogue_detail?csnumber=26020
[JPEG2000]: Joint Photographers Expert Group (JPEG). Jpeg 2000 image coding system. 9 December 1999. Report (draft) ISO/IEC CD15444-1:1999. URL: http://www.jpeg.org/cd15444-1.pdf
[JSON-LD]: Manu Sporny; Gregg Kellogg; Markus Lanthaler. W3C. JSON-LD 1.0. 16 January 2014. W3C Recommendation. URL: https://www.w3.org/TR/json-ld/
[LD-BP]: Bernadette Hyland; Ghislain Auguste Atemezing; Boris Villazón-Terrazas. W3C. Best Practices for Publishing Linked Data. 9 January 2014. W3C Note. URL: https://www.w3.org/TR/ld-bp/
[LDP-PRIMER]: Nandana Mihindukulasooriya; Roger Menday. W3C. Linked Data Platform 1.0 Primer. 23 April 2015. W3C Note. URL: https://www.w3.org/TR/ldp-primer/
[MICRODATA]: Ian Hickson. W3C. HTML Microdata. 29 October 2013. W3C Note. URL: https://www.w3.org/TR/microdata/
[MOVING-FEATURES-CSV]: Akinori Asahara; Ryosuke Shibasaki; Nobuhiro Ishimaru; David Burggraf. OGC ® Moving Features Encoding Extension: Simple Comma Separated Values (CSV). 17 February 2015. URL: http://www.opengeospatial.org/standards/movingfeatures
[MOVING-FEATURES-XML]: Akinori Asahara; Ryosuke Shibasaki; Nobuhiro Ishimaru; David Burggraf. OGC ® Moving Features Encoding Part I: XML Core. 17 February 2015. URL: http://www.opengeospatial.org/standards/movingfeatures
[N-TRIPLES]: Gavin Carothers; Andy Seaborne. W3C. RDF 1.1 N-Triples. 25 February 2014. W3C Recommendation. URL: https://www.w3.org/TR/n-triples/
[NeoGeo]: Barry Norton; Luis M. Vilches; Alexander De León; John Goodwin; Claus Stadler; Suchith Anand; Dominic Harries; Boris Villazón-Terrazas; Ghislain A. Atemezing. NeoGeo Vocabulary Specification. 5 February 2012 (Madrid Edition). URL: http://geovocab.org/doc/neogeo/
[OWL2-OVERVIEW]: W3C OWL Working Group. W3C. OWL 2 Web Ontology Language Document Overview (Second Edition). 11 December 2012. W3C Recommendation. URL: https://www.w3.org/TR/owl2-overview/
[OandM]: Simon Cox. Observations and Measurements - XML Implementation. 22 March 2011. URL: http://www.opengeospatial.org/standards/om
[PNG]: Tom Lane. W3C. Portable Network Graphics (PNG) Specification (Second Edition). 10 November 2003. W3C Recommendation. URL: https://www.w3.org/TR/PNG
[PROTO3]: Google. Protocol Buffers. 23 August 2016. URL: https://developers.google.com/protocol-buffers/docs/reference/proto3-spec
[PURI]: Phil Archer; Nikos Loutas; Stijn Goedertier; Saky Kourtidis. Study On Persistent URIs. 17 December 2012. URL: http://philarcher.org/diary/2013/uripersistence/
[R2RML]: Souripriya Das; Seema Sundara; Richard Cyganiak. W3C. R2RML: RDB to RDF Mapping Language. 27 September 2012. W3C Recommendation. URL: https://www.w3.org/TR/r2rml/
[RDF-SCHEMA]: Dan Brickley; Ramanathan Guha. W3C. RDF Schema 1.1. 25 February 2014. W3C Recommendation. URL: https://www.w3.org/TR/rdf-schema/
[RDF-SYNTAX-GRAMMAR]: Dave Beckett. W3C. RDF 1.1 XML Syntax. 25 February 2014. W3C Recommendation. URL: https://www.w3.org/TR/rdf-syntax-grammar/
[RDF11-PRIMER]: Guus Schreiber; Yves Raimond. W3C. RDF 1.1 Primer. 24 June 2014. W3C Note. URL: https://www.w3.org/TR/rdf11-primer/
[RFC3986]: T. Berners-Lee; R. Fielding; L. Masinter. IETF. Uniform Resource Identifier (URI): Generic Syntax. January 2005. Internet Standard. URL: https://tools.ietf.org/html/rfc3986
[RFC3987]: M. Duerst; M. Suignard. IETF. Internationalized Resource Identifiers (IRIs). January 2005. Proposed Standard. URL: https://tools.ietf.org/html/rfc3987
[RFC4180]: Y. Shafranovich. IETF. Common Format and MIME Type for Comma-Separated Values (CSV) Files. October 2005. Informational. URL: https://tools.ietf.org/html/rfc4180
[RFC5758]: Q. Dang; S. Santesson; K. Moriarty; D. Brown; T. Polk. IETF. Internet X.509 Public Key Infrastructure: Additional Algorithms and Identifiers for DSA and ECDSA. January 2010. Proposed Standard. URL: https://tools.ietf.org/html/rfc5758
[RFC5988]: M. Nottingham. IETF. Web Linking. October 2010. Proposed Standard. URL: https://tools.ietf.org/html/rfc5988
[RFC6570]: J. Gregorio; R. Fielding; M. Hadley; M. Nottingham; D. Orchard. IETF. URI Template. March 2012. Proposed Standard. URL: https://tools.ietf.org/html/rfc6570
[RFC6906]: E. Wilde. IETF. The 'profile' Link Relation Type. March 2013. Informational. URL: https://tools.ietf.org/html/rfc6906
[RFC7049]: C. Bormann; P. Hoffman. IETF. Concise Binary Object Representation (CBOR). October 2013. Proposed Standard. URL: https://tools.ietf.org/html/rfc7049
[RFC7159]: T. Bray, Ed.. IETF. The JavaScript Object Notation (JSON) Data Interchange Format. March 2014. Proposed Standard. URL: https://tools.ietf.org/html/rfc7159
[RFC7946]: H. Butler; M. Daly; A. Doyle; S. Gillies; S. Hagen; T. Schaub. IETF. The GeoJSON Format. August 2016. Proposed Standard. URL: https://tools.ietf.org/html/rfc7946
[SCHEMA-ORG]: Schema.org. URL: http://schema.org/
[SDW-UCR]: Frans Knibbe; Alejandro Llaves. W3C. Spatial Data on the Web Use Cases & Requirements. 25 October 2016. W3C Note. URL: https://www.w3.org/TR/sdw-ucr/
[SENSORTHINGS]: Steve Liang; Chih-Yuan Huang; Tania Khalafbeigi. OGC ® SensorThings API Part 1: Sensing. 26 July 2016. URL: http://www.opengeospatial.org/standards/sensorthings
[SHACL]: Holger Knublauch; Dimitris Kontokostas. W3C. Shapes Constraint Language (SHACL). 14 August 2016. W3C Working Draft. URL: https://www.w3.org/TR/shacl/
[SPARQL11-OVERVIEW]: The W3C SPARQL Working Group. W3C. SPARQL 1.1 Overview. 21 March 2013. W3C Recommendation. URL: https://www.w3.org/TR/sparql11-overview/
[Simple-Features]: John Herring. Simple Feature Access - Part 1: Common Architecture. 28 May 2011. URL: http://www.opengeospatial.org/standards/sfa
[TABULAR-METADATA]: Jeni Tennison; Gregg Kellogg. W3C. Metadata Vocabulary for Tabular Data. 17 December 2015. W3C Recommendation. URL: https://www.w3.org/TR/tabular-metadata/
[TIMESERIESML]: James Tomkins; Dominic Lowe. TimeseriesML 1.0 – XML Encoding of the Timeseries Profile of Observations and Measurements. 9 September 2016. URL: http://www.opengeospatial.org/standards/tsml
[TURTLE]: Eric Prud'hommeaux; Gavin Carothers. W3C. RDF 1.1 Turtle. 25 February 2014. W3C Recommendation. URL: https://www.w3.org/TR/turtle/
[URLs-in-data]: Jeni Tennison. W3C. URLs in Data Primer. 4 June 2013. W3C Working Draft. URL: https://www.w3.org/TR/urls-in-data/
[VOCAB-DATA-CUBE]: Richard Cyganiak; Dave Reynolds. W3C. The RDF Data Cube Vocabulary. 16 January 2014. W3C Recommendation. URL: https://www.w3.org/TR/vocab-data-cube/
[VOCAB-DCAT]: Fadi Maali; John Erickson. W3C. Data Catalog Vocabulary (DCAT). 16 January 2014. W3C Recommendation. URL: https://www.w3.org/TR/vocab-dcat/
[VOCAB-DQV]: Riccardo Albertoni; Antoine Isaac. W3C. Data on the Web Best Practices: Data Quality Vocabulary. 15 December 2016. W3C Note. URL: https://www.w3.org/TR/vocab-dqv/
[VOCAB-SSN]: Kerry Taylor; Krzysztof Janowicz; Danh Le Phuoc; Armin Haller. W3C. Semantic Sensor Network Ontology. 5 January 2017. W3C Working Draft. URL: https://www.w3.org/TR/vocab-ssn/
[Veregin]: H. Veregin. Data quality parameters. In: Geographical Information Systems: Principles, Techniques, Management and Applications. URL: http://www.geos.ed.ac.uk/~gisteac/gis_book_abridged/files/ch12.pdf
[VoID]: Keith Alexander; Richard Cyganiak; Michael Hausenblas; Jun Zhao. W3C. Describing Linked Datasets with the VoID Vocabulary. 3 March 2011. W3C Note. URL: https://www.w3.org/TR/void/
[W3C-BASIC-GEO]: Dan Brickley. W3C Semantic Web Interest Group. Basic Geo (WGS84 lat/long) Vocabulary. 1 February 2006. URL: https://www.w3.org/2003/01/geo/
[WCS]: Peter Baumann. OGC. WCS 2.0 Interface Standard- Core. 12 July 2012. OGC Interface Standard. URL: http://www.opengis.net/doc/IS/wcs-core-2.0.1
[WEBARCH]: Ian Jacobs; Norman Walsh. W3C. Architecture of the World Wide Web, Volume One. 15 December 2004. W3C Recommendation. URL: https://www.w3.org/TR/webarch/
[WFS]: Panagiotis (Peter) A. Vretanos. OGC. Web Feature Service 2.0 Interface Standard. 10 July 2014. OGC Interface Standard. URL: http://www.opengis.net/doc/IS/wfs/2.0.2
[WMS]: Jeff de la Beaujardiere. OGC. Web Map Server Implementation Specification. 15 March 2006. OpenGIS Implementation Standard. URL: http://portal.opengeospatial.org/files/?artifact_id=14416
[WPS]: Matthias Mueller; Benjamin Pross. OGC. Web Processing Service 2.0 Interface Standard. 5 October 2015. OGC Implementation Standard. URL: http://docs.opengeospatial.org/is/14-065/14-065.html
[XLINK11]: Steven DeRose; Eve Maler; David Orchard; Norman Walsh et al. W3C. XML Linking Language (XLink) Version 1.1. 6 May 2010. W3C Recommendation. URL: https://www.w3.org/TR/xlink11/
[XML11]: Tim Bray; Jean Paoli; Michael Sperberg-McQueen; Eve Maler; François Yergeau; John Cowan et al. W3C. Extensible Markup Language (XML) 1.1 (Second Edition). 16 August 2006. W3C Recommendation. URL: https://www.w3.org/TR/xml11/
[YAML]: Oren Ben-Kiki; Clark Evans; Ingy döt Net. YAML Ain’t Markup Language (YAML™) Version 1.2. 1 October 2009. URL: http://yaml.org/spec/1.2/spec.html

Abstract

Status of This Document

1. Introduction

2. Audience

3. Scope

4. Best Practices Summary

Best Practices Summary

5. Spatial Things, Features and Geometry

6. Coverages: describing properties that vary with location (and time)

7. Spatial relations

8. Coordinate Reference Systems (CRS)

9. Linked Data

10. Why are traditional Spatial Data Infrastructures not enough?

11. How to use these best practices

11.1 What are the starting points?

11.2 What are you talking about?

11.3 Who is your audience?

11.4 Parse that!

12. The Best Practices

12.1 Spatial Metadata

Benefits

Benefits

Benefits

12.2 Spatial Data Quality

Benefits

12.3 Spatial Data Versioning

Benefits

12.4 Spatial Data Identifiers

Benefits

12.5 Spatial Data Vocabularies

12.5.1 Describing location

Benefits

Benefits

12.5.2 Publishing data with clear semantics

Benefits

12.5.3 Temporal aspects of spatial data

12.6 Spatial Data Access

Benefits

Benefits

Benefits

12.7 Linking Spatial Data

Benefits

Benefits

12.8 Dealing with large datasets

13. Other best practices

14. Narrative - the Nieuwhaven flooding

14.1 Describing predicted flood extent using vector geometries

Example

14.2 Publishing statistical data for geographic areas

14.3 Publishing data about administrative areas

Example

14.4 Correlating statistical and geographic areas to assess impact

14.5 Publishing data about topographic features with associated discovery metadata

Example

Example

14.6 Using spatial relations

Example

14.7 Publishing coverage data about predicted water depth

Example

14.8 Using an API to make it easy to work with spatial data

Example

14.9 Making spatial data consumable by both humans and machines

Example

Example

14.10 Providing simple access to spatial data through 'convenience' APIs

Example

14.11 Publishing geographic information in simple tabular data

Example

14.12 Publishing data about moving objects

Example

14.13 Publishing real-time data streams from sensors

Example

14.14 Describing location using relative positions

Example

14.15 Using social media to convey spatial data

Example

Example

15. Conclusions

A. Applicability of common formats to implementation of best practices

B. Authoritative sources of geographic identifiers