Copyright © 2017 OGC & W3C ® (MIT, ERCIM, Keio, Beihang), W3C liability, trademark and document use rules apply.
This document advises on best practices related to the publication and usage of spatial data on the Web; the use of Web technologies as they may be applied to location. The best practices are intended for practitioners, including Web developers and geospatial experts, and are compiled based on evidence of real-world application. These best practices suggest a significant change of emphasis from traditional Spatial Data Infrastructures by adopting a Linked Data approach. As location is often the common factor across multiple datasets, spatial data is an especially useful addition to the Linked Data cloud; the 5 Stars of Linked Data paradigm is promoted where relevant.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.
This Working Draft updates several of the best practices and represents a general consolidation of the document. Our aim remains to provide actionable advice and guidance to practitioners (e.g. those directly publishing spatial data on the Web themselves, or those developing software tools to assist that publication) - which means that the omission of examples in many of the best practices will be resolved before final publication. We intend to refer to examples “in the wild” in an effort to provide evidence that each Best Practice is being applied. All substantive changes made since the 5 January 2017 publication are listed in section F3.
Looking to future releases, the editors anticipate:
Specifically, we ask reviewers to consider ISSUE 208 in Best Practice 7: Use globally unique persistent HTTP URIs for spatial things regarding the use of “indirect identifiers” (as discussed in [WEBARCH] section 2.2.3 Indirect Identification.
Changes made in this and previous releases are recorded in Appendix F - Changes since previous versions.
The editors would like to thank everyone for their feedback - and to encourage reviewers to continue this critique.
For OGC: This is a Public Draft of a document prepared by the Spatial Data on the Web Working Group (SDWWG) - a joint W3C-OGC project (see charter). The document is prepared following W3C conventions. The document is released at this time to solicit public comment.
This document was published by the Spatial Data on the Web Working Group as a Working Group Note. If you wish to make comments regarding this document, please send them to public-sdw-wg@w3.org (subscribe, archives). All comments are welcome.
Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
This document is governed by the 1 September 2015 W3C Process Document.
This section is non-normative.
Increasing numbers of Web applications provide a means of accessing data. From simple visualizations to sophisticated interactive tools, there is a growing reliance on data. The open data movement has lead to many national, regional and local governments publishing their data through portals. Scientific and cultural heritage data is increasingly published on the Web for reuse by others. Crowd-sourced and social media data are abundant on the Web. Sensors, connected devices and services from domains such as energy, transport, manufacturing and healthcare are becoming commonly integrated using the Web as a common data sharing platform.
The Data on the Web Best Practices [DWBP] provide a set of recommendations that are applicable to the publication of all types of data on the Web. Those best practices cover aspects including data formats, data access, data identifiers, metadata, licensing and provenance.
Location information, or spatial data, is often a common thread running through such data; describing how things are positioned relative to the Earth in terms of coordinates and/or topology.
Within this document our focus is the somewhat broader concern of spatial data; data that describes anything with spatial extent (i.e. size, shape or position) whether or not it is positioned relative to the Earth.
Similarly to the challenges identified in [DWBP] relating to publishing data on the Web, and therefore not making use of the full potential of the Web as a data sharing platform, there is a lack of consistency in how people publish spatial data.
It is not that there is a lack of spatial data on the Web; the maps, satellite and street level images offered by search engines are familiar and there are many more examples of spatial data being used in Web applications.
However, the data that has been published is difficult to find and often problematic to access for non-specialist users. The key problems we are trying to solve in this document are discoverability, accessibility and interoperability. Our overarching goal is to enable spatial data to be integrated within the wider Web of data; providing standard patterns and solution that help solve these problems.
This section is non-normative.
Our goal in writing this best practice document is to support the practitioners who are responsible for publishing their spatial data on the Web or developing tools to make it easy for others to work with spatial data.
We expect readers to be familiar both with the fundamental concepts of the architecture of the Web [WEBARCH] and the generalized best practices related to the publication and usage of data on the Web [DWBP].
We aim to provide two primary pathways into these best practices:
In each case, we aim to help them provide incremental value to their data through application of these best practices.
This document provides a wide range of examples that illustrate how these best practices may be applied using specific technologies. We do not expect readers to be familiar with all the technologies used herein; rather that readers can identify with the activities being undertaken in the various examples and, in doing so, find relevant technologies that they are already aware of or discover technologies that are new them.
In this document we focus on the needs of data publishers and the developers that provide tools for them. That said, we recognize that value can only be gained from publishing the spatial data when people use it! Although we do not directly address the needs of those users, we ask that data publishers and developers reading this document do not forget about them; moreover, that they always consider the needs of users when publishing spatial data or developing the supporting tools. All of our best practices are intended to provide guidance about the publishing spatial data to improve ease of use. As we said above: the key problems we are trying to solve in this document are discoverability, accessibility and interoperability. We hope that the examples included in the urban flooding scenario will help illustrate how users may benefit from the application of these best practices.
This section is non-normative.
All of the best practices described in [DWBP] are relevant to publication of spatial data on the Web. Some, such as [DWBP] Best Practice 4: Provide data license information need no further elaboration in the context of spatial data. However, other best practices from [DWBP] are further refined in this document to provide more specific guidance for spatial data.
The best practices described below are intended to meet requirements derived from the scenarios in [SDW-UCR] that describe how spatial data in commonly published and used on the Web.
In line with the charter, this document provides advice on:
As stated in the charter, discussion of activities relating to rending spatial data as maps is explicitly out of scope.
We extend [DWBP] to cover aspects specifically relating to spatial data, introducing new best practices only where necessary. In particular, we consider the individual resources, or Spatial Things, that are described within a dataset.
We consider a ‘programmable web’, formed by a network of connected services, products, data and devices, each doing what it is good at, to be the future of the Web. So whether we are talking about Big, Crawlable, Linked, Open, Closed, Small, Spatial or Structured Data; our starting point is that it needs to be machine-friendly and developer-friendly: making it as programmable as possible.
There are many situations where the location of a person is very useful; from using a taxi hailing service to geocoding a selfie. Technology makes this location information easy to collect and share. However, spatial data has particular characteristics which makes its use potentially more complex. For example a single location of an anonymous tracked mobile phone may cause few privacy concerns, however the same phone tracked over a few days could provide enough information to make the identification of it's user possible. Like all personally identifiable information, great care must be taken as the collection, management and security of such information is the subject of legal frameworks. We do not attempt to provide guidance as to legal aspects of storing potentially personally identifiable spatial information, expert legal advice should be obtained. In summary: legal and privacy considerations relating to spatial data are out of scope.
The best practices described in this document are compiled based on evidence of real-world application. Where the Working Group have identified issues that inhibit the use or interoperability of spatial data on the Web, yet no evidence of real-world application is available, the editors present these issues to the reader for consideration, along with any approaches recommended by the Working Group. Such recommendations will be clearly distinguished as such in order to ensure that they are not confused with evidence-based best practice.
Devise a way to make best versus emerging practices clearly recognizable in this document.
The normative element of each best practice is the intended outcome. Possible implementations are suggested and, where appropriate, these recommend the use of a particular technology.
We intend this best practice to be durable; that is that the best practices remain relevant for many years to come as the specific technologies change. However, in order to provide actionable guidance, i.e. to provide readers with the technical information they need to get their spatial data on the Web, we try to balance between durable advice (that is necessarily general) and examples using currently available technologies that illustrate how these best practices can be implemented. We expect that readers will continue to be able to derive insight from the examples even when those specifically mentioned technologies are no longer in common usage, understanding that technology ‘y’ has replaced technology ‘x’.
This document contains a variety of best practices related to the publication and usage of spatial data on the Web. First, it continues with several more in-depth introductions on spatial things and geometry, coverages, spatial relations, coordinate reference systems, linked data, and Spatial Data Infrastructures. Then it describes how these best practices can be used, depending on your starting point and context. After that, the best practices themselves are described. They are about metadata, quality, versioning, identifiers, vocabularies, (API) access, linking, and large datasets.
The following best practices can be found in this document:
In spatial data standards from the Open Geospatial Consortium (OGC) and the 19100 series of ISO geographic information standards from ISO/TC 211 the primary entity is the feature. [ISO-19101] defines a feature as an: “abstraction of real world phenomena”.
This terse definition is a little confusing, so let’s unpack it.
Firstly, it talks about “real world phenomena”; that’s everything from highways to helicopters, parking meters to postcode areas, water bodies to weather fronts and more. These can be physical things that you can touch (e.g. a phone box) or an abstract concept that has spatial extent (e.g. a postcode area). Features can even be fictional (e.g. “Dickensian London”) and may even lack any concrete location information such as the mythical Atlantis.
The key point is that these “features” are things that one talks about in the universe of discourse - which is defined in [ISO-19101] as the “view of the real or hypothetical world that includes everything of interest”.
Secondly, the definition of feature talks about “abstraction”. Take the example of Eddystone Lighthouse. A helicopter pilot might see it a “vertical obstruction” and be interested in attributes such as its height and precise location. Whereas a sailor may see it as a “maritime navigation aid” and need information about its light characteristic and general location. Depending on one’s set of concerns, only a subset of the attributes of a given “real world phenomenon” are relevant. In the case of Eddystone Lighthouse, we defined two separate “abstractions”. As is common practice in many information modelling activities, the common sets of attributes for a given “abstraction” are used to define classes. In the parlance of [ISO-19101], such a class is known as “feature type”.
Although the exact semantics differ a little, there is a good correlation between the concept of “feature type” as defined in spatial data standards and the concept of “class” defined in [RDF-SCHEMA]. The former is an information modelling construct that binds a fixed set of attributes to an identified resource, whereas the latter defines the set of all resources that share the same group of attributes.
When combined with the open-world assumption embraced by RDF Schema and the Web Ontology Language (OWL) [OWL2-OVERVIEW], the set-based approach to classes provides more flexibility when combining information from multiple sources. For example, the “Eddystone Lighthouse” resource can be seen as both a “vertical obstruction” and a “maritime navigation aid” as it meets the criteria for membership of both sets. Conversely, this flexibility makes it much more difficult to build software applications as there is no guarantee that an information resource will specify a given attribute. Web standards such the Shapes Constraint Language [SHACL] are being defined to remedy this issue.
However, the term “feature” is also commonly used to mean a capability of a system, application or component. Also, in some domains and/or applications no distinction is made between "feature" and the corresponding real-world phenomena.
To avoid confusion, we adopt the term “spatial thing” throughout the remainder of this best practice document. “Spatial thing” is defined in [[W3C-BASIC-GEO] as “Anything with spatial extent, i.e. size, shape, or position. e.g. people, places, bowling balls, as well as abstract areas like cubes”.
The concept of “spatial thing” is considered to include both "real-world phenomena" and their abstractions (e.g. “feature” as defined in [ISO-19101]). Furthermore, we treat it as inclusive of other commonly used definitions; e.g. Feature from [NeoGeo], described as “A geographical feature, capable of holding spatial relations”.
A spatial thing may move. We must take care not to oversimplify our concept of spatial thing by assuming that it is equivalent to definitions such as Location (from [DCTERMS]) or Place (from [SCHEMA-ORG]), which are respectively described as “A spatial region or named place” and "Entities that have a somewhat fixed, physical extension".
How do we ensure alignment with the terminology being used in the further development of GeoSPARQL? We expect a new spatial ontology to be published which will contain clear and unambiguous definitions for the terms used therein.
Looking more closely, it is important to note that geometry is typically a property of a spatial thing.
{ “geometry”: { “type”: “Point”, “coordinates”: [50.184, -4.268] } }
In actual fact, this is only one geometry that may be used to describe Eddystone Lighthouse. Other geometries might include a 2D polygon that defines the footprint of the lighthouse in a horizontal plane and a 3D solid describing the volumetric shape of the lighthouse.
Furthermore, these geometries may be subject to change due to, say, a resurvey of the lighthouse. In such a situation, the geometry object would be updated- but the spatial
thing that we are talking about is still Eddystone Lighthouse. Following the best practices presented below, we use a HTTP URI to unambiguously identify Eddystone Lighthouse:
http://d-nb.info/gnd/1067162240
(URI sourced from Deutsche Nationalbibliothek).
We say that the spatial thing is disjoint from the geometry object. The spatial
thing, Eddystone Lighthouse (http://d-nb.info/gnd/1067162240
), is the “real world phenomenon” about which we want to state facts (such as it has a focal height is at 41 meters above sea level) and link to other real world phenomena (for example, that it is located at Eddystone Rocks, Cornwall; another spatial thing identified as
http://sws.geonames.org/2650253
by GeoNames).
Many aspects of spatial things can be described with single-valued, static properties. However, in some applications it is more useful to describe the variation of property values in space and time. Such descriptions are formalized as coverages. Users of spatial information may employ both viewpoints.
So what is a coverage? As defined by [ISO-19123] it is simply a data structure that maps points in space and time to property values. For example, an aerial photograph can be thought of as a coverage that maps positions on the ground to colors. A river gauge maps points in time to flow values. A weather forecast maps points in space and time to values of temperature, wind speed, humidity and so forth. One way to think of a coverage is as a mathematical function, where data values are a function of coordinates in space and time.
Sometimes you’ll hear the word “coverage” used synonymously with “gridded data” or “raster data” but this isn’t really accurate. You can see from the above paragraph that non-gridded data (like a river gauge measurement) can also be modelled as coverages. Nevertheless, you will often find a bias toward gridded data in discussions (and software) that concern coverages.
A coverage is not itself a spatial thing. The definition above presents a coverage as a data construct - in which case, it does not exist in the real world. Accordingly, we might say in the hydrology example, where a river gauge measures flow values at regular sampling times, that the “river segment” (a spatial thing) has a property “flow rate” that is expressed as coverage data.
Spatial things and coverages may be related in several ways:
A coverage can be defined using three main pieces of information:
Usually, the most complex piece of information in the coverage is the definition of the domain. This can vary quite widely from coverage type to coverage type, as the list above shows. For this reason, coverages are often defined by the spatiotemporal geometry of their domain. You will hear people talking about “multidimensional grid coverages” or “time-series coverages” or “vertical profile coverages” for example.
A spatial relation specifies how an object is located in space in relation to a reference object. Commonly used types of spatial relations are: topological, directional and distance relations.
Do we also need to talk about spatial relationships? And how they are related to spatial things and geometries?
Introduction to CRS does not yet cover non-geographic cases.
Best Practice scope is "spatial data" - which includes non-geographic location (e.g. where things aren't positioned relative to the Earth). For example, we have a microscopy use case where the locations of cells are described.
One of the most fundamental aspects of publishing spatial data, data about location, is how to express and share the location in a consistent way. In almost all cases where you are publishing data for use by the wider web community the use of Latitude and Longitude is most appropriate. Lat and Long measurement are global and offer a level of precision well suited for many applications, e.g. can express a location to within a few metres perfect for locating a Starbucks, geocoding a photograph or capturing an augmented reality Pokemon hiding in your local park.
As with everything to do with spatial data, of course things can get more complicated. There is not complete agreement over the order in which to present the measurements LAT/LONG or LONG/LAT or whether to express them in decimal degrees or as degrees, minutes and seconds.
Therefore it is very important to provide explicit information to your users. For example, this snippet of results from the Google Geocoding API makes explicit which is the latitude and which is the longitude coordinate. See Best Practice 18: Describe the location according to a Coordinate Reference System for more information.
"formatted_address" : "1600 Amphitheatre Parkway, Mountain View, CA 94043, USA",
"geometry" : {
"location" : {
"lat" : 37.4224764,
"lng" : -122.0842499
},
"location_type" : "ROOFTOP",
"viewport" : {
"northeast" : {
"lat" : 37.4238253802915,
"lng" : -122.0829009197085
},
"southwest" : {
"lat" : 37.4211274197085,
"lng" : -122.0855988802915
}
}
},
The following is a little more technical; in most cases this should only be for information.
The Long/Lat measurements are of course angular measurements expressing a position on the surface of a sphere. We are assuming that the sphere in question is (usually) the Planet earth, and that the sphere is actually a sphere. To make this more explicit we need to use a defined reference system and geodetic datum: in simple terms this tells us where we make the angular measurements from (e.g. the Equator and Greenwich Meridian) and gives us an agreed definition of the size and shape of the sphere (turns out the Earth isn’t one, though it is often approximated as a flattened sphere).
In almost all cases when you find Long/Lat measurements they are using a reference system and geodetic datum called WGS84. WGS84 was defined to support the GPS system, so that’s handy for all those mobile apps.
90% of people can stop reading now, but of course there are going to be a few cases where WGS84 Long/Lat is not appropriate.
In many parts of the world location data has been collected using local coordinate systems that are specific to particular countries or regions. These local coordinate systems often use projected measurements defined on a flat, two-dimensional surface which are easier to use that angular measurements and are needed if you are making topographic maps. (But be aware that being flattened, these projected maps distort the true size of countries, and also distance and angular measurements.)
So it may be that you have information in a local Coordinate Reference System (CRS), rather than Long/Lat - what should you do? You can publish information in a local CRS as it is, but you need to tell users what particular CRS is being used, because there are many many CRS systems in use. A good directory of them is maintained by the EPSG, a oil industry organisation. It is common for a CRS to be described by its ESPG code, EPSG:4277 is the UK National Grid for example.
Alternatively you can re-project your coordinates to WGS84 Long/Lat using many available tools online. So for example the location at 516076,170953 in UK National Grid Coordinates is -0.331841, 51.425708 in WGS84 Long/Lat. This converstion is a useful step as it makes you data more accessible to global users, so if it is possible it is helpful to publish data in both local and global coordinates.
So we are now at the point where 99.9% of people can stop reading, but for the remaining few people that have more specific requirements in terms of higher precision there are a few more topics. If you need to be able to measure in terms of a few centimetres or less then things are more complicated. With this level of precision required you need to take into account a more sophisticated model of the shape of the earth and take into account plate tectonics.
For these use cases more complex reference system and geodetic datums are used, for example in Europe a system called ETRS89 can be used instead of WGS84, in North America a similar system called NAD-83 is used. So it might be that you have measurements made using these reference systems, again best practice is to be explicit in describing their use, and in these use cases be careful re-projecting to different systems as required accuracy may be lost.
Finally another issue is that points on the surface of the earth are actually moving relative to the coordinate system, due to geologic processes. You may think this is of interest only to geologists, but when I tell you that Australia has moved around 1.5m since the framework was last reset 20 years ago, and remind you that we are entering the age of self-driving cars, then you will probably think again. Re-calculating the datum from time to time, or maybe continuously, really does matter for some applications. See Best Practice 3: Specify Coordinate Reference System for high-precision applications for more information.
The term ‘Linked Data’ refers to an approach to publishing data that puts linking at the heart of the notion of data, and uses the linking technologies provided by the Web to enable the weaving of a global distributed database. By naming real world entities - be they web resources, physical objects such as the Eiffel Tower, or even more abstract things such as relations or concepts - with URLs data can be published and linked in the same way web pages can. [LDP-PRIMER]
The 5-star scheme at 5 Star Data states:
★ make your stuff available on the Web (whatever format) under an open license
★★ make it available as structured data (e.g., Excel instead of image scan of a table)
★★★ make it available in a non-proprietary open format (e.g., CSV as well as of Excel)
★★★★ use URIs to denote things, so that people can point at your stuff
★★★★★ link your data to other data to provide context
We think that the concept of Linked Data is fundamental to the publishing of spatial data on the Web: it is the links that connect data together that are the foundational to the Web of data.
These best practices promote a Linked Data approach.
Sources such as the Best Practices for Publishing Linked Data [LD-BP] assert a strong association between Linked Data and the Resource Description Framework (RDF) [ RDF11-PRIMER]. Yet we believe that Linked Data requires only that the formats used to publish data support Web linking (see [WEBARCH] §4.4 Hypertext). 5 Star Data (based on [5STAR-LOD]) asserts only that data formats be open and non-proprietary (★★★); and infers the need for data formats to support use of URIs as identifiers (★★★★) and Web linking (★★★★★).
Within this document we include examples that use RDF and related technologies such as triple stores and SPARQL because we see evidence of its use in real world applications that support Linked Data. However, we must make clear to readers that there is no requirement for all publishers of spatial data on the Web to embrace the wider suite of technologies associated with the Semantic Web; we recognize that in many cases, a Web developer has little or no interest in the toolchains associated with Semantic Web due to the addition of complexity to any Web-centric solution.
Although we think that Linked Data need not necessarily require the use of RDF, it is probably the most commonly representation. We note that [JSON-LD] provides a bridge between those worlds by providing a data format that is compatible with RDF but relies on standard JSON tooling.
Furthermore, as the examples in this document illustrate, we often see a ‘hybrid’ approach being used in real-world applications; using RDF to work with graphs of information that interlink resources, while relying on other technologies to query and process the spatial aspects of that information for performance reasons.
Finding, accessing and using data disseminated through spatial data infrastructures (SDI) based on OGC web services is difficult for non-expert users. There are several reasons, including:
However, spatial data infrastructures are a key component of the broader spatial data ecosystem. Such infrastructures typically include workflows and tools related to the management and curation of spatial datasets, and provide mechanism to support the rich set of capabilities required by the expert community. Our goal is to help spatial data publishers build on these foundations to enable the spatial data from SDIs to be fully integrated with the Web of data.
When your starting point is a spatial data infrastructure, you should at least read the following best practices. These provide the most important extra steps that should be taken in order to bring spatial data from spatial data infrastructures to the Web:
The rest of the best practices provide more detail on specific aspects of publishing spatial data on the Web, such as metadata, geometries, CRS information, versioned data, and so on.
Section 11. How to use these best practices is incomplete.
Estimate that this covers only a quarter of the "spatial data publication pathway" that we are trying to help would-be spatial data publishers navigate. More material to be added describing the full range of considerations when publishing spatial data on the Web in the next public draft.
Preparations for publishing spatial data on the Web need to start somewhere. Typically, your spatial data will be in the following places:
If your spatial data is managed within a software system it is likely that you will be able to access that data through one or more of the methods identified above; as structured data from a bulk extract (e.g. a “data dump”), via direct access to the underpinning data repository or through a bespoke or standards-compliant API provided by the system.
As working with specific spatial data management systems is beyond the scope of this best practice document we will assume that one of the four methods identified above is your starting point.
Each of these starting points have their own challenges, but working with plain text documents can be particularly tricky as you will need to parse the natural language to identify the spatial things and their properties before you can proceed any further. Natural Language Processing (NLP) is also beyond the scope of this best practice document - so we will assume that you’ve already completed this step and have parsed any plain documents into structured data records of some kind.
The Web is an information space in which the items of interest, referred to as resources, are identified by URIs ([WEBARCH] §1. Introduction). The spatial data you want to publish is one such resource. Depending on the nature of your spatial data, it may be a single dataset or a collection of datasets. [VOCAB-DCAT] provides a useful definition of dataset: “A collection of data, published or curated by a single agent, and available for access or download in one or more formats.”
Deciding whether your spatial data is a single dataset or not is somewhat arbitrary. To decide this, it is often useful to consider attributes such as the license under which the data will be made available, the refresh or publication schedules, the quality of the data and the governance regime applied in managing the data. Typically, all of these attributes should be consistent within a single dataset.
As a first step in publishing your spatial data on the Web, we need to stitch your data into the Web’s information space by assigning a URI to each dataset (see [DWBP] Best Practice 9: Use persistent URIs as identifiers of datasets). Furthermore, if you anticipate your data changing over time and you want users to be able to refer to a particular version of your dataset you should also consider assigning a URI to each version of the dataset (see [DWBP] Best Practice 11: Assign URIs to dataset versions and series).
[DWBP] section 8.6 Data Versioning provides further guidance on working with versioned resources: providing metadata to indicate the particular version of a given dataset resource (see [DWBP] Best Practice 7: Provide a version indicator) and enabling users to browse the history of your dataset (see [DWBP] Best Practice 8: Provide version history).
We also need to look inside the datasets at the resources described within your data. If you want these resources to be visible within the Web’s information space, by which we mean that others can refer to or talk about those resources, then they must also be assigned URIs (see [DWBP] Best Practice 10: Use persistent URIs as identifiers within datasets). These URIs are like 'Web-scale foreign keys' that enable information from different sources to be stitched together.
In spatial data, our primary concern is always the spatial things; these are the things with spatial extent (i.e. size, shape, or position) that we talk about in our data - anything from physical things like people, places and post boxes to abstractions such as administrative areas. Spatial things should always be assigned URIs (see Best Practice 7: Use globally unique persistent HTTP URIs for spatial things) - potentially reusing existing URIs that are already in common usage. A common pattern used when assigning URIs to spatial things is append the locally-scoped identifiers used within the dataset to a URI path within an internet DNS domain where one has administrative rights to publish content.
Depending on how you organize your data, it may also be helpful to give your geometry objects URIs. For example, you may want to reuse a line string when describing the boundaries of adjacent administrative areas, or you may need to serve geometry data from an alternate URL because property data and geometry data are managed in different systems. Essentially, if you want to refer to a resource on the Web, you need to assign a URI to it.
Once you have determined the subjects of your spatial data, you should then consider your users - and the software tools, applications and capabilities they might have at their disposal.
Your objective should be to reduce the “friction” created for users to work with your data by providing it in a form that is closest to what their chosen software environment supports.
It is likely that you will be able to identify your intended “community of use” - and on that basis discern how best to publish data for them. However, increasingly data is being repurposed to derive insight in ways that the original publisher had never foreseen. This “unanticipated re-use” can add significant value to your data (e.g. because you didn’t know that your data could be used that way!) but this introduces the challenge of working with a large set of unknown users, developers and devices.
So while you should always prioritize your known users when publishing spatial data on the Web (often, because they are your stakeholders and their happiness can lead to continued funding!), it will often reap dividends to “design for the masses”: providing your spatial data in a way that is most readily usable with the (geo)spatial JavaScript libraries commonly employed across the Web.
Things that you should consider when choosing how to publish your spatial data on the Web are described next …
For users to work with your data, software agents (a.k.a. the “machines”) need to be able to parse it - to resolve the serialized data into its component parts. You should make your data available in machine-readable, standardized data formats (see [DWBP] Best Practice 12: Use machine-readable standardized data formats); e.g. JSON [RFC7159], XML [XML11], CSV [RFC4180] and other tabular data formats, YAML [YAML], protocol-buffers [PROTO3] etc. According to the 5 Star Data [5STAR-LOD] scheme, using open and non-proprietary structured data formats yields a 3-star rating (★★★), so you’re well on your way to good practice.
Consider that Web applications are most often written in JavaScript, probably the most “frictionless” data format for Web developers is JSON. That said, it is reasonably simple to parse other formats for use in JavaScript using widely available libraries. In some cases, there are even standards to define how this should be done (for example: [CSV2JSON])
You should also consider whether there are any attributes of these machine-readable standardized data formats that offset a little inconvenience for your data user. For example, protocol-buffers [PROTO3] and CBOR [RFC7049] (“Concise Binary Object Representation”) provide a significantly more compact encoding that JSON. The inconvenience of having to use additional libraries to parse these binary formats is offset by the convenience of much faster load times.
Imagery formats JPEG [JPEG2000] and PNG [PNG] can also be coerced to carry data; providing 3 or 4 channels of 8-bit data values. This can be an attractive way to encode gridded coverage data values as it is highly compact. So long as you don’t apply compression algorithms to the “image”; while compression retains visual integrity, it can ruin your data integrity. Experience indicates that network providers often do apply compression to image formats - even if you don’t want that. The key point is to ensure that you choose formats that are unaffected by the transport network.
When selecting the data format, make sure that your community of use have access to libraries or other software components required to work with that format. Let’s take [GeoTIFF] as an “anti-example”: it’s the de facto format for encoding geo-referenced imagery data - such as that available from satellites - but the lack of widely available libraries for working with it in a JavaScript application make it unsuitable for publishing spatial data on the Web. Although a developer could write a byte-level parser, it puts an additional burden on any re-use.
[DWBP] provides best practices discussing the provision of metadata to support discovery and reuse of data (see [DWBP] section 8.2 Metadata for more details). Providing metadata at the dataset level supports a mode of discovery well aligned with the practices used in Spatial Data Infrastructure (SDI) where a user begins their search for spatial data by submitting a query to a catalog. Once the appropriate dataset has been located, the information provided by the catalog enables the user to find a service end-point from which to access the data itself - which may be as simple as providing a mechanism to download the entire dataset for local usage or may provide a rich API enabling the users to request only the required parts for their needs. The dataset-level metadata is used by the catalog to match the appropriate dataset(s) with the user's query.
This section includes best practices for including the spatial extent and the CRS of the dataset in the metadata. These are the extra metadata items needed to make spatial datasets discoverable and reusable. A third best practice in this section helps you go a step further: exposing spatial data on the web in such a way that the individual entities within the dataset are discoverable.
Best Practice 1: Include spatial metadata in dataset metadata
The description of datasets that have spatial features should include explicit metadata about the spatial coverage
This best practice extends [DWBP] Best Practice 2: Provide descriptive metadata.
Why
For spatial data, it is often necessary to describe the spatial details of the dataset - such as spatial coverage or extent of the dataset or, put in simpler terms, which area of the world the data is about. This information is used, for example, by SDI catalog services that offer spatial querying to find data - but also by users to understand the nature of the dataset. In some cases, for example when dealing with crowd-sourced data, provenance information is important as well.
Intended Outcome
Dataset metadata should include the information necessary to enable spatial queries within catalog services such as those provided by SDIs.
Dataset metadata should include the information required for a user to evaluate whether the spatial data is suitable for their intended application.
Possible Approach to Implementation
When publishing a dataset, provide as much spatial metadata as necessary, but at least the spatial coverage. Other examples of spatial metadata are:
In Spatial Data Infrastructures the accepted standard for describing metadata is [ ISO19115].
To provide information about the spatial attributes of the dataset on the web one can:
How to Test
Check if the spatial metadata for the dataset itself includes the overall features of the dataset in a human-readable format.
Check if the descriptive spatial metadata is available in a valid machine-readable format.
Evidence
Relevant requirements: R-Discoverability, R-Compatibility, R-BoundingBoxCentroid, R-Crawlability, R-SpatialMetadata and R-Provenance.
Best Practice 2: Provide context required to interpret data values
Data values should be linked to spatial, temporal and thematic information that describes them.
This best practice is under review by the WG to see if it is sufficiently covered in DWBP. (see action)
Why
For users of spatial or temporal data it should always be possible to look up spatial, temporal or thematic metadata about a given value. This allows them to determine, for example, which reference system (CRS or TRS) and unit of measure (UoM) is used for a numeric value, the accuracy of the data value, and so on. Such metadata may be attached to metadata for collections, as described in Best Practice 1: Include spatial metadata in dataset metadata, or to individual values. The latter is necessary when this metadata is important for processing and interpreting the data, but varies from one value to the next. This information should be specified as explicit semantic data and/or be provided as linked to other resources.
Intended Outcome
The contextual data will specify spatial, temporal and thematic data and other information that can assist to interpret data values; this can include information related to quality, location, time, topic, type, etc.
Possible Approach to Implementation
The context required to interpret data values will require:
How to Test
...
Evidence
Relevant requirements: R-CRSDefinition, R-Provenance, R-QualityPerSample, R-SpatialMetadata, R-SensorMetadata.
Best Practice 3: Specify Coordinate Reference System for high-precision applications
A coordinate referencing system (CRS) should be specified for high-precision applications to locate geospatial entities.
Why
The CRS is a special metadata attribute of spatial data that must be known for users to judge if the data is usable to them. Clients or users must always be able to determine what CRS is used. Sometimes the CRS is left implicit: it is then determined by the specification of the data format that is used. Preferably, the CRS is specified at least as part of the metadata so that clients and users can judge if the data is usable, and can find spatial data with a specific CRS.
The choice of CRS is sensitive to the intended domain of application for the spatial data. For the majority of applications a common global CRS (WGS84) is fine, but high precision applications (such as precision agriculture and defense) require spatial referencing to be accurate to a few meters or even centimeters. Specific, highly accurate CRS exist to provide a coordinate system for a specific region of the world (often a specific country). Spatial data from France is never going to use the Dutch coordinate system and vice versa.
Different CRS exist mainly because the positions on the surface of the earth relative to each other are constantly changing. For example, North America and Europe are receding from each other by a couple of centimeters per year, whereas Australia is moving several centimeters per year north-eastwards. So, for better than one meter accuracy in Europe, the European Terrestrial Reference System 1989 (ETRS89) was devised and it is frequently revised to take account of the drifting European tectonic plate. Consequently, coordinates in the ETRS89 system will change by a couple of centimeters per year with respect to WGS84.
Even if a CRS, tied to a tectonic plate, is used, local coordinates in some areas may still change over time, if the plate is rotating with respect to the rest of the earth. Many existing useful maps pre-date GPS and WGS84 based mapping, so that location errors of tens of meters, or more, may exist when compared to the same location derived from a different technology, and these errors may vary in size across the extent of a single map.
Another reason why different CRS exist has nothing to do with tectonic drift, but with projecting the 3D globe on a flat, 2D map: Cartesian projections. These are useful e.g. for calculating areas.
The misuse of spatial data, because of confusion about the CRS, can result in catastrophic results; e.g. both the bombing of the Chinese Embassy in Belgrade during the Balkan conflict and fatal incidents along the East Timor border are generally attributed to spatial referencing problems.
Intended Outcome
Clients or users can determine which CRS is used. Also, a Coordinate Reference System (CRS) sensitive to the intended domain of application (e.g. high precision applications) for the spatial data should be chosen.
Possible Approach to Implementation
Recommendations about CRS referencing should consider:
How to Test
...
Evidence
Relevant requirements: R-DeterminableCRS
Best Practice 4: Make your spatial data indexable by search engines
Search engines should be able to crawl spatial data on the Web and index spatial things for direct discovery by users.
Why
In SDIs information about spatial datasets is published as authoritative metadata records and collated in Web-based catalogues. This approach causes a number of problems:
Search engines are the common starting point for people looking for content on the Web that is widely understood. By publishing spatial data in a way that enables their crawlers to index spatial datasets including each spatial thing, the fidelity of search results should improve. Users will be able to directly search for specific entities rather than having to look for a dataset and then parse through it; e.g. to search for "Anne Frank’s House" (https://g.co/kg/m/02s5hd
) rather than looking for a dataset about "Cultural Heritage in Amsterdam" and hoping that it contains a reference to what you’re interested in.
At present, spatial information is not widely exploited by search engines. However, by increasing the volume of spatial information presented to search engines, and the consistency with which it is provided, we expect search engines to begin offering spatial search functions. We already see evidence of this in the form of contextual search, such as prioritization of search results from nearby entities. In addition, search engines are beginning to offer more structured, custom searches that return only results that include certain [SCHEMA-ORG] types, like Dataset, Place or City.
Intended Outcome
Information about spatial datasets and things is indexed by search engines.
Users can find spatial things using common search engines.
Possible Approach to Implementation
In general, you need to:
The Web-page for the dataset is an entry-point for humans to browse and for the search engines to crawl your data. This landing page should provide descriptive metadata that helps users evaluate whether the dataset meets their needs (see Best Practice 1: Include spatial metadata in dataset metadata and [DWBP] Best Practice 2: Provide descriptive metadata), and may provide links to other service end-points, APIs or tools that will help a user work with the dataset. The landing page should be indexable by the search engines so that it can be discovered too!
To enable humans and Web-crawlers to find HTML pages for the spatial things, the "landing page" needs to include hyperlinks that can be followed. Where you have a larger collection of spatial things, you should support paging through the collection.
You may also consider using Sitemaps to direct the Web-crawler; noting that sitemaps currently are limited to several thousands of entries and will not work for larger datasets.
For very large datasets paging through thousands of pages is not useful for a human either. Consider supporting filtering and/or organise the spatial things into subsets, as described in Best Practice 13: Provide subsets for large spatial datasets.
A pre-condition for this best practice is Best Practice 7: Use globally unique persistent HTTP URIs for spatial things as persistent identifiers are essential to support reliable indexing and linking. Traditionally spatial datasets have not been maintained with stable identifiers for spatial things, but to share spatial data on the Web stable identifiers are a must. Sharing spatial data is more than "just" making the dataset available on the Web.
Each Web-page can likely be generated programmatically from the data you hold about the spatial thing, either directly from the data or by using an API that makes the data available on the Web.
It is important to keep in mind that the HTML representations should not mainly be designed for the search engines, but they should present the data in a clear and understandable way to human users. The page about the spatial thing should be useful to a user and encourage others to link to the page when they share other information about the spatial thing. This typically will also improve the ranking of these pages in search results.
In addition to exposing the spatial data as linked HTML Web-pages, indexing by web-engines can be further enhanced by incorporating a description of the spatial thing as structured markup (in particular [MICRODATA] or [JSON-LD] annotations using [SCHEMA-ORG]) as this enables the search engines to make more detailed assumptions about your resource. It is important to note that this is not only helpful to search engines, but also to other tools that want to understand more about the semantics of the resource, for example, its location.
In [SCHEMA-ORG], a spatial dataset is a Dataset and a spatial thing is in general a Place or an Event. For some types of spatial things, more specific sub-types exist, for example City or Mountain.
Location information about a spatial thing is typically provided using a geometry (GeoCoordinates or GeoShape) or a PostalAddress. [SCHEMA-ORG] coordinates are restricted to WGS 84 with longitude and latitude. Supported geometry types are points, line strings, polygons, boxes and circles.
Through the use of [SCHEMA-ORG] annotations, search engines and others can connect location information with other information, e.g. about the nature of the spatial thing, opening hours, contact details, etc.
The use of [SCHEMA-ORG] for spatial data is in its early days and has to be understood as an "emerging practice".
The Web-pages should also provide a mechanism to download data in the formats you decide to support. [DWBP] Best Practice 14: Provide data in multiple formats provides guidance.
Typically multiple formats for a resource are supported using two mechanisms: HTTP content negotiation and by adding format-specific file extensions to the resource URI like ".json", ".xml" or ".ttl". Content negotiation is the standard mechanism of HTTP and the format-specific URIs enable the use of clickable links to the resource in a specific format.
Search engines may also index resource representations in other formats than HTML.
In 2016, these topics were analysed in a testbed organised by Geonovum in the Netherlands. More details can be found in reports from the testbed: Spatial Data on the Web using the current SDI and Crawlable geospatial data using the ecosystem of the Web and Linked Data.
The use of [SCHEMA-ORG] for describing spatial information is continually evolving; spatial data publishers should familiarise themselves with current practices. A useful Introduction to Structured Data is provided in Google's developer portal.
How to Test
Using a Web browser,
Monitor the search consoles of the search engines about the progress in indexing your Web-pages and their structured data. In case any errors are reported, try to fix them.
Evidence
Relevant requirements: R-BoundingBoxCentroid, R-Crawlability, R-Discoverability, R-Linkability, R-MachineToMachine.
[DWBP] provides a best practice discussing how the quality of data on the web should be described (see [DWBP] section 8.5 Data Quality for more details). This section is based on the Data Quality section from [DWBP] and adds a best practice specific for spatial data.
In the Spatial Metadata section we provided a Best Practice on how to deal with CRS in spatial data on the web. There is also a clear link between CRS and data quality, because the accuracy of spatial data depends for a large part on the CRS used. This can be seen as conformance of data with a "standard" - in this case, a (spatial or temporal) reference system. This is how you can describe spatial data quality using different vocabularies. We will provide an example in this section.
Best Practice 5: Describe the positional accuracy of spatial data
Accuracy and precision of spatial data should be specified in machine-interpretable and human-readable form.
Why
The amount of detail that is provided in spatial data and the resolution of the data can vary. No measurement system is infinitely precise and in some cases the spatial data can be intentionally generalized (e.g. merging entities, reducing the details, and aggregation of the data) [Veregin].
It is important to understand the difference between precision and accuracy. Seven decimal places of a latitude degree corresponds to about one centimeter. Whatever the precision of the specified coordinates, the accuracy of positioning on the actual earth's surface using WGS84 will only approach about a meter horizontally and may have apparent errors of up to 100 meters vertically, because of assumptions about reference systems, tectonic plate movements and which definition of the earth's 'surface' is used.
Intended Outcome
When known, the resolution and precision of spatial data should be specified in a way to allow consumers of the data to be aware of the resolution and level of details that are considered in the specifications.
Possible Approach to Implementation
Describe the accuracy of spatial data in a way that is understandable for humans.
In addition, describe the accuracy of spatial data in a machine-readable format. [ VOCAB-DQV] is such a format. It is a vocabulary for describing data quality, including the details of quality metrics and measurements.
We need some explanations for the approaches to describe positional (in)accuracy.
a:Dataset a dcat:Dataset ;
dct:conformsTo <http://data.europa.eu/eli/reg/2010/1089/oj> .
<http://data.europa.eu/eli/reg/2010/1089/oj> a dct:Standard , foaf:Document ;
dct:title "COMMISSION REGULATION (EU) No 1089/2010 of 23 November 2010
implementing Directive 2007/2/EC of the European Parliament
and of the Council as regards interoperability of spatial
data sets and services"@en ;
dct:issued "2010-12-08"^^xsd:date .
The following example shows how DQV can express the precision of a spatial dataset:
:myDataset a dcat:Dataset ;
dqv:hasQualityMeasurement :myDatasetPrecision, :myDatasetAccuracy .
:myDatasetPrecision a dqv:QualityMeasurement ;
dqv:isMeasurementOf :spatialResolutionAsDistance ;
dqv:value "1000"^^xsd:decimal ;
sdmx-attribute:unitMeasure <http://www.wurvoc.org/vocabularies/om-1.8/metre>
.
:spatialResolutionAsDistance a dqv:Metric;
skos:definition "Spatial resolution of a dataset expressed as distance"@en ;
dqv:expectedDataType xsd:decimal ;
dqv:inDimension dqv:precision
.
This example was taken from [VOCAB-DQV]. For more examples of expressing spatial data precision and accuracy see DQV, Express dataset precision and accuracy.
How to Test
...
Evidence
Relevant requirements: R-MachineToMachine, R-QualityPerSample.
Spatial things and their attributes can change over time. For example, a lake may grow or shrink due to changes in climate, water extraction or any number of reasons. For many applications, it is important that information about spatial things is kept up to date. When new information is available, the data publisher may make this available on the Web according to their update schedule and policies. [DWBP] section 8.6 Data Versioning and Best Practice 21: Provide data up to date provide directly applicable guidance.
When dealing with change to a spatial thing, you should consider its lifecycle; in particular, how much change is acceptable before a spatial thing can no longer be considered as the same resource. Consider Eddystone Lighthouse for example: the “Eddystone Light”, a maritime navigation aid, has existed in (more or less) the same place on Eddystone Rocks since 1698. A single HTTP URI (such as http://dbpedia.org/resource/Eddystone_Lighthouse
) is used to identify “the lighthouse on Eddystone rocks” for all that period. The lighthouse's attributes (such as its focal height, visible range and light characteristic) have changed over that period, but we still consider it to be the same lighthouse. However, if our interest is historic buildings, we would identify the four different structures that have stood on that site as different spatial things, from Winstanley's Eddystone Lighthouse (the first incarnation) to Douglass' Eddystone Lighthouse (the 4th and current incarnation). Incremental change for these structures during the entire period from 1698 is not appropriate; one structure replaces another and so each structure should be assigned a unique identifier. In summary, different things are important to different people!
Essentially, the decision to assign a new identifier in response to change depends on how domain experts think about the lifecycle of the spatial thing, which then manifests in a data modelling choice. [DWBP] section 8.9 Data Vocabularies and section 12.5 Spatial Data Vocabularies provide further guidance on the topic of data modelling; determining which concepts and relationships should be used to describe your area of interest.
Data publishers should not attempt to guess all the purposes for which someone might use or reference their data - ending up with a super-complex data model that tries to cover every possible use case. Instead, data publishers should try to help data consumers make informed decisions about the best way to use the data by providing good metadata. When it comes to spatial things, or any resource, that changes over time, it is important to provide metadata about the life cycle of those entities and the resources used to describe them. Given that information, data consumers can make considered choices about which resource they want to link to. [DWBP] section 8.2 Metadata provides useful guidance.
All that said, if you consider that the change affects the fundamental nature of the spatial thing, then you should assign a new identifier. See section 12.4 Spatial Data Identifiers for more details. Otherwise, read on for guidance on how to describe properties that change over time.
Best Practice 6: How to describe properties that change over time
Spatial data should include metadata that allows a user to determine when it is valid for.
Why
Spatial things and their attributes change over time. Mostly, users are interested in current information. They need to be able to determine whether the published description of a spatial thing meets their needs. For example, is the published geographic extent of the City of Amsterdam relevant for a land-usage study of the nineteenth century? (Gemeentegeschiedenis.nl, "Parish History", illustrates how the extent of Amsterdam has changed during the past 200-years, in HTML and GeoJSON). Where the information is available, a user may want to browse older versions of the published information to understand the nature of any changes or to find historical information.
Intended Outcome
Users are provided with the most recent version of information about a spatial things and its attributes by default.
Users are able to determine the time period for which data is applicable.
If a version history of changes is available, users are able to browse through a set of changes to see how a spatial thing and its attributes have changed over time.
Possible Approach to Implementation
When publishing information about a spatial thing that is subject to change there are three main approaches to consider:
Whichever approach is chosen, publishers of spatial data should consider how dataset metadata plays an important part in helping users determine whether a dataset is fit for their use. Particularly where the contents of a dataset change with time, statements about the (most recent) publication date, the frequency of update and the time period for which the dataset is relevant (i.e. temporal extent) should be provided. Please refer to [DWBP] section 8.2 Metadata for more details about dataset metadata.
A description of the lifecycle of the spatial things (e.g. what triggers a change and whether those changes are versioned etc.) should also be provided in either the dataset's metadata, schema or specification. For example, the UK's Digital National Framework policy states that data publishers must provide these lifecycle rules.
Approach (1) is lightweight and should only be used where there are no user requirements that require access to older descriptions of the spatial things. Data publishers simply replace the old description of the spatial thing with the amended description and keep users informed about updates by providing the appropriate metadata (e.g. when the data was changed). This may be achieved using dataset metadata (as outlined above) or by including the metadata attributes in the description of each spatial thing.
Where users are anticipated to need to understand how a spatial thing has changed over time, approaches (2) and (3) must be considered.
Approach (2) requires the data publisher to publish immutable resources that describe the spatial thing at specific points in time (i.e. "snapshots") and provide a mechanism for users to browse between those snapshots. Given that each snapshot of the spatial thing is published as a separate resource, this approach is suited to infrequent changes so that the number of snapshots does not become unweildy.
The URI for the spatial thing, the base URI, should resolve to provide the current information and a link to its version history of snapshots. [DWBP] Best Practice 8: Provide version history describes how a version history may be implemented. Each snapshot resource within the version history must be uniquely identified; a common approach is to append a date/time stamp to the base URI as a version indicator. [DWBP] Best Practice 7: Provide a version indicator provides relevant guidance.
Approach (3) is suitable where a spatial thing has a small number of attributes that are frequently updated. For example, the GPS-position of a runner or when streaming data from a sensor, such as the water level from a stream guage.
With this approach, the description of the spatial thing must include a property that contains a sequentially-ordered set of data-points, each of which defines a time-stamp and the values for the time-varying attribute(s). By definition, this property can be considered as a time-series coverage. Standard data encodings are available for time-series data, including: [TIMESERIESML] for GML, plus [COVERAGE-JSON] and [SENSORTHINGS] for JSON.
The OGC [MOVING-FEATURES-XML] and [MOVING-FEATURES-CSV] specifications follow the pattern described above. A trajectory
element is used to describe the position of a spatial thing, and varying attributes (such as orientation or rotation) can be added alongside the tuples in the trajectory. However, there is limited evidence of adoption outside of Japan.
How to Test
...
Evidence
Relevant requirements: R-MachineToMachine, R-MovingFeatures, R-Streamable
The primary topics of any spatial dataset are spatial things, each described by a set of attributes and usually at least one geometry. How your spatial data is structured will depend on the vocabulary or data model you use (see section 12.5 Spatial Data Vocabularies for further details on vocabulary choice). This will determine the types of entities that, along with the spatial things themselves, are important enough to be given identifiers so that statements can be made about them. Geometry objects are an example of an entity that is often assigned a unique identifier so that they can be referenced or reused.
To publish spatial data on the Web, we need to stitch the spatial things and their corresponding entities into the Web’s information space; contributing to the Web of data. First: [WEBARCH] Good Practice: Identify with URIs states that "agents should provide URIs as identifiers for resources". Second: the 5 Star Data scheme states: "★★★★ use URIs to denote things, so that people can point at your stuff".
[DWBP] Best Practice 10: Use persistent URIs as identifiers within datasets provides directly applicable guidance. When identifying resources, it advises:
Furthermore, given ubiquitous use of the Hyper Text Transfer Protocol (HTTP) on the Web, we SHOULD use HTTP URIs to identify resources in spatial data.
We consider identifiers in the Web’s information space to be unaffected by the choice to serve HTTP content securely or not. For example, http://example.org/country/suriname
and https://example.org/country/suriname
both identify the same spatial thing - in this case the South American country of Suriname.
Resources identified with HTTP URIs can be specified as the target of links within the Web’s global information space, enabling information from different sources to be related and combined. This is the fundamental basis of 5★ Linked Data: "★★★★★ link your data to other data to provide context".
Best Practice 7: Use globally unique persistent HTTP URIs for spatial things
Use stable HTTP URIs to identify spatial things, re-using commonly used URIs where they exist and it is appropriate to do so.
Why
The Web works with resources that are identified using HTTP URIs. We want Spatial things to be first class resources on the Web that we want to make statements about and relate to other resources. To do this, spatial things need to be addressable resources in the Web’s global information space which means they must be identified using HTTP URIs.
This is a fundamentally different data publication approach to what is typical today where the dataset is (often) globally identified, but individual spatial things, or "features" in SDI parlance, are not - at least not with a persistent identifier.
The HTTP URIs used to identify spatial things need to be stable or persistent so that relationships that link them to other resources don’t break.
Intended Outcome
Spatial things become part of the Web’s global information space enabling them be linked with other spatial things and other resources and for those links to be durable. In other words, spatial data becomes part of the Web of Data.
Possible Approach to Implementation
The Web of data is made up of subjects and objects; the things we talk about and the things we refer to. For example, we could say that Anne Frank's House (the subject) is within the Municipality of Amsterdam (the object). In RDF this looks like:
<
https://g.co/kg/m/02s5hd
>
schema:containedInPlace
<
http://sws.geonames.org/2759793
>
.
When considering HTTP URIs for objects (e.g. the target of our hyperlinks) it makes sense to reuse existing identifiers. After all, you are trying to stitch your spatial data into the Web so that we can "link your data to other data" and achieve a ★★★★★ rating! Organizations such as DBPedia, GeoNames and government mapping and cadastral authorities (that publish national registers of addresses, buildings, etc.) are good sources of stable, authoritative URIs. Appendix B. Authoritative sources of geographic identifiers lists sources of URIs for spatial things, and the steps described for discovering existing vocabularies [LD-BP] can be readily adapted to find more. For more details about how you might link to these authoritative identifiers, see section 12.7 Linking Spatial Data.
However, HTTP URIs for subjects (e.g. the resource that we want to make statements about) can be a bit more tricky. If you are working purely with data then you can reuse existing URIs minted by other authorities for your subject URIs. But publishing spatial data on the Web means that the URIs for each spatial thing should resolve to Web pages or data resources that provide useful information (see ). An HTTP request will be directed to a host Web server, identified by the internet domain name (or IP address) in the requested URI. If you use a URI with an internet domain name where you have no control over how the Web server behaves, then there is no way for your statements to be included in the Web server's response.
To take control of how information about spatial things is presented, data publishers need to assign their subject spatial things HTTP URIs from an internet domain name where they have authority over how the Web server responds. Typically, this means minting new HTTP URIs. It's all worth considering that the use of a particular internet domain may reinforce the authority of the information served. For example, a URI for Anne Frank's House is: https://monumentenregister.cultureelerfgoed.nl/monuments?MonumentId=4296
. The use of the internet domain registered to the Cultural Heritage Agency of the Netherlands gives the definition authenticity.
The need to control what information is provided about a given spatial thing means that it is not uncommon for a spatial thing to be identified by multiple HTTP URIs. The equality between two URIs that refer to the same resource can be stated using a property such as owl:sameAs
. Care must always be taken when using owl:sameAs
to determine that the two URIs actually refer to the same resource, rather than two resources that are similar. Warning: don't say if you're not sure it's true!
For more information about the types of properties that can be used to link between spatial things, and between spatial things and other resources, see section 12.7 Linking Spatial Data.
When minting your own URIs, [DWBP] Best Practice 10: Use persistent URIs as identifiers within datasets cites the advice from GS1's SmartSearch Implementation Guideline [GS1] which suggests that your URIs should include the type of resource that is being identified to help human readability. Also, given the need for the HTTP URIs for spatial things to be used throughout their lifetime (and perhaps beyond) you should give some thought to designing a URI that is persistent.
[DWBP] Best Practice 9: Use persistent URIs as identifiers of datasets cites the European Commission's Study on Persistent URIs [PURI] as a good source from which to gain insights about designing persistent URIs.
When an HTTP URI is resolved, the server will respond with a sequence of bytes: by its nature, HTTP can only serve information resources such as Web pages or JSON documents. Yet a spatial thing is actually a real or conceptual phenomenon - a lake is made from water not information! Using a single URI to refer to both the spatial thing and the page/document that describes the spatial thing introduces a URI collision. This can impose a cost in communication due to the effort required to resolve ambiguities. [URLs-in-data] has more to say on this subject, including recommending URI design patterns that enable differentiation between the spatial thing and the page/document that describes it.
However, in most cases using a single URI for both spatial thing and the page/document is simpler to implement and meets the expectations of most end-users. As stated in [WEBARCH] section 2.2.3 Indirect Identification, identifiers are commonly used in this way. There is no obligation to distinguish between the spatial thing and the page/document unless your application requires this.
While there is a cost to this conflation, problems can be mitigated by avoiding making statements that confuse spatial thing and the page/document, such as “Uluru is available in KML format”; e.g. <
http://sws.geonames.org/7645281
> dc:hasFormat ex:kml .
This statement is clearly not true; an ancient monolith covering more than 3 km2 cannot be provided in XML!
There is a level of discomfort in the wider community (based on discussion with Platform Linked Data Nederland folks amongst others) about whether this best practice should recommend "indirect identifiers" (where spatial thing and page/document both share the same URI) while the TAG Guidance (albeit from 2005) states that a HTTP 303 (see other)
response should be provided by servers resolving the URI of a non-information resource (such as a spatial thing), referring the user agent to the corresponding information resource. (i.e. the /id
and /doc
pattern that is in widespread use but often seems to confuse users and even some experts).
We pretty much agreed that use of indirect identifiers was OK during our discussion at TPAC-2016. That said, we didn't record a resolution.
If we want to stick with the TAC guidance, suggest that we remove the paragraph beginning
However, in most cases using a single URI for both spatial thing and the page/document [...]
and the following note, and replace with:
Dereferencing URIs for spatial things should result in aHTTP 303 (see other)
response that redirects the user agent to the corresponding page/document. This means that the spatial thing and the page/resource MUST have different URIs. It is common to use/id
as part of the URI for non-information resources, and/doc
for the corresponding page/document.
That said, [URLs-in-data] provides other alternatives such as using a #id
fragment.
HTTP URIs for spatial things should not include any indication of the data format used to encode the page/document as this may change as your systems evolve. That said, you may wish to provide a set of complementary resources that specify a particular format as part of your content negotiation strategy. For example, the URI http://sws.geonames.org/7645281/about.rdf
resolves to provide an RDF/XML encoding of the information about Uluru in the Northern Territory of Australia (http://sws.geonames.org/7645281
).
[DWBP] Best Practice 10: Use persistent URIs as identifiers within datasets notes that URIs can be long. You may need to define identifiers that are locally unique within your spatial dataset and provide a mechanism to programmatically convert each local identifier to a URI. For example, the Metadata Vocabulary for Tabular Data [TABULAR-METADATA] achieves this using URI Templates as described in [RFC6570].
It is also good practice to use a redirection service to hide complex and potentially changing service end-point URLs, such as for a Web Feature Service behind well-designed URIs. This means that users don’t need to be aware of the complexities of the API or changes in endpoint URIs or API versions in order to request information about a particular spatial thing. For example, the URI http://data.example.org/aan/id/perceel/aan.2528
could be used as proxy for the WFS GetFeature request http://geodata.nationaalgeoregister.nl/aan/wfs?VERSION=2.0.0&SERVICE=WFS&REQUEST=GetFeature&featureID=aan.2528
.
Finally, while it is simple to use a query-pattern URL to serve information about a resource identified with a URI from a third-party internet domain, e.g. http://example.org/museums?q=http://sws.geonames.org/6618987
, these URLs are unsuitable as persistent identifiers. More often than not, your intended users will dereference the "official" URI, e.g. http://sws.geonames.org/6618987
. That said, this kind of search operation does provide a useful mechanism to find particular spatial things. See Best Practice 12: Include search capability in your data access API for further details.
How to Test
Check that within the data spatial things, such as countries, regions and people, are referred to by HTTP URIs or by short identifiers that can be converted to HTTP URIs. Ideally the URIs should resolve, however, they have value as globally scoped variables whether they resolve or not.
Evidence
Relevant requirements: R-Linkability, R-GeoReferencedData, R-IndependenceOnReferenceSystems.
In this document there is no section on formats for publishing spatial data on the web. The formats are basically the same as for publishing any other data on the web: XML, JSON, CSV, RDF, etc. Refer to [DWBP] section 8.6 Data Formats for more information and best practices.
That being said, it is important to publish your spatial data with clear semantics, i.e. to provide information about the contents of your data. The primary use case for this is you have information about a collection of SpatialThings and you want to publish precise information about their attributes and how they are inter-related. Another use case is the publication on the Web of a dataset that has a spatial component in a form that search engines will understand.
Depending on the format you use, the semantics may already be described in some form. For example, in GeoJSON [RFC7946] this description is present in the specification. When using JSON it is possible to add semantics using a JSON-LD @context. For providing semantics to search engines, using schema.org is a good option, as explained in Best Practice 4: Make your spatial data indexable by search engines.
In a linked data setting, the attributes of a spatial thing can be described using existing vocabularies, where each term has a published definition. [DWBP] Best Practice 15: Reuse vocabularies, preferably standardized ones recommends using terms from an established widely used vocabulary. If you can't find a suitable existing vocabulary term, you should create your own, and publish a clear definition for the new term (see [LD-BP]. We recommend that you link your own vocabulary to commonly used existing ones because this increases its usefulness. We provide the mapping between some commonly used spatial vocabularies.
We must avoid being overly focused on RDF.
The [LD-BP] reference makes this section very RDF dependent. Is there a need / justification for this? Are we saying that RDF is the only recommended way to publish data and models on the Web? Web developers may not care about RDF vocabularies and maybe they prefer a Swagger document (just to pick an example)?
To reduce the RDF focus, some text was added.
The current list of RDF vocabularies / OWL ontologies for spatial data being considered by the SDW WG are provided below. Some of these will be used in examples. Full details, including mapping between vocabularies, pointers about inconsistencies in vocabularies (if any are evident), and recommendations avoiding their use as these may lead to confusion, will be published in a complementary NOTE: Comparison of geospatial vocabularies.
The NOTE will be concerned with helping data publishers choose the right spatial data format or vocabulary. It provides a methodology for making that choice. We do this rather than recommending one vocabulary because this recommendation would not be durable as vocabularies are released or amended.
Vocabularies can discovered from Linked Open Vocabularies (LOV); using search terms like 'location' or Tags place, Geography, Geometry and Time.
http://statistics.data.gov.uk/def/statistical-geography#
and
http://statistics.data.gov.uk/def/statistical-entity#
(URIs do not resolve)
No attempts have yet been made to rank these vocabularies; e.g. in terms of expressiveness, adoption etc.
The motivation behind the ISA Programme Location Core Vocabulary was establishing a minimal core common to existing spatial vocabularies. However, experience suggests that such a minimal common core is not very useful as one quickly need to employ specific semantics to meet one's application needs.
Do we need a subclass of SpatialThing for entities that do not have a clearly defined spatial extent; or a property that expresses the fuzziness the extent?
Location information is often a common thread running through such data and can be an important 'hook' for finding information and for integrating different datasets. There are different ways of describing the location of spatial things. You can use and/or refer to the name of a well known named place, provide the location's coordinates as a geometry or describe it in relation to another location. These last two options are described in this section.
Best Practice 8: Provide geometries on the Web in a usable way
Geometry data should be expressed in a way that allows its publication and use on the Web.
Why
This best practice helps with choosing the right format for describing geometry based on aspects like performance and tool support. It also helps when deciding on whether or not using literals for geometric representations is a good idea.
Intended Outcome
The format chosen to express geometry data should:
Possible Approach to Implementation
Steps to follow:
geo:lat
and geo:long
that are used extensively for describing geo:Point
objects.How to Test
...
Evidence
Relevant requirements: R-BoundingBoxCentroid, R-Compressible, R-CRSDefinition, R-EncodingForVectorGeometry, R-IndependenceOnReferenceSystems, R-MachineToMachine, R-SpatialMetadata, R-3DSupport, R-TimeDependentCRS, R-TilingSupport.
Best Practice 9: How to describe relative positions
Provide a relative positioning capability in which the entities can be linked to a specific position.
Why
In some cases, it is necessary to describe the location of an entity in relation to another location or in relation to location of another entity. For example, South-West of Guildford, close to London Bridge.
Intended Outcome
It should be possible to describe the location of an entity in relation to another entity or in relation to a specific location, instead of specifying a geometry.
The relative positioning descriptions should be machine-interpretable and/or human-readable.
Possible Approach to Implementation
The relative positioning should be provided as:
Do we need this as a best practice; if yes, this best practice needs more content
How to Test
...
Evidence
Relevant requirements: R-MachineToMachine, R-SamplingTopology.
In most cases, the effective use of information resources requires understanding thematic concepts in addition to the spatial ones; "spatial" is just a facet of the broader information space. For example, when the Dutch Fire Service responded to an incident at a day care center, they needed to evacuate the children. In this case, the 2nd closest alternative day care center was preferred because it was operated by the same organization as the one that was subject of the incident, and they knew who all the children were.
This best practice document provides mechanisms for determining how places and locations are related - but determining the compatibility or validity of thematic data elements is beyond our scope; we're not attempting to solve the problem of different views on the same/similar resources.
That said, there is one aspect of thematic semantics that must be mentioned. The most important semantic statement you can make when publishing spatial data - or any data - is to specify the type of a resource. For spatial things, there are a number of types that define "spatialness" (see section 12.5.1 Describing location). But you should also consider non-spatial aspects when designating the type of a spatial thing. For example, should a fire incident occur at Amsterdam Central railway station, it might seem sensible for the Municipal Fire Department to designate a type such as Building or Station (the Dutch Government Base Registry defines Amsterdam Central railway station, identified as https://brt.basisregistraties.overheid.nl/top10nl/id/gebouw/102625209
, designates both of these types). However, the Fire Department are concerned with a fire incident - not the railway station itself. The fire incident is a spatial thing (it has spatial extent) but it is not the station. For example, the fire may spread to adjacent buildings. The Fire Department might designate their spatial thing as having type FireIncident or similar. Advice on how to assign a persistent identifier to the fire incident is provided in Best Practice 7: Use globally unique persistent HTTP URIs for spatial things, and section 12.7 Linking Spatial Data provides guidance on how one might relate the fire incident to other conincident spatial things such as Amsterdam Central railway station.
Thematic semantics are out of scope for this best practice document. For associated best practices, please refer to [DWBP] section 8.2 Metadata, Best Practice 3: Provide structural metadata; and [DWBP] section 8.9 Data Vocabularies, Best Practice 15: Reuse vocabularies, preferably standardized ones and Best Practice 16: Choose the right formalization level.
See also [LD-BP] Vocabularies.
Best Practice 10: Use spatial semantics for Spatial Things
The best vocabulary should be chosen to describe the available SpatialThings.
Why
[DWBP] Best Practice 15: Reuse vocabularies, preferably standardized ones recommends the use of existing vocabularies. There are several vocabularies available in which you can describe SpatialThings. A robust methodology or an informed decision making process should be adopted to choose the best available vocabulary to describe the entities.
There is nothing to stop publishers describing their SpatialThings multiple times using different vocabularies, thereby maximizing the potential for interoperability and letting the consumers choose which is the most useful. The intent for this best practice is to provide a mechanism that helps the data publisher choose the right vocabulary or vocabularies that best meet their intended goal.
Intended Outcome
Entities and their properties are described using common and reusable vocabularies to increase and improve the interoperability of the descriptions.
Possible Approach to Implementation
There are various vocabularies that provide common information (semantics) about spatial things, such as Basic Geo vocabulary, [GeoSPARQL] or schema.org.
Work is underway on an update for the GeoSPARQL spatial ontology. This will provide an agreed spatial ontology conformant to the ISO 19107 abstract model and will be based on existing available ontologies such as GeoSPARQL, the W3C basic geo vocabulary, NeoGeo and the ISA Core Location vocabulary. This best practice will recommend the use of this spatial ontology.
This best practice provides a method for selecting the right vocabulary (or vocabularies) for your intended goal. The semantic description of entities and their properties should use the existing common vocabularies in their descriptions to increase the interoperability with other descriptions that may refer to the same vocabularies. The steps required to choose the best vocabularies are:
The Basic Geo vocabulary has a class SpatialThing which has a very broad definition. This can be applicable (as a generic concept) to most of the common use-cases.
For some use cases we might need something more specific.
Methodology for selecting a spatial vocabulary is not yet defined.
How to Test
...
Evidence
Relevant requirements: R-MachineToMachine, R-MobileSensors, R-MovingFeatures.
We might publish in the Best Practice Note or a complimentary Note a set of statements mapping the set of available vocabularies about spatial things. There are mappings available e.g. GeoNames has a mapping with schema.org. http://www.geonames.org/ontology/mappings_v3.01.rdf
Temporal relationship types will be described here and be entered eventually as link relationship types into the IANA registry, Link relations, just like the spatial relationships.
In the same sense as with spatial data, temporal data can be fuzzy.
Retain section; point to where temporal data is discussed in detail elsewhere in this document.
SDIs have long been used to provide access to spatial data via web services; typically using open standard specifications from the Open Geospatial Consortium (OGC). With the exception of the Web Map Service, these OGC Web service specifications have not seen widespread adoption beyond the geospatial expert community. In parallel, we have seen widespread emergence of Web applications that use spatial data.
[DWBP] provides best practices discussing access to data using Web infrastructure (see [DWBP] section 8.10 Data Access). This section provides additional insight for publishers of spatial data.
Making data available on the Web requires data publishers to provide some form of access to the data. There are numerous mechanisms available, each providing varying levels of utility and incurring differing levels of effort and cost to implement and maintain. Publishers of spatial data should make their data available on the Web using affordable mechanisms in order to ensure long-term, sustainable access to their data.
When determining the mechanism to be used provide Web access to data, publishers need to assess utility against cost. In order of increasing usefulness and cost:
Read [DWBP] Best Practice 23: Make data available through an API, Best Practice 24: Use Web Standards as the foundation of APIs, Best Practice 25: Provide complete documentation for your API, and Best Practice 26: Avoid Breaking Changes to Your API for general recommendations about publishing data using APIs.
Best Practice 11: Expose spatial data through 'convenience APIs'
If you have a specific application in mind for publishing your data, tailor the spatial data API to meet that goal.
Why
When access to spatial data is provided by bulk download or through a generalized query service, users need to understand how the data is structured in order to work effectively with that data. Given that spatial data may be arbitrarily complex, this burdens the data user with significant effort before they can even perform simple queries. In addition, spatial datasets tend to be large. Convenience APIs are tailored to meet a specific goal; enabling a user to engage with arbitrarily complex data structures using (a set of) simple queries.
Intended Outcome
The API provides a coherent set of queries and operations, including spatial ones, that help users get working with the data quickly to achieve common tasks. The API provides both machine readable data and human readable HTML markup; the latter is used by search engines to index the spatial data.
Possible Approach to Implementation
The API should offer both machine readable data and human readable HTML that includes the structured metadata required by search engines seeking to index content (see Best Practice 4: Make your entity-level data indexable by search engines for more details).
1. Reuse your existing spatial data infrastructure
In the geospatial domain there are a lot of WFS services providing data. A RESTful API as a wrapper, proxy or a shim layer can be created around WFS services. Content from the WFS service can be provided in this way as linked data, JSON or another Web friendly format. This approach is similar to the use of Z39.50 in the library community; that protocol is still used but 'modern' Web sites and web services are wrapped around it.
There are examples of this approach of creating a convenience API that works dynamically on top of WFS such as the experimental ldproxy. This requires relatively little effort and is an attractive option for quickly exposing spatial data from existing WFS services on the Web. The approach is to create an intermediate layer by introducing a proxy on top of the WFS (data service) and CSW (metadata service) so the contained resources are made available. The proxy maps the data and metadata to schema.org according to a provided mapping scheme; assigns URIs to all resources based on a pattern; makes each resource available in HTML, XML, JSON-LD, GML, GeoJSON, and RDF/XML (metadata only); and generates links to data in other datasets using SPARQL queries.
[Add description of another example using a similar approach: PDOK]
2. Provide web-friendly access to the data as an alternative
A more effective route may be to provide an alternative 'Linked Data friendly' access path to the data source; creating a new, complementary service endpoint e.g. expose the underpinning postGIS database via SPARQL endpoint (using something like ontop-spatial) and Linked Data API.
3. To limit the amount of modifications and load on your SDI but still maintain a direct link between the data provided through the SDI and the web friendly version of the spatial features, use 'rdf_seealso' as spatial feature attribute to point to the web friendly representation.
How to Test
...
Evidence
Relevant requirements: R-Compatibility, R-LightweightAPI.
Best Practice 12: Include search capability in your data access API
If you publish an API to access your data, make sure it allows users to search for specific data.
Should best practice "Include search capability in your data access API" move or be removed?
Why
It can be hard to find a particular resource within a dataset, requiring either prior knowledge of the respective identifier for that resource and/or some intelligent manual guesswork. It is likely that users will not know the URI of the resource that they are looking for- but may know (at least part of) the name of the resource or some other details. A search capability will help a user to determine the identifier for the resource(s) they need using the limited information they have.
Intended Outcome
A user can do a text search on the name, label or other property of an entity that they are interested in to help them find the URI of the related resource.
Possible Approach to Implementation
to be added
How to Test
...
Evidence
Relevant requirements: {... hyperlinked list of use cases ...}
Best Practice 13: Provide subsets for large spatial datasets
Identify subsets of large spatial data resources that are a convenient size for applications to work with
Is the term "subset" correct?
Why
[DWBP] Best Practice 18: Provide Subsets for Large Datasets explains why providing subsets is important. Spatial datasets, particularly coverages such as satellite imagery, sensor measurement time-series and climate prediction data, are often very large. In these cases it is useful to provide subsets by having identifiers for conveniently sized subsets of large datasets that Web applications can work with.
Intended Outcome
Being able to refer to subsets of a large spatial data resource that are sized for convenient usage in Web applications.
Possible Approach to Implementation
Two possible approaches are described below:
A Web service URL in general does not provide a good URI for a resource as it is unlikely to be persistent. A Web service URL is often technology and implementation dependent and both are very likely to change with time. For example, consider oft used parameters such as ?version=
. Good practice is to use URIs that will resolve if the resource is relevant and may be referenced by others, therefore identifiers for subsets should be protocol independent.
How to Test
...
Evidence
Relevant requirements: R-Compatibility, R-Linkability, R-Provenance, R-ReferenceDataChunks.
More content needed for this best practice.
Earlier in this document, the Linked Data section explained that we believe that Linked Data requires only that the formats used to publish data support Web linking. In other words, linking spatial data does not automatically mean the use of RDF; links can also be created, for example, using GML, HTML or JSON-LD.
Links, in whatever machine-readable form, are important. In the wider Web, it is links that enable the discovery of web pages: from user-agents following a hyperlink to find related information to search engines using links to prioritize and refine search results. This section is concerned with the creation and use of those links to support discovery of the SpatialThings described in spatial datasets.
For data to be on the web the resources it describes need to be connected, or linked, to other resources. The connectedness of data is one of the fundamentals of the Linked Data approach that these best practices build upon. The 5-star rating for Linked Open Data asserts that to achieve the fifth star you must "link your data to other data to provide context". The benefits for consumers and publishers of linking to other data are listed as:
Just like any type of data, spatial data benefits massively from linking when publishing on the web. The widespread use of links within data is regarded as one of the most significant departures from contemporary practices used within SDIs. That's why this topic is included in this Best Practice.
Crucially, the use of links is predicated on the ability to identify the origin and target, or beginning and end, of the link. Best Practice 6: Use globally unique persistent HTTP URIs for spatial things is a prerequisite.
This section extends [DWBP] by providing a best practice about creating links between the resources described inside datasets.
[DWBP] identifies Linkability as one of the benefits gained from implementing the Data on the Web best practices (see [DWBP] section 8.7 Data Identifiers Best Practice 9: Use persistent URIs as identifiers of datasets and Best Practice 10: Use persistent URIs as identifiers within datasets). However, no discussion is provided about how to create the links that the use those persistent URIs.
Best Practice 14: Publish links from spatial things to related resources
The data should be published with explicit links, including spatial links, to spatial things or other resources, both in the same dataset and in other datasets.
Why
Exposing entity-level links to Web applications, user-agents and Web crawlers allows the relationships between resources to be found without the data user needing to download the entire dataset for local analysis. Entity-level links, preferably meaningful links, provide explicit description of the relationships between resources and enable users to find related data and determine whether the related data is worth accessing. Entity-level links can be used to combine information from different sources; for example, to determine correlations in statistical data relating to the same location.
Data publishers should assert the relationships that they know about. Relationships between resources with spatial extent (i.e. size, shape, or position; SpatialThings) can often be inferred from their spatial properties, but this is a complex task. It requires both complex spatial processing (e.g. region connection calculus) and some degree of understanding about the semantics of the two, potentially related, resources in order to determine how they are related, if at all. This should not be left to the data user.
When your spatial resources are linked to those in common usage it will be easier to discover them. For example, a data user interested in air quality data about the place they live might begin by searching for that place in popular data repositories such as GeoNames, Wikidata or DBpedia. Once the user finds the resource that describes the correct place, they can search for data that refers to the identified resource that, in this case, relates to air quality. Furthermore, by referring to resources in common usage, it becomes even easier to find those resources as search engines will prioritize resources that are linked to more often.
It is not always feasible to link your spatial things to other resources in common usage. For example, if you were maintaining a registry of cultural heritage in Amsterdam, it would be reasonably simple to look up identifiers for the city's 50 or so museums and map these to your spatial things. But it would be a huge task for, say, a topographic mapping agency to cross-reference their entire catalogue of named places containing tens of thousands of spatial things with third-party resources (although in the spirit of crowd-sourcing, if someone else found those links useful, they may take on the task of relating the spatial things and publishing those relationships to the Web as a complementary resource!).
In essence, you should only manage the data that you have the resources to maintain.
Intended Outcome
It should be possible for humans and for machine agents to understand, interpret and follow the entity-level links between resources. Preferably, the definition of the meaning of a given link is precise and explicit.
It should be possible for humans and machine agents to find spatial relationships between Things without performing post geometric processing.
Spatial things should be related to commonly used resources.
Possible Approach to Implementation
Steps:
1. Choose one of the following general methods to provide explicit entity-level links:
The use of Linksets needs further discussion as evidence indicates that it is not yet a widely adopted best practice. It may be appropriate to publish such details in a Note co-authored with the DWBP Working Group.
[GML] adopted the [XLINK11] standard to represent links between resources. At the time of adoption, XLink was the only W3C-endorsed standard mechanism for describing links between resources within XML documents. The Open Geospatial Consortium anticipated broad adoption of XLink over time - and, with that adoption, provision of support within software tooling. While XML Schema, XPath, XSLT and XQuery etc. have seen good software support over the years, this never happened with XLink. The authors of GML note that given the lack of widespread support, use of Xlink within GML provided no significant advantage over and above use a bespoke mechanism tailored to the needs of GML.
[VoID] provides guidance on how to discover VoID descriptions (including Linksets)- both by browsing the VoID dataset descriptions themselves to find related datasets, and using
/.well-known/void
(as described in [RFC5758]).
How would a (user) agent discover that these 3rd-party link-sets existed? Is there evidence of usage in the wild?
Does the [BEACON] link dump format allow the use of wild cards / regex in URLs (e.g. URI Template as defined in [RFC6570]?
How do we know what is at the end of a link - and what can I do with it / can it do for me (e.g. the 'affordances' of the target resource).
How to describe the 'affordances' of the target resource?
2. Make spatial relationships explicit
It is a good idea to express spatial relationships between things explicitly in your data. A possible approach for this is to find out if they have corresponding geometries using spatial functions, and then express these correspondences as explicit relationships. If the spatial datasets you want to reconcile are managed in a Geographic Information System (GIS) or a spatial database, you can use the GIS spatial functions to find related spatial things. If your spatial things are expressed as Linked Data, you can use [GeoSPARQL], which has a set of spatial query functions available.
The mechanics of how to decide when two resources are related, if they don't have geometric or topological properties that allow you to determine this, are beyond the scope of this best practice. Tools (e.g. OpenRefine and Silk Linked Data Integration Framework) are available to assist with reconciliation based on e.g. geographical names or addresses present in the data and may provide further insight.
Where possible, existing identifiers should be reused when referring to resources (see Best Practice 7: Use globally unique persistent HTTP URIs for spatial things). However, the use of multiple identifiers for the same resource is commonplace, for example, where data publishers from different jurisdictions refer to the same SpatialThing. In this special case, properties such as owl:sameAs
can be used to declare that multiple identifiers refer to the same resource. It is often the case that data published from different sources about the same physical or conceptual resource may provide different view points. Note that you shouldn't use owl:sameAs
to indicate that two spatial things are only similar, while not exactly the same.
Maintaining links to *all* related resources doesn't scale. Redraft required.
3. Decide which spatial relationships to use
To let user agents know what is at the end of a link, it's a good idea to use explicitly defined relationship types to link between resources. [DWBP] section 8.9 Data Vocabularies provides information on general relationship types described in well-defined vocabularies (see [DWBP] Best Practice 15: Reuse vocabularies, preferably standardized ones).
Describing the spatial relationships between SpatialThings can be based on relationships such as topological (e.g. contains), geographical (e.g. nearby) and hierarchical (e.g. part of) links.
Social relationships can be defined based on perception; e.g. "samePlaceAs", nearby, south of. These relationships can also be defined based on temporal concepts such as: after, around, etc. In current practice, there is no such property as samePlaceAs
to express the social notion of place; enabling communities to unambiguously indicate that they are referring to the same place but without getting hung up on the semantic complexities inherent in the use of owl:sameAs
or similar.
Include details of which spatial relationships are published as IANA Link relations.
Which vocabularies out there have social spatial relationships? FOAF, GeoNames, ...
4. Determine the things to link to
The links should connect spatial things rather than information resource(s) that describe them. In many cases, different identifiers are used to describe the SpatialThing and the information resource that describes that SpatialThing. For example, within DBpedia, the city of Sapporo, Japan, is identified as http://dbpedia.org/resource/Sapporo
, while the information resource that describes this resource is identified as http://dbpedia.org/page/Sapporo
. Care should be taken to select the identifier for the SpatialThing rather than the information resource that describes it; in the example above, this is http://dbpedia.org/resource/Sapporo
.
Besides making the links to things within the same dataset explicit, data publishers should also relate their data to commonly used spatial resources such as GeoNames using links.
A list of sources of commonly used spatial resources is provided in section B. Authoritative sources of geographic identifiers.
This best practice is concerned with the connections between SpatialThings. When describing an individual SpatialThing itself, it is often desirable to decompose the information into several uniquely identified objects. For example, the geometry of an administrative area may be managed as a separate resource due to the large number of geometric points required to describe the boundary.
How to Test
...
Evidence
Relevant requirements: R-Linkability, R-MachineToMachine, R-SamplingTopology, R-SpatialRelationships, R-SpatialOperators. R-Crawlability, R-Discoverability.
Related data to a spatial dataset and its individual data items should be discoverable by browsing the links
Why
In much the same way as the document Web allows one to find related content by following hyperlinks, the links between spatial datasets, SpatialThings described in those datasets and other resources on the Web enable humans and software agents to explore rich and diverse content without the need to download a collection of datasets for local processing in order to determine the relationships between resources.
Spatial data is typically well structured; datasets contain SpatialThings that can be uniquely identified. This means that spatial data is well suited to the use of links to find related content.
Are we missing a best practice describing how to discover and annotate information within unstructured resources?
The emergency response to natural disasters is often delayed by the need to download and correlate spatial datasets before effective planning can begin. Not only is the initial response hampered, but often the correlations between resources in datasets are discarded once the emergency response is complete because participants have not been able to capture and republish those correlations for subsequent usage.
Intended Outcome
It should be possible for humans to explore the links between a spatial dataset (or its individual items) and other related data on the Web.
It should be possible for software agents to automatically discover related data by exploring the links between a spatial dataset (or its individual items) and other resources on the Web.
It should be possible for a human or software agent to determine which links are relevant to them and which links can be considered trustworthy.
What do we expect user-agents to do with a multitude of links from a single resource? A document hyperlink has just one target; but in data, a resource may be related to many things.
Possible Approach to Implementation
These "back-links" can be traversed to find related information and also help a publisher assess the value of their content by making it possible to see who is using (or referencing) their data.
How to Test
...
Evidence
Relevant requirements: {... hyperlinked list of use cases ...}
When publishing large datasets on the web, and designing APIs for accessing those datasets, data publishers must be aware of the constraints of operating in a Web environment. Providing access to large spatial datasets, such as coverages, is a particular challenge. The API should provide mechanisms to request subsets of the dataset that are a convenient size for client applications to manage.
There are several best practices in this document dealing with large datasets and coverages:
Should we discuss scalability issues here?
This section is a placeholder for best practices that were in the FPWD but have not yet been placed in the new doc structure. They may be removed, merged, or moved.
Best Practice 16: Provide a minimum set of information about spatial things for your intended application
When someone looks up a URI for a SpatialThing, provide useful information, using the common representation formats
Why
This will allow to distinguish SpatialThings from one another by looking at their properties; e.g. type, label. It will also allow to get the basic information about SpatialThings by referring to their URI.
Intended Outcome
This requirement should serve a minimum set of information for a SpatialThing against a URI. In general, this will allow to look up the properties and features of a SpatialThings, and get information from machine-interpretable and/or human-readable descriptions.
Possible Approach to Implementation
This requirement specifies that useful information should be returned when a resource is referenced. This can include:
rdfs:label
)How to Test
...
Evidence
Relevant requirements: R-MachineToMachine, R-MultilingualSupport, R-SpatialVagueness
The best practices described in this document will incorporate practice from both Observations and Measurements [OandM] and W3C Semantic Sensor Network Ontology [ VOCAB-SSN].
See also W3C Generic Sensor API and OGC Sensor Things API. These are more about interacting with sensor devices.
Best Practice 17: Describe the location according to a Coordinate Reference System
While publishing location data, it should be also specified what particular Coordinate Reference System (CRS) is being used.
Why
The spatial data can be published and shared between different communities through the use of common standards. However there are several standards and different ways to represent spatial data; e.g. describing coordinates based on WGS84 Long/Lat. Each community structures the spatial data according to their standards and their own domain of interest and their own CRS. This variety can create confusion and inconsistencies in using and interpreting the spatial data. To allow end-users (i.e. human users and machines) to interpret the spatial information correctly and consistently, the CRS, in which the data is represented based on, should be also provided and described using (machine interpretable) metadata.
Intended Outcome
The Coordinate Reference System should be specified and publish as machine readable and interpretable metadata and according to the common vocabularies.
It should be possible for human and machine users to access and interpret the CRS that is used to describe the spatial data.
Possible Approach to Implementation
EPSG:4277
. The following describes some of the key considerations in choosing a CRS.
How to Test
...
Evidence
...
This narrative introduces a flooding scenario as a background story for the Best Practice.
Names and places used in this scenario are fictional, procedures and practices may not reflect those used in the real world. Our intent is to provide a coherent context within which the best practices can be illustrated. We do not attempt to provide best practice for management of flooding events. However, many of the procedures discussed are based on information from flood-risk-and-water-management-in-the-netherlands.
Nieuwhaven is a flourishing coastal city in the Netherlands. In common with much of the Netherlands, the low lying nature of Nieuwhaven make it prone to flooding from both rivers and the North Sea. To mitigate or reduce risks to homes and businesses, significant investment has been made to flood control and water management infrastructure.
Flood Risk Management and Water Management are integrated in the Netherlands. By combining responsibilities for daily water management and flood risk management, the same people are involved who have a detailed knowledge of their water systems and flood defenses.
Flood risk management can be separated into three layers:
Our scenario concentrates on element (3).
The Nieuwhaven Water Board is the independent local government body responsible for regional water management; maintaining the system of dikes, drainage, canals and pumping stations that are designed to keep the city and surrounding environment from flooding.
Based on assessment of historical flooding events, Newhaven Water Board is able to determine the extent of flooding that would occur as the result of hypothetical storm surge and river flooding events.
tbd add example including:
Municipal emergency services, public health authorities and water boards are grouped according to a “safety region” in order to establish a multi-disciplinary “emergency team” for crisis management. This helps to ensure that there is effective communication between those responsible for public safety and those responsible for flood control and water management.
Each safety region prepares systematically for its own specific characteristics, based on available capabilities. This plan, the “Flood Response Plan”, includes evacuation strategies that are developed in response to hypothetical flooding events. Scenarios are prepared beforehand and carefully considered. The emergency team must be prepared at all times to deliver an assessment on a disaster / incident scenario and advise on proposed interventions, e.g. evacuation and deployment of temporary flood defenses.
In developing the Flood Response Plan, the numbers of citizens impacted by each hypothetical flooding event are determined by cross-referencing the areas affected by surface water flooding with census data.
Statistics Netherlands (CBS) publishes reliable and coherent statistical information which respond to the needs of Dutch society and is responsible for compiling official national statistics.
CBS makes use of OData, the Open Data Protocol v4, to provide open datasets for use by third parties. Furthermore, CBS provides a search interface to help a user find the dataset of interest. Data is also provided in CSV format.
CBS provides metadata for the census dataset, in both human- and machine-readable forms. A download of each dataset is available, but may leave the data user with more data than they can conveniently work with. Users are often interested only in the subset of areas that are relevant to them - such as those by flooding. CBS provides an API enabling the user to retrieve the relevant data by selecting the area of interest and, optionally, choosing specific dimensions of the statistical data.
Census data naturally takes the form of a statistical 'data cube', with statistical dimensions of area, time, gender, age range etc. A useful standards-based approach to making the data available would be to represent it as RDF, using the RDF Data Cube Vocabulary [VOCAB-DATA-CUBE]. This offers a standards based way to represent statistical data and associated metadata as RDF. API access to the data could be provided via a SPARQL endpoint, or a more specific API. The Linked Data API, implemented by Epimorphics’ ELDA, provides a useful mechanism to expose simple RESTful APIs on top of RDF/SPARQL.
Population data from a census is typically broken down by area, gender, age (and perhaps other statistical dimensions) and relates to a particular time.
CBS uses established URLs to identify each administrative area for which population data is available. Details of the administrative areas for Nieuwhaven are published by the municipal government. This information includes the geometry for each administrative area.
Data about administrative areas are often useful - perhaps they represent one of the most popular spatial datasets. In this case they are useful for coordinating the emergency response, i.e. predicting and tracking which neighborhoods or districts are threatened. Because the names of local administrative areas such as neighborhoods are very well known they are also useful for communication with citizens, i.e. letting them know if their neighborhood is threatened by the flood or not.
Because the administrative area datasets are quite popular, all kinds of data users will want to use it - not only GIS experts. To enable them to find the data on the web, it was published in such a way that search engines can crawl the data, making the data findable using popular search engines.
tbd add example including:
By cross-referencing the population statistics, administrative areas and surface water flooding extent (e.g. by calculating the intersection of the flood with administrative areas), the number of citizens impacted by each hypothetical flooding event can be estimated.
Once the number of citizens that need refuge has been determined, the emergency teams can designate public buildings, such as schools and sports centers, as evacuation points and define safe transit routes to get to those points.
The municipal government published details of the built infrastructure within Nieuwhaven, including public buildings and transport infrastructure.
tbd add example including:
The municipal government also publishes metadata describing each dataset (DWBP-BP1) that, besides free text descriptions (e.g., title, abstract), include the following information:
To facilitate data discoverability, metadata should be published via different channels and formats (DWBP-BP22). Nowadays, such metadata are typically maintained in geospatial catalogues, encoded based on [ISO19115] - the standard for spatial metadata. In addition to this, such metadata can be served in RDF, and made queryable via a SPARQL endpoint; e.g. [GeoDCAT-AP] provides an XSLT-based mechanism to automatically transform ISO 19115 metadata into RDF, following a schema based on the W3C Data Catalog Vocabulary [VOCAB-DCAT].
This solution can be further enhanced by making data discoverable and indexable via search engines. The advantage is that this would allow data consumers to discover the data even though they do not know the relevant catalogue(s), and to find alternative data sources.
This can be achieved, following Search Engine Optimization (SEO) techniques, by embedding metadata in catalogue’s Web pages, with mechanisms like HTML-RDFa, Microdata, and Microformats. Examples of this approach include the following ones:
tbd add example including:
publish dataset metadata
Temporary flood defenses are common where roads and railways cross permanent flood defenses or are built up on boulevards along rivers. Also, temporary flood defenses are also deployed where dikes have not passed their annual visual inspection or 5-yearly assessment. Information regarding the condition of dikes cannot be incorporated into the plan, and must be considered during an actual flood event.
tbd add examples including:
Storm surge and river flood warning services are provided by the National Water Management Centre (WMCN) at Rijkswaterstaat, who are responsible for the design, construction, management and maintenance of the main infrastructure facilities in the Netherlands such as the main road network, the main waterway network and water systems.
The storm surge warning service is triggered by storm surge alert from the Royal Netherlands Meteorological Institute (KNMI), the Dutch national weather service. A forecast combination of heavy rainfall, high-tide and storm makes it likely that a flooding will occur in the next 120 hours. Specialists use meteorological, hydrological and urban flood prediction models within the Flood Early Warning System (FEWS) to estimate peak water-levels, when these will occur and which area will likely be flooded.
The Storm surge warnings consist of predicted maximum water levels and a general description of wind and tide. 10-minute water level forecasts are computed and distributed, including details of wave run-up and overtopping for dikes.
Every 6 hours, new meteorological predications are incorporated into the flood prediction, resulting in a new version of the 10-minute water level forecast dataset being made available.
tbd add example including:
The emergency team for the Nieuwhaven safety region compare the predictions for the forecast flooding event against the hypothetical scenarios developed in the Flood Response Plan to determine which of the prepared response plans to execute.
Based on this assessment, the imminent flooding event requires a number of temporary flood defenses to be deployed and evacuation of some districts of Nieuwhaven.
The emergency team identify where additional temporary flood defenses are required due to any dikes that are in a state of disrepair (e.g. having failed their annual or 5-yearly assessments).
tbd add example:
cross reference the location of each damaged dike with predicted high-water level, determined via an API call into the 10-minute water level forecasts to extract a water level time-series for a given point to determine if the water level is predicted to exceed a threshold, in which case, temporary flood defenses will be required.
Details of the emergency and the evacuation plan must be communicated to citizens. They are kept informed during and prior to the event using multiple channels:
The evacuation plan must be discoverable by the public. The intent is for each plan to be both human (primarily) and machine readable. The requirement for machine readability is mostly to support automated discovery of the content via web search. The URL itself ideally should also be "human friendly" as it should be easy to share verbally in addition to being embedded and linked to from other web pages.
While making the plans clear and understandable to human readers is well understood (and beyond the scope of this best practices document!). The challenge is to make the content machine readable. The use of a simple tag based schema using microdata, RDFa or JSON-LD is recommended. A simple first step might be to use the schema.org "schema:Event
" item tag <div class="event-wrapper" itemscope itemtype="http://schema.org/Event">
, which has useful generic properties of date, location, duration etc. The places of evacuation refuges (e.g. schools, sports centers etc.) should be tagged using the generic "schema:Place
".
tbd add example including:
Details of the evacuation route should be provided ideally as a textual description (perhaps machine readable using the schema.org "TravelAction" item, although this is rather limited) and a graphical representation. Potentially route information could be encoded using a format such as OpenLR but this has not achieved widespread adoption.
tbd add example including:
News and media agencies provide Web applications that help communicate the evacuation to citizens as effectively as possible; e.g. by creating simple Web applications that direct one to the correct evacuation plan based on their postal code or online mapping tools. Media agencies may cross-reference evacuation plans with Features that have non-official identifiers; e.g. from What3Words (W3W) or GeoNames.
tbd add example including:
During a flood event, the Flood Response Plan indicates that emergency services will have to focus their efforts on reducing the number of fatalities. This means that if an evacuation order is given, the efforts of the emergency services will be focused on traffic control and on non self reliant groups.
As the flood event progresses, the emergency services provide evacuation assistance for the vulnerable, such as the residents of care homes.
The municipal public health authority publishes details of care homes and other health care facilities on-line as open data, using a simple CSV format.
tbd add example including:
The position of each vehicle used by the emergency services is tracked in near real-time using GPS. The coordinators within the emergency team are able to view both current position and where the vehicles have been; gauging the evacuation progress against the Flood Response Plan.
tbd add example including:
During floods and storm surges, professionals (often aided by trained volunteers) constantly monitor all flood defenses. Measurements include: water level, wave height, wind speed and direction.
The emergency team use these observations to monitor the rising water levels to ensure that these are consistent with the predictions (both in terms of timing and peak water-level) in case additional interventions, such as evacuating more districts, are required.
A real-time data stream of water level at a specific location within Nieuwhaven’s canals is published from an automated monitoring system operated by the Water Board; e.g. a Web-enabled sensor.
Example tbd including:
Fortunately, the prediction is sufficiently accurate that the evacuation plan remains effective. However, the emergency team notice that the water level in one particular sector is higher than predicted- and rising. Further analysis indicates that an automated control gate has malfunctioned.
A team is dispatched to use the manual override. The manual control is located using relative positioning.
tbd add example including:
The Spatial Data on the Web working group is working on recommendations about the use of formats for publishing spatial data on the web, specifically about selecting the most appropriate format. There may not be one most appropriate format: which format is best may depend on many things. This section gives two tables that both aim to be helpful in selecting the right format in a given situation. These tables may in future be merged or reworked in other ways.
The first table is a matrix of the common formats, showing in general terms how well these formats help achieve goals such as discoverability, granularity etc.
Format | Openness | Binary/text | Usage | Discoverability | Granular links | CRS Support | Verbosity | Semantics vocab? | Streamable | 3D Support |
---|---|---|---|---|---|---|---|---|---|---|
ESRI Shape | Open'ish | Binary | Geometry only attributes and metadata in linked DB files | Poor | In Theory? | Yes | Lightweight | No | No | Yes |
GeoJSON [RFC7946] | Open | Text | Geometry and attributes inline array | Good ? | In Theory? | No | Lightweight | No | No | No |
DXF | Proprietary | Binary | Geometry only attributes and metadata in linked DB files | Poor | Poor | No | Lightweight | No | No | Yes |
GML | Open | Text | Geometry and attributes inline or xlinked | Good ? | In Theory ? | Yes | Verbose | No | No | Yes |
KML | Open | Text | Geometry and attributes inline or xlinked | Good ? | In Theory ? | No | Lightweight | No | Yes? | Yes |
The second table is much more detailed, listing the currently much-used formats for spatial data, and scoring each format on a lot of detailed aspects.
GML | GML-SF0 | JSON-LD | GeoSPARQL (vocabulary) | schema.org | GeoJSON | KML | GeoPackage | Shapefile | GeoServices / Esri JSON | Mapbox Vector Tiles | |
---|---|---|---|---|---|---|---|---|---|---|---|
Governing Body | OGC, ISO | OGC | W3C | OGC | Google, Microsoft, Yahoo, Yandex | Authors (now in IETF process) | OGC | OGC | Esri | Esri | Mapbox |
Based on | XML | GML | JSON | RDF | HTML with RDFa, Microdata, JSON-LD | JSON | XML | SQLite, SF SQL | dBASE | JSON | Google protocol buffers |
Requires authoring of a vocabulary/schema for my data (or use of existing ones) | Yes (using XML Schema) | Yes (using XML Schema) | Yes (using @context) | Yes (using RDF schema) | No, schema.org specifies a vocabulary that should be used | No | No | Implicitly (SQLite tables) | Implicitly (dBASE table) | No | No |
Supports reuse of third party vocabularies for features and properties | Yes | Yes | Yes | Yes | Yes | No | No | No | No | No | No |
Supports extensions (geometry types, metadata, etc.) | Yes | No | Yes | Yes | Yes | No (under discussion in IETF) | Yes (rarely used except by Google) | Yes | No | No | No |
Supports non-simple property values | Yes | No | Yes | Yes | Yes | Yes (in practice: not used) | No | No | No | No | No |
Supports multiple values per property | Yes | No | Yes | Yes | Yes | Yes (in practice: not used) | No | No | No | No | No |
Supports multiple geometries per feature | Yes | Yes | n/a | Yes | Yes (but probably not in practice?) | No | Yes | No | No | No | No |
Support for Coordinate Reference Systems | any | any | n/a | many | WGS84 latitude, longitude | WGS84 longitude, latitude with optional elevation | WGS84 longitude, latitude with optional elevation | many | many | many | WGS84 spherical mercator projection |
Support for non-linear interpolations in curves | Yes | Only arcs | n/a | Yes (using GML) | No | No | No | Yes, in an extension | No | No | No |
Support for non-planar interpolations in surfaces | Yes | No | n/a | Yes (using GML) | No | No | No | No | No | No | No |
Support for solids (3D) | Yes | Yes | n/a | Yes (using GML) | No | No | No | No | No | No | No |
Feature in a feature collection document has URI (required for ★★★★) | Yes, via XML ID | Yes, via XML ID | Yes, via @id keyword | Yes | Yes, via HTML ID | No | Yes, via XML ID | No | No | No | No |
Support for hyperlinks (required for ★★★★★) | Yes | Yes | Yes | Yes | Yes | No | No | No | No | No | No |
Media type | application/gml+xml | application/gml+xml with profile parameter | application/ld+json | application/rdf+xml, application/ld+json, etc. | text/html | application/vnd.geo+json | application/vnd.google-earth.kml+xml, application/vnd.google-earth.kmz | - | - | - | - |
Remarks | comprehensive and supporting many use cases, but requires strong XML skills | simplified profile of GML | no support for spatial data, a GeoJSON-LD is under discussion | GeoSPARQL also specifies related extension functions for SPARQL; other spatial vocabularies exist, see ??? | schema.org markup is indexed by major search engines | supported by many mapping APIs | focussed on visualisation of and interaction with spatial data, typically in Earth browsers liek Google Earth | used to support "native" access to spatial data across all enterprise and personal computing environments, including mobile devices | supported by almost all GIS | mainly used via the GeoServices REST API | used for sharing spatial data in tiles, mainly for display in maps |
As per https://www.w3.org/DesignIssues/LinkedData.html item 4, it's useful for people to link their data to other related data. In this context we're most frequently talking about either Spatial Things and/or their geometry.
There are many useful sets of identifiers for spatial things and which ones are most useful will depend on context. This involves discovering relevant URIs that you might want to connect to.
Relevant URIs for spatial things can be found in many places. This list gives the first places you should check:
Finding out which national open spatial datasets are available, and how they can be accessed, currently requires prior knowledge in most cases because these datasets are often not easily discoverable. Look for national data portals / geoportals such as Nationaal Georegister (Dutch national register of spatial datasets) or Dataportaal van de Nederlandse overheid (Dutch national governmental data portal).
As an example, let's take Edinburgh. In some recent work with the Scottish Government, we have an identifier for the City of Edinburgh Council Area - i.e. the geographical area that Edinburgh City Council is responsible for:
http://statistics.gov.scot/id/statistical-geography/S12000036
(note that this URI doesn't resolve yet but it will in the next couple of months once the system goes properly live)
The UK government provides an identifier for Edinburgh and/or information about it that we might want to link to:
http://statistics.data.gov.uk/id/statistical-geography/S12000036
The Scottish identifier is directly based on this one, but the Scottish Government wanted the ability to create something dereferenceable, potentially with additional or different info to the data.gov.uk one. These two are owl:sameAs.
DBpedia also includes a resource about Edinburgh. Relationship: "more or less the same as" but probably not the strict semantics of owl:sameAs.
http://data.ordnancesurvey.co.uk/id/50kGazetteer/81482
This Edinburgh resource is found by querying the OS gazetteer search service for 'Edinburgh' then checking the labels of the results that came up. OS give it a type of 'NamedPlace' and give it some coordinates.
http://data.ordnancesurvey.co.uk/id/50kGazetteer/81483
This Edinburgh airport resource was also found by the same OS gazetteer search service for 'Edinburgh'. This is clearly not the same as the original spatial thing, but you might want to say something like 'within' or 'hasAirport'.
http://data.ordnancesurvey.co.uk/id/7000000000030505
This resource is in the OS 'Boundary Line' service that contains administrative and statistical geography areas in the UK. It's probably safe to say the original identifier is owl:sameAs this one.
http://sws.geonames.org/2650225
This is the GeoNames resource for Edinburgh found using the search service:
http://api.geonames.org/search?name=Edinburgh&type=rdf&username=demo
Once you have found a place in GeoNames, there are other useful services to find things that are nearby.
Glossary section needs improving; see existing sources of definitions.
Consider adopting definitions from, or aligning definitions with the ISO/TC 211 Glossary - see linked data prototype.
For example, the Coverage definition is considered unclear and potentially inconsistent with the ISO definition of Coverage.
Need consistency in how we cite existing specifications.
Why do some references go to the glossary (e.g. WFS) and some to the references (e.g. SPARQL)? Maybe WFS etc. should be added to the references, too, and the text should include a link both to the glossary and the references?
Coverage: A coverage is a function that describe characteristics of real-world phenomena that vary over space and/or time. Typical examples are temperature, elevation and precipitation. A coverage is typically represented as a data structure containing a set of such values, each associated with one of the elements in a spatial, temporal or spatiotemporal domain. Typical spatial domains are point sets (e.g. sensor locations), curve sets (e.g. contour lines), grids (e.g. orthoimages, elevation models), etc. A property whose value varies as a function of time may be represented as a temporal coverage or time-series [ISO-19109] §8.8.
Coordinate Reference System (CRS): A coordinate-based local, regional or global system used to locate geographical entities. Also known as Spatial Reference System. Compare with the ISO definition.
Ellipsoid: An ellipsoid is a closed quadric surface that is a three-dimensional analogue of an ellipse. In geodesy a reference ellipsoid is a mathematically defined surface that approximates the geoid.
Extent: The area covered by something. Within this document we always imply spatial extent; e.g. size or shape that may be expresses using coordinates.
Feature: Abstraction of real world phenomena. Compare with the ISO definition. [ ISO-19101] §4.11
Geocoding: Forward geocoding, often just referred to as geocoding, is the process of converting addresses into geographic coordinates. Reverse geocoding is the opposite process; converting geographic coordinates to addresses. Compare with the ISO definition.
Geohash: A geocoding system with a hierarchical spatial data structure which subdivides space into buckets. Geohashes offer properties like arbitrary precision and the possibility of gradually removing characters from the end of the code to reduce its size (and gradually lose precision). As a consequence of the gradual precision degradation, nearby places will often (but not always) present similar prefixes. The longer a shared prefix is, the closer the two places are. wikipedia
Geographic information (also geospatial data): Information concerning phenomena implicitly or explicitly associated with a location relative to the Earth. Compare with the ISO definition. [ ISO-19101] §4.16.
Geographic information system (GIS): An information system dealing with information concerning phenomena associated with location relative to the Earth. Compare with the ISO definition. [ ISO-19101] §4.18
Geometry: An ordered set of n-dimensional points; can be used to model the spatial extent or shape of a spatial thing
Geoid: An equipotential surface where the gravitational field of the Earth has the same value at all locations. This surface is perpendicular to a plumb line at all points on the Earth's surface and is roughly equivalent to the mean sea level excluding the effects of winds and permanent currents such as the Gulf Stream.
Hypermedia: to be added
Internet of Things (IoT): The network of physical objects or "things" embedded with electronics, software, sensors, and network connectivity, which enables these objects to be controlled remotely and to collect and exchange data.
JavaScript Object Notation (JSON): A lightweight, text-based, language-independent data interchange format defined in [ RFC7159]. It was derived from the ECMAScript Programming Language Standard. JSON defines a small set of formatting rules for the portable representation of structured data.
Link: A typed connection between two resources that are identified by Internationalized Resource Identifiers (IRIs) [RFC3987], and is comprised of: (i) a context IRI, (ii) a link relation type, (iii) a target IRI, and (iv) optionally, target attributes. Note that in the common case, the IRI will also be a URI [RFC3986], because many protocols (such as HTTP) do not support dereferencing IRIs [RFC5988].
Linked data: The term ‘Linked Data’ refers to an approach to publishing data that puts linking at the heart of the notion of data, and uses the linking technologies provided by the Web to enable the weaving of a global distributed database [LDP-PRIMER].
Open-world assumption (OWA): In a formal system of logic used for knowledge representation, the open-world assumption asserts that the truth value of a statement may be true irrespective of whether or not it is known to be true. This assumption codifies the informal notion that in general no single agent or observer has complete knowledge. In essence, from the absence of a statement alone, a deductive reasoner cannot (and must not) infer that the statement is false. wikipedia
Resource Description Framework (RDF): A directed, labeled graph data model for representing information in the Web. It may be serialized in a number of data formats such as N-Triples [N-TRIPLES], XML [ RDF-SYNTAX-GRAMMAR], Terse Triple Language (“turtle” or TTL) [TURTLE] and JSON-LD [ JSON-LD].
Semantic web: The term “Semantic Web” refers to World Wide Web Consortium's vision of the Web of linked data. Semantic Web technologies enable people to create data stores on the Web, build vocabularies, and write rules for handling data.
SPARQL: A query language for RDF; it can be used to express queries across diverse data sources [SPARQL11-OVERVIEW].
Spatial data: Data describing anything with spatial extent; i.e. size, shape or position. In addition to describing things that are positioned relative to the Earth (also see geospatial data), spatial data may also describe things using other coordinate systems that are not related to position on the Earth, such as the size, shape and positions of cellular and sub-cellular features described using the 2D or 3D Cartesian coordinate system of a specific tissue sample.
Spatial database: A spatial database, or geodatabase, is a database that is optimized to store and query data that represents objects defined in a geometric space. Most spatial databases allow representation of simple geometric objects such as points, lines and polygons and provide functions to determine spatial relationships (overlaps, touches etc.).
SDI: An ecosystem of geographic data, metadata, tools, applications, policies and users that are necessary to acquire, process, distribute, use, maintain, and preserve spatial data. Due to its nature (size, cost, number of interactors) an SDI is often government-related.
Spatial thing: Anything with spatial extent, i.e. size, shape, or position. e.g. people, places, bowling balls, as well as abstract regions like cubes. Compare with the ISO definition for Spatial Object. [ W3C-BASIC-GEO]
Temporal thing: Anything with temporal extent, i.e. duration. e.g. the taking of a photograph, a scheduled meeting, a GPS time-stamped track-point [W3C-BASIC-GEO]
Triple-store (or quadstore): A triple-store or RDF store is a purpose-built database for the storage and retrieval of RDF subject-predicate-object “triples” through semantic queries. Many implementations are actually “quad-stores” as they also hold the name of the graph within which a triple is stored.
Universe of discourse: View of the real or hypothetical world that includes everything of interest. Compare with the ISO definition. [ ISO-19101] §4.29
Web Coverage Service (WCS): A service offering multi-dimensional coverage data for access over the Internet [WCS]
Web Feature Service (WFS): A standardized HTTP interface allowing requests for geographical features across the web using platform-independent calls. Web Feature Service [WFS].
Web Map Service (WMS): A standardized HTTP interface for requesting geo-registered map images from one or more distributed spatial databases [WMS].
Well Known Text (WKT): A text markup language for representing vector geometry objects on a map, spatial reference systems of spatial objects and transformations between spatial reference systems. wikipedia)
Web Processing Service (WPS): An interface standard which provides rules for standardizing inputs and outputs (requests and responses) for invoking spatial processing services, such as polygon overlay, as a Web service [WPS].
Extensible Markup Language (XML): A simple, very flexible text-based markup language derived from SGML (ISO 8879). It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. [XML11]
The editors gratefully acknowledge the contributions made to this document by all members of the working group and the chairs: Kerry Taylor and Ed Parsons.
A full change-log is available on GitHub
The document has undergone substantial changes since the first public working draft. Below are some of the changes made:
Section 14. Narrative - the Nieuwhaven flooding has also be introduced in this draft in an attempt to provide a context for the best practices using an end-to-end narrative based on an urban flooding event. This introduces a number of case studies that are intended to illustrate how the challenges associated with publishing different kinds of spatial data on the Web may be approached. This draft includes only the overview for each case study and does not (yet) describe the activities that each actor should undertake in order to publish their spatial data.
Significant updates to:
(further updates to these best practices are expected in the next WD release, circa end January 2017)
Plus minor changes that include adding a list of most important best practices for data publishers that start from an existing SDI to section 9, and changing of a few best practice titles to include the word spatial.
Significant updates to:
Also: