Warning:
This wiki has been archived and is now read-only.

Technology talk 1

From Spatial Data on the Web Working Group
Jump to: navigation, search

Observations on the current state of affairs based on publication of a spatial dataset as linked data

This is a summary of the Tech Talk held at the teleconference at 2015-05-20. The talk was held by Frans Knibbe (Geodan), he made some observations on the current state of affairs of spatial data on the web using the example of a publication of a data set containing spatial data as Linked Data. The dataset is based on the Dutch Base Registry of Builings and Addresses (Basisregistratie Adressen en Gebouwen, or BAG) and was published using Virtuoso Open Source 7.1. The dataset consists of about 700 million RDF triples. The demonstration consisted of dereferencing URI's and executing SPARQL query's in a web browser.

The URI of the dataset is http://lod.geodan.nl/basisreg/bag/. As is good practice, it gives access to metadata describing the dataset. There are spatial some spatial things in the metadata:

  1. dcterms:subject is used to indicate that the dataset is spatial
  2. dcterms:spatial is used to indicate the spatial coverage

How these properties were used was open for interpretation. Perhaps it would be good to have some more standardisation. Some things were not possible to express in the metadata in a standardised way:

  1. What kind of spatial data does the dataset contain?
  2. Which CRS(s) is/are used?
  3. What is the spatial resolution of the data set (or parts of the dataset)?

Following one of the expressions of spatial coverage leads to http://lod.geodan.nl/basisreg/bag/SpatialExtent, data about the spatial extent as a geometry. The geometry is encoded as a WKT literal, according to the GeoSPARQL vocabulary. It was noted that all geometries in the dataset use WGS84 coordinates. Originally, the coordinates used the Dutch national grid CRS, but the coordinates have been transformed to WGS84 for web publication. It was felt that GeoSPARQL drives publishers to WGS84, because it is the CRS that is assumed when none is specified. If a CRS is specified its URI should be concatenated with the geometry in the WKT literal, which has some undesirable effects. To avoid these, WGS84 was used. This seemed possible because of the relatively low accuracy of the coordinates. If this accuracy was below the meter level, WGS84 could not be used.

An example of a GeoSPARQL geometry that includes a CRS specification van be found in this question on Stackoverflow.

The dataset metadata also indicate the SPARQL endpoint for the dataset: http://lod.geodan.nl/sparql. This was used for the following SPARQL queries.

The first query is an example of data enrichment and geocoding (finding the coordinates of a location). It finds the location and some additional data on Beursplein 5, Amsterdam (the well known address of the Dutch stock exchange).

prefix bag: <http://lod.geodan.nl/vocab/bag#>
select ?postalCode ?area ?purpose ?status ?geometry
where {
	?woonplaatsmut a bag:Woonplaatsmutatie .
	?woonplaatsmut bag:woonplaatsnaam "Amsterdam"^^xsd:string .
	?woonplaatsmut bag:woonplaats ?woonplaats .
	?openbareruimtemut a bag:Openbareruimtemutatie .
	?openbareruimtemut bag:woonplaats ?woonplaats .
	?openbareruimtemut bag:openbareruimtenaam "Beursplein"^^xsd:string .
	?openbareruimtemut bag:openbareRuimte ?openbareruimte .
	?numaandmut a bag:Nummeraanduidingmutatie .
	?numaandmut bag:huisnummer "5"^^xsd:integer .
	?numaandmut bag:openbareRuimte ?openbareruimte .
	?numaandmut bag:nummeraanduiding ?nummeraanduiding .
	?numaandmut bag:postcode ?postalCode .
	?adresseerbaarobjectmut a bag:Adresseerbaarobjectmutatie .
	?adresseerbaarobjectmut bag:hoofdadres ?nummeraanduiding .
	?adresseerbaarobjectmut bag:oppervlakte ?area .
	?adresseerbaarobjectmut bag:gebruiksdoel ?purpose .
	?adresseerbaarobjectmut bag:verblijfsobjectstatus ?status .
	?adresseerbaarobjectmut bag:geometrie ?geometry .
}

The next query shows reverse geocoding (find the location name or address from coordinates) and examples of spatial functions:

prefix bag: <http://lod.geodan.nl/vocab/bag#>
select str(?name)
from <http://lod.geodan.nl/basisreg/bag/woonplaats/>
where {
 ?wpmut a bag:Woonplaatsmutatie .
 ?wpmut bag:lastKnown "true"  xsd:boolean .
 ?wpmut bag:geometrie ?geom .
 ?wpmut bag:woonplaatsnaam ?name
 filter (bif:st_within(?geom, bif:st_point (6.5,52)))
}

It was noted that although this query uses wel known OGC spatial functions (like st_within), the functions are not defined in an OGC namespace, but in a specific Virtuoso vocabulary (indicated by the "bif" prefix).

The query above gives the expected result: the place name (city name) of the location identified by the coordinates (6.5,52). The query below is similar, but uses other coordinates:

prefix bag: <http://lod.geodan.nl/vocab/bag#>
select str(?name)
from <http://lod.geodan.nl/basisreg/bag/woonplaats/>
where {
 ?wpmut a bag:Woonplaatsmutatie .
 ?wpmut bag:lastKnown "true"  xsd:boolean .
 ?wpmut bag:geometrie ?geom .
 ?wpmut bag:woonplaatsnaam ?name
 filter (bif:st_within(?geom, bif:st_point (4.9,52.3)))
}

When these coordinates are used, the query returns two place names. This is an incorrect result, because a point can only be in one place. This is caused by this version (which is currently not the latest version) of Virtuoso not using the actual geometries for spatial calculations, but their bounding boxes. This in turn shows that it could be a good idea to be able to test software for compliance with standards.

A final example also consists of two similar queries. The first one request all addresses having postal code "1079MB". It uses semantics from the vocabulary that was created for the BAG: http://lod.geodan.nl/vocab/bag.

prefix bag: <http://lod.geodan.nl/vocab/bag#>
select *
from <http://lod.geodan.nl/basisreg/bag/nummeraanduiding/>
where {
  ?adres a bag:Nummeraanduidingmutatie .
  ?adres bag:postcode "1079MB"  xsd:string.
}

The second query is similar to the one above, only the vocabulary that is used is different. The query below uses semantics from the Location Core Vocabulary (LOCN) (http://www.w3.org/ns/locn):

prefix locn: <http://www.w3.org/ns/locn#>
select *
from <http://lod.geodan.nl/basisreg/bag/nummeraanduiding/>
where {
  ?address a locn:Address .
  ?address locn:postCode "1079MB"  xsd:string.
}

This demonstrates that it is possible to query the same dataset using different models, in this case a national and a global model. This is made possible because some BAG resources are defined as subclasses of LOCN resources, and because Virtuoso Open Source has support for inference. The triples using LOCN semantics are inferred from the relationship between the specific vocabulary and the general vocabulary. This example shows that having simple global vocabularies for spatial data can be very beneficial for interoperability between different data sets.