SmOD INSPIRE Vocabularies

Authors
Tatiana Tarasova, Spaziodati
Jindřich Mynarz, University of Economics, Prague
Phil Archer W3C/ERCIM
Last Updated
19 August 2015

This document is also available in Turtle and RDF/XML.

Abstract

The SmartOpenData project, SmOD, developed a Linked Data model based on the European Union's INSPIRE data specifications. The SmOD work lead to the creation of a set of very small vocabularies that define classes and properties that mirror those in INSPIRE that were useful to a series of pilots, focusing on the rural economy, tourism, protected sites etc.

This document describes and aggregates the set of SmOD-INSPIRE vocabularies.

Status of this Document

This vocabulary is stable. Definitions may be updated to clarify semantics if appropriate but the basic definitions will not change. If you wish to add terms in the INSPIRE model not included here, please contact Phil Archer

This is not a W3C standard and has not been endorsed by the W3C Membership.


Introduction

The INSPIRE data model is large, complex, and designed for use in a Geospatial Information System and not for Linked Data. Rather than try and replicate the whole model in RDF, SmOD takes a much more Linked Data centric approach, re-using concepts wherever necessary but simplifying it where possible. An important reference in this work was the Study on RDF and PIDs for INSPIRE by Diederik Tirry and Danny Vandenbroucke. It summarised work by three experts: Clemens Portele, Linda van den Brink and Stuart Williams.

In addition to the vocabularies described here, the Smart Open Data Project also developed a specific vocabulary for its pilot projects and a SKOS concept scheme for the Corine Land Cover Nomenclature.

Namespaces

The Study on RDF and PIDs's first recommendation is that RDF namespaces should be aligned with the XML namespaces so that, for example, the namespace used for Protected Sites should be http://inspire.jrc.ec.europa.eu/schemas/ps/3.0/. However, the project received anecdotal information that it would be unwise to wait for the RDF schemas to be developed and published at those URLs. Therefore the decision was taken to use the W3C infrastructure with namespaces of the form http://www.w3.org/2015/03/inspire/{xx} where xx is the INSPIRE theme. The W3C website is extremely stable and these namespaces should be considered as persistent although they are not the product of any W3C working group. If the European Commission's Joint Research Centre were to publish its own schemas then the ones created by SmOD would be deprecated in favour of them. If, however, the JRC or other parties wish to extend the vocabularies hosted at W3C then this would be possible, particularly through the Locations and Addresses Community Group.

The GCM & Geographical Names

The most visible difference between the INSPIRE model and the SmOD interpretation is the elimination of the Geographical Names theme. The full INSPIRE model supports the provision of multiple spellings of place names, using multiple scripts, linked to audio files for pronunciation and more. Initial work in the project used a model that mirrored this. However, the result was a lot of complexity in the data with many of the properties unused and several unnecessary blank nodes. The Study on RDF and PIDs's recommendation in section 3.2.12 is to simply use rdfs:label (a string with an optional language tag). This simplification risks losing some of the rich data that might be available in some situations but is absent in the SmOD pilots. Making this change was inline with the overall recommendation of 'thinking Linked Data' and made the data in the pilots much easier to work with at a stroke.

The SmOD partners have also used GeoSPARQL's gsp:SpatialObject class in preference to creating a new class of gcm:SpatialObject (INSPIRE has a 'generic concept model' at its core). The definition of gsp:SpatialObject is: "The class Spatial Object represents everything that can have a spatial representation. It is superclass of feature and geometry." [GeoPSARQL, PDF, p6]. This very general class therefore fits very will within INSPIRE and the SmOD pilots and there is no gain in defining an INSPIRE-specific class of the same name.

The Model, Theme by Theme

This section describes the decisions made when incorporating each INSPIRE theme in the SmartOpenData model.

The SmartOpenData model - a Linked Data interpreation of several INSPIRE themes used in the project

Protected Sites

Namespacehttp://www.w3.org/2015/03/inspire/ps#
Classps:ProtectedSite
Object Propertiesps:legalFoundationDocument
ps:siteDesignation
ps:siteProtectionClassification
ps:isManagedBy
Datatype propertyps:legalFoundationDate
Concept Schemes usedINSPIRE Registry, Protection Classification
External vocabularies usedFOAF, ORG

Protected Sites are defined by a document that details the relevant protection, this might be legislation but is more usually some sort of order or notice. Using ps:legalFoundationDocument to link to a class describing a document, such as the gcm:DocumentCitation class is quite awkward from a Linked data point of view. The natural thing to do is simply to link to the document itself and that document will be an instance of the well used foaf:Document class. The ps:legalFoundationDocument property has this as its range. But such documents may not be available online (or their URL unknown) and so a method of referring to offline documents needs to be provided. SmOD simplifies the various properties of the INSPIRE gcm:DocumentCitation class (title, shortname, date etc.) down to the dcterms:bibliographicCitation property. Where a legal foundation document is not available online, a blank node will be created in the graph with this property that gives a reference to the actual document. Where the document does exist online it will be linked to directly.

The Protected Sites theme is the first of many in the SmartOpenData model that makes use of the SKOS concept schemes published in the INPSIRE Registry.

The ps:siteDesignation property can point to one or more of the specialisations of the Designation Value code list type (http://inspire.ec.europa.eu/codelist/DesignationValue/). These are:

So for example, if a site were designated as an area of special conservation under Natura 2000, the value of the ps:siteDesignation property would be http://inspire.ec.europa.eu/codelist/Natura2000DesignationValue/specialAreaOfConservation.

The provision of SKOS Concepts schemes makes this easy and avoids this or any other project writing its own version of designation schemes like Natura 2000. However, the Registry does not provide SKOS concepts schemes for all aspects of INSPIRE or some of the closely related data models. For example, the ps:siteProtectionClassification property in the Protected Sites theme takes one of 7 enumerated values:

In XML-centric systems these would be provided as strings but in Linked Data, they are better rendered as SKOS concepts so that they can pointed to via their URI, with multilingual labels etc. A very simple SKOS concept scheme was created to provide such URIs for the values in the ProtectedClassificationValue enumeration at http://www.w3.org/2015/03/inspire/ProtectionClassification# replacing the namespace URI with the prefix pspc, each of the terms in the list can be referred to as pspc:natureConservation, pspc:archaeological etc. Note that the lower camel case capitalisation has been preserved from the original, rather than the more usual practice in Linked Data of naming classes using title case.

The SmOD data model makes use of the Protected Sites Simple data model from INSPIRE but takes one extra class and relationship from Protected Sites Full, namely ps:isManagedBy. This has a range of foaf:Agent to keep the vocabulary as general as possible but it is expected that in practice, org:Organization (or one of its sub classes) will be used. org:Organization is a sub class of foaf:Agent. In some cases foaf:Group or event foaf:Person will be better, both of these are also sub classes of foaf:Agent. As well as basic information like the organisation's name, the ORG ontology is recommended as it has the following features:

The latter aspect matches the Responsible Agency class in Protected Sites Full that includes properties for recording the beginning and end of the agency's lifespan.

Land Use

Namespacehttp://www.w3.org/2015/03/inspire/lu#
Classlu:ExistingLandUseObject
Object Propertylu:hilucsLandUse

In the same way that the INSPIRE Registry is used as a source of SKOS concepts as value for the ps:siteDesignation property, the lu:hilucsLandUse property can link a Spatial Object to one of the values from the code list at http://inspire.ec.europa.eu/codelist/HILUCSValue/. The domain of lu:hilucsLandUse is the lu:ExistingLandUseObject and so systems can, at least in theory, infer that any Spatial Object that has a lu:hilucsLandUse property is also an instance of lu:ExistingLandUseObject.

Administrative Units

Namespacehttp://www.w3.org/2015/03/inspire/au#
Classau:AdministrativeUnit
Object Propertiesau:nationalLevel
au:country
Datatype propertyau:nationalCode
Concept Schemes usedINSPIRE Registry, EEA Codelist for bio-geographical regions, Europe 2011

From a SmartOpenData perspective, the Administrative Units theme is very simple. The au:AdministrativeUnit class itself is defined as a sub class of the gsp:SpatialObject class so it inherits rdfs:label as the property for its name and the usual means of providing boundary information (via gsp:Geometry). Administrative Units typically have a national code associated with them and this is provided as a string value for the au:nationalCode property which is defined as a sub property of skos:notation.

The INSPIRE Registry provides a SKOS concept scheme for the Administrative Hierarchy Level (http://inspire.ec.europa.eu/codelist/AdministrativeHierarchyLevel/) with URIs for the 6 levels as
http://inspire.ec.europa.eu/codelist/AdministrativeHierarchyLevel/1stOrder/
http://inspire.ec.europa.eu/codelist/AdministrativeHierarchyLevel/2ndOrder/ etc. These URIs are the value for the au:nationalLevel property.

Finally, SmOD can make use of the Metadata Registry (MDR) provided by the European Publications Office as a source of URIs as values for au:country. This URI set provides the names of all countries in the world in all official languages of the EU and follows a predictable pattern, based on a country's ISO 3166 3 character code:
http://publications.europa.eu/resource/authority/country/FIN
http://publications.europa.eu/resource/authority/country/GBR
etc.

The downside of using these URIs is that, for now, they are not resolvable. The Publications Office is known to be working on making them so but at the time of writing they are not following Linked Data principles – something the Publications Office is very aware of.

Bio-Geographical Regions

Namespacehttp://www.w3.org/2015/03/inspire/br#
Classbr:Bio-geographicalRegion
Object Propertiesbr:regionClassification
br:regionClassificationLevel
Datatype propertyau:nationalCode
Concept Schemes usedINSPIRE Registry, MDR (countries)

INSPIRE recognises 4 regional classification schemes within this theme:

The Natura 2000 And Emerald Bio-geographical Region Classification is the one of most interest for SmartOpenData. The European Environment Agency maintains this list and publishes it in a variety of formats.

The Natura 2000 And Emerald Bio-geographical Region Classification
CodeNameRegionpre_2012
alpineAlpine Bio-geographical RegionBio-geographical RegionALP
AnatolianAnatolian Bio-geographical RegionBio-geographical RegionANA
arcticArctic Bio-geographical RegionBio-geographical RegionARC
atlanticAtlantic Bio-geographical RegionBio-geographical RegionATL
blackSeaBlack Sea Bio-geographical RegionBio-geographical RegionBLS
borealBoreal Bio-geographical RegionBio-geographical RegionBOR
continentalContinental Bio-geographical RegionBio-geographical RegionCON
macaronesianMacaronesian Bio-geographical RegionBio-geographical RegionMAC
marineAtlanticMarine Atlantic RegionMarine RegionMATL
marineBalticMarine Baltic RegionMarine RegionMBAL
marineBlackSeaMarine Region Black SeaMarine RegionMBLS
marineMacaronesianMarine Macaronesian RegionMarine RegionMMAC
marineMediterraneanMarine Mediterranean RegionMarine RegionMMED
MediterraneanMediterranean Bio-geographical RegionBio-geographical RegionMED
pannonianPannonian Bio-geographical RegionBio-geographical RegionPAN
steppicSteppic Bio-geographical RegionBio-geographical RegionSTE

An RDF vocabulary at http://rdfdata.eionet.europa.eu/eea/biogeographic-regions2011.rdf provides URIs for each of the classifications in the form http://rdfdata.eionet.europa.eu/eea/biogeographic-regions2011/{code} and these can be used as values for the br:regionClassification property. Since this is not published as a SKOS Concept scheme per se, the range of br:regionClassification is undefined. The property could therefore also be used to link to Concept schemes for this or any of the other regional classification schemes. In common with lu:hilucsLandUse, the domain of br:regionClassification is defined. In this case the domain is br:Bio-geographicalRegion, which allows systems to infer that a gsp:SpatialObject with the property is also an instance of its subclass, br:Bio-geographicalRegion. This is not shown in the diagram to aid readability.

The br:regionClassificationLevel property links directly to the concept scheme in the INSPIRE Registry at http://inspire.ec.europa.eu/codelist/RegionClassificationLevelValue/ that provides URIs for the 4 possible values of International, Local, National and Regional in the form http://inspire.ec.europa.eu/codelist/RegionClassificationLevelValue/{code} where {code} is the terms from the list all in lower case.

Species Distribution

Namespacehttp://www.w3.org/2015/03/inspire/sd#
Classsd:SpeciesDistributionUnit
sd:Species
Object Propertiessd:eunisSpeciesCode
sd:occurenceCategory
sd:hasSpecies
Datatype propertiessd:eunomenID
smod:eunomenPage
Concept Schemes usedINSPIRE Registry, EUNIS/EEA, EU-NOMEN
EU-Nomen: 97523, EUNIS: 1023 (Yellowhammer). Photo credit Saxifraga

The Species Distribution theme provides a framework to support detailed information about population densities, counting methodologies etc. For SmOD, and again, when 'thinking Linked Data,' it is sufficient to use a simpler model.

The sd:SpeciesDistributionUnit class uses sd:hasSpecies to link to a class that represents any species of interest. This is equivalent to INSPIRE's Species Name Type. Species can be identified in multiple ways.

The European Environment Agency maintains its European Nature Information System, EUNIS, as a URI set for species and serves the data in a HTML or RDF/XML using content negotiation. The URIs are of the form http://eunis.eea.europa.eu/species/{species No} so that, for example, the yellowhammer is identified by http://eunis.eea.europa.eu/species/1023.

The data returned from the EUNIS system is comprehensive, providing the species' vernacular name in the official languages of the EU and equivalent identifiers from many other schemes. SmOD defines the domain of both sd:eunisSpeciesCode and sd:occurenceCategory as sd:Species. The range of sd:occurenceCategory is SKOS Concept and the INSPIRE Registry provides the relevant concept scheme at http://inspire.ec.europa.eu/codelist/OccurrenceCategoryValue/ but for sd:eunisSpeciesCode, the range is eunis:SpeciesSynonym, the type defined in the EUNIS data.

One of the other identifiers included in the EUNIS data is the EU-Nomen identifier which is present in some of the data used in the SmOD pilots. This identifier can be included directly in SmOD data using the sd:eunomenID property, which is defined as a subProperty of skos:notation. The literal value, e.g. 97523, is typed as such. A further property, smod:eunomenPage, links the species to its EU-Nomen Web page, e.g. http://www.eu-nomen.eu/portal/taxon.php?GUID=urn:lsid:faunaeur.org:taxname:97523

This page, indeed EU-Nomen, is not Linked data friendly since non-URI identifiers are used and the associated information is only available as a Web page, not as RDF. The EUNIS system uses the eunis:sameSynonymFaEu property to provide the EU-Nomen species number and the property is defined as having a domain of eunis:SameSynonym and range of rdfs:literal. This is semantically close enough to define sd:eunomenID as a subproperty of this as well as of skos:notation. The two together provide the detailed semantics we need – that the literal value is of a specific type and that that value can also be matched against values of the eunis:sameSynonymFaEu property.

As with any class, properties like rdfs:label may be used to give the name of the species as a string literal if needed.

Corine Land Cover

Namespacehttp://www.w3.org/2015/03/inspire/lc#
ClassNone
Object Propertieslc:corineLandCover
Concept Schemes usedCorine Land Cover

The Corine Land Cover taxonomy is defined by EIONET and published as a set of Web pages. At the time of writing, the SmOD partners understand that plans are in place to publish it as a SKOS Concept scheme but that has not yet happened. Therefore, a scheme was created and published at http://www.w3.org/2015/03/corine. The definition text for each concept is taken from the EIONET pages and served in HTML, RDF/XML and Turtle. The RDF serialisations also include labels and definitions in Spanish and Slovak as well as English, the latter taken from SAŽP's website. SAŽP also supplied the RGB colours associated with each Corine Land Cover type and these are used in the HTML page. An issue to highlight in this work is the choice of identifiers for each CLC type. These are usually in available data as three digit numbers, sometimes with, sometimes without separating dots (i.e. 111 or 1.1.1). It proved much easier therefore to use these numbers in the URIs than to use the names to create URIs like http://www.w3.org/2015/03/corine#ContinuousUrbanFabric. However, XML, and therefore RDF, requires that class names begin with either a letter or an underscore to each class begins with 'clc' – something that easy be inserted when processing input data.

It should be noted that Clemens Portele created a vocabulary for recording Corine Land Cover datasets. This captures the full complexity of the original model that, again, goes beyond what is required for SmOD. The equivalent property to lc:corineLandCover in Portele's work is lcv:class defined thus:

lcv:class a owl:ObjectProperty ;
  rdfs:comment "The range is a type for which no RDF representation is known: LandCoverClassValue"@en ;
  rdfs:range owl:Class ;
  skos:definition  "The assignment of a land cover class through a classification code identifier"@en ;
  skos:notation "class"^^xsd:NCName ;
  skos:prefLabel "class"@en ;
  skos:scopeNote "The identifier, eg 1, 1.1.2, ... (for CORINE LC classes) allow 
    to access to the value and the definition or narrative description of the 
    corresponding class."@en.

Rather than create the missing SKOS Concept scheme, Portele defines a general object property (confusingly called 'class') that has a range of owl:Class. SmOD would like to refer to this work but to keep the data simple, the lc:corineLandCover property is defined as a sub property of lcv:class and has a range of skos:Concept.

Environmental Monitoring Facilities

Namespacehttp://www.w3.org/2015/03/inspire/ef#
Classef:EnvironmentalMonitoringFacility
Object Propertyef:mediaMonitored
Datatype Propertiesef:specialisedEMFType
ef:purpose
Concept Schemes usedINSPIRE Registry

The EF vocabulary was created based on the INSPIRE Environmental Monitoring Facilities (EMF) theme (PDF). The scope includes the monitoring facilities and the observations linked to them. The latter are defined in a separate document, however, it is worth noting here that the RDF Data Cube vocabulary has been used extensively as this provides a method for recording statistical hypercube data in RDF.

One of the specific use cases for using the EF theme is to combine Protected Sites from Natura2000 and various water measurements from Waterbase (Lakes, Rivers and Underground Waters). Several possible user queries were defined that needed to be addressed within the SmOD project, some of which involved further datasets. These queries dictated the requirements for the model as follows:

To satisfy these requirements, the following classes and properties were defined.

ef:EnvironmentalMonitoringFacility is a spatial entity that collects or processes data about real-world objects whose properties (physical, chemical, biological or other aspects of environmental conditions) are observed or measured. Within the SmOD vocabularies, it is defined as a sub-class of gsp:SpatialObject.

ef:specialisedEMFType provides categorisation of EMF, such as platform, site, station, sensor, etc. INSPIRE has a codelist for this however, it is empty and therefore adds little value. For the ARPA pilot, the need is to represent the EMF type for humans, not to integrate/link it with similar datasets and so ef:specialisedEMFType is a datatype property that takes a literal value.

The code list for 'purpose', defined in SMOD as ef:purpose, is also empty, so, again, SmOD defines this as a datatype property.

The INPSIRE registry does, however, include a SKOS Concept Scheme that offers values for the ef:mediaMonitored property (air, biota, etc.). This object property therefore has a range of skos:Concept and a domain of ef:EnvironmentalMonitoringFacility.

Cadastral Parcels

Namespacehttp://www.w3.org/2015/03/inspire/cp#
Classcp:CadastralParcel
Datatype Propertycp:nationalCadastralReference

This very simple vocabulary introduces the Cadastral Parcel class, defined as a sub class of gsp:SpatialObject. Just one property is defined, cp:nationalCadastralReference, the value of which should be the thematic identifier at national level, generally the full national code of the basic property unit. Must ensure the link to the national cadastral register or equivalent.