Warning:
This wiki has been archived and is now read-only.
LOCN extension: Metadata
Contents
Scope
Andrea Perego (talk) To be reviewed / revised
This extension is meant to propose how to represent spatial (Andrea Perego (talk) and temporal?) features of datasets (Andrea Perego (talk) and services?), complementing existing vocabularies, as DCAT and VoID.
Metadata can play an important role in discovery of and usability of Linked Data datasets. Metadata provide a compact summary of a dataset. As such they are very suitable for indexing any collection of datasets that can be found on the web. Indexes of datasets can greatly improve discoverability of data. Once discovered, the metadata can provide useful information for data consumers on how to interact with the data. Various facts about a dataset can be useful if a dataset contains spatial data. We want to list those facts and find the best way of expressing them.
Background
Related use cases
- Linked Data based thematic web mapping
- Space and Time
- Sub-properties for locn:geometry
- CRS specification
- INSPIRE metadata
Related proposals
Proposal for extension of LOCN with properties for Coordinate Reference System and Level of Detail
Discussion
- A real world example: Dutch registry of buildings and addresses
- LOCN extension for dataset metadata
- Should time be modelled too?
- A proposal for two additional properties for LOCN
Relevant vocabularies
- Data Catalog Vocabulary (DCAT). W3C Recommendation 16 January 2014
- Describing Linked Datasets with the VoID Vocabulary. W3C Interest Group Note 03 March 2011
- OWL representation of ISO 19115 (Geographic Information - Metadata). Simon Cox CSIRO. November 2013.
- OWL representation of ISO 19108 (Geographic Information - Temporal Schema). Simon Cox CSIRO. May 2014.
- Dublin Core Metadata Initiative
Relevant dataset features
Extent/coverage - spatial and temporal
Predicates
There is a general agreement to specify this information by using, as predicates, dct:coverage and the related sub-properties dct:spatial (spatial coverage) and dct:temporal (temporal coverage).
Encoding / representation
Spatial coverage
Current solutions for the representation / encoding of spatial coverage can be grouped into two main classes:
- Geographical names, expressed by using
- URIs - as those operated by GeoNames or DBPedia (see the example included in the DCAT specification).
- Code lists - e.g., country codes, NUTS.
- Free text labels.
- Geometries, expressed by using one of the existing options (WKT, GML, KML, GeoJSON, GeoSPARQL, NeoGeo). Geometries can also be denoted by URIs, resolving to a given geometry encoding/representation and/or including the geometry encoding in one of their components (typically, the path URI component). Examples of the latter are the "geo" URI scheme, GeoHash, and URIs following the proposal made by Ian Davis in URIs for Places and Times.
Recommended solutions should satisfy the following requirements (Andrea Perego (talk) thinking aloud...):
- Being interoperable/compatible.
- Supporting different levels of precision and uncertainty.
- Supporting different spatial reference systems.
Temporal coverage
Formally, the range of dct:temporal is dct:PeriodOfTime. Although the Dublin Core User Guide on Publishing Metadata states that "the property must only be used with non-literal values", no specific terms are defined or recommended for this purpose.
Current solutions include:
- Defining ad hoc properties, denoting start/end dates. To promote standardization, the ADMS vocabulary recommends the use of schema:startDate and schema:endDate, respectively.
- Using URIs denoting "time intervals" (see the example included in the DCAT specification). URI spaces for time instants and intervals have been proposed by Ian Davis back in 2003 (see URIs for Places and Times), and are currently operated, e.g., by reference.data.gov.uk.
- Names for geologic time intervals are standardized by International Commission for Stratigraphy. URIs for these are provided by IUGS Commission For Geoscience Information, and delivered as linked data and at a SPARQL endpoint
- Using the W3C Time Ontology.
- Using literals expressing time intervals by using a given syntax - e.g., the DCMI Period Encoding Scheme, xsd:dateTime (start date) + separator + xsd:dateTime (end date), or xsd:dateTime + separator + xsd:duration (NB: quite often, xsd:date is used instead of xsd:dateTime).
- Using names denoting time periods.
Recommended solutions should satisfy the following requirements (Andrea Perego (talk) thinking aloud...):
- Being interoperable/compatible.
- Supporting different levels of precision (year, year+month, year+month+day, year+month+day+hour, etc.).
- Allowing missing start or end dates (Andrea Perego (talk) ??).
- Supporting different temporal reference systems (Andrea Perego (talk) ??).
Level of detail - spatial/temporal resolution
Predicates
A previous version of the DCAT vocabulary included a property, namely, dcat:granularity, which may have been suitable to specify the level of detail of a dataset.
On the other hand, VoID includes a property, void:feature, which is meant to be used to specify, quoting, "certain technical features of a dataset, such as its supported RDF serialization formats". It is unclear whether this definition may include the specification of the level of detail of a dataset.
Encoding / Representation
Specifying the level of detail includes the ability to represent quantities and units of measure.
Discussion on the LOCADD mailing list concerning Frans Knibbe's proposal, identified a few possible options:
- use literals, with a default unit of measure
- encode quantity + unit of measure in a literal (see mail from Andrea Perego, 4 Sep 2014)
- use the GoodRelations vocabulary (see mail from Bart van Leeuwen, 5 Sep 2014)
- use the QUDT ontologies (see mail from Frans Knibbe, 10 Sep 2014)
Coordinate Reference System (CRS) - spatial and temporal
A dataset containing geometry can use one or more coordinate reference systems for the coordinates in the geometry. For a user agent it is good to know about the CRS used, because the CRS determines if the data can be plotted on a map or be spatially combined with other data without the need for a coordinate transformation.
Being able to express the Coordinate Reference System(s) used in a dataset requires two ingredients:
- A predicate to indicate CRS
- A stable dataset or vocabulary on all (or most) coordinate reference systems, resulting in each CRS having a URI
The spatial nature of the data
To help indexing of datasets and discoverabilty of spatial data it is good to have a well known indication that the data is of a spatial nature: it contains data on locations, possibly expressed as geometry.
In the VoID specifaction there is an example where dcterms:subject is used to indicate that a dataset is about location (http://www.w3.org/TR/void/#subject):
:Geonames a void:Dataset; dcterms:subject <http://dbpedia.org/resource/Location> .
Questions about this approach:
- Can a link to a dbpedia resource be considered stable enough? Or would a link to a vocabulary resource be preferable?
- Next to indicating that a dataset contains spatial data, it could be desirable to further distinguish between geometry (sets of coordinates) and other location data, like addresses. How could that be achieved?