Warning:
This wiki has been archived and is now read-only.

LOCN extension: Metadata

From Locations and Addresses Community Group
Jump to: navigation, search

Scope

Andrea Perego (talk) To be reviewed / revised

This extension is meant to propose how to represent spatial (Andrea Perego (talk) and temporal?) features of datasets (Andrea Perego (talk) and services?), complementing existing vocabularies, as DCAT and VoID.

Metadata can play an important role in discovery of and usability of Linked Data datasets. Metadata provide a compact summary of a dataset. As such they are very suitable for indexing any collection of datasets that can be found on the web. Indexes of datasets can greatly improve discoverability of data. Once discovered, the metadata can provide useful information for data consumers on how to interact with the data. Various facts about a dataset can be useful if a dataset contains spatial data. We want to list those facts and find the best way of expressing them.

Background

Related use cases

Related proposals

Proposal for extension of LOCN with properties for Coordinate Reference System and Level of Detail

Discussion

Relevant vocabularies

Relevant dataset features

Extent/coverage - spatial and temporal

Predicates

There is a general agreement to specify this information by using, as predicates, dct:coverage and the related sub-properties dct:spatial (spatial coverage) and dct:temporal (temporal coverage).

Encoding / representation

Spatial coverage

Current solutions for the representation / encoding of spatial coverage can be grouped into two main classes:

  1. Geographical names, expressed by using
    • URIs - as those operated by GeoNames or DBPedia (see the example included in the DCAT specification).
    • Code lists - e.g., country codes, NUTS.
    • Free text labels.
  2. Geometries, expressed by using one of the existing options (WKT, GML, KML, GeoJSON, GeoSPARQL, NeoGeo). Geometries can also be denoted by URIs, resolving to a given geometry encoding/representation and/or including the geometry encoding in one of their components (typically, the path URI component). Examples of the latter are the "geo" URI scheme, GeoHash, and URIs following the proposal made by Ian Davis in URIs for Places and Times.

Recommended solutions should satisfy the following requirements (Andrea Perego (talk) thinking aloud...):

  • Being interoperable/compatible.
  • Supporting different levels of precision and uncertainty.
  • Supporting different spatial reference systems.
Temporal coverage

Formally, the range of dct:temporal is dct:PeriodOfTime. Although the Dublin Core User Guide on Publishing Metadata states that "the property must only be used with non-literal values", no specific terms are defined or recommended for this purpose.

Current solutions include:

  1. Defining ad hoc properties, denoting start/end dates. To promote standardization, the ADMS vocabulary recommends the use of schema:startDate and schema:endDate, respectively.
  2. Using URIs denoting "time intervals" (see the example included in the DCAT specification). URI spaces for time instants and intervals have been proposed by Ian Davis back in 2003 (see URIs for Places and Times), and are currently operated, e.g., by reference.data.gov.uk.
  3. Using the W3C Time Ontology.
  4. Using literals expressing time intervals by using a given syntax - e.g., the DCMI Period Encoding Scheme, xsd:dateTime (start date) + separator + xsd:dateTime (end date), or xsd:dateTime + separator + xsd:duration (NB: quite often, xsd:date is used instead of xsd:dateTime).
  5. Using names denoting time periods.

Recommended solutions should satisfy the following requirements (Andrea Perego (talk) thinking aloud...):

  • Being interoperable/compatible.
  • Supporting different levels of precision (year, year+month, year+month+day, year+month+day+hour, etc.).
  • Allowing missing start or end dates (Andrea Perego (talk) ??).
  • Supporting different temporal reference systems (Andrea Perego (talk) ??).

Level of detail - spatial/temporal resolution

Predicates

A previous version of the DCAT vocabulary included a property, namely, dcat:granularity, which may have been suitable to specify the level of detail of a dataset.

On the other hand, VoID includes a property, void:feature, which is meant to be used to specify, quoting, "certain technical features of a dataset, such as its supported RDF serialization formats". It is unclear whether this definition may include the specification of the level of detail of a dataset.

Encoding / Representation

Specifying the level of detail includes the ability to represent quantities and units of measure.

Discussion on the LOCADD mailing list concerning Frans Knibbe's proposal, identified a few possible options:

  • use literals, with a default unit of measure
  • encode quantity + unit of measure in a literal (see mail from Andrea Perego, 4 Sep 2014)
  • use the GoodRelations vocabulary (see mail from Bart van Leeuwen, 5 Sep 2014)
  • use the QUDT ontologies (see mail from Frans Knibbe, 10 Sep 2014)

Coordinate Reference System (CRS) - spatial and temporal

A dataset containing geometry can use one or more coordinate reference systems for the coordinates in the geometry. For a user agent it is good to know about the CRS used, because the CRS determines if the data can be plotted on a map or be spatially combined with other data without the need for a coordinate transformation.

Being able to express the Coordinate Reference System(s) used in a dataset requires two ingredients:

  1. A predicate to indicate CRS
  2. A stable dataset or vocabulary on all (or most) coordinate reference systems, resulting in each CRS having a URI

The spatial nature of the data

To help indexing of datasets and discoverabilty of spatial data it is good to have a well known indication that the data is of a spatial nature: it contains data on locations, possibly expressed as geometry.

In the VoID specifaction there is an example where dcterms:subject is used to indicate that a dataset is about location (http://www.w3.org/TR/void/#subject):

 :Geonames a void:Dataset;
   dcterms:subject <http://dbpedia.org/resource/Location> .

Questions about this approach:

  1. Can a link to a dbpedia resource be considered stable enough? Or would a link to a vocabulary resource be preferable?
  2. Next to indicating that a dataset contains spatial data, it could be desirable to further distinguish between geometry (sets of coordinates) and other location data, like addresses. How could that be achieved?