Data on the Web Best Practices

Abstract

This document provides ~~best practices~~ Best Practices related to the publication and usage of data on the Web designed to help support a self-sustaining ecosystem. Data should be discoverable and understandable by humans and machines. Where data is used in some way, whether by the originator of the data or by an external party, such usage should also be discoverable and the efforts of the data publisher recognized. In short, following these ~~best practices~~ Best Practices will facilitate interaction between publishers and consumers.

Namespaces used in multiple formats Best Practice 16: Use standardized terms Best Practice 17: Reuse vocabularies Best Practice 32: Enrich data by generating new metadata. Trust Best Practice 5: Provide data license information Best Practice 6: Provide data provenance information Best Practice 7: Provide data quality information Best Practice 8: Provide versioning information Best Practice 9: Provide version history Best Practice 13: Assign URIs to dataset versions and series Best Practice 19: Provide data unavailability reference Best Practice 27: Assess dataset coverage Best Practice 29: Update the status of identifiers Best Practice 30: Gather feedback from data consumers Best Practice 31: Provide information about feedback Interoperability Best Practice 11: Use persistent URIs as identifiers of datasets Best Practice 12: Use persistent URIs as identifiers within datasets Best Practice 16: Use standardized terms Best Practice 17: Reuse vocabularies Linkability Best Practice 11: Use persistent URIs as identifiers of datasets Best Practice 12: Use persistent URIs as identifiers within datasets Comprehension Best Practice 1: Provide metadata Best Practice 2: Provide descriptive metadata Best Practice 3: Provide locale parameters metadata Best Practice 4: Provide structural metadata Best Practice 6: Provide data provenance information Best Practice 30: Gather feedback from data consumers Best Practice 32: Enrich data by generating new metadata. Note Feedback about the association between BP and benefits is welcome! Issue 2 Should we remove the Reuse benefit? As one of the goals of the ~~BP is to promote the data reuse, then all BP have this benefit. Issue-226 Issue 3 Should we consider other BP benefits? Issue-230 Issue 4 The~~ document ~~has three different indexes for the BP: by challenges, by benefits and the summary. Should we keep the three of them? Are they redundant? Issue-227~~
Prefix	Namespace IRI
cnt	http://www.w3.org/2011/content#
dcat	http://www.w3.org/ns/dcat#
dct	http://purl.org/dc/terms/
dqv	http://www.w3.org/ns/dqv#
duv	http://www.w3.org/ns/duv#
foaf	http://xmlns.com/foaf/0.1/
oa	http://www.w3.org/ns/oa#
owl	http://www.w3.org/2002/07/owl#
pav	http://pav-ontology.github.io/pav/
prov	http://www.w3.org/ns/prov#
rdf	http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs	http://www.w3.org/2000/01/rdf-schema#
skos	http://www.w3.org/2004/02/skos/core#

9. 8. The Best Practices

This section contains the ~~best practices~~ Best Practices to be used by data publishers in order to help them and data consumers to overcome the different challenges faced when publishing and consuming data on the Web. One or more ~~best practices~~ Best Practices were proposed for each one of the ~~previously~~ challenges, which are described ~~challenges.~~ in the section Data on the Web Challenges .

Each BP is related to one or more requirements from the Data on the Web Best Practices Use Cases & Requirements ~~document.~~ document [ DWBP-UCR ] which guided their development. Each Best Practice has at least one of these requirements as evidence of its relevance.

9.1 8.1 Running Example

~~This example serves as a basis for elaboration that will be described in subsequent sections. It helps to illustrate how best practices may be applied. Example 1~~

John works for the Transport Agency of MyCity and he is in charge of ~~the publication of data on the Web about bus timetables as well as real time~~ publishing data about ~~the traffic of the city.~~ public transport. John ~~decides~~ wants to ~~create two datasets: one~~ publish this data for ~~the bus timetables~~ different types of data consumers such as developers interested on creating applications and ~~other one~~ also for ~~the real time traffic data. Some requirements~~ software agents. It is important that both humans and software agents can easily understand and process the data which should be ~~addressed: The dataset for bus timetables must be available in two languages: English~~ kept up to date and ~~Portuguese; Both datasets must~~ be ~~available in CSV and JSON-LD formats; When necessary~~ easily discoverable on the Web.

RDF examples ~~will be used to show the result~~ of the application of some ~~best practices. RDF examples in this document~~ Best Practices are ~~written in~~ shown using Turtle ~~syntax [ TURTLE ] and~~ [ ~~JSON-LD~~ Turtle ]. ~~Note In this current version, examples are presented just in Turtle syntax.~~

9.2 8.2 Metadata

The Web is an open information space, where the absence of a specific context, such a company's internal information system, means that the provision of metadata is a fundamental requirement. Data will not be discoverable or reusable by anyone other than the publisher if insufficient metadata is provided. Metadata provides additional information that helps data consumers better understand the meaning of data, its structure, and to clarify other issues, such as rights and license terms, the organization that generated the data, data quality, data access methods and the update schedule of datasets.

Metadata can be used to help tasks such as dataset discovery and reuse, and can be assigned considering different levels of granularity from a single property of a resource to a whole dataset, or all datasets from a specific organization.

Metadata can be of different types. These types can be classified in different taxonomies, with different grouping criteria. For example, a specific taxonomy could define three metadata types according to descriptive, structural and administrative features. Descriptive metadata serves to identify a dataset, structural metadata serves to understand the structure in which the dataset is distributed and administrative metadata serves to provide information about the version, update schedule etc. A different taxonomy could define metadata types with a scheme according to tasks where metadata are used, for example, discovery and reuse.

Best Practice 1: Provide metadata

~~Metadata must be provided~~ Provide metadata for both human users and computer ~~applications~~ applications.

Why

Providing metadata is a fundamental requirement when publishing data on the Web because data publishers and data consumers may be unknown to each other. Then, it is essential to provide information that helps human users and computer applications to understand the data as well as other important aspects that describes a dataset or a distribution.

Intended Outcome

~~It must~~ Humans will be ~~possible for humans~~ able to understand the ~~metadata, which makes it human-readable~~ metadata ~~. It should be possible for~~ and computer applications, notably user agents, will be able to process ~~the metadata, which makes it machine-readable metadata .~~ it.

Possible Approach to Implementation

Possible approaches to provide human readable metadata:

to provide metadata as part of an HTML Web page
to provide metadata as a separate text file

Possible approaches to provide ~~machine readable~~ machine-readable metadata:

~~machine readable~~ machine-readable metadata may be provided in a serialization format such as Turtle and JSON, or it can be embedded in the HTML page using [ HTML-RDFA ] or [ JSON-LD ]. If multiple formats are published separately, they should be served from the same URL using content ~~negotiation.~~ negotiation and made available under separate URIs, distinguished by filename extension. Maintenance of multiple formats is best achieved by generating each available format on the fly based on a single source of the metadata.
when defining ~~machine readable~~ machine-readable metadata, reusing existing standard terms and popular vocabularies are strongly recommended. For example, Dublin Core Metadata (DCMI) terms [ ~~DC-TERMS~~ DCTERMS ] and Data Catalog Vocabulary [ VOCAB-DCAT ] ~~should~~ can be used to provide descriptive metadata.

How to Test

~~For~~ Check if human readable metadata ~~, check that a human user can understand the metadata associated with a dataset.~~ is available.

~~For machine readable metadata , access~~ Check if the ~~same URL either with a user agent that accepts~~ metadata is available in a ~~more data oriented~~ valid machine-readable format ~~or a tool that extracts the data from an HTML page.~~ and without syntax error.

Evidence

Relevant requirements : R-MetadataAvailable, R-MetadataDocum, R-MetadataMachineRead

Benefits

Reuse
Comprehension
Discoverability
Processability

Best Practice 2: Provide descriptive metadata

~~The~~ Provide metadata that describes the overall features of datasets and ~~distributions must be described by metadata~~ distributions.

Why

Explicitly providing dataset descriptive information allows user agents to automatically discover datasets available on the Web and it allows humans to understand the nature of the dataset and its distributions.

Intended Outcome

~~It should~~ Humans will be ~~possible for humans~~ able to ~~understand~~ interpret the nature of the dataset and its ~~distributions. It should be possible for user~~ distributions, and software agents will be able to automatically discover datasets and distributions.

Possible Approach to Implementation

~~Discovery~~ Descriptive metadata ~~should~~ can include the following overall features of a dataset:

The title and a description of the dataset.
The keywords describing the dataset.
The date of publication of the dataset.
The entity responsible (publisher) for making the dataset available.
The contact point of for the dataset.
The spatial coverage of the dataset.
The temporal period that the dataset covers.
The themes/categories covered by a dataset.

~~Discovery~~ Descriptive metadata ~~should~~ can include the following overall features of a distribution:

The title and a description of the distribution.
The date of publication of the distribution.
The media type of the distribution.

The ~~machine readable~~ machine-readable version of the ~~discovery~~ descriptive metadata ~~may~~ can be provided ~~according to~~ using the vocabulary recommended by W3C to describe datasets, i.e. the Data Catalog Vocabulary [ VOCAB-DCAT ]. This provides a framework in which datasets can be described as abstract entities.

Example 3 2

Machine-readable

The example below shows how to use [ VOCAB-DCAT ] to provide the ~~machine readable~~ machine-readable discovery metadata for the ~~timetable~~ bus stops dataset ~~(timetable-001).~~ ( stops-2015-05-05 ). The dataset has one CSV distribution ~~(timetable-001-csv)~~ ( stops-2015-05-05.csv ) that is also described using the [ VOCAB-DCAT ]. a dcat dct dcat dct'; dct:modified "2015-05-05"^^xsd:date' dcat dct dct dct dct dcat dcat mobility a skos skos skos themes a skos skos csv a dcat dct dct dcat . Note Notice that this The dataset is classified under the domain represented by the relative URI ~~:mobility.~~ mobility. This domain may be defined as part of a set of domains identified by the URI ~~:themes. SKOS can be used to~~ themes. To describe both concepts and schema ~~concepts. Note Similar to the example of the DCAT specification, in order to~~ concepts, John used SKOS . To express frequency of update ~~we chose to use~~ an instance from the Content-Oriented Guidelines developed as part of the W3C Data Cube Vocabulary ~~efforts. Additionally, we~~ efforts was used. John chose to describe the spatial and temporal coverage of the example dataset using URIs from Geonames and the Interval dataset from data.gov.uk, respectively.

  :stops-2015-05-05
      a dcat:Dataset ;
      dct:title "Bus stops of MyCity" ;
      dcat:keyword "transport","mobility","bus" ;
      dct:issued "2015-05-05"^^xsd:date ;
      dcat:contactPoint <http://data.mycity.example.com/transport/contact> ;
      dct:temporal <http://reference.data.gov.uk/id/year/2015> ;
      dct:spatial <http://www.geonames.org/3399415> ;
      dct:publisher :transport-agency-mycity ;
      dct:accrualPeriodicity <http://purl.org/linked-data/sdmx/2009/code#freq-A> ;
      dcat:theme :mobility ;
      dcat:distribution :stops-2015-05-05.csv ;
      .


  :mobility

      a skos:Concept ;
      skos:inScheme :themes ;
      skos:prefLabel "Mobility"@en ;
      skos:prefLabel "Mobilidade"@pt
      .


  :themes

      a skos:ConceptScheme ;
      skos:prefLabel "A set of domains to classify documents" ;
      .


  :stops-2015-05-05.csv

      a dcat:Distribution ;
      dct:title "CSV distribution of stops-2015-05-05 dataset" ;
      dct:description "CSV distribution of the bus stops dataset of MyCity" ;
      dcat:mediaType "text/csv" ;

.

Human-readable

Example page with human-readable description of dataset is available.

How to Test

Check ~~that~~ if the metadata for the dataset itself includes the overall features of the ~~dataset.~~ dataset in a human-readable format.

Check if ~~a user agent can automatically discover~~ the ~~dataset.~~ descriptive metadata is available in a valid machine-readable format.

Evidence

Relevant requirements : R-MetadataAvailable , R-MetadataMachineRead , R-MetadataStandardized

Benefits

Reuse
Comprehension
Discoverability

Best Practice 3: Provide locale parameters metadata

~~Information~~ Provide metadata about locale parameters (date, time, and number formats, ~~language) should be described by metadata.~~ language).

Why

Providing locale parameters ~~metadata~~ helps ~~human users~~ humans and computer applications to ~~understand~~ work accurately with things like dates, currencies and ~~to manipulate the data, improving~~ numbers that may look similar but have different meanings in different locales. For example, the ~~reuse~~ 'date' 4/7 can be read as 7th of April or the ~~data. Providing information about the locality for which~~ 4th of July depending on where the data was created. Similarly €2,000 is ~~currently published aids data users in interpreting its meaning. Date, time, and number formats can have very different meanings, despite similar appearances.~~ either two thousand Euros or an over-precise representation of two Euros. Making the locale and language explicit allows users to determine how readily they can work with the data and may enable automated translation services.

Intended Outcome

~~It should be possible for human users~~ Humans and ~~computer applications~~ software agents will be able to interpret the meaning of strings representing dates, ~~times~~ times, currencies and numbers ~~accurately by referring to locale information.~~ etc. accurately.

Possible Approach to Implementation

Locale parameters metadata ~~should~~ can include the following information:

The language(s) of the dataset.
The formats used for numeric values, dates and time.

The ~~machine readable~~ machine-readable version of the discovery metadata may be provided according to the vocabulary recommended by W3C to describe datasets, i.e. the Data Catalog Vocabulary [ VOCAB-DCAT ].

Example 4 3

Machine-readable

The example below shows the ~~machine readable~~ machine-readable metadata for ~~timetable-001~~ the bus stops dataset ( stops-2015-05-05 ) with the inclusion of the locale parameters metadata. ~~a dcat dct dcat dct dct dcat dct dct dct dct dcat dcat . To~~ The property dct:language is used to declare the languages the dataset is published ~~in use dct:language.~~ in. If the dataset is available in multiple languages, use multiple values for this property. The property ~~[[VOCAB-DCAT]. As proposed in Dataset Descriptions from HCLS Community [ HCLS-DATASET ] values were taken from the Lexvo.org Ontology [~~ Lexvo dct:conformsTo ]. Note is used to ~~include~~ specify the standard adopted for date and ~~numbers formats! Issue 5 DCAT has a property to describe language, but there are no properties to describe date,~~ time ~~and numeric~~ formats. ~~Which vocabulary should be used to provide this type of metadata? This is Issue-167 .~~

  :stops-2015-05-05
      a dcat:Dataset ;
      dct:title "Bus stops of MyCity" ;
      dcat:keyword "transport","mobility","bus" ;
      dct:issued "2015-05-05"^^xsd:date ;
      dcat:contactPoint <http://data.mycity.example.com/transport/contact> ;
      dct:temporal <http://reference.data.gov.uk/id/year/2015> ;
      dct:spatial <http://www.geonames.org/3399415> ;
      dct:publisher :transport-agency-mycity ;
      dct:accrualPeriodicity <http://purl.org/linked-data/sdmx/2009/code#freq-A> ;
      dcat:theme :mobility ;
      dcat:distribution :stops-2015-05-05.csv ;
      dct:language <http://id.loc.gov/vocabulary/iso639-1/en> ;
      dct:language <http://id.loc.gov/vocabulary/iso639-1/pt> ;
      dct:conformsTo "ISO 8601" ; 

.

Human-readable

Example page with human-readable description of dataset is available.

How to Test

Check ~~that~~ if the metadata for the dataset itself includes ~~the language in which it is published and that all numeric, date,~~ information about local parameters (i.e. data, time, number formats, and ~~time fields have locale~~ language) in a human-readable format.

Check if the metadata ~~provided either~~ with ~~each field or as~~ locale information is available in a ~~general rule.~~ valid machine-readable format and without syntax errors.

Evidence

Relevant requirements : R-FormatLocalize , R-MetadataAvailable , R-GeographicalContext

Benefits

Reuse
Comprehension

Best Practice 4: Provide structural metadata

~~Information about~~ Provide metadata that describes the schema and internal structure of a ~~distribution must be described by metadata~~ distribution.

Why

Providing information about the internal structure of a distribution ~~can be helpful when exploring~~ is essential for others wishing to explore or ~~querying~~ query the dataset. ~~Besides, structural metadata provides information that~~ It also helps people to understand the meaning of the data.

Intended Outcome

~~It should~~ Humans will be ~~possible for humans~~ able to ~~understand~~ interpret the ~~internal structure or~~ schema of a ~~distribution. It should be possible for user~~ dataset and software agents will be able to automatically process ~~the structural metadata about a distribution.~~ distributions.

Possible Approach to Implementation

Human readable strucutral metadata usually provides the properties or columns of the dataset schema.

~~Structural~~ Machine-readable structural metadata is available according to the format of a specific distribution and it may be provided within separate documents or embedded into the document. For more details see the links below.

Tabular data: see Model for Tabular Data and Metadata on the Web
JSON-LD: see JSON-LD 1.0
XML: see XML Schema
Multi-dimensional data: see Data Cube

Example 5 4

Machine-readable

~~The example below presents~~ John used the ~~structural metadata~~ Model for ~~one of~~ Tabular Data and Metadata on the ~~CSV files (bus stops) that compose~~ Web for publishing the CSV distribution of the ~~Bus Timetable dataset. For more details about the composition of~~ bus stops dataset ( stops-2015-05-05.csv ). The example below presents the ~~CSV distribution see Use machine-readable standardized data formats .~~ structural metadata for this distribution.

{
	"@context": ["http://www.w3.org/ns/csvw", {
		"@language": "en"
	}],
	"url": "http://data.mycity.example.com/transport/dataset/bus/stops-2015-05-05.csv",
	"dc:title": "CSV distribution of stops-2015-05-05 dataset",
	"dcat:keyword": ["bus", "stop", "mobility"],
	"dc:publisher": {
		"schema:name": "Transport Agency of MyCity",
		"schema:url": {
			"@id": "http://example.org"
		}
	},
	"dc:license": {
		"@id": "http://opendefinition.org/licenses/cc-by/"
	},
	"dc:modified": {
		"@value": "2015-05-05",
		"@type": "xsd:date"
	},
	"tableSchema": {
		"columns": [{
			"name": "stop_id",
			"titles": "Identifier",
			"dc:description": "An identifier for the bus stop.",
			"datatype": "string",
			"required": true
		}, {
			"name": "stop_name",
			"titles": "Name",
			"dc:description": "The name of the bus stop.",
			"datatype": "string"
		}, {
			"name": "stop_desc",
			"titles": "Description",
			"dc:description": "A description for the bus stop.",
			"datatype": "string"
		}, {
			"name": "stop_lat",
			"titles": ["Latitude"],
			"dc:description": "The latitude of the bus stop.",
			"datatype": "number"
		}, {
			"name": "stop_long",
			"titles": "Longitude",
			"dc:description": "The longitude of the bus stop.",
			"datatype": "number"
		}, {
			"name": "zone_id",
			"titles": "ZONE",
			"dc:description": "An identifier for the zone where the bus stop is located.",
			"datatype": "string"
		}, {
			"name": "stop_url",
			"titles": "URL",
			"dc:description": "URL that identifies the bus stop.",
			"datatype": "string"
		}],
		"primaryKey": "stop_id"
	}

  
  
  
  
  
    
    
  
  
  
  
    
      
      
      
      
      
    
      
      
      
      
    
      
      
      
      
   
      
      
      
      
    
    
      
      
      
      
    
    
      
      
      
      
    
    
      
      
      
      
    
    "
  }


}

Human-readable

Example page with human-readable structural metadata is available.

How to Test

Check ~~that~~ if the ~~distribution itself includes~~ structural ~~information about~~ metadata of the ~~data organization.~~ dataset is provided in a human-readable format.

Check if ~~a user agent can automatically process~~ the metadata of the distribution includes structural information about the ~~distribution.~~ dataset in a machine-readable format and without syntax errors.

Evidence

Relevant requirements : R-MetadataAvailable

Benefits

Reuse
Comprehension
Processability

9.3 8.3 Data Licenses

A license is a very useful piece of information to be attached to data on the Web. According to the type of license adopted by the publisher, there might be more or fewer restrictions on sharing and reusing data. In the context of data on the Web, the license of a dataset can be specified within the ~~data,~~ metadata, or outside of it, in a separate document to which it is linked.

Best Practice 5: Provide data license information

~~Data~~ Provide a link to or copy of the license ~~information should be available~~ agreement that controls use of the data.

Why

The presence of license information is essential for data consumers to assess the usability of data. User ~~agents, for example,~~ agents may use the presence/absence of license information as a trigger for inclusion or exclusion of data presented to a potential consumer.

Intended outcome Outcome

~~It should~~ Humans will be ~~possible for humans~~ able to understand data license information describing possible restrictions placed on the use of a ~~distribution. It should be possible for machines~~ given distribution and software agents to automatically detect the data license of a distribution.

Possible Approach to Implementation

~~The machine readable version of the data~~ Data license ~~metadata may~~ information can be ~~provided using one~~ available via a link to, or embedded copy of, a human-readable license agreement. It can also be made available for processing via a link to, or embedded copy of, a machine-readable license agreement.

One of the following vocabularies that include properties for linking to a ~~license:~~ license can be used:

Dublin Core [ ~~DC-TERMS~~ DCTERMS ] ( dct:license )
Creative Commons [ ~~CCRel~~ CCREL ] ( cc:license )
schema.org [ SCHEMA-ORG ] ( schema:license )
XHTML [ XHTML-VOCAB ] ( xhtml:license )

There are also a number of ~~machine readable~~ machine-readable rights languages, including:

The Creative Commons Rights Expression Language [ ~~ccREL~~ CCREL ]
The Open ~~Data~~ Digital Rights Language [ ~~ODRL2~~ ODRL21-model ]
The Open Data Rights Statement Vocabulary [ ODRS ]. ]

Example 6 5

Machine-readable

The ~~example below shows~~ CSV distribution of the ~~machine readable metadata for :timetable-001-csv with~~ bus stops dataset ( stops-2015-05-05.csv ) will be published under the ~~inclusion~~ Creative Commons Attribution-ShareAlike 3.0 Unported license. The property dct:license is used to include this information as part of the ~~data license information.~~ distribution metadata.

~~csv a dcat dct dct dcat~~

  :stops-2015-05-05.csv
      a dcat:Distribution ;
      dct:title "CSV distribution of stops-2015-05-05 dataset" ;
      dct:description "CSV distribution of the bus stops dataset of MyCity" ;
      dcat:mediaType "text/csv" ;
      dct:license <http://creativecommons.org/licenses/by-sa/3.0/> ;

.

Human-readable

Example page with human-readable data license information of the distribution.

How to Test

Check ~~that~~ if the metadata for the dataset itself includes the data license ~~information.~~ information in a human-readable format.

Check if a user agent can automatically detect /discover the data license of the dataset.

Evidence

Relevant use cases : R-LicenseAvailable ~~and~~ , R-MetadataMachineRead , R-LicenseLiability

Benefits

Reuse
Trust

9.4 8.4 Data Provenance

Data provenance becomes particularly important when data is shared between collaborators who might not have direct contact with one another either due to proximity or because the published data outlives the lifespan of the data provider projects or organizations. The Web brings together business, engineering, and scientific communities creating collaborative opportunities that were previously unimaginable. The challenge in publishing data on the Web is providing an appropriate level of detail about its origin. The data producer may not necessarily be the data provider and so collecting and conveying this corresponding metadata is particularly important. Without ~~provenance,~~ provenance , consumers have no inherent way to trust the integrity and credibility of the data being shared. Data publishers in turn need to be aware of the needs of prospective consumer communities to know how much provenance detail is appropriate.

Best Practice 6: Provide data provenance information

~~Data provenance~~ Provide complete information ~~should should be available.~~ about the origins of the data and any changes you have made.

Why

~~Without accessible data provenance, data~~ Provenance is one means by which consumers ~~will not know the~~ of a dataset judge its quality. Understanding its origin or and history of helps one determine whether to trust the ~~published data.~~ data and provides important interpretive context.

Intended Outcome

~~It should be possible for humans to~~ Humans will know the origin or history of the ~~dataset. It should~~ dataset and software agents will be ~~possible for machines~~ able to automatically process ~~the~~ provenance ~~information about the dataset.~~ information.

Possible Approach to Implementation

The ~~machine readable~~ machine-readable version of the data provenance ~~may~~ can be provided ~~according to the~~ using an ontology recommended ~~by W3C~~ to describe provenance information, ~~i.e., the~~ such as W3C 's Provenance Ontology [ PROV-O ].

Example 7 6

Machine-readable

The example below shows the ~~machine readable~~ machine-readable metadata for the ~~Bus Timetable~~ bus stops dataset ~~(timetable-001)~~ with the inclusion of the provenance metadata. The ~~metadata specifies~~ properties dct:creator,dct:publisher and dct:issued are used to give information about the origin of the dataset. The property prov:actedOnBehalfOf is used to designate that John ~~created~~ acted on behalf of the ~~Bus Timetable dataset.~~ Transport Agency of MyCity.

~~a dcat dct dcat dct dct dcat dct dct dct dct dct john a foaf foaf foaf prov mycity a foaf foaf~~

  :stops-2015-05-05
      a dcat:Dataset, prov:Entity ;
      dct:title "Bus stops of MyCity" ;
      dcat:keyword "transport", "mobility", "bus" ;
      dct:issued "2015-05-05"^^xsd:date ; 

      dcat:contactPoint <http://data.mycity.example.com/transport/contact> ;
      dct:temporal <http://reference.data.gov.uk/id/year/2015> ;
      dct:spatial <http://www.geonames.org/3399415> ;
      dct:publisher :transport-agency-mycity ; 

      dct:accrualPeriodicity <http://purl.org/linked-data/sdmx/2009/code#freq-A> ;
      dct:language <http://id.loc.gov/vocabulary/iso639-1/en> ;
      dct:creator :john ; 
      .


  :john

      a foaf:Person, prov:Agent ;
      foaf:givenName "John" ;
      foaf:mbox <mailto:john@mycitytransport.org> ;
      prov:actedOnBehalfOf :transport-agency-mycity ; 
      .
  :transport-agency-mycity

      a foaf:Organization, prov:Agent ;
      foaf:name "Transport Agency of Mycity" ;
      .

Human-readable

~~to be done.~~ ~~Issue 6 Should we include a more complex example to illustrate provenance? Issue-220~~ Example page with human-readable provenance information about the bus stops dataset is available.

How to Test

Check that the metadata for the dataset itself includes the provenance information about the ~~dataset.~~ dataset in a human-readable format.

Check if a computer application can automatically process the provenance information about the dataset.

Evidence

Relevant requirements : R-ProvAvailable , R-MetadataAvailable

Benefits

Reuse
Comprehension
Trust

9.5 8.5 Data Quality

~~Data~~ The quality of a dataset can ~~affect~~ have a big impact on the ~~potentiality~~ quality of ~~the application~~ applications that use ~~data, as~~ it. As a consequence, ~~its~~ the inclusion of data quality information in ~~the~~ data publishing and consumption pipelines is of primary importance. Usually, the assessment of quality involves different kinds of quality dimensions, each representing groups of characteristics that are relevant to publishers and consumers. ~~Measures~~ The Data Quality Vocabulary defines concepts such as measures and metrics ~~are defined~~ to assess the quality for each quality dimension [ ~~DQV~~ VOCAB-DQV ]. There are heuristics designed to fit specific assessment situations that rely on quality indicators, namely, pieces of data content, pieces of data meta-information, and human ratings that give indications about the suitability of data for some intended use.

Best Practice 7: Provide data quality information

~~Data Quality~~ Provide information ~~should be available.~~ about data quality and fitness for particular purposes.

Why

Data quality might seriously affect the suitability of data for specific applications, including applications very different from the purpose for which it was originally generated. Documenting data quality significantly eases the process of ~~datasets~~ dataset selection, increasing the chances of reuse. Independently from domain-specific peculiarities, the quality of data should be documented and known quality issues should be explicitly stated in metadata.

Intended Outcome

~~It should~~ Humans and software agents will be ~~possible for humans to have access~~ able to ~~information that describes~~ assess the quality and therefore suitability of ~~the~~ a dataset ~~and its distributions. It should be possible~~ for ~~machines to automatically process the quality information about the dataset and its distributions.~~ their application.

Possible Approach to Implementation

The ~~machine readable~~ machine-readable version of the dataset quality metadata may be provided ~~according to~~ using the ~~vocabulary that is being~~ Data Quality Vocabulary developed by the DWBP working group ~~, i.e., the Data Quality Vocabulary~~ [ ~~DQV~~ VOCAB-DQV ].

Example 8 7

Machine-readable

The example below shows the metadata for the CSV distribution of the ~~Bus Timetable~~ bus stops dataset with the inclusion of the data quality metadata. The metadata was defined according to the Data Quality Vocabulary. Further examples can be found in the Data Quality Vocabulary document [ ~~DQV~~ VOCAB-DQV ].

csv a dcat dct dct dcat dct measure1 a dqv dqv dqv dqv measure2 a dqv dqv dqv dqv availabity a dqv dqv consistency a dqv dqvcategory2 csvAvailabilityMetric a dqv dqvavailabity csvConsistencyMetric a dqv dqvconsistency

  :stops-2015-05-05.csv
      a dcat:Distribution ;
      dcat:downloadURL <http://data.mycity.example.com/transport/dataset/bus/stops-2015-05-05.csv> ;
      dct:title "CSV distribution of stops-2015-05-05 dataset" ;
      dct:description "CSV distribution of the bus stops dataset of MyCity" ;
      dcat:mediaType "text/csv" ;
      dct:license <http://creativecommons.org/licenses/by-sa/3.0/> ;
      dqv:hasQualityMeasurement :measure1, :measure2  
      .
  :measure1

      a dqv:QualityMeasurement ;
      dqv:computedOn :stops-2015-05-05.csv ;
      dqv:isMeasurementOf :downloadURLAvailabilityMetric ;
      dqv:value "true"^^xsd:boolean 
      .
  :measure2

      a dqv:QualityMeasurement ;
      dqv:computedOn :stops-2015-05-05.csv ;
      dqv:isMeasurementOf :csvCompletenessMetric ;
      dqv:value "0.5"^^xsd:double 
      .


#definition of dimensions and metrics
  :availability

      a dqv:Dimension ;
      skos:prefLabel "Availability"@en ;
      skos:definition "Availability of a dataset is the extent to which data (or some portion of it) is present, obtainable and ready for use."@en ;
      dqv:inCategory :accessibility 
      .
  :completeness

      a dqv:Dimension ;
      skos:prefLabel "Completeness"@en ;
      skos:definition "Completeness refers to the degree to which all required information is present in a particular dataset."@en ;
      dqv:inCategory :intrinsicDimensions	
      .
  :downloadURLAvailabilityMetric

      a dqv:Metric ;
      skos:definition "It checks if dcat:downloadURL is available and if its value is dereferenceable."@en ;
      dqv:inDimension :availability
      .
  :csvCompletenessMetric

      a dqv:Metric ;
      skos:definition "Ratio between the number of objects represented in the cvs and the number of objects expected to be represented according to the declared dataset scope."@en ;
      dqv:inDimension :completeness


.

Human-readable

~~to be done. Note~~ ~~This example is not definitive. It will change according to the updates on the Data Quality Vocabulary [ DQV~~ Example page ]. with human-readable data quality information.

How to Test

Check that the metadata for the dataset itself includes quality information about the dataset.

Check if a computer application can automatically process the quality information about the dataset.

Evidence

Relevant Requirements: R-QualityMetrics , R-DataMissingIncomplete , R-QualityOpinions , R-DataMissingIncomplete , R-QualityMetrics

Benefits

Reuse
Trust

9.6 8.6 Data Versioning

Datasets published on the Web may change over time. Some datasets are updated on a ~~schedule basis~~ scheduled basis, and other datasets are changed as improvements in collecting the data make updates worthwhile. In order to deal with these changes, new versions of a dataset may be created. ~~Dataset versioning has been the subject of numerous discussions, however~~ Unfortunately, there is no consensus about when ~~creating~~ changes to a ~~new version of~~ dataset should cause it to be considered a ~~dataset.~~ different dataset altogether rather than a new version. In the ~~following~~ following, we present some scenarios where ~~a new dataset, i.e.~~ most publishers would agree that the revision should be considered a new version of the existing ~~dataset, should be created to reflect the corresponding update.~~ dataset.

Scenario 1: a new bus stop is created and ~~its timetable doesn’t exist on~~ it should be added to the dataset;
Scenario 2: an existing bus stop is removed and ~~its timetable~~ it should be deleted from the dataset;
Scenario 3: an error was identified in one of the existing ~~timetables~~ bus stops stored in the dataset and this error must be ~~corrected;~~ corrected.

~~The creation of~~ In general, multiple datasets to that represent time series ~~as well as~~ or spatial series, e.g. the same kind of data for different ~~regions, in general,~~ regions or for different years, are not considered as multiple versions ~~for~~ of the same dataset. In this case, each dataset covers a different ~~observation~~ set of observations about the world and should be treated as a new ~~dataset instead of a new version of an existing~~ dataset. This is also the case of with a dataset that collects data about ~~weakly~~ weekly weather ~~forecast of~~ forecasts for a given city, where every week a new dataset ~~should be~~ is created to store data about that specific week.

Scenarios 1 and 2 might trigger a major version, whereas Scenario 3 would likely trigger only a minor version. But how you decide whether versions are minor or major is less important than that you avoid making changes without incrementing the version indicator. Even for small ~~changes~~ changes, it is important to keep track of the different dataset versions to make the dataset trustworthy. Publishers should remember that a given dataset may be in use ~~for~~ by one or more data ~~consumers~~ consumers, and they should ~~be notified about the creation of new versions or it should be possible~~ take reasonable steps to ~~automatically identify different versions of~~ inform those consumers when a new version is released. For real-time data, an automated timestamp can serve as a version identifier. For each dataset, the ~~same dataset.Different types of dataset updates need~~ publisher should take a consistent, informative approach to versioning, so data consumers can understand and work with the changing data.

Best Practice 8: Provide ~~versioning information~~ a version indicator

~~Information about dataset versioning should be available.~~ Assign and indicate a version number or date for each dataset.

Why

Version information makes a revision of a dataset uniquely identifiable. Uniqueness can be used by data consumers to determine whether and how data has changed over time and to determine specifically which version of a dataset they are working with. Good data versioning enables consumers to understand if a newer version of a dataset is available. Explicit versioning allows for repeatability in research, enables comparisons, and prevents confusion. Using unique version numbers that follow a standardized approach can also set consumer expectations about how the versions differ.

Intended Outcome

~~It should~~ Humans and software agents will easily be ~~possible for data consumers~~ able to ~~easily~~ determine which version of ~~the~~ a dataset they are working with.

Possible Approach to Implementation

The ~~precise~~ best method ~~adopted~~ for providing versioning information ~~may~~ will vary according to the ~~context, however~~ context; however, there are some basic guidelines that can be followed, for example:

Include a unique version number or date as part of the metadata for the dataset.
Use a consistent numbering scheme with a meaningful approach to incrementing digits, such as [ SchemaVer ].
If the data is made available through an API , the URI used to request the latest version of the data should not change as the versions change, but it should be possible to request a specific version through the API .
Use ~~the~~ Memento [ RFC7089 ], or components thereof, to express temporal versioning of a dataset and to access the version that was operational at a given datetime. The Memento protocol aligns closely with the approach for assigning URIs to versions ~~described in the following and~~ that is used for W3C ~~specifications.~~ specifications, described below.

The Web Ontology Language [ OWL2-QUICK-REFERENCE ] and the Provenance, ~~authoring~~ Authoring and versioning Ontology [ PAV ] ~~provides~~ provide a number of annotation properties for version information.

Example 9 8

Machine-readable

The example below shows the metadata for ~~timetable-001~~ bus stops with the inclusion of the versioning metadata. ~~a dcat dct dcat dct dct dcat dct dct dct dct dct prov pav :~~ The properties owl:versionInfo and pav:version are used to denote the version ~~"1.0" ;~~ of the dataset.

  :stops-2015-05-05

      a dcat:Dataset ;
      dct:title "Bus stops of MyCity" ;
      dcat:keyword "transport","mobility","bus" ;
      dct:issued "2015-05-05"^^xsd:date ;
      dcat:contactPoint <http://data.mycity.example.com/transport/contact> ;
      dct:temporal <http://reference.data.gov.uk/id/year/2015> ;
      dct:spatial <http://www.geonames.org/3399415> ;
      dct:publisher :transport-agency-mycity ;
      dct:accrualPeriodicity <http://purl.org/linked-data/sdmx/2009/code#freq-A> ;
      dct:language <http://id.loc.gov/vocabulary/iso639-1/en> ;
      dct:creator :john ;
      owl:versionInfo "1.0" ; 
      pav:version "1.0" ; 
      .

Using Memento

Assume:

~~http://example.org/dataset~~ http://data.mycity.example.com/transport/dataset/bus/stops is the “generic URI” at which the current version of a dataset is always available
~~http://example.org/timetable-002~~ http://data.mycity.example.com/transport/dataset/bus/stops-2015-12-17 is the versioned URI for the current dataset
~~http://example.org/timetable-001~~ http://data.mycity.example.com/transport/dataset/bus/stops-2015-05-05 is the versioned URI of the prior version of the dataset

~~http://example.org/timetable-000 is the versioned URI of the first version of the dataset~~

In the Memento protocol, the versioned URIs provide HTTP response header information to express their version datetime and their relation to the generic URI:

~~HTTP OK GMT Link :~~

curl -I http://data.mycity.example.com/transport/dataset/bus/stops-2015-12-17

HTTP/1.1 200 OK
Memento-Datetime: Thu, 17 Dec 2015 00:00:00 GMT
Link:<http://data.mycity.example.com/transport/dataset/bus/stops>;

rel



<

=“



http
:

original


//example.org/timetable>;
rel=“original”


”

The ~~versioned URIs can provide~~ generic URI provides a link to a TimeGate, which supports datetime negotiation as a means to determine which version of a dataset was operational at a given ~~datetime:~~ datetime. Since the generic URI is not versioned, no version datetime is provided in the headers.

~~HTTP OK GMT~~

curl -i -H http://data.mycity.example.com/transport/dataset/bus/stops

HTTP/1.1 200 OK
Link: <http://data.mycity.example.com/transport/dataset/bus/timegate/stops>;

rel



<

=“



http

timegate


:


”


//example.org/timegate/timetable>;
rel=“timegate”

The ~~generic URI~~ versioned URIs can also provide a link to a TimeGate:

~~HTTP OK Link~~

curl -I http://data.mycity.example.com/transport/dataset/bus/stops-2015-05-05

HTTP/1.1 200 OK
Memento-Datetime: Tue, 05 May 2015 00:00:00 GMT
Link: <http://data.mycity.example.com/transport/dataset/bus/stops>;

rel=“original”,
 <http://data.mycity.example.com/transport/dataset/bus/timegate/stops>;
rel


:


=“



timegate



<

”



http
:
//example.org/timegate/timetable>;
rel=“timegate”

This is how a client determines which dataset version was operational on ~~March~~ June 20 2015:

~~HTTP datetime Link :~~

curl -I -H "Accept-Datetime: Sat, 20 Jun 2015  12:00:00 GMT" http://data.mycity.example.com/transport/dataset/bus/timegate/stops

HTTP/1.1 302 Found
Vary: accept-datetime
Location: http://data.mycity.example.com/transport/dataset/bus/stops-2015-05-05
Link: <http://data.mycity.example.com/transport/dataset/bus/stops>

rel



<
http

=


:


"original"


//example.org/timetable>
rel="original"

Human-readable

Example page with human-readable data versioning information.

How to Test

Check ~~that~~ if the metadata for the dataset/distribution provides a unique version number or date ~~is provided with the metadata describing~~ in a human-readable format.

Check if a computer application can automatically detect/discover the ~~dataset.~~ unique version number or date of a dataset or distribution.

Evidence

Relevant requirements : R-DataVersion

Benefits

Reuse
Trust

Best Practice 9: Provide version history

A Provide a complete version history ~~about~~ that explains the ~~dataset should be available.~~ changes made in each version.

Why

In creating applications that use data, it can be helpful to understand the variability of that data over time. Interpreting the data is also enhanced by an understanding of its dynamics. Determining how the various versions of a dataset differ from each other is typically very laborious unless a summary of the differences is provided.

Intended Outcome

~~It should~~ Humans and software agents will be ~~possible for data consumers~~ able to understand how the dataset typically changes from version to version and how any two specific versions differ.

Possible Approach to Implementation

Provide a list of published versions and a description for each version that explains how it differs from the previous version. An API can expose a version history with a single dedicated URL that retrieves the latest version of the complete history.

Example 10 9

Machine-readable

Suppose that a new bus stop was created and ~~its timetable doesn't exist on the timetable-001. To maintain the timetable-001 up to date~~ a new dataset ~~(timetable-002) was created. timetable-002 includes all the data from timetable-001 plus~~ ( stops-2015-12-17 ) is published to keep the data ~~about the~~ up to date. The new ~~bus stop.~~ dataset is a version of stops-2015-05-05. The ~~machine readable~~ machine-readable metadata of ~~timetable-002~~ the new dataset is shown below. a dcat dct dcat dct'; dct:modified "2015-05-06"^^xsd:date' dcat dct dct dct dct dct pav rdfs pav Versioning metadata for timetable-001 after below with the ~~creation of timetable-002, which is a new version of timetable-001.~~ corresponding versioning history information.

~~a dcat dct dct pav : version "1.0"~~

  :stops-2015-12-17
      a dcat:Dataset ;
      dct:title "Bus stops of MyCity" ;
      dcat:keyword "transport","mobility","bus" ;
      dct:issued "2015-12-17"^^xsd:date ;
      dcat:contactPoint <http://data.mycity.example.com/transport/contact> ;
      dct:temporal <http://reference.data.gov.uk/id/year/2015> ;
      dct:spatial <http://www.geonames.org/3399415> ;
      dct:publisher :transport-agency-mycity ;
      dct:accrualPeriodicity <http://purl.org/linked-data/sdmx/2009/code#freq-A> ;
      dct:language <http://id.loc.gov/vocabulary/iso639-1/en> ;
      dct:creator :john ;
       ...
      dct:isVersionOf :stops-2015-05-05 ;

      pav:previousVersion stops-2015-05-05 ;
      rdfs:comment "The bus stops dataset was updated to reflect the creation of a new bus stop at 1115 Pearl Street." ;
      owl:versionInfo "1.1" ;
      pav:version "1.1" ; 


.

Using Memento:

Assume:

http://example.org/timetable http://data.mycity.example.com/transport/dataset/bus/stops is the “generic URI” at which the current version of a dataset is always available
http://example.org/timetable-002 http://data.mycity.example.com/transport/dataset/bus/stops-2015-12-17 is the versioned URI for the current dataset
http://example.org/timetable-001 http://data.mycity.example.com/transport/dataset/bus/stops-2015-05-05 is the versioned URI of the prior version of the dataset
~~http://example.org/timetable-000~~ http://example.org/stops-2015-01-01 is the versioned URI of the first version of the dataset

The versioned URIs, the generic URI, and the TimeGate can provide a link to a TimeMap that provides an overview of all temporal versions of the dataset:

curl -I http://data.mycity.example.com/transport/dataset/bus/stops-2015-05-05

HTTP/1.1 200 OK
Memento-Datetime: Tue, 05 May 2015 00:00:00 GMT
Link: <http://data.mycity.example.com/transport/dataset/bus/stops>;

rel=“original”,
 <http://data.mycity.example.com/transport/dataset/bus/timemap/stops>;

rel=“timemap”;
 type

HTTP OK
 GMT
 GMT
 
type


=

curl 


"application/link-format"

HTTP OK
format



type

This is how the TimeMap is retrieved:

curl -I http://data.mycity.example.com/transport/dataset/bus/timemap/stops


HTTP/1.1 200 OK
Content-Type: application/link-format


<http://data.mycity.example.com/transport/dataset/bus/stops>;rel="original”,
<http://data.mycity.example.com/transport/dataset/bus/timegate/stops>;rel="timegate”,
<http://data.mycity.example.com/transport/dataset/bus/timemap/stops>;rel="timemap”;

 type="application/link-format",
<http://data.mycity.example.com/transport/dataset/bus/stops-2015-01-01>;

rel=“first memento"; datetime="Thu, 01 Jan 2015 00:00:00 GMT",
<http://data.mycity.example.com/transport/dataset/bus/stops-2015-05-05>;
rel=“memento"; datetime=“Tue, 05 May 2015 00:00:00 GMT"
<http://data.mycity.example.com/transport/dataset/bus/stops-2015-12-07>;
rel=“last
memento"



<

;



http

datetime


:


=


//example.org/timetable-002>;
rel=“last
memento";
datetime="Tue,05
May


"Thu,
17
Dec

2015
00:00:00
GMT"

The versioned URI can provide information regarding relations with other dataset versions:

~~HTTP OK GMT datetime datetime~~

curl -I http://data.mycity.example.com/transport/dataset/bus/stops-2015-05-05

HTTP/1.1 200 OK
Memento-Datetime: Tue, 05 May 2015 00:00:00 GMT
Link: <http://data.mycity.example.com/transport/dataset/bus/stops>;

rel=“original”,
 <http://data.mycity.example.com/transport/dataset/bus/stops-2015-01-01>;
rel=“prev first memento";
 datetime="Thu, 01 Jan 2015 00:00:00 GMT",
 <http://data.mycity.example.com/transport/dataset/bus/stops-2015-12-17>;
rel=“next last memento";
 datetime



=


"Tue,
05
May

"Thu,
17
Dec

2015
00:00:00
GMT"

Human-readable

Example page with human-readable data versioning history information.

How to Test

Check that a list of published versions is ~~available, and that~~ available as well as a change log describing precisely how each version ~~is described.~~ differs from the previous one.

Evidence

Relevant requirements : R-DataVersion

Benefits

Reuse
Trust

Best Practice 10: Avoid Breaking Changes to Your API , Communicate Changes to Developers Avoid changes to your API that break client code, and communicate any changes in your API to your developers when evolution happens Why When developers implement a client for your API , they may rely on specific characteristics that you have built into it, such as the schema or the details of each response. Avoiding breaking changes in your API minimizes breakage to client code. Communicating changes when they do occur allows developers to take action. Intended Outcome Developer code will continue to work, and if changes are made, developers will have sufficient time and information to adapt their code. That will enable them to address changes that would otherwise cause breakage. Possible Approach to Implementation When improving your API , focus on adding new calls rather than changing how existing calls work. Existing clients can ignore such changes and will continue functioning. If using a fully RESTful style, you should be able to avoid changes that affect developers by keeping home resource URIs constant and changing only elements that your users do not call directly. If you need to change your data in ways that are not compatible with the extension points that you initially designed, then a completely new design is required, and this will be a breaking change. In that case, it’s best to implement the changes as a new API . If using any other architectural style, use versioning to indicate changes that affect client code. Indicate the version in the response header. Major version numbers should be reflected in your URIs or in request headers. When versioning in URIs, include the version number as far to the left as possible. Keep the previous version available for developers whose code has not yet been adapted to the new version. How to Test Be sure that client code is still working after changes, ask for feedback from developers Evidence Relevant requirements : R-DataVersion Note This BP will be complemented.

9.7 8.7 Data Identifiers

Identifiers take many forms and are used extensively in every information system. Data discovery, usage and citation on the Web depends fundamentally on the use of HTTP (or HTTPS) URIs: globally unique identifiers that can be looked up by dereferencing them over the Internet [ RFC3986 ]. It is perhaps worth emphasizing some key points about URIs in the current context.

URIs are 'dumb strings', that is, they carry no semantics. Their function is purely to identify a resource.
Although the previous point is accurate, it would be perverse for a URI such as http://example.com/dataset.csv to return anything other than a CSV file. Human readability is helpful.
When de-referenced (looked up), a single URI may offer the same resource in more than one format. http://example.com/dataset may offer the same data in, say, CSV, JSON and XML. The server returns the most appropriate format based on content negotiation .
One URI may redirect to another.
De-referencing a URI triggers a computer program to run on a server so that ~~the URI acts as a call to an API . The server~~ may ~~therefore~~ do something as simple as return a single, static file, or it may carry out complex processing. Precisely what processing is carried out, i.e. the software on the server, is completely independent of the URI itself.

Best Practice ~~11:~~ 10: Use persistent URIs as identifiers of datasets

~~Datasets must be identified~~ Identify each dataset by a carefully chosen, persistent URI.

Why

Adopting a common identification system enables basic data identification and comparison processes by any stakeholder in a reliable way. They are an essential pre-condition for proper data management and reuse.

Developers may build URIs into their code and so it is important that those URIs persist and that they dereference to the same resource over time without the need for human intervention.

Intended Outcome

Datasets or information about ~~datasets, must~~ datasets will be discoverable and citable through time, regardless of the status, availability or format of the data.

Possible Approach to Implementation

To be persistent, URIs must be designed as ~~such and backed up by organizational commitments.~~ such. A lot has been written on this topic as the table below shows. Some sources of information related to URI persistence Status Title Authors and Date Background Cool URIs don't change Tim Berners-Lee, 1998 Cool URIs for the Semantic Web Leo Saurman, Richard Cyganiak, 2008 Linked Data Tim Berners-Lee, 2009 Key Source Designing URI Sets topic, see, for example, the ~~UK Public Sector (PDF) UK Chief Technology Officer Council October 2009 Survey & summary of techniques~~ European Commission's Study on Persistent URIs Phil Archer, Nikos Loutas, Stijn Goedertier, Saky Kourtidis, 2013 Expansion Creating Linked Data Jeni Tennison, 2009 Linked Data: Evolving the Web into a Global Data Space Tom Heath & Christian Bizer, 2011 Linked Data Patterns Leigh Dodds & Ian Davis, 2012 Best Practices for Multilingual Linked Open Data Jose Emilio Labra Gayo, 2012 Detail Statistical Linked Dataspaces Sarven Capadisli, 2012 Issue 7 The table links to Designing URI Sets for the UK Public Sector. A newer version of this document (which was the first of its kind) exists but is on a GitHub repository . It seems that this might happen due to changes in organisation behind data.gov.uk. If this happens, we should update the link to point to the new version. URIs can be long. In a dataset of even moderate size, storing each URI is likely to be repetitive and obviously wasteful. Instead, define locally unique identifiers for each element and provide data that allows them to be converted to globally unique URIs programmatically. The Metadata Vocabulary for Tabular Data [ ~~tabular-metadata~~ PURI ] ~~provides mechanisms for doing this within tabular data such as CSV files,~~ which in ~~particular using URI template properties such as the about URL property.~~ turn links to many other resources.

Where a data publisher is unable or unwilling to manage ~~its~~ a URI space directly for persistence, an alternative approach is to use a redirection service such as Permanent Identifiers for the Web or purl.org . ~~This provides~~ These provide persistent URIs that can be redirected as required so that the eventual location can be ephemeral. The software behind such services is freely available so that it can be installed and managed locally if required.

Digital Object Identifiers ( ~~(DOIs)~~ DOI s) offer a similar alternative. These identifiers are defined independently of any Web technology but can be appended to a 'URI stub.' DOIs are an important part of the digital infrastructure for research data and and libraries.

Example 11 10

The URI http://data.mycity.example/public-transport/road/bus/dataset/timetable http://data.mycity.example.com/transport/dataset/bus/stops has several features that support persistence:

All names are subject to change over time but in choosing a domain name, it is reasonable for John to assume that MyCity will continue to exist and that it will continue to have a government. Therefore, while cases like Yugoslavia prove that even country names change and top level domains disappear (like .yu), ~~mycity.gov~~ a domain name based on the city's name is as persistent as any domain name can be.
By putting data on the data.mycity.example data.mycity.example.com subdomain, John is creating a specific domain that can be managed independently of any particular department.
It is not safe to assume that a specific department will persist. The authorities in MyCity might very well decide that the Transport Agency should be merged with another to create the Transport and Environment Agency. It is right, therefore, not to include the name of the Transport Agency in the URI, but to include the task from which the data comes, in this case that of providing public transport.
~~Likewise, the path segments of /road and /bus take us further towards the specific dataset for which John is responsible.~~ The /dataset path segment is an indication that the URI identifies a dataset, rather than, say, a specific bus route.
Likewise, the path segment of /bus take us further towards the specific dataset for which John is responsible.
Finally /timetable /stops leads us to the dataset concerning bus ~~timetables~~ stops in MyCity.
In DCAT terms, this would be the identifier for the dataset. Specific distributions of the dataset are likely to be identified by adding the relevant file extension to the URI, such as http://data.mycity.example/public-transport/road/bus/dataset/timetable.csv http://data.mycity.example.com/transport/dataset/bus/stops.csv, http://data.mycity.example/public-transport/road/bus/dataset/timetable.json http://data.mycity.example.com/transport/dataset/bus/stops.json, http://data.mycity.example/public-transport/road/bus/dataset/timetable.ttl http://data.mycity.example.com/transport/dataset/bus/stops.ttl etc.

These points cover the design aspects of a persistent URI. To cover the organizational aspect, MyCity should publish information about its URI design principles as well as a commitment to maintain the service in the long term.

How to Test

Check that each dataset ~~in question~~ is identified using a URI that has been ~~assigned under a controlled process as set out in the previous section. Ideally,~~ designed for persistence. Ideally the relevant Web site includes a description of the ~~process~~ design scheme and a credible pledge of persistence should the publisher no longer be able to maintain the URI space themselves.

Evidence

Relevant requirements : R-UniqueIdentifier , R-Citable

Benefits

Reuse
Linkability
Discoverability
Interoperability

Best Practice ~~12:~~ 11: Use persistent URIs as identifiers within datasets

~~Datasets should use and reuse~~ Reuse other people's URIs as identifiers within datasets where possible.

Why

The power of the Web lies in the Network effect . The first telephone only became useful when the second telephone meant there was someone to call; the third telephone made both of them more useful yet. Data becomes more valuable if it refers to other people's data about the same thing, the same place, the same concept, the same event, the same person, and so on. That means using the same identifiers across datasets and making sure that your identifiers can be referred to by other datasets. When those identifiers are HTTP URIs, they can be looked up and more data discovered.

These ideas are at the heart of the 5 Stars of Linked Data where one data point links to another, and of Hypermedia where links may be to further data or to services ~~(or more generally 'affordances')~~ that can act on or relate to the data in some way. Examples include a bug reporting mechanisms, processors, a visualization engine, a sensor, an actuator etc. In both Linked Data and Hypermedia, the emphasis is put on the ability for machines to traverse from one resource to another following links that express relationships.

That's the Web of Data.

Intended Outcome

~~That one data item can~~ Data items will be related ~~to others~~ across the Web creating a global information space accessible to humans and machines alike.

Possible Approach to Implementation

This is a topic in itself and a general document such as this can only include superficial detail.

Developers know that very often the problem they're trying to solve will have already been solved by other people. In the same way, if you're looking for a set of identifiers for obvious things like countries, currencies, subjects, species, proteins, cities and regions, Nobel prize winners and products – someone's done it already. The steps described for discovering existing vocabularies [ LD-BP ] can readily be adapted.

ensure URI sets you use are published by a trusted group or organization;
ensure URI sets have ~~permanent~~ persistent URIs.

If you can't find an existing set of identifiers that meet your needs then you'll need to create your own, following the patterns for URI persistence so that others will add value to your data by linking to it.

URIs can be long. In a dataset of even moderate size, storing each URI is likely to be repetitive and obviously wasteful. Instead, define locally unique identifiers for each element and provide data that allows them to be converted to globally unique URIs programmatically. The Metadata Vocabulary for Tabular Data [ Tabular-Metadata ] provides mechanisms for doing this within tabular data such as CSV files, in particular using URI template properties such as the about URL property.

Example 12 11

The URI given as an example in the previous Best Practice ( http://data.mycity.example/public-transport/road/bus/dataset/timetable http://data.mycity.example.com/transport/dataset/bus/stops ) identifies a dataset. Much of the URI can be reused to identify bus stops, routes and the type of bus used on a given service. For example, a suitable persistent URI for the 'Airport - Bullfrog' route would be:

http://data.mycity.example/public-transport/road/bus/route/id/AB http://data.mycity.example.com/transport/route/bus/id/AB

This has the same initial structure as for the dataset but rather than /dataset it now includes the path segment /route so that humans can see that the type of thing identified is a bus route. The /id segment indicates that the URI identifies something that is not an information resource, i.e. something you cannot retrieve over the Internet, and /AB is the local identifier for the actual bus route. ~~Dereferencing this~~ This is consistent with advice from GS1's SmartSearch Implementation Guideline [ GS1 ] which says that where standard identifiers are used for a product, location etc., it is recommended that the URI includes the type of identifier being used. For example, if a GTIN is being used to identify a product then the URI should be of the form: http://data.myproduct.example.com/gtin/05011476100885. Dereferencing URIs for non-information resources should result in an HTTP 303 redirect to a similar URL such as http://data.mycity.example/public-transport/road/bus/route/doc/AB http://data.mycity.example.com/transport/route/bus/doc/AB that describes , i.e. gives information about, the AB bus route (note the substitution of /doc for /id ). Jeni Tennison's work on URLs in Data has more to say on this ~~topic.~~ topic [ URLs-in-data ].

In offering this advice, it is recognized that URIs can be long. In a dataset of even moderate size, storing each URI is likely to be repetitive and obviously wasteful. Instead, define locally unique identifiers for each element (such as AB in this example) and provide data that allows them to be converted to globally unique URIs programmatically. The Metadata Vocabulary for Tabular Data [ ~~tabular-metadata~~ Tabular-Metadata ] provides mechanisms for doing this within tabular data such as CSV files, in particular using URI template properties such as the about URL property.

How to Test

Check that within the dataset, references to things that don't change or that change slowly, such as countries, regions, organizations and people, as are referred to by URIs or by short identifiers that can be appended to a URI stub. Ideally the URIs should resolve, however, they have value as globally scoped variables whether they resolve or not.

Evidence

Relevant requirements : R-UniqueIdentifier

Benefits

Reuse
Linkability
Discoverability
Interoperability

Best Practice ~~13:~~ 12: Assign URIs to dataset versions and series

Assign URIs ~~should be assigned~~ to individual versions of datasets as well as to the overall series.

Why

Like documents, many datasets fall into natural series or groups. For example:

~~noon temperature readings~~ bus stops in ~~central London 1850 to the present day;~~ MyCity (that change over time);
~~today's noon temperature~~ a list of elected officials in ~~London;~~ MyCity
~~the temperature in London at noon on 3rd June 2015.~~ evolving versions of a document through to completion.

In different circumstances, it will be appropriate to refer ~~separately~~ to ~~each~~ the current situation (the current set of ~~these examples (and many like them).~~ bus stops, the current elected officials etc.). In others, it may be appropriate to refer to the situation as it existed at a specific time.

Intended Outcome

~~It should~~ Humans and software agents will be ~~possible~~ able to refer to a specific ~~version~~ versions of a dataset and to concepts such as a 'dataset series' and 'the latest ~~version.'~~ version'.

Possible Approach to Implementation

The W3C provides a good example of how to do this. The (persistent) URI for this document is ~~http://www.w3.org/TR/2016/WD-dwbp-20160112/.~~ http://www.w3.org/TR/2015/WD-dwbp-20150224/. That identifier points to an immutable snapshot of the document on the day of its publication. The URI for the 'latest version' of this document is http://www.w3.org/TR/dwbp/ which is an identifier for a series of closely related documents that are subject to change over time. At the time of publication, these two URIs both resolve to this document. However, when the next version of this document is published, the 'latest version' URI will be changed to point to ~~that.~~ that, but the dated URI remains unchanged.

Example 13 12

Suppose that a new bus stop is created. To ~~complete~~ keep stops-2015-05-05 up to date, a new version of the ~~London temperature example, one might imagine URIs as follows:~~ dataset ( stops-2015-12-17 ) is created. stops-2015-12-17 includes all the data from stops-2015-05-05 plus the data about the new bus stop. The two versions can be identified by the following URIs:

~~http http~~

: http://data.mycity.example.com/transport/dataset/bus/stops-2015-05-05 is the versioned URI of the first version of the dataset

~~//weather.example.com/temperature/UK/London/noon/2015-06-03~~ http://data.mycity.example.com/transport/dataset/bus/stops-2015-12-17 is the version URI of the updated version of the dataset

http://data.mycity.example.com/transport/dataset/bus/stops always resolves to the latest version so it pointed to resolved to stops-2015-05-05 until 17 December 2015 when the server configuration was updated to point that URL to stops-2015-12-17.

How to Test

Check that each version of a dataset has its own URI, and that ~~logical groups of datasets are~~ there is also ~~identifiable.~~ a "latest version" URI.

Evidence

Relevant requirements : R-UniqueIdentifier , R-Citable

Benefits

Reuse
Discoverability
Trust

9.8 8.8 Data Formats

The ~~formats~~ format in which data is made available to consumers ~~are~~ is a key aspect of making that data usable. The best, most flexible access mechanism in the world is pointless unless it serves data in formats that enable use and reuse. Below we detail ~~best practices~~ Best Practices in selecting formats for your data, both at the level of files and that of individual fields. W3C encourages use of formats that can be used by the widest possible audience and processed most readily by computing systems. Source formats, such as database dumps or spreadsheets, used to generate the final published format, are out of scope. This document is concerned with what is actually published rather than internal systems used to generate the published data.

Best Practice ~~14:~~ 13: Use machine-readable standardized data formats

~~Data must be~~ Make data available in a ~~machine-readable~~ machine-readable, standardized data format that is ~~adequate for~~ well suited to its intended or potential use.

Why

As data becomes more ubiquitous, and datasets become larger and more complex, processing by computers becomes ever more crucial. Posting data in a format that is not ~~machine readable~~ machine-readable places severe limitations on the continuing usefulness of the data. Data becomes useful when it has been processed and transformed into information. Note that there is an important distinction between formats that can be read and edited by humans using a computer and formats that are machine-readable. The latter term implies that the data is readily extracted, transformed and processed by a computer.

Using non-standard data formats is costly and inefficient, and the data may lose meaning as it is transformed. On the other hand, standardized data formats enable interoperability as well as future uses, such as remixing or visualization, many of which cannot be anticipated when the data is first published. ~~The use of non-proprietary data formats should also be considered since it increases the possibilities for use and reuse of data~~

Intended Outcome

~~It should~~ Machines will easily be ~~possible for machines~~ able to ~~easily~~ read and process data published on the ~~Web. It should~~ Web and humans will be ~~possible for data consumers~~ able to use computational tools typically available in the relevant domain to work with the data.

~~It should be possible for data consumers who wants to use or reuse the data to do so without investment in proprietary software.~~

Possible Approach to Implementation

Make data available in a ~~machine readable~~ machine-readable standardized data format that is easily parseable including but not limited to CSV, XML, ~~Turtle, NetCDF,~~ HDF5, JSON and ~~RDF.~~ RDF serialization syntaxes like RDF/XML, JSON-LD, Turtle.

Example 14 13

~~To provide data about the bus timetables~~ John ~~chooses the~~ knows that tabular ~~format and creates a new distribution for timetable-001. To promote interoperability and to help the automatic~~ data ~~processing, John adopts~~ is commonly used on the ~~GTFS standard~~ Web and he decides to ~~describe~~ use CSV as the data ~~about the bus time tables. According to GTFS John needs to create several csv files, which describe information about bus stops, routes, trips, stop times, bus frequencies as well as information about~~ format for one of the ~~starting and ending time~~ distributions of the bus ~~service during the week,~~ stops dataset. To facilitate data processing, he uses the ~~weekend~~ Model for Tabular Data and ~~holidays. As described in~~ Metadata on the ~~tabular data model, John creates~~ Web for publishing the CSV distribution (


stops-2015-05-05.csv

). The example below presents a ~~group~~ fragment of ~~tables to connect the several csv files used to describe~~ the ~~bus timetables. In this case, the csv~~ CSV distribution ~~(timetable-001-csv) of~~ which complies with the timetable-001 corresponds to a group of tables as described below. a dcat dcat csv a dcat dcat dct dcat dct http } structural metadata defined in Example 4 .

Identifier,Name,Description,Latitude,Longitude,ZONE,URL 

345,Castle Avenue,Sunset Drive,-3.731862,-38.526670,x20,http://data.mycity.example.com/transport/road/bus/stop/id/345
483,Main Street,Lily Park,-3.731541,-38.535157,x20,http://data.mycity.example.com/transport/road/bus/stop/id/483

How to Test

Check ~~that~~ if the data format conforms to a known machine-readable data format specification.

Evidence

Relevant requirements : R-FormatMachineRead , R-FormatStandardized R-FormatOpen

Benefits

Reuse
Processability

Best Practice ~~15:~~ 14: Provide data in multiple formats

~~Data should be~~ Make data available in multiple ~~data formats.~~ formats when more than one format suits its intended or potential use.

Why

Providing data in more than one format reduces costs incurred in data transformation. It also minimizes the possibility of introducing errors in the process of transformation. If many users need to transform the data into a specific data format, publishing the data in that format from the beginning saves time and money and prevents errors many times over. Lastly it increases the number of tools and applications that can process the data.

Intended Outcome

~~It should be~~ As many users as possible ~~for data consumers~~ will be able to ~~work with~~ use the data without ~~transforming it.~~ first having to transform it into their preferred format.

Possible Approach to Implementation

Consider the data formats most likely to be needed ~~by intended users,~~ and consider alternatives that are likely to be useful in the future. Data publishers must balance the effort required to make the data available in many ~~formats,~~ formats against the cost of doing so, but providing at least one alternative will greatly increase the usability of the data. In order to serve data in more than one format you can use content negotiation as described in Best Practice Use content negotiation for serving data available in multiple formats.

A word of warning: local identifiers within the dataset, which may be exposed as fragment identifiers in URIs, must be consistent across the various formats.

Example 15 14

In order to reach a larger number of data ~~consumers~~ consumers, John decides to also provide ~~RDF and XML distributions for~~ a JSON distribution of the ~~timetable-001.~~ bus stops dataset. In the following example, the property dcat:distribution is used to associate the dataset stops-2015-05-05 with its two distributions: stops-2015-05-05.csv and stops-2015-05-05.json.

a dcat dcat dcat dcat csv a dcat dcat dct dcat dct rdf dcat dct dcat dct xml a dcat dcat dct dcat dct

  :stops-2015-05-05
      a dcat:Dataset ;
      dcat:distribution :stops-2015-05-05.csv ;
      dcat:distribution :stops-2015-05-05.json
      .
  :stops-2015-05-05.csv

      a dcat:Distribution ;
      dcat:downloadURL <http://data.mycity.example.com/transport/dataset/bus/stops-2015-05-05.csv> ;
      dct:title "CSV distribution of stops-2015-05-05 dataset" ;
      dct:description "CSV distribution of the bus stops dataset of MyCity" ;
      dcat:mediaType "text/csv" ;
      dct:license <http://creativecommons.org/licenses/by-sa/3.0/> ;
      .
  :stops-2015-05-05.json

      a dcat:Distribution ;
      dcat:downloadURL <http://data.mycity.example.com/transport/dataset/bus/stops-2015-05-05.json> ;
      dct:title "JSON distribution of stops-2015-05-05 dataset" ;
      dct:description "JSON distribution of the bus stops dataset of MyCity" ;
      dcat:mediaType "application/json" ;
      dct:license <http://creativecommons.org/licenses/by-sa/3.0/> ;
      .

How to Test

Check ~~that~~ if the complete dataset is available in more than one data format.

Evidence

Relevant requirements : R-FormatMultiple

Benefits

Reuse
Processability

9.9 8.9 Data Vocabularies

Data is often represented in a structured and controlled way, making reference to a range of vocabularies, for example, by defining types of nodes and links in a data graph or types of values for columns in a table, such as the subject of a book, or a relationship “knows” between two persons. Additionally, the values used may come from a limited set of pre-existing values or resources: for example object types, roles of a person, countries in a geographic area, or possible subjects for books. Such vocabularies ensure a level of control, standardization and interoperability in the data. They can also serve to improve the usability of datasets. Say, a dataset contains a reference to a concept described in several languages. Such reference allows applications to localize their display of their search depending on the language of the user. According to W3C , ~~vocabularies~~ Vocabularies define the concepts and relationships (also referred to as “terms” or “attributes”) used to describe and represent an area of ~~concern. Vocabularies~~ interest. They are used to classify the terms that can be used in a particular application, characterize possible relationships, and define possible constraints on using those terms. Several ~~categories of vocabularies~~ near-synonyms for 'vocabulary' have been coined, for example, ontology, controlled vocabulary, thesaurus, taxonomy, code list, semantic network.

There is no strict division between the artifacts referred to by these names. “Ontology” tends however to denote the vocabularies of classes and properties that structure the descriptions of resources in (linked) datasets. In relational databases, these correspond to the names of tables and columns; in XML, they correspond to the elements defined by an XML Schema. Ontologies are the key building blocks for inference techniques on the Semantic Web. The first means offered by W3C for creating ontologies is the RDF Schema [ RDF-SCHEMA ] language. It is possible to define more expressive ontologies with additional axioms using languages such as those in The Web Ontology Language [ OWL2-OVERVIEW ].

On the other hand, “controlled vocabularies”, “concept ~~schemes”,~~ schemes” and “knowledge organization systems” enumerate and define resources that can be employed in the descriptions made with the former kind of ~~vocabulary.~~ vocabulary, i.e. vocabularies that structure the descriptions of resources in (linked) datasets. A concept from a thesaurus, say, “architecture”, will for example be used in the subject field for a book description (where “subject” has been defined in an ontology for books). For defining the terms in these vocabularies, complex formalisms are most often not needed. Simpler models have thus been proposed to represent and exchange them, such as the ISO 25964 data model [ ISO-25964 ] or W3C 's Simple Knowledge Organization System [ SKOS-PRIMER ].

Best Practice ~~16: Use~~ 15: Reuse vocabularies, preferably standardized ~~terms~~ ones

~~Standardized~~ Use terms ~~should be used~~ from shared vocabularies, preferably standardized ones, to ~~provide~~ encode data and ~~metadata~~ metadata.

Why

~~The need for code lists~~ Use of vocabularies already in use by others captures and ~~other commonly used terms for data values~~ facilitates consensus in communities. It increases interoperability and reduces redundancies, thereby encouraging reuse of your own data. In particular, the use of shared vocabularies for ~~describing~~ metadata ~~is to avoid as much as possible ambiguity~~ (especially structural, provenance, quality and ~~clashes in~~ versioning metadata) helps the ~~terms chosen for~~ comparison and automatic processing of both data and ~~metadata information. The key reason is to be able~~ metadata. In addition, refering to ~~refer~~ codes and terms from standards helps to ~~the standardized body/organization which defines the term~~ avoid ambiguity and clashes between similar elements or ~~code as a clear reference.~~ values.

Intended Outcome

~~The benefit of using standardized code lists and other commonly used terms is to enable interoperability~~ Interoperability and consensus among data publishers and ~~consumers.~~ consumers will be enhanced.

Possible Approach to Implementation

~~An approach to implementation is the case~~ The Vocabularies section of ~~a vocabulary developed within a Working Group or a standardized body such as~~ the W3C . Best Practices for Publishing Linked Data [ LD-BP ] provides guidance on the discovery, evaluation and selection of existing vocabularies.

~~The~~ Organizations such as the Open Geospatial Consortium ~~(OGC) could define~~ (OGC), ISO , W3C , WMO , libraries and research data services, etc. provide lists of codes, terminologies and Linked Data vocabularies that can be used by everyone. A key point is to make sure the ~~notion~~ dataset, or its documentation, provides enough (human- and machine-readable) context so that data consumers can retrieve and exploit the standardized meaning of ~~granularity~~ the values. In the context of the Web, using unambiguous, Web-based identifiers (URIs) for ~~geospatial datasets, while [DCAT] vocabulary provides a~~ standardized vocabulary ~~reusing the same notion applied~~ resources is an efficient way to ~~catalogs on the Web.~~ do this.

Example 16 15

~~to be done~~ ~~How~~

The DCAT vocabulary expresses metadata concerning datasets [ VOCAB-DCAT ] and re-uses elements from several pre-existing vocabularies: Dublin Core, FOAF, SKOS and vCard. Reusing Dublin Core properties like dct:title instead of creating new ones (say, dcat:title ) enables DCAT-based metadata to ~~Test Check~~ be consumed by any application that can read and manipulate Dublin Core statements.
In the ~~terms or codes to be used are defined in a standard organization/working group~~ digital culture sector, the data model for Europeana ( EDM ) also makes extensive uses of ~~body such as IETF, OGC, W3C ,~~ existing shared vocabularies like Dublin Core, FOAF, SKOS, etc. ~~Evidence Relevant requirements : R-MetadataStandardized , R-QualityComparable Benefits Reuse Processability Interoperability~~ This has facilitated adoption of EDM by Europeana's data providers and helped position it as a Best Practice ~~17: Reuse vocabularies Shared vocabularies should be used to provide~~ for similar initiatives in the same sector. For instance, the metadata ~~Why Reusing vocabularies increases interoperability and reduces redundancies, encouraging reuse~~ application profile from the Digital Public Library of America reuses EDM and thus the ~~data. Shared~~ various vocabularies ~~capture~~ that EDM builds on. As a ~~consensus~~ result, large amounts of digital culture data have become more interoperable within the ~~community about a specific domain. The reuse of shared vocabularies~~ sector. That data is also easier to ~~describe metadata helps~~ reuse by consumers from other communities, who are not familiar with the ~~automatic processing of data~~ traditional models and ~~metadata. Shared vocabularies should be especially~~ terminologies used ~~to describe both structural metadata as well as other types~~ by library, archives and museums.
The Library of ~~metadata (descriptive, provenance, quality~~ Congress publishes lists of ISO 639 country codes as Linked Data (see [ ISO639-1-LOC ] for two-letter codes):
```
:stops

    dct:language <http://id.loc.gov/vocabulary/iso639-1/en> .
```
Australia's Solid Earth and ~~versioning). Intended Outcome It should be possible to automatically compare two or more datasets when they use~~ Environment Grid publishes a reference list of URIs for geologic timescale elements from the ~~same vocabulary to describe metadata. It should be possible~~ International Commission on Stratigraphy's Chronostratigraphic Chart, such as http://resource.geosciml.org/classifier/ics/ischart/Paleozoic for ~~machines to automatically process~~ the ~~data within~~ Paleozoic Era:
```
:dataset-005 a dcat:Dataset ;

    dct:temporal <http://resource.geosciml.org/classifier/ics/ischart/Paleozoic> .
```
Google maintains the General Transit Feed Specification that defines a ~~dataset. It should be possible~~ format for ~~machines~~ publishing public transportation data. This format relies on a set of fields like route_short_name or route_type that are carefully defined and exposed to ~~automatically process~~ constant community feedback in order to facilitate consensus. Definitions include specifications of coded values, as the ~~metadata~~ ones used with route_type:
```
0 - Tram, Streetcar, Light rail. Any light rail or street level system within a metropolitan area.

1 - Subway, Metro. Any underground rail system within a metropolitan area.
2 - Rail. Used for intercity or long-distance travel.
```
Note that ~~describes~~ in a ~~dataset.~~ non-Linked Data fashion, these fields and codes have no individual Web identifiers nor machine-readable semantics. Exploiting them thus requires implementers to parse the documentation and encode interpretations in each individual application consuming the data.

~~Possible Approach~~

How to Implementation Test

~~The Standard~~ Using vocabulary repositories like the Linked Open Vocabularies repository ~~section~~ or lists of services mentioned in technology-specific Best Practices such as the ~~W3C~~ Best Practices for Publishing Linked Data [ LD-BP ~~] provides guidance on~~ ], or the ~~discovery, evaluation~~ Core Initial Context for RDFa and ~~selection of existing vocabularies. Example 17 to be done How to Test Check~~ JSON-LD , check that ~~terms~~ classes, properties, terms, elements or attributes used to represent a dataset do not replicate those defined by vocabularies used for other datasets.

Check if the terms or codes in ~~common use within~~ the ~~same domain.~~ vocabulary to be used are defined in a standards development organization such as IETF, OGC & W3C etc., or are published by a suitable authority, such as a government agency.

Evidence

Relevant requirements : R-MetadataStandardized , R-MetadataDocum , R-QualityComparable , R-VocabOpen , R-VocabReference

Benefits

Reuse
Processability
Comprehension
Trust
Interoperability

Best Practice ~~18:~~ 16: Choose the right formalization level

~~When reusing a vocabulary, a data publisher should opt~~ Opt for a level of formal semantics that ~~fit~~ fits both data and the most likely applications.

Why

As Albert Einstein may or may not have said: everything should be made as simple as possible, but not simpler.

Formal semantics ~~may~~ help ~~one~~ to establish precise specifications that ~~support establishing the intended~~ convey detailed meaning ~~of the vocabulary~~ and ~~the performance of~~ using a complex vocabulary (ontology) may serve as a basis for tasks such as automated reasoning. On the other hand, such complex vocabularies require more effort to produce and understand, which could hamper their reuse, ~~as well as the~~ comparison and linking of datasets ~~exploiting~~ that use them. ~~Highly formalized~~

If the data is ~~also harder~~ sufficiently rich to ~~exploit by inference engines: for example, using an OWL class in a position where a SKOS concept~~ support detailed research questions (the fact that A, B and C are true, and that D is ~~enough, or using OWL classes with complex OWL axioms raises the formal complexity of the data according~~ not true, leads to the conclusion E) then something like an OWL ~~Profiles~~ Profile would clearly be appropriate [ OWL2-PROFILES ]. ~~Data producers should therefore seek~~

But there is nothing complicated about a list of bus stops.

Choosing a very simple vocabulary is always attractive but there is a danger: the drive for simplicity might lead the publisher to ~~identify~~ omit some data that provides important information, such as the ~~right level~~ geographical location of ~~formalization for particular domains, audiences and tasks, and maybe offer different formalization levels when one size does~~ the bus stops that would prevent showing them on a map. Therefore, a balance has to be struck, remembering that the goal is not ~~fit all.~~ simply to share your data, but for others to reuse it.

Intended Outcome

The ~~data supports all~~ most likely application cases ~~but should not~~ will be supported with no more ~~complex to produce and reuse~~ complexity than ~~necessary;~~ necessary.

Possible Approach to Implementation

~~Identify the "role" played by the vocabulary for the datasets, say, providing classes and properties~~ Look at what your peers do already. It's likely you'll see that there is a commonly used ~~to type resources and provide the predicates for RDF statements,~~ vocabulary that matches, or ~~elements in an XML Schema, as opposed~~ nearly matches, your current needs. That's probably the one to ~~providing simple concepts or codes~~ use.

You may find a vocabulary that ~~are used for representing attributes of the resources described in~~ you'd like to use but you notice a ~~dataset. When simpler models are enough~~ semantic constraint that makes it difficult to ~~convey the necessary semantics, represent vocabularies using them. For instance, for Linked Data, SKOS may be preferred for simple vocabularies~~ do so, such as ~~opposed~~ a domain or range restriction that doesn't apply to ~~formal ontology languages like OWL; see for example how concept schemes and code lists are used in~~ your case. In that scenario, it's often worth contacting the ~~RDF Data Cube Recommendation [ VOCAB-DATA-CUBE ]. Example 18~~ vocabulary publisher and talking to them about it. They may well be ~~done How~~ able to ~~Test For formal knowledge representation languages, applying an inference engine~~ lift that restriction and provide further guidance on ~~top of~~ how the ~~data that uses a given~~ vocabulary ~~does not produce too many statements that are unnecessary for target applications. Evidence~~ is used more broadly.

~~Relevant requirements : R-VocabReference , R-VocabDocum , R-QualityComparable~~ W3C operates a mailing list at public-vocabs@w3.org ~~Benefits to be done. Issue 8 The best practice on formalization above (especially sections "Intended outcome"~~ [ archive ] where issues around vocabulary usage and ~~"How to test") should~~ development can be ~~re-written in a more technology-neutral way. Issue-144 9.10 Sensitive Data~~ discussed.

~~To support best practices~~ If you are creating a vocabulary of your own, keep the semantic restrictions to the minimum that works for ~~publishing sensitive data, data publishers should identify all sensitive data, assess~~ you, again, so as to increase the ~~exposure risk, determine~~ possibility of reuse by others. As an example, the ~~intended usage, data user audience and any related usage policies, obtain appropriate approval, and determine~~ designers of the ~~appropriate security measures needed to taken~~ (very widely used) SKOS ontology itself have minimized its ontological commitment by questioning all formal axioms that were suggested for its classes and properties. Often they were rejected because their use, while beneficial to ~~protect~~ many applications, would have created formal inconsistencies for the ~~data, which should also account~~ data from other applications, making SKOS not usable at all for ~~secure authentication and use of HTTPS. Data publishers should preserve~~ these. As an example, the ~~privacy of individuals where~~ property skos:broader was not defined as a transitive property, even though it would have fitted the ~~release~~ way hierarchical links between concepts are created for many thesauri [ SKOS-DESIGN ]. Look for evidence of personal information would endanger safety (unintended accidents) or security (deliberate attack). Privacy information might include: full name, home address, mail address, national identification number, IP address (in some cases), vehicle registration plate number, driver's license number, face, fingerprints, or handwriting, credit card numbers, digital identity, date that kind of ~~birth, birthplace, genetic information, telephone number, login name, screen name, nickname, health records etc.~~ "design for wide use" when selecting a vocabulary.

~~At times, because~~ Another example of ~~sharing policies sensitive data may not~~ this "design for wide use" can be ~~available~~ seen in ~~part or~~ schema.org . Launched in ~~its entirety. Data unavailability represents gaps that may affect the overall analysis~~ June 2011, schema.org was massively adopted in a very short time in part because of ~~datasets. To account~~ its informative rather than normative approach for unavailable data, data publishers should publish information about unavoidable data gaps. Best Practice 19: Provide data unavailability reference References to data that is not open, or available under different restrictions to defining the ~~origin~~ types of ~~the reference, should provide explanation about how the referred data~~ objects that properties can be accessed and who can access it. Why Publishing online documentation about unavailable data due to sensitivity issues provides a means for publishers to explicitly identify knowledge gaps. This provides a contextual explanation for consumer communities thus encouraging use used with. For instance, the values of the ~~data that is available. Intended Outcome Publishers should provide information about data that is referred~~ property author are only "expected" to ~~from~~ be of type Organization or Person.author "can be used" on the ~~current dataset~~ type CreativeWork but ~~that~~ this is ~~unavailable or only available under different conditions. Possible Approach to Implementation Data publishers may publish an HTML document that gives~~ not a ~~human-readable explanation for data unavailability. RDF may be used~~ strict constraint. Again, that approach to ~~provide~~ design makes schema.org a ~~machine readable version of the same information. If appropriate, consider editing the server's 4xx response page(s)~~ good choice as a vocabulary to ~~provide the information.~~ use when encoding data for sharing.

How to Test

~~If the dataset includes references to other data that is unavailable, check whether an explanation~~ This is ~~available in the metadata and/or description~~ almost always a matter of ~~it.~~ subjective judgement with no objective test. As a general guideline:

Are common vocabularies used such as Dublin Core and schema.org?
Are simple facts stated simply and retrieved easily?
For formal knowledge representation languages, applying an inference engine on top of the data that uses a given vocabulary does not produce too many statements that are unnecessary for target applications.

Evidence

Relevant requirements : ~~R-AccessLevel~~ R-VocabReference , R-QualityComparable

Benefits

Reuse
~~Trust~~ Comprehension
Interoperability

9.11 8.10 Data Access

Providing easy access to data on the Web enables both humans and machines to take advantage of the benefits of sharing data using the Web infrastructure. By default, the Web offers access using Hypertext Transfer Protocol (HTTP) methods. This provides access to data at an atomic transaction level. ~~However, when~~ When data is distributed across multiple files or requires more sophisticated retrieval methods ~~different~~ approaches ~~can be adopted to enable data access, including~~ like bulk download and API s. s can be adopted.

~~One approach is packaging data in~~ In the bulk ~~using non-proprietary file formats (for example tar files). Using this~~ download approach, bulk data is generally pre-processed server side where multiple files or directory trees of files are provided as one downloadable file. When bulk data is being retrieved from non-file system solutions, depending on the data user communities, the data publisher can offer APIs to support a series of retrieval operations representing a single transaction.

For data that is ~~streaming to the Web~~ generated in ~~“real time”~~ real time or ~~“near~~ near real ~~time”,~~ time, data publishers should ~~publish data or~~ use ~~APIs~~ an automated system to enable immediate access to ~~data, allowing access to critical time sensitive~~ time-sensitive data, such as emergency information, weather forecasting data, or ~~published~~ system monitoring metrics. In general, APIs should be available to allow third parties to automatically search and retrieve ~~data published on the Web.~~ such data.

~~On a further note, it can be observed that~~ Aside from helping to automate real-time data ~~on the Web is essentially about the description~~ pipelines, APIs are suitable for all kinds of ~~entities identified by a unique, Web-based, identifier (an URI). Once the~~ data ~~is dumped and sent to an institute specialised in digital preservation the link with the Web is broken (dereferencing) but the role of~~ on the ~~URI as~~ Web. Though they generally require more work than posting files for download, publishers are increasingly finding that delivering a ~~unique identifier still remains. In order to increase the usability of preserved dataset dumps it~~ well documented, standards-based, stable API is ~~relevant to maintain a list of these identifiers.~~ worth the effort.

Best Practice ~~20:~~ 17: Provide bulk download

~~Data should be available for bulk download.~~ Enable consumers to retrieve the full dataset with a single request.

Why

When ~~web~~ Web data is distributed across many ~~URLs and~~ URIs but might logically be organized as one container, accessing the data in bulk is can be useful. Bulk access provides a consistent means to handle the data as one ~~container. Without it, individually~~ dataset. Individually accessing data is over many retrievals can be cumbersome ~~leading~~ and, if used to reassemble the complete dataset, can lead to inconsistent approaches to handling the ~~container.~~ data.

Intended Outcome

~~It should~~ Large file transfers that would require more time than a typical user would consider reasonable will be possible ~~to download data on the Web in bulk. Data publishers should provide a way either through bulk file formats or APIs for consumers to access this type of data.~~ via dedicated file-transfer protocols.

Possible Approach to Implementation

Depending on the nature of the data and consumer ~~needs~~ needs, possible approaches could ~~include:~~ include the following:

~~Preprocessing~~ For datasets that exist initially as multiple files, preprocessing a copy of the data ~~in compressed archive format where~~ into a single file and making the data ~~more easily~~ accessible ~~as one URL. This is particularly useful~~ for ~~handling data that changes infrequently or on a scheduled basis.~~ download from one URI. For larger datasets, the file can also be compressed.
Hosting an API ~~such as a REST or SOAP service~~ that ~~dynamically retrieves individual data and returns~~ includes the ability to retrieve a bulk ~~container.~~ download in addition to dynamic queries. This approach is useful ~~when~~ for capturing a complete snapshot of ~~the~~ dynamic data. ~~The API~~
For very large datasets, bulk file transfers can ~~also~~ be ~~useful for consumers to customize what they want included~~ enabled via means other than http, such as bbcp or ~~excluded.~~ GridFTP .

~~Hosting a database, web page, or SPARQL endpoint that contains discoverable~~

The bulk download should include the metadata describing the dataset. Discovery metadata [ VOCAB-DCAT ] ~~describing the container and data URLs associated with~~ should also be available outside the ~~container.~~ bulk download.

How to Test

~~Humans can retrieve copies of preprocessed bulk data through existing tools such as a browser. Clients~~ Check if the full dataset can ~~test bulk access through an API or queries to web resources~~ be retrieved with ~~discoverable metadata about the bulk data.~~ a single request.

Evidence

Relevant requirements : R-AccessBulk

Benefits

Reuse
Access

Best Practice ~~21: Use Web Standardized Interfaces~~ 18: Provide Subsets for Large Datasets

It If your dataset is ~~recommended to use URIs, HTTP verbs, HTTP response codes, MIME types, typed HTTP Links~~ large, enable users and ~~content negotiation when designing APIs~~ applications to readily work with useful subsets of your data.

Issue Why This BP is the subject of much debate and

Large datasets can be difficult to move from place to place. It can also be inconvenient for users to store or parse a large dataset. Users should not ~~yet be seen as stable. Issue-233 Why APIs~~ have to download a complete dataset if they only need a subset of it. Moreover, Web applications that ~~use HTTP verbs, URIs,~~ tap into large datasets will perform better if their developers can take advantage of “lazy loading”, working with smaller pieces of a whole and ~~response codes leverage developers’ existing knowledge, making it easier~~ pulling in new pieces only as needed. The ability to ~~make use~~ work with subsets of ~~your API . Using a standardized interface~~ the data also ~~helps~~ enables offline processing to ~~avoid tight coupling between requests and responses, making for an API that~~ work more efficiently. Real-time applications benefit in particular, as they can ~~readily be used by many clients.~~ update more quickly.

Intended Outcome Developers who have some experience with REST or REST-like APIs

Humans and applications will ~~have an initial understanding~~ be able to access subsets of ~~how~~ a dataset, rather than the entire thing, with a high ratio of needed to ~~use your API because it uses standardized interfaces. Your API~~ unneeded data for the largest number of users. Static datasets that users in the domain would consider to be too large will ~~also~~ be ~~easier to maintain~~ downloadable in smaller pieces. APIs will make slices or filtered subsets of the data available, the granularity depending on the needs of the domain and the demands of performance in a Web application.

Possible Approach Approaches to Implementation

~~There are many RESTful development frameworks available. If you~~ Consider the expected use cases for your dataset and determine what types of subsets are ~~already using a web development framework that supports building REST APIs, consider using that. If not, consider an~~ likely to be most useful. An API ~~-specific framework that uses REST, such as those mentioned above. One implementation type~~ is usually the most flexible approach to ~~consider~~ serving subets of data, as it allows customization of what data is ~~a hypermedia API —an API that responds with links rather than~~ transferred, making the available subsets much more likely to provide the needed data ~~alone. Even~~ – and little unneeded data – for an any given situation. The granularity should be suitable for Web application access speeds. (An API call that returns within one second enables an application to deliver interactivity that feels natural. Data that takes more than ten seconds to deliver will likely cause users to suspect failure.)

Another way to subset a dataset is ~~not truly RESTful, using hypermedia~~ to simply split it into smaller units and make those units individually available for download or viewing.

It can also be helpful ~~for making an API~~ to mark up a dataset so that ~~is self-documenting. RESTful APIs~~ individual sections through the data (or even smaller pieces, if expected use ~~hypermedia as~~ cases warrant it) can be processed separately. One way to do that is by indicating “slices” with the ~~engine of application state (HATEOAS). Because state~~ RDF Data Cube Vocabulary .

Example 18

The MyCity transit agency has been collecting detailed data about passenger usage for several years. This is ~~controlled~~ a very large dataset, containing values for numbers of passengers by ~~links~~ transit type, route, vehicle, driver, entry stop, exit stop, transit pass type, entry time, etc. They have found that ~~can be examined~~ a wide variety of stakeholders are interested in downloading various subsets of the data. The folks who run each transit system want only the data for their transit mode, the city planners only want the numbers of entries and ~~used on~~ exits at each stop, the ~~fly,~~ city budget office wants only the ~~underlying code~~ numbers for the various types of passes sold, and others want still different subsets. The agency created a Web site where users can ~~change without affecting client code~~ select which variables are of interest to them, set ranges on some variables, and ~~developers, making your API evolvable.~~ download only the subset they need.

How to Test Example 21 to

Check that the entire dataset can be ~~done~~ recovered by making multiple requests that retrieve smaller units.

Evidence

Relevant requirements : ~~R-AccessBulk~~ R-Citable , R-GranularityLevels , R-UniqueIdentifier , R-AccessRealTime , R-GranularityLevels

Benefits

Reuse
Linkability
Access
Processability

Best Practice ~~22: Serving~~ 19: Use content negotiation for serving data ~~and resources with different~~ available in multiple formats

~~It is recommended to use~~ Use content negotiation in addition to file extensions for serving data available in multiple ~~formats~~ formats.

Why

It is possible to ~~have~~ serve data ~~being served~~ in a an HTML page mixed with human-readable and machine-readable ~~data.~~ data, using RDFa ~~could be used to mix HTML content with semantic data. But, in some cases this page is subject~~ for example. However, as the Architecture of ~~scraping by some applications in order to get data available. When structured data is mixed with HTML, but it is possible to have a different representation with~~ the ~~same structured data, written in Turtle or JSON-LD, it is recommended to serve this page using Content Negotiation.~~ Web [ WEBARCH ~~Note~~ ] and DCAT [ VOCAB-DCAT ~~This BP will~~ ] make clear, a resource, such as a dataset, can have many representations. The same data might be ~~complemented.~~ available as JSON, XML, RDF, CSV and HTML. These multiple representations can be made available via and API but should be made available from the same URL using content negotiation to return the appropriate representation (what DCAT calls a distribution). Specific URIs can be used to identify individual representations of the data directly, by-passing content negotiation.

Intended Outcome

~~It should be possible to serve~~ Content negotiation will enable different resources or different representations of the same resource ~~with different representations.~~ to be served according to the request made by the client.

Possible Approach to Implementation

A possible approach to implementation is to configure the ~~web~~ Web server to deal with content negotiation of the requested resource.

http://example.org/profile_info.html - Personal information represented in HTML + RDFa http://example.org/profile_info.json - The same information of the resource but represented in JSON-LD format http://example.org/profile_info.ttl - The same information of the resource but represented in Turtle format

The ~~specif~~ specific format of the resource's representation can be accessed by the URI or by the Content-type of the HTTP Request.

Example 22 19

to Different representations of the bus stops dataset can be ~~done~~ served according to the specified content type of the HTTP Request:
Using cURL to get the content of http://data.mycity.example.com/transport/dataset/bus/stops represented in CSV and in JSON-LD format.


curl

-

H

"Accept:
text/csv"

http
:
//data.mycity.example.com/transport/dataset/bus/stops


curl

-

H

"Accept:
application/ld+json"

http
:
//data.mycity.example.com/transport/dataset/bus/stops

How to Test

Check the available representations of the resource and try to get them specifying the accepted content on the HTTP Request header.

Evidence

Relevant requirements :

Benefits

Reuse
Access

Best Practice ~~23:~~ 20: Provide real-time access

When data is produced in ~~real-time,~~ real time, make it ~~should be~~ available on the Web in real time or near real-time.

Why

The presence of real-time data on the Web enables access to critical time sensitive data, and encourages the development of real-time ~~web~~ Web applications. Real-time access is dependent on real-time data producers making their data readily available to the data publisher. The necessity of providing real-time access for a given application will need to be evaluated on a case by case basis considering refresh rates, latency introduced by data post processing steps, infrastructure availability, and the data needed by consumers. In addition to making data accessible, data publishers may provide additional information describing data gaps, data errors and anomalies, and publication delays.

Intended Outcome

~~Data should~~ Applications will be ~~available at~~ able to access time-critical data in real time or near real ~~time,~~ time , where real-time means a range from milliseconds to a few seconds after the data ~~creation, and near real time is a predetermined delay for expected data delivery.~~ creation.

Possible Approach to Implementation

~~Real-time data accessibility may be achieved through two means: Push - as data~~ A possible approach to implementation is ~~produced the producers communicates data~~ for publishers to ~~the~~ configure a Web Service that provides a connection so as real-time data ~~publisher either~~ is received by ~~disseminating data to~~ the ~~publisher or making storage~~ web service it can be instantly made available ~~accessible~~ to ~~the~~ consumers by polling or streaming.

If data ~~producer. On-Demand (Pull) - available~~ is checked infrequently by consumers, real-time data ~~is made available~~ can be polled upon ~~request. In this case,~~ consumer request for the most recent data through an API . The data publishers will provide an API to facilitate these read-only requests. ~~In addition to~~

If data ~~access, to ensure credibility providing access to error conditions, anomalies, and instrument "house keeping"~~ is checked frequently by consumers, a streaming data ~~enhance real-time applications ability to interpret and convey real-time~~ implementation may be more appropriate where data ~~quality to consumers.~~ is pushed through an API . While streaming techniques are beyond the scope of this best practice, there are many standard protocols and technologies available (for example Server-sent Events, WebSocket, EventSourceAPI) for clients receiving automatic updates from the server.

Example 23 20

In this example the Transport Agency of MyCity keeps track of all bus GPS data. The API provides consumers real-time status information using a REST API . The API allows the consumer to ~~be done~~ select:

Current position of the bus
Bus arrival time
Bus status

API Description

Description	API	Parameters
Bus position	`{root}/bus/position/current`	bus_id
Bus arrival time to some stop	`{root}/bus/arrival_time`	bus_id, stop_id
Bus status (Possible return: "on-schedule", "delay", "out-of-service")	`{root}/bus/status`	bus_id

How to Test

To adequately test real time data access, data will need to be tracked from the time it is initially collected to the time it is published and accessed. [ PROV-O ] can be used to describe these activities. Caution should be used when analyzing real-time access for systems that consist of multiple computer systems. For example, tests that rely on wall clock time stamps may reflect inconsistencies between the individual computer systems as opposed to data publication time latency.

Evidence

Relevant requirements : R-AccessRealTime

Benefits

Reuse
Access

Best Practice ~~24:~~ 21: Provide data up to date

~~Data must be~~ Make data available in an up-to-date ~~manner~~ manner, and make the update frequency ~~made~~ explicit.

Why

~~Data~~ The availability of data on the Web ~~availability~~ should closely ~~coincide with~~ match the data ~~provided at~~ creation ~~time,~~ or collection time, or perhaps after it has been processed or changed. Carefully synchronizing data publication to the update frequency encourages ~~data~~ consumer confidence and data reuse.

Intended Outcome

Data on the Web will be updated in a timely manner so that the most recent data available online generally reflects the most recent data released via any other channel. When new data ~~is provided or data is updated,~~ becomes available, it ~~must~~ will be published ~~to coincide with~~ on the ~~data changes.~~ Web as soon as practical thereafter.

Possible Approach to Implementation

~~Implement an API to enable data access. When data is provided by bulk access, new files with new data should~~ New versions of the dataset can be ~~provided as soon as additional data is created or updated. Or, use technologies that are intended~~ posted to ~~expose data~~ the Web on a regular schedule, following the Best Practices for Data Versioning . Posting to the Web ~~using interlinked resources, like Activity Streams or Atom.~~ can be made a part of the release process for new versions of the data. Making Web publication a deliverable item in the process and assigning an individual person as responsible for the task can help prevent data becoming out of date. To set consumer expectations for updates going forward, you can include human-readable text stating the expected publication frequency, and you can provide machine-readable metadata indicating the frequency as well.

Example 24 21

Suppose that the update frequency of the bus stops dataset is annual. In order to describe the frequency with which new data is added to the dataset, the property dct:accrualPeriodicity can be ~~done~~ used. A new version of the dataset ( stops-2016-05-05 ) is created to reflect the update schedule of the data. It is important to note that new versions can be created sooner than the schedule calls for, but the publisher should ensure that extra versions are published to the Web as quickly as their scheduled counterparts.

   :stops-2016-05-05

      a dcat:Dataset ;
      dct:title "Bus stops of MyCity" ;
      dcat:keyword "transport","mobility","bus" ;
      dct:issued "2016-05-05"^^xsd:date ;
      ...
      dct:accrualPeriodicity <http://purl.org/linked-data/sdmx/2009/code#freq-A> ;
      ...

      dct:isVersionOf :stops-2015-05-05 ;
      pav:previousVersion stops-2015-12-17 ;
      rdfs:comment "The bus stops dataset was updated to reflect the creation of new bus stops since the last update and to follow the update frequency" ;
      owl:versionInfo "1.2" ;
      pav:version "1.2" ; 
      .

How to Test

~~Write test standard operating procedure for data publisher to keep test data~~ Check that the update frequency is stated and that the most recently published copy on the Web ~~site up to date. Following standard operating procedure:~~ is no older than the date predicted by the stated update frequency.

Write test client to access published data. Access data and save first copy locally. Publish an updated version of data. Access data and save second copy locally. Compare first copy to second copy to verify change.

Evidence

Relevant requirements : R-AccessUptodate

Benefits

Reuse
Access

~~Issue 9 To debate if the goal should be to adhere to a published schedule for updates. Issue-195~~

Best Practice ~~25: Document your API~~ 22: Provide an explanation for data that is not available

~~Provide your users with complete information~~ For data that is not available, provide an explanation about how ~~to use your API .~~ the data can be accessed and who can access it.

Why

~~The primary consumers of an API are developers. In order~~ Publishing online documentation about unavailable data due to ~~develop against your API ,~~ sensitivity issues provides a ~~developer will need to understand how~~ means for publishers to explicitly identify knowledge gaps. This provides a contextual explanation for consumer communities thus encouraging use ~~it.~~ of the data that is available.

Intended Outcome

~~Developers will be able to code efficiently against your API , and they~~ Consumers will ~~make best use of the features you have provided. It~~ know that data that is ~~recommended to show explanation about the architecture chosen for the API design and show how~~ referred to ~~invoke each API call and what will be returned~~ from ~~those calls.~~ the current dataset is unavailable or only available under different conditions.

Possible Approach to Implementation Swagger , io-docs , OpenApis , and others provide formats

Depending on the machine/human context there are a variety of ways to indicate data unavailability. Data publishers may publish an HTML document that gives a human-readable explanation for ~~documentation.~~ data unavailability. From a machine application interface perspective, appropriate HTTP status codes with customized human readable messages can be used. Examples of status codes include: 303 (see other), 410 (permanently removed), 503 (service *providing data* unavailable).

How to Test

~~Quality of documentation~~ Where the dataset includes references to data that is ~~related~~ no longer available or is not available to ~~the usage~~ all users, check that an explanation of what is missing and ~~feedback from developers. Try~~ instructions for obtaining access (if possible) are given. Check if a legitimate http response code in the 400 or 500 range is returned when trying to get ~~constant feedback from your users about the documentation.~~ unavailable data.

Evidence

~~Note~~ Relevant requirements : R-AccessLevel , R-SensitivePrivacy , R-SensitiveSecurity

Benefits

Reuse
Trust
~~This BP will be complemented.~~

8.10.1 Data Access APIs

Best Practice ~~26: Use~~ 23: Make data available through an API

Offer an API to serve data if you have the resources to do so.

Why

An API offers the greatest flexibility and processability for consumers of your data. It can enable real-time data usage, filtering on request, and the ability to work with the data at an atomic level. If your dataset is large, frequently updated, or highly complex, an API is likely to be ~~helpful.~~ the best option for publishing your data.

Intended Outcome

Developers will have programmatic access to the data for use in their own ~~applications.~~ applications, with data updated without requiring effort on the part of consumers. Web applications will be able to obtain specific data by querying a programmatic interface.

Possible Approach to Implementation

Creating an API is a little more involved than posting data for download. It requires some understanding of how to build a Web application. One need not necessarily build from scratch, however. If you use a data management platform, such as CKAN, you may be able to ~~simply~~ enable an existing API . Many ~~web~~ Web development frameworks include support for APIs, and there are also frameworks written specifically for building custom APIs.

Rails, Django, and Express are some example Web development frameworks that offer support for building APIs. Examples of API frameworks include Swagger, Apigility, ~~Apache CXF,~~ Restify, and ~~Restify~~ Restlet.

How to Test

Check if a test client can simulate calls and the API returns the expected responses.

Evidence

Relevant requirements : R-AccessRealTime , R-AccessUpToDate

Benefits

Reuse
Processability
Interoperability
Access

Best Practice 24: Use ~~Service Virtualization~~ Web Standards as the foundation of APIs

When designing APIs, use an architectural style that is founded on the technologies of the Web itself.

Why

APIs that are built on Web standards leverage the strengths of the Web. For example, using HTTP verbs as methods and URIs that map directly to ~~simulate calls~~ individual resources helps to avoid tight coupling between requests and responses, making for an API that is easy to maintain and can readily be understood and used by many developers. The statelessness of the Web can be a strength in enabling quick scaling, and using hypermedia enables rich interactions with your API .

Intended Outcome

Developers who have some experience with APIs based on Web standards, such as REST, will have an initial understanding of how to use the API . The API will also be easier to maintain.

Possible Approaches to Implementation

REST (REpresentational State Transfer)[ Fielding ][ Richardson ] is an architectural style that, when used in a Web API , takes advantage of the architecture of the Web itself. A full discussion of how to build a RESTful API is beyond the scope of this document, but there are many resources and a strong community that can help in getting started. There are also many RESTful development frameworks available. If you are already using a Web development framework that supports building REST APIs, consider using that. If not, consider an API -only framework that uses REST.

Another aspect of implementation to consider is making a hypermedia API , one that responds with links as well as data. Links are what make ~~sure~~ the Web a web, and data APIs can be more useful and usable by including links in their responses. The links can offer additional resources, documentation, and navigation. Even for an API that does not meet all the ~~performance~~ constraints of REST, returning links in responses can make for a service that is ~~acceptable.~~ rich and self-documenting.

Example 24

An example response for information about a certain bus route from a hypermedia API might look like the following:

 {

 	"code": "200",
 	"text": "OK",
 	"data": {
 		"update_time": "2013-01-01T03:00:02Z",
 		"route_id": "52",
 		"route_name": "Lexington South",
 		"route_description": "Lexington corridor south of Market",
 		"route_type": "3"
 	},
 	"links": [{
    "href": "http://data.mycity.example.com/transport/api/v2/routes/52",
 		"rel": "self",
 		"type": "application/json",
 		"method": "GET"
 	}, {
  "href": "http://data.mycity.example.com/transport/api/v2/routes",
 		"rel": "collection",
 		"type": "application/json",
 		"method": "GET"
 	}, {
    "href": "http://data.mycity.example.com/transport/api/v2/schedules/52",
 		"rel": "describedby",
 		"type": "application/json",
 		"method": "GET"
 	}, {
    "href": "http://data.mycity.example.com/transport/api/v2/maps/52",
 		"rel": "describedby",
 		"type": "application/json",
 		"method": "GET"
 	}]
}

How to Test

Check that the service avoids using http as a tunnel for calls to custom methods, and check that URIs do not contain method names.

Evidence

~~Issue 10~~ Relevant requirements : R-APIDocumented , R-UniqueIdentifier

Benefits

Reuse
Linkability
Interoperability
Discoverability
Access
Processability

~~To review~~ Best Practice 25: Provide complete documentation for your API

Provide complete information on the ~~BP "Use~~ Web about your API . Update documentation as you add features or make changes.

Why

Developers are the primary consumers of an API " and ~~possibly rewrite~~ the documentation is the first clue about its quality and usefulness. When API documentation is complete and easy to understand, developers are probably more willing to continue their journey to use it. Providing comprehensive documentation in one place allows developers to code efficiently. Highlighting changes enables your users to take advantage of new features and adapt their code if needed.

Intended Outcome

Developers will be able to obtain detailed information about each call to the API , including the parameters it takes and what it is expected to return, i.e., the whole set of information related to the API . The set of values — how to use it, notices of recent changes, contact information, and so on — should be described and easily browsable on the Web. It will also enables machines to access the API documentation in order to help developers build API client software.

Possible Approach to Impelmentation section. Implementation

A typical API reference provides a comprehensive list of the calls the API can handle, describing the purpose of each one, detailing the parameters it allows and what it returns, and giving one or more examples of its use. One nice trend in API documentation is to provide a form in which developers can enter specific calls for testing, to see what the API returns for their use case. There are now tools available for quickly creating this type of documentation, such as Swagger , io-docs , OpenApis , and others. It is important to say that the API should be self-documenting as well, so that calls return helpful information about errors and usage. API users should be able to contact the maintainers with questions, suggestions, or bug reports.

The quality of documentation is also related to usage and feedback from developers. Try to get constant feedback from your users about the documentation.

How to Test

Check that every call enabled by your API is described in your documentation. Make sure you provide details of what parameters are required or optional and what each call returns.

Check the Time To First Successful Call (i.e. being capable of doing a successful request to the API within a few minutes will increase the chances that the developer will stick to your API ).

Evidence

Relevant requirements : R-APIDocumented

Benefits

Reuse
Trust

Best Practice 26: Avoid Breaking Changes to Your API

Avoid changes to your API that break client code, and communicate any changes in your API to your developers when evolution happens.

Why

When developers implement a client for your API , they may rely on specific characteristics that you have built into it, such as the schema or the format of a response. Avoiding breaking changes in your API minimizes breakage to client code. Communicating changes when they do occur enables developers to take advantage of new features and, in the rare case of a breaking change, take action.

Intended Outcome

Developer code will continue to work. Developers will know of improvements you make and be able to make use of them. Breaking changes to your API will be rare, and if they occur, developers will have sufficient time and information to adapt their code. That will enable them to avoid breakage, enhancing trust. Changes to the API will be announced on the API 's documentation site.

Possible Approach to Implementation

When improving your API , focus on adding new calls or new options rather than changing how existing calls work. Existing clients can ignore such changes and will continue functioning.

If using a fully RESTful style, you should be able to avoid changes that affect developers by keeping resource URIs constant and changing only elements that your users do not code to directly. If you need to change your data in ways that are not compatible with the extension points that you initially designed, then a completely new design is called for, and that means changes that break client code. In that case, it’s best to implement the changes as a new REST API , with a different resource URI.

If using an architectural style that does not allow you to make moderately significant changes without breaking client code, use versioning. Indicate the version in the response header. Version numbers should be reflected in your URIs or in request "accept" headers (using content negotiation). When versioning in URIs, include the version number as far to the left as possible. Keep the previous version available for developers whose code has not yet been adapted to the new version.

Example 26

Some examples of breaking changes to an API include:

Removing a call;
Changing the method used to make a call;
Changing the URI of a resource used in a call;
Adding a required parameter for a call;
Changing the data type of a parameter;
Changing the name of a key in a key-value response;
Changing the structure of an XML response
Changing the data type of a value in a response, such as changing a string to an array;

Suppose the MyCity transit agency's API responds to a request for a certain bus's arrival time at a single station as http://data.mycity.example.com/transport/api/arrivals/buses/53/stop/12, but the agency decides it wants to make it possible to query for a range of stops at once. Rather than change the form of the request to require a range, like http://data.mycity.example.com/transport/api/arrivals/buses/53/stop/12-12, the agency can keep the old API call and add a new one for multiple arrivals, like http://data.mycity.example.com/transport/api/arrivals/buses/53/stops/1-12.

To notify users directly of changes, it's a good idea to create a mailing list and encourage developers to join. You can then announce changes there, and this provides a nice mechanism for feedback as well. It also allows your users to help each other.

How to Test

Release changes initially to a test version of your API before applying them to the production version. Invite developers to test their applications on the test version and provide feedback.

Evidence

Relevant requirements : R-PersistentIdentification , R-APIDocumented

Benefits

Trust
Interoperability

9.12 8.11 Data Preservation

~~This section describes best practices related~~

The working group recognizes that it is unrealistic to assume that all data ~~preservation . Albeit being~~ on the Web will be available on demand at all times into the indefinite future. For a ~~closely related topic archiving is considered~~ wide variety of reasons, data publishers are likely to want or need to remove data from the live Web, at which point it moves out of scope for ~~this group~~ the current work and ~~therefore~~ into the scope of data archivists. What is in scope here, however, is what is left behind, that is, what steps should publishers take to indicate that data has been removed or archived. Simply deleting a resource from the Web is bad practice. In that circumstance, dereferencing the URI would lead to an HTTP Response code of 404 that tells the user nothing other than that the resource was not ~~covered here.~~ found. The following Best Practices offer more productive approaches.

Best Practice 27: Preserve identifiers

When removing data from the Web, preserve the identifier and provide information about the archived resource.

Why

URI dereferencing is the primary interface to data on the Web. If dereferencing a URI leads to the infamous 404 response code (Not Found), the user will not know whether the lack of availability is permanent or temporary, planned or accidental. If the publisher, or a third party, has archived the data, that archived copy is much less likely to be found if the original URI is effectively broken.

Intended Outcome

The URI of a dataset will always dereference to the dataset or redirect to information about it.

Possible Approach to Implementation

There are two scenarios to consider:

the dataset has been deleted entirely and is no longer available via any route;
the dataset has been archived and is only available through a request to the archive.

In the first of these cases, the server should be configured to respond with an HTTP Response code of 410 (Gone) . From the specification:

The 410 response is primarily intended to assist the task of Web maintenance by notifying the recipient that the resource is intentionally unavailable and that the server owners desire that remote links to that resource be removed.

In the second case, where data has been archived, it is more appropriate to redirect requests to a Web page giving information about the archive that holds the data and how a potential user can access it.

In both cases, the original URI continues to identify the dataset and leads to useful information, even though that dataset is no longer directly available.

How to Test

Check that dereferencing the URI of a dataset that is no longer available returns information about its current status and availability, using either a 410 or 303 Response Code as appropriate.

Evidence

Relevant requirements : R-AccessLevel , R-PersistentIdentification

Benefits

Reuse
Trust

Best Practice 28: Assess dataset coverage

~~The~~ Assess the coverage of a dataset ~~should be assessed~~ prior to its ~~preservation~~ preservation.

Why

A chunk of Web data is by definition dependent on the rest of the global graph. This global context influences the meaning of the description of the resources found in the dataset. Ideally, the preservation of a particular dataset would involve preserving all its context. That is the entire Web of Data.

At ~~ingestion~~ the time of archiving, an evaluation of the linkage of ~~Web data~~ the dataset dump to already preserved ~~resources is assessed. The presence of all~~ resources, and the vocabularies ~~and target resources in uses is sought in a set of digital archives taking care of preserving Web data.~~ used, needs to be assessed. Datasets for which very few of the vocabularies used and/or resources pointed ~~out~~ to are already preserved somewhere should be flagged as being at risk.

Intended Outcome

~~It should~~ Users will be ~~possible~~ able to ~~appreciate the coverage and external dependencies~~ make use of ~~a given dataset.~~ archived data well into the future.

Possible Approach to Implementation

~~The assessment can be performed by the digital preservation institute or the dataset depositor. It essentially consists in checking~~ Check whether all the resources used are either already preserved somewhere or need to be provided along with the ~~new~~ dataset being considered for preservation.

Example 27 28

A An RDF dataset ~~targetted~~ targeted for preservation is made of the following triples:

<http://data.mycity.example/public-transport/road/bus/route/ABtimetable> a gtfs:Route; gtfs:color "ff0000"; gtfs:shortname "10"; gtfs:longName "Airport - Bullfrog"; gtfs:agency <http://example.org/transport-agency/DTA>; gtfs:routeType ex:three; ex:usualVehicleType dbpedia:Roumaster; foaf:isPrimaryTopicOf ex:Airport_Bullfrog. <http://data.mycity.example/public-transport/road/bus/route/BFC> a gtfs:Route; gtfs:color "ffff00"; gtfs:shortname "20"; gtfs:longName "Bullfrog - Furnace Creek Resort"; gtfs:agency <http://example.org/transport-agency/DTA>; gtfs:routeType ex:three; ex:usualVehicleType dbpedia:Articulated_bus; foaf:isPrimaryTopicOf ex:Bullfrog_Furnace_Creek_Resort.

 
  <http://data.mycity.example.com/transport/route/bus/ABtimetable> 
      a gtfs:Route ;
      gtfs:color "ff0000" ;
      gtfs:shortname "10" ;
      gtfs:longName "Airport - Bullfrog" ;
      gtfs:agency <http://data.mycity.example.com/transport-agency/DTA> ;
      gtfs:routeType ex:three ;
      ex:usualVehicleType dbpedia:Roumaster ;
      foaf:isPrimaryTopicOf ex:Airport_Bullfrog
      .

  <http://data.mycity.example.com/ransport/route/bus/BFC> 

      a gtfs:Route ;
      gtfs:color "ffff00" ;
      gtfs:shortname "20" ;
      gtfs:longName "Bullfrog - Furnace Creek Resort" ;
      gtfs:agency <http://data.mycity.example.com/transport-agency/DTA> ;
      gtfs:routeType ex:three ;
      ex:usualVehicleType dbpedia:Articulated_bus ;
      foaf:isPrimaryTopicOf ex:Bullfrog_Furnace_Creek_Resort
      .
…

  …

Those triples make use of the "gtfs" vocabulary and a custom one defined in the testing domain name "ex". It also uses entities defined in "foaf", "dbpedia" and "ex". Although not formal standards, FOAF and GTFS [ GTFS ] are well established ontologies that are archived in several places on the Web (see, for instance, the LOV repository ). Entities defined in DBpedia are also preserved through their Memento gateway and archived dumps of the dataset also exist. The risks associated to preserving the triple making use of those external resource is thus minimal. A bigger concern arises from the usage made of resources defined in "ex" which is a namespace that, by design, does not exist outside of the dataset. Unless the data describing "ex:usualVehicleType", "ex:Airport_Bullfrog" and "ex:Bullfrog_Furnace_Creek_Resort" is preserved alongside those triples their contextual meaning will be lost. This is particularly critical for "ex:usualVehicleType" as without it the relationship between the described route and the dbpedia resources will be unknown to a consuming application (however obvious it may be to a human).

Considering this assessment, a revised dataset including the definition of "ex:usualVehicleType" can be considered for preservation:

<http://data.mycity.example/public-transport/road/bus/route/AB> a gtfs:Route; gtfs:color "ff0000"; gtfs:shortname "10"; gtfs:longName "Airport - Bullfrog"; gtfs:agency <http://example.org/transport-agency/DTA>; gtfs:routeType ex:three; ex:usualVehicleType dbpedia:Roumaster; foaf:isPrimaryTopicOf ex:Airport_Bullfrog. <http://data.mycity.example/public-transport/road/bus/route/BFC> a gtfs:Route; gtfs:color "ffff00"; gtfs:shortname "20"; gtfs:longName "Bullfrog - Furnace Creek Resort"; gtfs:agency <http://example.org/transport-agency/DTA>; gtfs:routeType ex:three; ex:usualVehicleType dbpedia:Articulated_bus; foaf:isPrimaryTopicOf ex:Bullfrog_Furnace_Creek_Resort. …

  <http://data.mycity.example.com/transport/route/bus/AB> a gtfs:Route;
      gtfs:color "ff0000" ;
      gtfs:shortname "10" ;
      gtfs:longName "Airport - Bullfrog" ;
      gtfs:agency <http://data.mycity.example.com/transport-agency/DTA> ;
      gtfs:routeType ex:three ;
      ex:usualVehicleType dbpedia:Roumaster ;
      foaf:isPrimaryTopicOf ex:Airport_Bullfrog
      .

  <http://data.mycity.example.com/transport/route/bus/BFC> 

      a gtfs:Route;
      gtfs:color "ffff00";
      gtfs:shortname "20";
      gtfs:longName "Bullfrog - Furnace Creek Resort";
      gtfs:agency <http://data.mycity.example.com/transport-agency/DTA>;
      gtfs:routeType ex:three;
      ex:usualVehicleType dbpedia:Articulated_bus;
      foaf:isPrimaryTopicOf ex:Bullfrog_Furnace_Creek_Resort
      .
  …

  # Custom vocabulary element
  ex:usualVehicleType 
      a rdf:Property ;
      rdfs:subPropertyOf gtfs:routeType ;
rdfs:range
gtfs:Bus.

  # Custom vocabulary element
  ex:usualVehicleType a rdf:Property;
    rdfs:subPropertyOf gtfs:routeType;
    rdfs:range gtfs:Bus.

This second, more complete, dataset is better suited for preservation as it is more self-describing and only makes use of external entities whose preservation is trusted.

How to Test Datasets making references to portions of the Web of Data which are not preserved should receive a lower score than those using common resources. Evidence Relevant requirements : R-VocabReference Benefits Reuse Trust Best Practice 28: Use a trusted serialisation format for preserved data dumps Data depositors willing to send a data dump for long term preservation must use a well established serialisation Why Web data is an abtract data model that can be expressed in different ways (RDF/XML, JSON-LD, ...). Using a well established serialisation of this data increases its chances of reuse. Institute doing digital preservation are tasked with monitoring file format obsolescence. Datasets which have been acquired in some format some years ago may have to be converted into another format in order to still be usable with more modern software (see [ ROSENTHAL ]). This tasks can be made more challenge, or even impossible, if non standard serialisation formats are used by data depositors. Intended Outcome It should be possible to read and load the dataset into a database even its software is no longer supported. Possible Approach to Implementation Give preference to Web data serialisation formats available as open standards. For instance those provided by the W3C [ FORMATS ]. Example 28

~~Those triples are serialised using the Turtle W3C recommendation.~~ It is ~~a text-based format which is supported by the majority of software able~~ impossible to ~~process Web data. This format can thus be trusted for preservation. ex foaf foaf A custom-made serialisation of the same data such as~~ determine what follows should be given a negative appreciation towards preserving the dataset. How to Test Try to dereference the URI of the data dump with Content-Type header according to the format you expect to get, using for example [ cURL ] Evidence Relevant requirements : R-FormatStandardized Benefits Reuse Best Practice 29: Update the status of identifiers Preserved resources should be linked with their "live" counterparts Why URI dereferencing is a primary interface to data on the Web. Linking preserved datasets with the original URI inform the data consumer of the status of these resources. During its life cycle a dataset may undergo several modifications. Although URIs assigned to things are not expected to change, the description of these resource will ~~evolve over time. During this evolution, several snapshots could~~ be ~~made available for preservation and access as versions. Intended Outcome A link is maintained between the URI of a resource, the most up-to-date description~~ available for it, and preserved descriptions. If the resource does not exist any more the description should say so and refer to the last preserved description that was available. Possible Approach to Implementation There is a variety of HTTP status codes that could be put into use to relate the URI with its preserved description. In particular, 200, 410 and 303 in, say, 50 years' time. However, one can be used for different scenarios: 200 => there is a new description which contains pointers to archived description 410 => the resource is no longer available but it has been removed under a controlled process cf. 404 which simply states check that something is not available. 303 => the resource identified by this URI is no longer served here but there is a preserved description at a different location. In addition to the status codes, HTTP Link headers can also be an archived dataset depends only on widely used ~~to relate~~ external resources to preserved descriptions. Example 29 One approach with link header is to use the Memento protocol to give a link to a timegate providing access to preserved descriptions of the resource: HTTP OK GMT Using HTTP status code the data consumer can be redirected to the most recent description of the entity. In the following example a request for the resource "http://example.org/timetable-001" is first redirected to the description "http://example.org/data/timetable-001" which, as it has been preserved and ~~flagged as invalid, redirects the client to the newer description "http://example.org/newdata/timetable-001" HTTP HTTP HTTP How to Test~~ vocabularies. Check that ~~de-referencing the URI of a~~ unique or lesser-used dependencies are preserved ~~dataset returns information about its current status and availability.~~ as part of the archive.

Evidence

Relevant requirements : ~~R-AccessLevel , R-PersistentIdentification~~ R-VocabReference

Benefits

Reuse
Trust

9.13 8.12 Feedback

Publishing ~~data~~ on the Web enables data sharing on a large ~~scale, providing data access~~ scale to a wide range of audiences with different levels of expertise. Data publishers want to ensure that the data published is meeting the data consumer needs and for this purpose, user feedback is crucial. Feedback has benefits for both ~~data~~ publishers and ~~data~~ consumers, helping data publishers to improve the integrity of their published data, as well as ~~to encourage~~ encouraging the publication of new data. Feedback allows data consumers to have a voice describing usage experiences (e.g. applications using data), preferences and needs. When possible, feedback should also be publicly available for other data consumers to examine. Making feedback publicly available allows users to become aware of other data consumers, supports a collaborative environment, and allows user community experiences, concerns or questions are currently being addressed.

From a user interface perspective there are different ways to gather feedback from data consumers, including site registration, contact forms, quality ratings selection, surveys and comment boxes for blogging. From a machine perspective the data publisher can also record metrics on data usage or information about specific applications ~~consumers are currently relying upon.~~ that use the data. Feedback such as this establishes a ~~line of~~ communication channel between data publishers and data consumers. ~~In order to quantify and analyze usage feedback, it should be recorded in a machine-readable format. Blogs and other publicly~~ Publicly available feedback should be displayed in a human-readable ~~form through the user interface.~~ form.

This section provides some BP Best Practices to be followed by data publishers in order to enable ~~data~~ consumers to provide ~~feedback about the consumed data.~~ feedback. This feedback can be for humans or machines.

Best Practice ~~30:~~ 29: Gather feedback from data consumers

~~Data publishers should provide~~ Provide a readily discoverable means for consumers to offer feedback.

Why

~~Providing~~ Obtaining feedback ~~contributes to improving the quality of published data, may encourage publication of new data,~~ helps ~~data~~ publishers understand the needs of their data consumers ~~needs better and, when feedback is made publicly available,~~ and can help them improve the quality of their published data. It also enhances trust by showing consumers that the ~~consumers' collaborative experience.~~ publisher cares about addressing their needs. Specifying a clear feedback mechanism removes the barrier of having to search for a way to provide feedback.

Intended Outcome

~~It should be possible for data~~ Data consumers will be able to provide feedback and ~~rate data in both human and machine-readable formats. The feedback should be Web accessible~~ ratings about datasets and ~~it should provide a URL reference to the corresponding dataset.~~ distributions.

Possible Approach to Implementation

Provide data consumers with one or more feedback mechanisms including, but not limited ~~to:~~ to, a ~~registration form,~~ contact form, point and click data quality rating buttons, or a comment ~~box for blogging. Collect~~ box. In order to make the most of feedback ~~in machine-readable formats~~ received from consumers, it's a good idea to ~~represent~~ collect the feedback with a tracking system that captures each item in a database, enabling quantification and ~~use~~ analysis. It is also a ~~vocabulary~~ good idea to capture the ~~semantics~~ type of each item of feedback, i.e., its motivation (editing, classifying [rating], commenting or questioning), so that each item can be expressed using the ~~feedback information.~~ Dataset Usage Vocabulary [ VOCAB-DUV ].

How to Test Demonstrate how feedback can be collected from data consumers. Verify that the feedback is persistently stored. If the feedback is made publicly available verify that a URL links back to the published data being referenced.

Check that ~~the~~ at least one feedback ~~format conforms to a known machine-readable format specification in current use among anticipated~~ mechanism is provided and readily discoverable by data ~~users.~~ consumers.

Evidence

Relevant requirements : R-UsageFeedback , R-QualityOpinions

Benefits

Reuse
Comprehension
Trust

Best Practice ~~31: Provide information about~~ 30: Make feedback available

~~Information~~ Make consumerfeedback about ~~feedback should be provided.~~ datasets and distributions publicly available.

Why

By sharing feedback with consumers, publishers can demonstrate to users that their concerns are being addressed, and they can avoid submission of duplicate bug reports. Sharing ~~information about~~ feedback ~~allows data~~ also helps consumers understand any issues that may affect their ability to ~~be aware~~ use the data, and it can foster a sense of ~~feedback given by other consumers.~~ community among them.

Intended Outcome

~~It should~~ Consumers will be ~~possible for humans to have access~~ able to ~~information~~ assess the kinds of errors that ~~describes feedback on a dataset given by one or more data consumers. It should~~ affect the dataset, review other users' experiences with it, and be ~~possible for machines~~ reassured that the publisher is actively addressing issues as needed. Consumers will also be able to ~~automatically process feedback information about a dataset.~~ determine whether other users have already provided similar feedback, saving them the trouble of submitting unnecessary bug reports and sparing the maintainers from having to deal with duplicates.

Possible Approach to Implementation

~~The machine readable version~~ Feedback can be availabe as part of ~~the feedback metadata may~~ an HTML Web page, but it can also be provided ~~according to the vocabulary that is being developed by the DWBP working group , i.e.,~~ in a machine-readable format using the Dataset Usage Vocabulary [ ~~DUV~~ VOCAB-DUV ].

Example 31 30

~~to be done~~

  :stops-2015-05-05
      a dcat:Dataset ;
      dct:title "Bus stops of MyCity" ;
      dcat:keyword "transport","mobility","bus" ;
      dct:issued "2015-05-05"^^xsd:date ;
      dcat:contactPoint <http://data.mycity.example.com/transport/contact> ;
      dct:temporal <http://reference.data.gov.uk/id/year/2015> ;
      dct:spatial <http://www.geonames.org/3399415> ;
      dct:publisher :transport-agency-mycity ;
      dct:accrualPeriodicity <http://purl.org/linked-data/sdmx/2009/code#freq-A> ;
      dcat:theme :mobility ;
      dcat:distribution :stops-2015-05-05.csv ;
      .


  :stops-2015-05-05.csv

      a dcat:Distribution ;
      dct:title "CSV distribution of stops-2015-05-05 dataset" ; 
      dct:description "CSV distribution of the bus stops dataset of MyCity" ;
      dcat:mediaType "text/csv" ;
      .


  :comment1Content a cnt:ContentAsText ;

      cnt:chars "This dataset is missing stop 3" ; 
      .


  :comment1

      a duv:UserFeedback ;
      oa:hasBody :comment1Content ;
      oa:hasTarget :stops-2015-05-05 ;
      dct:creator :localresident ;
      .


  :comment2Content a cnt:ContentAsText ;

      cnt:chars "Are tab delimited formats also available?" ;
      .


  :comment2

      a duv:UserFeedback ;
      oa:hasTarget :stops-2015-05-05.csv ;
      oa:hasBody :comment2Content ;
      dct:creator :localresident ;
      .


  :localresident

      a foaf:Person ;
      foaf:Name "Alan Law" ;
.

How to Test

Check that ~~the metadata for the dataset itself includes~~ any feedback ~~information about the dataset. Check if~~ given by data consumers for a ~~computer application can automatically process feedback information about the dataset.~~ specific dataset or distribution is publicly available.

Evidence

Relevant requirements : R-UsageFeedback , R-QualityOpinions

Benefits

Reuse
Trust

9.14 8.13 Data Enrichment

Note To discuss about enrichment yields derived data, not just metadata. For example, you could take a dataset of scheduled and real bus arrival times and enrich it by adding on-time arrival percentages. The percentages are data, not metadata. Note To discuss about the meaning of the word “topification”.

Data enrichment refers to a set of processes that can be used to enhance, refine or otherwise improve raw or previously processed data. This idea and other similar concepts contribute to making data a valuable asset for almost any modern business or enterprise. It ~~also shows~~ is a diverse topic in itself, details of which are beyond the ~~common imperative~~ scope of ~~proactively using this~~ the current document. However, it is worth noting that some of these techniques should be approached with caution, as ethical concerns may arise. In scientific research, care must be taken to avoid enrichment that distorts results or statistical outcomes. For data about individuals, privacy issues may arise when combining datasets. That is, enriching one dataset with another, when neither contains sufficient information about any individual to identify them, may yield a combined dataset that compromises privacy. Furthermore, these techniqes can be carried out at scale, which in ~~various ways.~~ turn highlights the need for caution.

This section provides some advice to be followed by data publishers in order to ~~enable data consumers to~~ enrich data.

Best Practice ~~32:~~ 31: Enrich data by generating new ~~metadata.~~ data

~~Data should be enriched whenever possible,~~ Enrich your data by generating ~~richer metadata to represent and describe it.~~ new data from the raw data when doing so will enhance its value.

Why

~~There is a large number~~ Enrichment can greatly enhance processability, particularly for unstructured data. Under some circumstances, missing values can be filled in, and new attributes and measures can be added. Publishing more complete datasets can enhance trust, if done properly and ethically. Deriving additional values that are of general utility saves users time and encourages more kinds of reuse. There are many intelligent techniques that can be used to enrich ~~raw or previously treated data and to extract new metadata from it,~~ data, making ~~data~~ the dataset an even more valuable asset. ~~These methods~~

Intended Outcome

Datasets with missing values will be enhanced by filling those values. Structure will be conferred and utility enhanced if relevant measures or attributes are added, but only if the addition does not distort analytical results, significance, or statistical power.

Possible Approaches to Implementation

Techniques for data enrichment are complex and go well beyond the scope of this document, which can only highlight the possibilities.

Machine learning can readily be applied to the enrichment of data. Methods include those focused on data categorization, disambiguation, entity recognition, sentiment ~~analysis,~~ analysis and topification, among others. ~~Providing new and richer metadata~~ New data values may ~~help~~ be derived as simply as performing a mathematical calculation across existing columns. Other examples include visual inspection to identify features in spatial data and cross-reference to external databases for demographic information.

Values generated by inference-based techniques should be labeled as such, and it should be possible to retrieve any original values replaced by enrichment.

Whenever licensing permits, the code used to enrich the data should be made available along with the dataset. Sharing such code is particularly important for scientific data.

~~Intended Outcome~~

How to Test

~~Describe a~~ Look for missing values in the dataset ~~using richer sets of metadata, which can~~ or additional fields likely to be needed by others. Check that any data added by inferential enrichment techniques is identified as such and that any replaced data is still available. Check that code used to enrich the data is available. Check whether the metadata being extracted is in accordance with human knowledge and readable by humans.

~~Possible Approach to Implementation~~

Evidence

Relevant requirements: R-DataEnrichment , R-FormatMachineRead , R-ProvAvailable

Benefits

Reuse
Comprehension
Trust
Processability

Best Practice 32: Provide Complementary Presentations

Enrich data by presenting it in complementary, immediately informative ways, such as visualizations, tables, Web applications, or summaries.

Why

~~The implementation depends on what types of metadata should be produced. They require~~ Data published online is meant to inform others about its subject. But only posting datasets for download or API access puts the ~~implementation of methods~~ burden on consumers to interpret it. The Web offers unparalleled opportunities for presenting data ~~categorization, disambiguation, sentiment analysis, among others. After new metadata is extracted,~~ in ways that let users learn and explore without having to create their own tools.

Intended Outcome

Complementary data presentations will enable human consumers to have immediate insight into the data by presenting it ~~can be provided as part of~~ in ways that are readily understood.

Possible Approaches to Implementation

One very simple way to provide immediate insight is to publish an analytical summary in an HTML ~~Web page~~ page. Including summative data in graphs or ~~any open~~ tables can help users scan the summary and quickly understand the meaning of the data.

If you have the means to create interactive visualizations or Web applications that use the data, you can give consumers of your data ~~format.~~ greater ability to understand it and discover patterns in it. These approaches also demonstrate its suitability for processing and encourage reuse.

How to test Test

Check ~~whether~~ that the ~~metadata being extracted~~ dataset is accompanied by some additional interpretive content that can be perceived without downloading the ~~techniques~~ data or invoking an API .

Evidence

Relevant requirements: R-DataEnrichment

Benefits

Reuse
Comprehension
Access
Trust

8.14 Republication

Reusing data is another way of publishing data; it's simply republishing. It can take the form of combining existing data with other datasets, creating Web applications or visualizations, or repackaging the data in a new form, such as a translation. Data republishers have some responsibilities that are unique to that form of publishing on the Web. This section provides advice to be followed when republishing data.

Best Practice 33: Provide Feedback to the Original Publisher

Let the original publisher know when you are reusing their data. If you find an error or have suggestions or compliments, let them know.

Why

Publishers generally want to know whether the data they publish has been useful. Moreover, they may be required to report usage statistics in ~~accordance with human-knowledge~~ order to allocate resources to data publishing activities. Reporting your usage helps them justify putting effort toward data releases. Providing feedback repays the publishers for their efforts by directly helping them to improve their dataset for future users.

Intended Outcome

Better communication will make it easier for original publishers to determine how the data they post is being used, which in turn helps them justify publishing the data. Publishers will also be made aware of steps they can take to improve their data. This leads to more and better data for everyone.

Possible Approach to Implementation

When you begin using a dataset in a new product, make a note of the publisher’s contact information, the URI of the dataset you used, and the date on which you contacted them. This can be ~~readable~~ done in comments within your code where the dataset is used. Follow the publisher’s preferred route to provide feedback. If they do not provide a route, look for contact information for the Web site hosting the data.

Example 33

# Calling the MyCity transit API, http://data.mycity.example.com/transport/api/docs/

# Published by MyCity Transit Agency,
# notified of our reuse by email to opendata@mycitytransit.example.org
#

by
humans.

Newton
Calegari
on
3/24/2016.

How to Test

Check that you have a record of at least one communication informing the publisher of your use of the data.

Evidence

Relevant requirements: ~~R-DataEnrichment~~ R-TrackDataUsage , R-UsageFeedback , R-QualityOpinions

Benefits

Reuse
~~Comprehension~~ Interoperability
~~Processability~~ Trust

Best Practice 34: Follow Licensing Terms

Find and follow the licensing requirements from the original publisher of the dataset.

Why

Licensing provides a legal framework for using someone else’s work. By adhering to the original publisher’s requirements, you keep the relationship between yourself and the publisher friendly. You don’t need to worry about legal action from the original publisher if you are following their wishes. Understanding the initial license will help you determine what license to select for your reuse.

Intended Outcome

Data publishers will be able to trust that their work is being reused in accordance with their licensing requirements, which will make them more likely to continue to publish data. Reusers of data will themselves be able to properly license their derivative works.

Possible Approach to Implementation 10.

Read the original license and adhere to its requirements. If the license calls for specific licensing of derivative works, choose your license to be compatible with that requirement. If no license is given, contact the original publisher and ask what the license is.

How to Test

Read through the original license and check that your use of the data does not violate any of the terms.

Evidence

Relevant requirements: R-LicenseAvailable , R-LicenseLiability ,

Benefits

Reuse
Trust

Best Practice 35: Cite the Original Publication

Acknowledge the source of your data in metadata. If you provide a user interface, include the citation visibly in the interface.

Why

Data is only useful when it is trustworthy. Identifying the source is a major indicator of trustworthiness in two ways: first, the user can judge the trustworthiness of the data from the reputation of the source, and second, citing the source suggests that you yourself are trustworthy as a republisher. In addition to informing the end user, citing helps publishers by crediting their work. Publishers who make data available on the Web deserve acknowledgment and are more likely to continue to share data if they find they are credited. Citation also maintains provenance and helps still others to work with the data.

Intended Outcome

End users will be able to assess the trustworthiness of the data they see and the efforts of the original publishers will be recognized. The chain of provenance for data on the Web will be traceable back to its original publisher.

Possible Approach to Implementation

You can present the citation to the original source in a user interface by providing bibliographic text and a working link.

How to Test

Check that the original source of any reused data is cited in the metadata provided. Check that a human-readable citation is readily visible in any user interface.

Evidence

Relevant requirements: R-Citable , R-ProvAvailable , R-MetadataAvailable , R-TrackDataUsage

Benefits

Reuse
Discoverability
Trust

Best Practice	Benefits
Best Practice 1: Provide ~~Metadata~~ metadata	Reuse Comprehension Discoverability Processability
Best Practice 2: Provide descriptive metadata	Reuse Comprehension Discoverability
Best Practice 3: Provide locale parameters metadata	Reuse Comprehension
Best Practice 4: Provide structural metadata	Reuse Comprehension Processability
Best Practice 5: Provide data license information	Reuse Trust
Best Practice 6: Provide data provenance information	Reuse Comprehension Trust
Best Practice 7: Provide data quality information	Reuse Trust
Best Practice 8: Provide ~~versioning information~~ a version indicator	Reuse Trust
~~Provide~~ Best Practice 9: Provide version history	Reuse Trust
Best Practice 10: Use persistent URIs as identifiers of datasets	Reuse Linkability Discoverability Interoperability
Best Practice 11: Use persistent URIs as identifiers within datasets	Reuse Linkability Discoverability Interoperability
Best Practice 12: Assign URIs to dataset versions and series	Reuse Discoverability Trust
Best Practice 13: Use machine-readable standardized data formats	Reuse Processability
Best Practice 14: Provide data in multiple formats	Reuse Processability
~~Use~~ Best Practice 15: Reuse vocabularies, preferably standardized ~~terms~~ ones	Reuse ~~vocabularies~~ Processability Comprehension Trust Interoperability
Best Practice 16: Choose the right formalization level	~~Provide data unavailability reference~~ Reuse Comprehension Interoperability
Best Practice 17: Provide bulk download	Reuse Access
~~Follow REST principles when designing APIs~~ Best Practice 18: Provide Subsets for Large Datasets	Reuse Linkability Access Processability
~~Serving~~ Best Practice 19: Use content negotiation for serving data ~~and resources with different~~ available in multiple formats	Reuse Access
Best Practice 20: Provide real-time access	Reuse Access
Best Practice 21: Provide data up to date	Reuse Access
~~Maintain separate versions~~ Best Practice 22: Provide an explanation for a data AP that is not available	Reuse Trust
~~Assess dataset coverage~~ Best Practice 23: Make data available through an API	Reuse Processability Interoperability Access
Best Practice 24: Use ~~a trusted serialisation format~~ Web Standards as the foundation of APIs	Reuse Linkability Interoperability Discoverability Access Processability
Best Practice 25: Provide complete documentation for ~~preserved data dumps~~ your API	Reuse Trust
~~Update the status of~~ Best Practice 26: Avoid Breaking Changes to Your API	Trust Interoperability
Best Practice 27: Preserve identifiers	Reuse Trust
Best Practice 28: Assess dataset coverage	Reuse Trust
Best Practice 29: Gather feedback from data consumers	Reuse Comprehension Trust
~~Provide information about~~ Best Practice 30: Make feedback available	Reuse Trust
Best Practice 31: Enrich data by generating new ~~metadata~~ data	Reuse Comprehension Trust Processability
Best ~~Practices~~ Practice 32: Provide Complementary Presentations	~~This section is non-normative.~~ Reuse Comprehension ~~Issue 11~~ Access Trust ~~Each requirement should be listed as an evidence of at least one BP. We need to review the requirements that are not associated with a BP~~
Best Practice 33: Provide Feedback to ~~check if they are still in the scope of~~ the ~~BP document. Issue-229~~ Original Publisher ~~Requirements x Best Practices~~	Reuse Interoperability Trust
Best Practice 34: Follow Licensing Terms	Reuse Trust
~~R-AccessBulk~~ Best Practice 35: Cite the Original Publication	Reuse Discoverability Trust

Requirement	Best Practices
R-MetadataAvailable	Best Practice 1: Provide ~~metadata,~~ metadata Best Practice 2: Provide descriptive ~~metadata,~~ metadata Best Practice 3: Provide locale parameters ~~metadata,~~ metadata Best Practice 4: Provide structural ~~metadata,~~ metadata Best Practice 6: Provide data provenance information Best Practice 35: Cite the Original Publication
R-MetadataDocum	Best Practice 1: Provide metadata Best Practice 15: Reuse vocabularies, preferably standardized ones
R-MetadataMachineRead	Best Practice 1: Provide ~~metadata,~~ metadata Best Practice 2: Provide descriptive ~~metadata,~~ metadata Best Practice 5: Provide data license information
R-MetadataStandardized	Best Practice 2: Provide descriptive ~~metadata,~~ metadata Best Practice 15: Reuse vocabularies, preferably standardized ones
R-FormatLocalize	Best Practice 3: Provide locale parameters metadata
R-GeographicalContext	Best Practice 3: Provide locale parameters metadata
R-LicenseAvailable	Best Practice 5: Provide data license ~~information,~~ information Best Practice ~~14: Use standardized terms~~ 34: Follow Licensing Terms
~~R-PersistentIdentification~~ R-LicenseLiability	Best Practice ~~26: Update the status of identifiers~~ 5: Provide data license information Best Practice 34: Follow Licensing Terms
~~R-QualityComparable~~ R-ProvAvailable	Best Practice ~~16: Choose~~ 6: Provide data provenance information Best Practice 31: Enrich data by generating new data Best Practice 35: Cite the ~~right formalization level~~ Original Publication
R-QualityMetrics	Best Practice 7: Provide data quality information Best Practice 7: Provide data quality information
R-DataMissingIncomplete	Best Practice 7: Provide data quality information Best Practice 7: Provide data quality information
R-QualityOpinions	Best Practice ~~27:~~ 7: Provide data quality information Best Practice 29: Gather feedback from data ~~consumers,~~ consumers Best Practice ~~28: Provide information about~~ 30: Make feedback available Best Practice 33: Provide Feedback to the Original Publisher
~~R-TrackDataUsages~~ R-DataVersion	Best Practice 8: Provide a version indicator Best Practice 9: Provide version history
~~R-UsageFeedback~~ R-UniqueIdentifier	Best Practice ~~27: Gather feedback from~~ 10: Use persistent URIs as identifiers of datasets Best Practice 11: Use persistent URIs as identifiers within datasets Best Practice 12: Assign URIs to dataset versions and series Best Practice 18: Provide Subsets for Large Datasets Best Practice 24: Use Web Standards as the foundation of APIs
R-Citable	Best Practice 10: Use persistent URIs as identifiers of datasets Best Practice 12: Assign URIs to dataset versions and series Best Practice 18: Provide Subsets for Large Datasets Best Practice 35: Cite the Original Publication
R-FormatMachineRead	Best Practice 13: Use machine-readable standardized data ~~consumers,~~ formats Best Practice ~~28:~~ 31: Enrich data by generating new data
R-FormatStandardized	Best Practice 13: Use machine-readable standardized data formats
R-FormatOpen	Best Practice 13: Use machine-readable standardized data formats
R-FormatMultiple	Best Practice 14: Provide ~~information about feedback~~ data in multiple formats
~~R-VocabDocum~~ R-QualityComparable	Best Practice 15: Reuse vocabularies, preferably standardized ones Best Practice 16: Choose the right formalization level
R-VocabOpen	Best Practice 15: Reuse vocabularies, preferably standardized ones
R-VocabReference	Best Practice 15: Reuse vocabularies, preferably standardized ones Best Practice 16: Choose the right formalization level Best Practice 28: Assess dataset coverage
~~R-VocabVersion~~ R-AccessBulk	Best Practice ~~24: Assess dataset coverage~~ 17: Provide bulk download
~~R-UniqueIdentifier~~ R-GranularityLevels	Best Practice ~~10: Use persistent URIs as identifiers,~~ 18: Provide Subsets for Large Datasets Best Practice ~~11: Assign URIs to dataset versions and series~~ 18: Provide Subsets for Large Datasets
~~R-LicenseAvailable~~ R-AccessRealTime	Best Practice 5: 18: Provide Subsets for Large Datasets Best Practice 20: Provide real-time access Best Practice 23: Make data ~~license information~~ available through an API
~~R-LicenseLiability~~ R-AccessUpToDate	Best Practice 21: Provide data up to date Best Practice 23: Make data available through an API
~~R-ProvAvailable~~ R-AccessLevel	Best Practice 6: 22: Provide an explanation for data ~~provenance information~~ that is not available Best Practice 27: Preserve identifiers
R-SensitivePrivacy	Best Practice ~~17: Preserve people's right to privacy~~ 22: Provide an explanation for data that is not available
R-SensitiveSecurity	Best Practice ~~17:~~ 22: Provide an explanation for data that is not available
R-APIDocumented	Best Practice 24: Use Web Standards as the foundation of APIs Best Practice 25: Provide complete documentation for your API Best Practice 26: Avoid Breaking Changes to Your API
R-PersistentIdentification	Best Practice 26: Avoid Breaking Changes to Your API Best Practice 27: Preserve ~~people's right~~ identifiers
R-UsageFeedback	Best Practice 29: Gather feedback from data consumers Best Practice 30: Make feedback available Best Practice 33: Provide Feedback to ~~privacy~~ the Original Publisher
R-DataEnrichment	Best Practice 31: Enrich data by generating new data Best Practice 32: Provide Complementary Presentations
R-TrackDataUsage	Best Practice 33: Provide Feedback to the Original Publisher Best Practice 35: Cite the Original Publication

Abstract

Status of This Document

1. Introduction

2. Audience

3. Scope

4. Context

5. Data on the Web Challenges Namespaces

7. 6. Best Practices Template

Why

Intended Outcome

Possible Approach to Implementation

How to Test

Evidence

Benefits

8. 7. Best Practices Summary

9. 8. The Best Practices

9.1 8.1 Running Example

9.2 8.2 Metadata

Why

Intended Outcome

Possible Approach to Implementation

How to Test

Evidence

Benefits

Why

Intended Outcome

Possible Approach to Implementation

How to Test

Evidence

Benefits

Why

Intended Outcome

Possible Approach to Implementation

How to Test

Evidence

Benefits

Why

Intended Outcome

Possible Approach to Implementation

How to Test

Evidence

Benefits

9.3 8.3 Data Licenses

Why

Intended outcome Outcome

Possible Approach to Implementation

How to Test

Evidence

Benefits

9.4 8.4 Data Provenance

Why

Intended Outcome

Possible Approach to Implementation

How to Test

Evidence

Benefits

9.5 8.5 Data Quality

Why

Intended Outcome

Possible Approach to Implementation

How to Test

Evidence

Benefits

9.6 8.6 Data Versioning

Why

Intended Outcome

Possible Approach to Implementation

How to Test

Evidence

Benefits

Why

Intended Outcome

Possible Approach to Implementation

How to Test

Evidence

Benefits

9.7 8.7 Data Identifiers

Why

Intended Outcome

Possible Approach to Implementation