Warning:
This wiki has been archived and is now read-only.

Implementation Report

From Data on the Web Best Practices
Jump to: navigation, search

This is the implementation report template for the Data on the Web Best Practices. Please do not fill this in, rather, make a copy and complete that.

If the reporter gives permission for the report to be public, then it can be completed on the wiki. If not, it will need to be archived in W3C Member-only space

  • Name of reporter
  • Relevant URL(s)
  • Date of submission
  • Add any other details and comments as appropriate.
Best Practice Intended Outcome Test Pass Fail Partial Comments
Provide metadata Humans will be able to understand the metadata and computer applications, notably user agents, will be able to process it. Check if human readable metadata is available.
Check if the metadata is available in a valid machine-readable format and without syntax error.
Provide descriptive metadata Humans will be able to interpret the nature of the dataset and its distributions, and software agents will be able to automatically discover datasets and distributions. Check if the metadata for the dataset itself includes the overall features of the dataset in a human-readable format.
Check if the descriptive metadata is available in a valid machine-readable format.
Provide locale parameters metadata Humans and software agents will be able to interpret the meaning of strings representing dates, times, currencies and numbers etc. accurately. Check if the metadata for the dataset itself includes information about local parameters (i.e. data, time, number formats, and language) in a human-readable format.
Check if the metadata with locale information is available in a valid machine-readable format and without syntax errors.
Provide structural metadata Humans will be able to interpret the schema of a dataset and software agents will be able to automatically process distributions. Check if the structural metadata of the dataset is provided in a human-readable format.
Check if the metadata of the distribution includes structural information about the dataset in a machine-readable format and without syntax errors.
Provide data license information Humans will be able to understand data license information describing possible restrictions placed on the use of a given distribution and software agents to automatically detect the data license of a distribution Check if the metadata for the dataset itself includes the data license information in a human-readable format.
Check if a user agent can automatically detect /discover the data license of the
Provide data provenance information Humans will know the origin or history of the dataset and software agents will be able to automatically process provenance information. Check that the metadata for the dataset itself includes the provenance information about the dataset in a human-readable format.
Check if a computer application can automatically process the provenance information about the dataset.
Provide data quality information Humans and software agents will be able to assess the quality and therefore suitability of a dataset for their application. Check that the metadata for the dataset itself includes quality information about the dataset.
Check if a computer application can automatically process the quality information about the dataset.
Provide a version indicator Humans and software agents will easily be able to determine which version of a dataset they are working with. Check if the metadata for the dataset/distribution provides a unique version number or date in a human-readable format.
Check if a computer application can automatically detect/discover the unique version number or date of a dataset or distribution.
Provide version history Humans and software agents will be able to understand how the dataset typically changes from version to version and how any two specific versions differ. Check that a list of published versions is available as well as a change log describing precisely how each version differs from the previous one.
Use persistent URIs as identifiers of datasets Datasets or information about datasets will be discoverable and citable through time, regardless of the status, availability or format of the data. Check that each dataset is identified using a URI that has been designed for persistence. Ideally the relevant Web site includes a description of the design scheme and a credible pledge of persistence should the publisher no longer be able to maintain the URI space themselves.
Use persistent URIs as identifiers within datasets Data items will be related across the Web creating a global information space accessible to humans and machines alike. Check that within the dataset, references to things that don't change or that change slowly, such as countries, regions, organizations and people, are referred to by URIs or by short identifiers that can be appended to a URI stub. Ideally the URIs should resolve, however, they have value as globally scoped variables whether they resolve or not.
Assign URIs to dataset versions and series Humans and software agents will be able to refer to specific versions of a dataset and to concepts such as a 'dataset series' and 'the latest version'. Check that each version of a dataset has its own URI, and that there is also a "latest version" URI.
Use machine-readable standardized data formats Machines will easily be able to read and process data published on the Web and humans will be able to use computational tools typically available in the relevant domain to work with the data. Check if the data format conforms to a known machine-readable data format specification.
Provide data in multiple formats As many users as possible will be able to use the data without first having to transform it into their preferred format. Check if the complete dataset is available in more than one data format.
Reuse vocabularies, preferably standardized ones Interoperability and consensus among data publishers and consumers will be enhanced. Using vocabulary repositories like the Linked Open Vocabularies repository or lists of services mentioned in technology-specific Best Practices such as the Best Practices for Publishing Linked Data [LD-BP], or the Core Initial Context for RDFa and JSON-LD, check that classes, properties, terms, elements or attributes used to represent a dataset do not replicate those defined by vocabularies used for other datasets.

Check if the terms or codes in the vocabulary to be used are defined in a standards development organization such as IETF, OGC & W3C etc., or are published by a suitable authority, such as a government agency.
Choose the right formalization level The most likely application cases will be supported with no more complexity than necessary. This is almost always a matter of subjective judgement with no objective test. As a general guideline:
  • Are common vocabularies used such as Dublin Core and schema.org?
  • Are simple facts stated simply and retrieved easily?
  • For formal knowledge representation languages, applying an inference engine on top of the data that uses a given vocabulary does not produce too many statements that are unnecessary for target applications.
Provide bulk download Large file transfers that would require more time than a typical user would consider reasonable will be possible via dedicated file-transfer protocols. Check if the full dataset can be retrieved with a single request
Provide Subsets for Large Datasets Humans and applications will be able to access subsets of a dataset, rather than the entire thing, with a high ratio of needed to unneeded data for the largest number of users. Static datasets that users in the domain would consider to be too large will be downloadable in smaller pieces. APIs will make slices or filtered subsets of the data available, the granularity depending on the needs of the domain and the demands of performance in a Web application. Check that the entire dataset can be recovered by making multiple requests that retrieve smaller units.
Use content negotiation for serving data available in multiple formats Content negotiation will enable different resources or different representations of the same resource to be served according to the request made by the client. Check the available representations of the resource and try to get them specifying the accepted content on the HTTP Request header.
Provide real-time access Applications will be able to access time-critical data in real time or near real time, where real-time means a range from milliseconds to a few seconds after the data creation. To adequately test real time data access, data will need to be tracked from the time it is initially collected to the time it is published and accessed. [PROV-O] can be used to describe these activities. Caution should be used when analyzing real-time access for systems that consist of multiple computer systems. For example, tests that rely on wall clock time stamps may reflect inconsistencies between the individual computer systems as opposed to data publication time latency.
Provide data up to date Data on the Web will be updated in a timely manner so that the most recent data available online generally reflects the most recent data released via any other channel. When new data becomes available, it will be published on the Web as soon as practical thereafter. Check that the update frequency is stated and that the most recently published copy on the Web is no older than the date predicted by the stated update frequency.
Provide an explanation for data that is not available Consumers will know that data that is referred to from the current dataset is unavailable or only available under different conditions. Where the dataset includes references to data that is no longer available or is not available to all users, check that an explanation of what is missing and instructions for obtaining access (if possible) are given. Check if a legitimate http response code in the 400 or 500 range is returned when trying to get unavailable data.
Make data available through an API Developers will have programmatic access to the data for use in their own applications, with data updated without requiring effort on the part of consumers. Web applications will be able to obtain specific data by querying a programmatic interface. Check if a test client can simulate calls and the API returns the expected responses.
use Web Standards as the foundation of APIs Developers who have some experience with APIs based on Web standards, such as REST, will have an initial understanding of how to use the API. The API will also be easier to maintain. Check that the service avoids using http as a tunnel for calls to custom methods, and check that URIs do not contain method names.
Provide complete documentation for your API Developers will be able to obtain detailed information about each call to the API, including the parameters it takes and what it is expected to return, i.e., the whole set of information related to the API. The set of values — how to use it, notices of recent changes, contact information, and so on — should be described and easily browsable on the Web. It will also enables machines to access the API documentation in order to help developers build API client software. Check that every call enabled by your API is described in your documentation. Make sure you provide details of what parameters are required or optional and what each call returns.
Check the Time To First Successful Call (i.e. being capable of doing a successful request to the API within a few minutes will increase the chances that the developer will stick to your API).
Avoid Breaking Changes to Your API Developer code will continue to work. Developers will know of improvements you make and be able to make use of them. Breaking changes to your API will be rare, and if they occur, developers will have sufficient time and information to adapt their code. That will enable them to avoid breakage, enhancing trust. Changes to the API will be announced on the API's documentation site. Release changes initially to a test version of your API before applying them to the production version. Invite developers to test their applications on the test version and provide feedback.
Preserve identifiers The URI of a dataset will always dereference to the dataset or redirect to information about it. Check that dereferencing the URI of a dataset that is no longer available returns information about its current status and availability, using either a 410 or 303 Response Code as appropriate.
Assess dataset coverage Users will be able to make use of archived data well into the future. It is impossible to determine what will be available in, say, 50 years' time. However, one can check that an archived dataset depends only on widely used external resources and vocabularies. Check that unique or lesser-used dependencies are preserved as part of the archive.
Gather feedback from data consumers Data consumers will be able to provide feedback and ratings about datasets and distributions. Check that at least one feedback mechanism is provided and readily discoverable by data consumers.
Make feedback available Consumers will be able to assess the kinds of errors that affect the dataset, review other users' experiences with it, and be reassured that the publisher is actively addressing issues as needed. Consumers will also be able to determine whether other users have already provided similar feedback, saving them the trouble of submitting unnecessary bug reports and sparing the maintainers from having to deal with duplicates. Check that any feedback given by data consumers for a specific dataset or distribution is publicly available.
Enrich data by generating new data Datasets with missing values will be enhanced by filling those values. Structure will be conferred and utility enhanced if relevant measures or attributes are added, but only if the addition does not distort analytical results, significance, or statistical power. Look for missing values in the dataset or additional fields likely to be needed by others. Check that any data added by inferential enrichment techniques is identified as such and that any replaced data is still available. Check that code used to enrich the data is available. Check whether the metadata being extracted is in accordance with human knowledge and readable by humans.
Provide Complementary Presentations Complementary data presentations will enable human consumers to have immediate insight into the data by presenting it in ways that are readily understood. Check that the dataset is accompanied by some additional interpretive content that can be perceived without downloading the data or invoking an API.
Provide Feedback to the Original Publisher Better communication will make it easier for original publishers to determine how the data they post is being used, which in turn helps them justify publishing the data. Publishers will also be made aware of steps they can take to improve their data. This leads to more and better data for everyone. Check that you have a record of at least one communication informing the publisher of your use of the data.
Follow Licensing Terms Data publishers will be able to trust that their work is being reused in accordance with their licensing requirements, which will make them more likely to continue to publish data. Reusers of data will themselves be able to properly license their derivative works. Read through the original license and check that your use of the data does not violate any of the terms.
Cite the Original Publication End users will be able to assess the trustworthiness of the data they see and the efforts of the original publishers will be recognized. The chain of provenance for data on the Web will be traceable back to its original publisher. Check that the original source of any reused data is cited in the metadata provided. Check that a human-readable citation is readily visible in any user interface.