(Revised) PSI Directive Theme: Quality
Data quality has been one of the major concern of reusers. It is one of the most difficult tasks to achieve for public bodies that intend to publish existing datasets as fast as possible to ensure that they are shared with the community. However the quality of data is critical to ensure reuse and support the whole ecosystem of service and apps creation. The revised PSI directive states that:
To facilitate re-use, public sector bodies should, where possible and appropriate, make documents available through open and machine-readable formats and together with their metadata, at the best level of precision and granularity, in a format that ensures interoperability,
It therefore defines quality criteria for the data that is to be made available following the PSI directive implementation. Those criteria are related to formats, as well as metadata associated.
The Guidelines on recommended standard licences, datasets and charging for the reuse of documents (2014/C 240/01) following the (Revised) PSI directive state that
In order to maximise the intended benefits of […] ‘high-demand’ datasets, particular attention should be paid to ensuring their availability, quality, usability and interoperability.
Quality in this document is in particular related to update, granularity, persistence and referentiability, the presence of metadata, and data formats. However it also highlights the importance of involving re-users in the maintenance of data quality over time. While more datasets have been published and reused, data quality has emerged as a major concern for both data publishers and data consumers. The W3C is working on a Data Quality vocabulary], while the European Commission has funded the Open Data Support (PDF) project to define quality of Open Data and associated metadata.
The characteristics of data that are recommended are illustrated in a set of best practices that take into consideration feedback of reusers and potential reusers to improve the data, the creation of metadata that support the discovery and reuse of data, the provision of versioning information, the unambiguous and persistent identification of datasets and finally the definition of quality criteria for the datasets. The following Best Practices develop these ideas further.
- Enrich data by generating new metadata
- Gather feedback from data consumers
- Make feedback available
- Provide data quality information
- Preserve identifiers
- Use persistent URIs as identifiers of datasets
- Use persistent URIs as identifiers within datasets
- Provide Complementary Presentations
- Provide Feedback to the Original Publisher
- Provide data provenance information
- Assign URIs to dataset versions and series
- Provide version history
- Provide a version indicator