Quality Requirements From UCR

From Data on the Web Best Practices
Jump to: navigation, search

Parent page: https://www.w3.org/2013/dwbp/wiki/Data_quality_notes#Scoping_and_requirements_from_DWBP_WG

On the ongoing process

The UCR document lists relevant requirement for data quality and granularity (Q&G): :

  • R-DataMissingIncomplete: 'Publishers should indicate if data is partially missing or if the dataset is incomplete'
  • R-QualityComparable: 'Data should be comparable with other datasets'
  • R-Data should be complete: 'Data should be complete'
  • R-QualityMetrics: 'Data should be associated with a set of documented, objective and, if available, standardized quality metrics. This set of quality metrics may include user-defined or domain-specific metrics.'
  • R-QualityOpinions: 'Subjective quality opinions on the data should be supported'
  • R-GranularityLevels: 'Data available at different levels of granularity should be accessible and modelled in a common way'

We have to confirm whether the scope of Q&G work is indeed these "official" Q&G reqs or if we should go beyond, e.g. reflecting the quality of the vocabulary (re-)used, access to datasets, metadata and more generally the implementation of our best practices (cf. the "5 stars" thread).

The distinction between intrinsinc and extrinsic metadata may help making choices here. For example, Q&G could be defined wrt. intrinsic properties of the datasets, not extrinsinc properties (let alone properties of the metadata for a dataset!)

Analysis of existing requirements

A full listing of requirements in the current UCR doc and their relevance to Q&G is available at https://www.w3.org/2013/dwbp/wiki/Requirements_In_Scope_For_Quality

Analysis of existing UCs

Existing requirements are rather generic, so to get more material quality-related quotes have been gathered each UC description. We can later come back to the reqs; for each req, aggregating the relevant quotes from the UC that list them.

This is tedious, a first look at the UCs shows that many UCs say they have quality-related requirements, but do not say much about the reason for these requirements, and how they could be tackled (e.g. 4). Conversely, some UCs do not list official quality reqs but they deliver some interesting insights (e.g 3)

There is also the case of some requirements not directly listed as Q&G reqs, but that UCs may relate to Q&G concerns.

The analysis is at https://www.w3.org/2013/dwbp/wiki/Quality_Aspects_In_Use_Cases

Getting more precise info from UC owners

Quality Questionnaire (work-in-progress)