Quality Aspects In Use Cases

From Data on the Web Best Practices
Jump to: navigation, search

Which are the parts of UC descriptions relevant for the Quality and Granularity vocabulary?

Parent page: https://www.w3.org/2013/dwbp/wiki/Quality_Requirements_From_UCR

Source: http://www.w3.org/TR/2015/NOTE-dwbp-ucr-20150224/#use-cases-1

The table below has:

  • UC: the UC from which inspiration has been taken;
  • Quotes: *all* sentences about Q&G aspects in the description of the UC
  • Requirements : Reqs for the UC, which are potentially in scope for the vocabulary.

NB: right now, reqs are generally listed only the first time they appear.


UC Quotes Requirements
1 ASO: Airborne Snow Observatory Quality: Available in a number of scientific formats to customers and stakeholders based on customer requirements
  • R-GranularityLevels
  • R-DataMissingIncomplete
  • R-ProvAvailable (a general one coming in many requirements)
  • R-QualityMetrics
  • R-MetadataAvailable (as a placeholders for many things, which may be out of scope for Q&G)
2 BBC Quality: High level and domain vocabularies adapted to BBC applications.

R-MetadataStandardized

3 Bio2RDF Quality: Bio2RDF scripts generate provenance records using VoID, PROV and DC. A date-specific dataset IRI is linked to a unique dataset IRI using the PROV predicate wasDerivedFrom such that one can retrieve all provenance records for datasets created on different dates. Each resource in the dataset is linked the date-unique dataset IRI that is part of the provenance record using the VoID inDataset predicate. Provenance indicates the time at which the RDF was generated, licensing (if available from the data source provider), dc:creator link to the script on Github that was used to generate a dataset, the void:sparqlEndpoint to point to the dataset SPARQL endpoint, and void:dataDump to point to the data download URL.

Dataset metrics:

   total number of triples
   number of unique subjects
   number of unique predicates
   number of unique objects
   number of unique types
   unique predicate-object links and their frequencies
   unique predicate-literal links and their frequencies
   unique subject type-predicate-object type links and their frequencies
   unique subject type-predicate-literal links and their frequencies
   total number of references to a namespace
   total number of inter-namespace references
   total number of inter-namespace-predicate references
  • R-VocabReference (should we try to get an indicator of how much a dataset re-uses standard vocabularies? This may bring us into really broad scope, including all requirements on vocabularies...)
  • R-AccessUpToDate (similar issue: we may end up wanting indicators for all access requirements)
4 BuildingEye: SME use of public data Quality: standardized, interoperable across local authorities
  • R-QualityCompleteness
  • R-QualityComparable
5 Dados.gov.br Quality: Authoritative, clean data, vetted and guaranteed.

R-QualityOpinions

6 Digital archiving of Linked Data

R-PersistentIdentification (if we decide some info about preservation and persistence should be part of quality info)

7 Dutch Base Registers Governmental data has to be traceable/trustable as such.


8 GS1 Digital Quality: Very important to have trustworthy authoritative data from respective organizations.

Challenges: An organization (e.g. retailer) might embed authoritative data asserted by another organization (e.g. brand owner) and there is the risk that such embedded information becomes stale if it is not continuously synchronized.

Potential Requirements:

  • The ability to determine who asserted various facts — and whether they are the organization that can assert those facts authoritatively.
  • If the data about a product is inaccurate or out-of-date, we might need to provide some guidance about how liability terms and disclaimers can be expressed in Linked Open Data.

Q: is the first 'potential req' related to granularity?

9 ISO GEO Story Challenges: A unified way to have access to each record within the catalog at different levels: local, regional, national or EU level.

R-GranularityLevels

10 The Land Portal Quality: Every sort of data, from high quality to unverified.

R-GranularityLevels R-QualityCompleteness R-QualityMetrics

11 LA Times' Reporting of Ron Galperin's Infographic The methodology used is not explained - making it hard to assess trustworthiness. How can provenance be described?

R-DataProductionContext R-GeographicalContext R-QualityMetrics R-UniqueIdentifier (if relevant for quality)

12 LusTRE: Linked Thesaurus fRamework for Environment

Quality: Largely variable.

Challenges: Assessment and documentation of dataset and linkset quality with domain-dependent quality metrics.

LusTRE considers the heterogeneity in scope and levels of abstraction of existing environmental thesauri as an asset. It includes a review of thesauri and their characteristics in term of multilingualism, openness and quality. Expressing dataset and linkset quality would be needed to make accessible the quality assessment of thesauri.

Quality of thesauri and linksets is not necessarily limited to the initial review of thesauri, it should be monitored and promptly documented.

http://www.edbt.org/Proceedings/2013-Genova/papers/workshops/a8-albertoni.pdf presents measures for quality of linksets.

R-Citable, R-DataEnrichment, R-DataVersion, R-ProvAvailable, R-QualityComparable, R-QualityCompleteness, R-QualityMetrics, R-QualityOpinions, etc.

13 Machine-readability and Interoperability of Licenses
14 Mass Spectrometry Imaging (MSI) Quality: varies with mass spectrometry instrument used, preparation of sample. note AI: this quality of content (images)!

R-DataEnrichment R-QualityCompleteness R-QualityMetrics

15 OKFN Transport WG Perceived liability risks, often associated with data quality issues, prevent operators from opening up their data.

R-DataMissingIncomplete R-DataProductionContext R-GeographicalContext R-QualityComparable R-QualityCompleteness R-QualityMetrics

16 Open City Data Pipeline Challenges:
  • Incomplete data (can be overcome using semantic technologies and/or statistical methods).
  • Heterogeneity (indicators, licenses, formats).
  • Metadata is not always uniform, not only titles of columns, but standardization about units, etc.

R-DataMissingIncomplete R-DataProductionContext R-GeographicalContext R-QualityComparable R-QualityCompleteness

17 Open Experimental Field Studies For measurements to be considered useful and comparable to other findings scientists need to track every aspect of their laboratory and field experiments. This can include: background describing the purpose of the experiment, [...] quality assurance, problem reporting [...] quality control codes selected...

Quality: House keeping data, problem reporting, maintenance history, calibration history.

Negative aspects: When data is published on the Web there is no mechanism for users to rate and review data.

Challenges:

  • Publishing experiments to publically accessible Web-based archives.
  • Advertising experiments in catalogs that includes comprehensive information about the things and services used in the experiment.
  • Providing composite experiment in such a way that it is useful to users that are not fellow collaborators.

R-AccessRealTime R-DataIrreproducibility R-DataLifecycleStage R-DataProductionContext R-QualityOpinions

18 Resource Discovery for Extreme Scale Collaboration (RDESC) Quality: is important to maintain correctness and quality of search result.

Challenges:

  • Scalability of such systems.
  • Metadata about Quality of Published Data.
  • Frequency of Data Update..
  • User Feedback for data correction/annotation

Potential Requirements:

  • Use of Persistent URIs.
  • Requirements to publish quality of published data.

R-AccessRealTime R-DataMissingIncomplete R-ProvAvailable R-SLAAvailable

19 Recife Open Data Portal Quality: Verified and clean data.

Challenges: Automate the data publishing process to keep data up to date and accurate.

R-QualityComparable, R-QualityCompleteness

20 Retrato da Violência (Violence Map) Quality: not guaranteed. (!)

Negative Aspects: the data is already outdated (in 2014)

R-AccessUpToDate R-QualityCompleteness

21 Share-PSI 2.0 Report from which many requirements can be derived http://www.w3.org/2013/share-psi/workshop/samos/report

R-AccessRealTime, R-AccessUpToDate, R-GeographicalContext, R-ProvAvailable, R-QualityComparable, R-QualityOpinions

22 Tabulae - how to get value out of data Quality: The information must be at least semi-structured (for instance, an spreadsheet).

Challenges:

  • Quality of data and metadata.
  • Inconsistency between different data sources.
  • Internationalization and format issues (e.g., languages, numbers, dates, etc.)

R-AccessUpToDate R-FormatLocalize R-ProvAvailable R-QualityComparable R-QualityCompleteness

23 UK Open Research Data Forum Quality: Variable - often empirical, often messy. Some of the data may not be repeatable.
R-ProvAvailable
24 Uruguay Open Data Catalog Quality: Most of the data is realized properly, with complete or near complete metadata.

Challenges: Automated publication process using harvesting or similar tools. Alerts or control panels to keep data updated.

R-DataMissingIncomplete R-AccessLevel

25 Web Observatory Quality: Variable, depend on the data source, can be structured or not.

Challenges: Data velocity; Data variety

R-DataEnrichment R-GranularityLevels R-ProvAvailable

26 Wind Characterization Scientific Study The DMF will record all processing history, quality assurance work, problem reporting, and maintenance activities for both instrumentation and data.