Warning:
This wiki has been archived and is now read-only.

RequirementsFromFPWDBP

From Data on the Web Best Practices
Jump to: navigation, search


On the ongoing requirement elicitation/refinement process. (Note from Riccardo)

I've tried to follow the “Eating your own dog food” tradition. So I've started considering each BP in the FPWD, and thinking about requirements that each BP could suggest if we consider the quality related information as a special case of open [meta]data that must be shared on the web.

Of course, that is just one of the possible ways to proceed

Requirements are listed in the table below, trying to track the following piece of information.

  • Best practice: the BP from which I've taken inspiration when writing the requirement;
  • Requirement: a verbose sentence explaining the requirements. Requirements are extracted by analogy with BP written in the FPWD BP, or considering some of my experience in the field;
  • Competency questions: Some of the requirements can be turned in set of “competency questions (CQ)” for the quality Vocabulary. Hopefully Competency questions are close to the “concrete requirements”, Antoine had in mind;

I have to admit that in this early phase, I tried to be imaginative, including CQs that might be not in the scope of quality vocabulary, thus I wouldn't be surprised if the group decide to discard (large!?? ) part of the requirements I have listed.

In some cases, competency questions might suggest to introduce terms that substantially or partially overlap with other known vocabularies ( e.g, DAQ, PROV-O, DCAT, RDFCUBE). In these cases, the corresponding known vocabularies are indicated in the Vocabularies to check column. so that, the group can decide, afterwards, and case by case, if include overlapping terms as brand new terms in the quality vocabulary, or as specialization of well-known vocabulary (e.g., by using rdfs:subPropertyOf, rdfs:subClassOf) or directly by deploying the terms offered in the existing vocabulary namespace.

TODO

  • to figure out requirements that can be extracted from BP 15 to BP 27;
  • to double-check competency questions extracted so far;
  • are there other competency questions that might make sense?
  • to refine/split/group CQs as needed.

Requirements

no. Requirement Competency questions Vocabularies to check Best Practice
1 It should be possible for computer applications, notably search tools, to locate and process the quality of datasets easily, which implies, quality of data should be human and machine readable - -
  • Best Practice 2: “Use machine-readable formats to provide metadata”
  • Best Practice 8: “Provide data quality information”
2 Quality should be stated via standardized vocabulary. Possibly expressed in RDF having HTTP URI for terms, and having multilingual description for terms. [Of course, also other kind of technological approaches, let's say less linked data oriented (e.g., schema.org /microdata), can be considered in order to make the quality vocabulary appealing for community not so interested in Linked data] - - Best Practices 3: “Use standard terms to define metadata”
3 It should be possible for consumers to interpret the meaning of quality, for example, quality scores should be provided mentioning the scale they range, the quality dimensions measured and metrics adopted. Is the quality score associated to that dataset good or bad ? DAQ, RDFDATACUBE Best Practice 5: “Provide locale parameters metadata”
4 Quality info should be licences, search tool and humans should know if and how they can use it. Example Example Best Practice 6: “Provide data license information”
5 It should be possible to determine the provenance of quality.
  • who has provided the quality info?;
  • Is the quality authoritatively provided?
  • is quality certified by anyone?
  • is quality lively evaluated?
  • What service / program has been adopted to work that quality assessment ?
PROV-O Best Practice 7: “Provide data provenance information”
6 Data Quality might be expressed according to different quality dimensions relying on metrics / feedback opinions. In particular, (i) results from cross-domain metrics/measures as well as domain specific metrics/measures should be representable in the quality vocabulary. (ii) The set of quality metrics/measures and quality dimensions considered in an quality assessment should be left open.
  • what kind of quality representation is provided? (Metric-based, feedback opinion, description of known quality issues)
  • which quality metric has been deployed?
  • what that metric stands for? /what that metric measures?
  • what kind of quality dimension have been evaluated?
DAQ Best Practice 8: “Provide data quality information”
7 known quality issues should be documented at least for human consumers - - Best Practice 8: “Provide data quality information”
8 Quality information should be associated with a specific release/distribution of a dataset, date-time info about when the evaluation has been performed should be indicated, so that, the change in terms of quality over time can be tracked.
  • when quality [or even, a particular measure of quality] has been last assessed on dataset X [or even a distribution Y of the datataset X]?
  • does the newest release/distribution of dataset X have a better quality than the previous?
DAQ, PROV-O, DCAT Best Practice 10: “Provide version history”
9 Every first class citizen quality concept such as quality dimensions, metrics, etc should have a unique ID [Possibly a HTTP IRI ???!??!?]. E E Best Practice 11: “Use unique identifiers”
10 quality should be available in non proprietary formats - - Best Practice 13: “Use open data formats”
11 quality should be available in multiple formats - - Best Practice 14:” Provide data in multiple formats
E E E E E