Abstract

This document provides a framework in which the quality of a dataset can be described, whether by the dataset publisher or by a broader community of users. It does not provide a formal, complete definition of quality, rather, it sets out a consistent means by which information can be provided such that a potential user of a dataset can make his/her own judgment about its fitness for purpose.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

The model for the Data Quality Vocabulary is nearing maturity, but the Working Group is seeking feedback on a number of specific issues highlighted in the document below.

This document was published by the Data on the Web Best Practices Working Group as a Working Draft. If you wish to make comments regarding this document, please send them to public-dwbp-comments@w3.org ( subscribe , archives ). All comments are welcome.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy . The group does not expect this document to become a W3C Recommendation. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy .

This document is governed by the 1 September 2015 W3C Process Document .

1. Introduction

This section is non-normative.

The Data on the Web Best Practices Working Draft has pointed out the relevance of publishing information about the quality of data published on the Web . Accordingly, the Data on the Web Best Practices Working Group has been chartered to create a vocabulary for expressing data quality. The Data Quality Vocabulary (DQV) presented in this document is foreseen as an extension to DCAT [ vocab-dcat ] to cover the quality of the data, how frequently is it updated, whether it accepts user corrections, persistence commitments etc. When used by publishers, this vocabulary will foster trust in the data amongst developers.

This vocabulary does not seek to determine what "quality" means. We believe that quality lies in the eye of the beholder; that there is no objective, ideal definition of it. Some datasets will be judged as low-quality resources by some data consumers, while they will perfectly fit others' needs. In accordance, we attach a lot of importance to allowing many actors to assess the quality of datasets and publish their annotations, certificates, opinions about a dataset. A dataset's publisher should seek to publish metadata that helps data consumers determine whether they can use the dataset to their benefit. However, publishers should not be the only ones to have a say on the quality of data published in an open environment like the Web. Certification agencies, data aggregators, data consumers can make relevant quality assessments, too.

We want to stimulate this by making it easier to publish, exchange and consume quality metadata, for every step of a dataset's lifecycle. This is why next to rather expected constructs like quality measures, the Data Quality Vocabulary puts a lot of an emphasis on feedback, annotation, agreements and agreements.

Note that DQV elements can be applied not only to express metadata on the provenance quality of datasets; they can also be used to express statements about the quality of that metadata itself. This is especially true when it comes to representing the provenance of that describes them. metadata or its conformance with respect to established metadata standards.

2. Namespaces

The namespace for DQV is provisionally set as http://www.w3.org/ns/dqv# . . DQV, however, seeks to re-use elements from other vocabularies, notably DCAT , following the best practices for data vocabularies identified by the Data on the Web Best Practices Working Group.

Note The Working Group is considering putting all new classes and properties defined in the DWBP Vocabularies in the DCAT namespace. As an attempt to stimulate reactions which might help in taking a decision, the Dataset Usage Vocabulary will be moved under the DCAT namespace. In case of positive reactions to the DUV choice, the data quality vocabulary might consider to go in the same direction.

The table below indicates the full list of namespaces and prefixes used in this document.

Prefix Namespace
daq http://purl.org/eis/vocab/daq#
dcat http://www.w3.org/ns/dcat#
dcterms http://purl.org/dc/terms/
dqv http://www.w3.org/ns/dqv#
duv http://www.w3.org/ns/duv#
oa http://www.w3.org/ns/oa#
prov http://www.w3.org/ns/prov#
sdmx-attribute http://purl.org/linked-data/sdmx/2009/attribute#
skos http://www.w3.org/2004/02/skos/core#

3. Vocabulary Overview

The following vocabulary is based on DCAT [ vocab-dcat ] that it extends with a number of additional properties and classes suitable for expressing the quality of a dataset.

The quality of a given dataset of or distribution is assessed via a number of observed properties. For instance, one may consider a dataset to be of high quality because it complies to a specific standard while for other use-cases the quality of the data will depend on its level of interlinking with other datasets. To express these properties an instance of a dcat:Dataset or dcat:Distribution can be related to four five different types of quality information represented by the following classes:

DQV defines quality measures as specific instances of DQV observations, Quality Measurements, adapting the DAQ daQ quality metrics framework [ DaQ ], [ DaQ-RDFCUBE ]: ]. It relies on quality dimensions and quality metrics.

Besides quality measurements, DQV considers certificates, standards, and quality policies, which can also be organized according to dimensions. Quality metadata containers ( dqv:QualityMetadata ) can group together different quality statements, so that their provenance can be tracked jointly.

Fig. 1 Data model showing the main relevant classes and their relations.

N.B.: "containment" refers to the inclusion of quality statements into "containers", which may or may not be treated as (RDF) graphs (see later example and the usage note for the class dqv:QualityMetadata ).

Quality information can be derived from other quality information. For example, a dimension could quality annotation can be "multilinguality" and two metrics could derived from a standard or a quality measurement. Quality measurements can be "ratio of literals with language tags" derived from other measurements. Metrics can be derived from other metrics. A standard can be built on another standard or a (set of) metrics. DQV models such derivations through the property prov:wasDerivedFrom as illustrated in the diagram below.

Fig. 2 Using the property prov:wasDerivedFrom to interrelate quality metrics and "number other quality statements.

4. Vocabulary specification

Note

This section is work in progress. We will include later more tables with specification of different language tags". individual classes and properties.

4.1 Class: Quality Measurement

The following properties should be used on this class: dqv:hasMetric dqv:isMeasurementOf , dqv:value , qb:dataSet .

Issue 3 Should (and if yes, how) DQV represent multiple/derived values for a metric (e.g., average or normalized value)? ( Issue-222 )
Issue 4 1

Should (and if yes, how) DQV represent parameters for a metric applied for computing a specific quality measure measurement (e.g.,a specific setting of weights)? ( Issue-223 )

RDF Class: dqv:QualityMeasure dqv:QualityMeasurement
Definition: A quality measure measurement represents the evaluation of a given dataset (or dataset distribution) against a specific quality metric.
Subclass of: qb:Observation
Equivalent class daq:Observation
Usage note: The unit of measure in quality measurement should be specified through the property sdmx-attribute:unitMeasure as recommended by RDF Data Cube [ Vocab-Data-Cube ]. The Ontology of units of Measure (OM) [ RijgersbergEtAl ] provides a list of HTTP dereferenceable unit of measures which can be exploited as values for sdmx-attribute:unitMeasure .

4.1.1 Property: hasMetric isMeasurementOf

Issue 5 Is dqv:hasMetric correct or should it be dqv:metric? The issue is complicated by the relationship between DQV and both QB and daQ. Issue-231
RDF Property: dqv:hasMetric dqv:isMeasurementOf
Definition: Indicates the metric being observed.
Instance of: qb:DimensionProperty
Domain: qb:Observation
Range: dqv:Metric
Equivalent Property daq:metric

4.1.2 Property: dataSet

RDF Property: qb:dataSet
Definition: Indicates the dataset to a quality measure measurement (which is an RDF Data Cube observation) belongs.
Domain: qb:Observation
Range: qb:DataSet

4.1.3 Property: computedOn

RDF Property: dqv:computedOn
Definition: Refers to the resource (e.g., a dataset, a linkset, a graph, a set of triples) on which the quality measurement is performed. In the DQV context, this property is generally expected to be used in statements in which objects are instances of dcat:Dataset and dcat:Distribution .
Instance of: qb:DimensionProperty
Domain: dqv:QualityMeasure dqv:QualityMeasurement
Equivalent property: daq:computedOn
Inverse property: dqv:hasQualityMeasure dqv:hasQualityMeasurement

4.1.4 Property: value

RDF Property: dqv:value
Definition: Refers to values computed by metric.
Instance of: qb:MeasureProperty , owl:DatatypeProperty
Domain: dqv:QualityMeasure dqv:QualityMeasurement
Equivalent property: daq:value

4.2 Class: Metric

The following properties should be used on this class: dqv:hasDimension dqv:inDimension .

Issue 6 In daQ, the property daq:expectedDataType associates each metric to the expected data type for its observed value. Data types for observed values are restricted to xsd:anySimpleType (e.g. xsd:boolean, xsd:double etc…). Is the current practice of using daq:expectedDataType in daQ appropriate? Isn't the restriction to xsd:anySimpleType too narrow? ( Issue-224 )
RDF Class: dqv:Metric
Definition: A standard to measure a quality dimension. An observation (instance of dqv:QualityMeasure) dqv:QualityMeasurement) assigns a value in a given unit to a Metric.
Equivalent class daq:Metric

4.2.1 Property: hasDimension expectedDataType

Usage note: Dimensions are meant to systematically organize metrics. The Data Quality Vocabulary defines no specific cardinality constraints for dqv:hasDimension, since distinct quality frameworks might have different perspectives over a metric. A metric may therefore be associated to more than one dimension. However, those who define new quality measures should try to avoid this as much as possible and assign only one dimension to the metrics they define.
RDF Property: dqv:hasDimension dqv:expectedDataType
Definition: Represents the dimension a metric allows a measurement of. expected data type for metric's observed value (e.g. xsd:boolean, xsd:double etc...)
Domain: dqv:Metric
Range: dqv:Dimension xsd:anySimpleType
Inverse: Equivalent property daq:hasMetric daq:expectedDataType

4.3 Class: Dimension

The following properties should be used on this class: dqv:hasCategory dqv:inCategory .

RDF Class: dqv:Dimension
Definition: Represents criteria relevant for assessing quality. Each quality dimension must have one or more metric to measure it. A dimension is linked with a category using the dqv:hasDimension dqv:inDimension property.
Subclass of: skos:Concept
Equivalent class daq:Dimension

4.3.1 Property: hasCategory inCategory

RDF Property: dqv:hasCategory dqv:inCategory
Definition: Represents the category a dimension is grouped in.
Domain: dqv:Dimension
Range: dqv:Category
Inverse: daq:hasDimension
Usage note: Categories are meant to systematically organize dimensions. The Data Quality Vocabulary defines no specific cardinality constraints for dqv:hasCategory, dqv:inCategory, since distinct quality frameworks might have different perspectives over a dimension. A dimension may therefore be associated to more than one category. However, those who define new quality measures metrics should try to avoid this as much as possible and assign only one category to the dimensions they define.

4.4 Class: Category

RDF Class: dqv:Category
Definition: Represents a group of quality dimensions in which a common type of information is used as quality indicator.
Subclass of: skos:Concept
Equivalent class daq:Category
Note

Dimension and category are abstract entities. We represent instances dqv:Dimension and dqv:Category as instances of skos:Concept , which we think enable similar features as these for dimensions and categories in daQ. Our representation choice differs more significantly for metrics, however. daQ uses RDFS/OWL classes and subclasses to represent constraints on measurements (e.g., on the type of values). RDFS/OWL however makes an 'open world' assumption that does not allow one to capture entirely all constraints. Additionally, languages are currently being defined to represent constraints in more appropriate ways (SHACL). We think it is therefore not appropriate now to recommend to treat specific metrics as subclasses of dqv:Metric, and we refer implementers to future progress on SHACL and related technology.

4.5 Class: Quality Measure Measurement Dataset

RDF Class: dqv:QualityMeasureDataset dqv:QualityMeasurementDataset
Definition: Represents a dataset of quality measures, measurements, evaluations of a given dataset (or dataset distribution) against a specific quality metric.
Subclass of: qb:DataSet
Equivalent class daq:QualityGraph

4.6 Class: Quality Policy

RDF Class: dqv:QualityPolicy
Definition: Represents a policy or agreement that is chiefly governed by data quality concerns.

4.6 4.7 Class: Quality Annotation

EquivalentClasses( dqv:QualityAnnotation ObjectHasValue( oa:motivatedBy dqv:qualityAssessment )
RDF Class: dqv:QualityAnnotation
Definition: Represents quality annotations, including rating, quality certificate, feedback that can be associated to datasets or distributions. Quality annotations must have one oa:motivatedBy statement with an instance of oa:Motivation (and skos:Concept), which reflects a quality assessment purpose. We define this instance as dqv:qualityAssessment.
Subclass of: oa:Annotation
Equivalent class
EquivalentClasses(

    dqv:QualityAnnotation 
    ObjectHasValue( oa:motivatedBy dqv:qualityAssessment )  	

)

Note

To make the document more self-contained we might consider to describe some properties of oa:Annotation, such as hasBody, hasTarget.

4.8 Class: Quality Certificate

RDF Class: dqv:QualityCertificate
Definition: An annotation that associates a resource (especially, a dataset or a distribution) to another resource (for example, a document) that certifies the resource's quality according to a set of quality assessment rules.
Subclass of: dqv:QualityAnnotation

4.7 4.9 Class: User Quality Feedback

RDF Class: dqv:UserQualityFeedback
Definition: Represents feedback users might want to associate to datasets or distributions. Besides dqv:qualityAssessment which is the motivation required by all quality annotations, one of the predefined instances of oa:Motivation should be indicated as motivation to distinguish among the different kinds of feedback, e.g, classifications, questions.
Subclass of: dqv:QualityAnnotation duv:UserFeedback

Issue 7 4.10 Class: Quality Metadata

RDF Class: Should we exploit predefined instances of oa:Motivation dqv:QualityMetadata
Definition: Represents quality metadata, it is defined to further characterize group quality certificates, policies, measurements and annotations under a user's feedback purposes? ( Issue-201 named graph.
Subclass of: rdfg:Graph ) Combining
Usage note:

QualityMetadata containers do not necessary include all types of quality statements DQV can support. Implementers decide the predefined instances granularity of oa:Motivation with containment. In the dqv:qualityAssessment current version of DQV, we could distinguish different kinds also leave open the choice of for user feedbacks, for example: dqv:qualityAssessment plus oa:editing might indicate a request for the containment "technique". Implementers can use (RDF) graph containment . They may also use a modification or edit, which relates dedicated property of their choice to the quality link instances of dqv:QualityMetadata with instances of other DQV classes. For example using (a subproperty of) dcterms:hasPart .

4.11 Property: In Dimension

RDF Property: dqv:inDimension
Definition: Represents the target dataset/distribution dqv:qualityAssessment plus oa:questioning might express dimensions a question issued about quality metric, certificate and annotation allow a measurement of.
Range: dqv:Dimension
Equivalent to:
SubObjectPropertyOf( 
   ObjectInverseOf( daq:hasMetric )  

   dqv:inDimension                                     

)
Usage note: Dimensions are meant to systematically organize metrics, quality certificates and quality annotations. The Data Quality Vocabulary defines no specific cardinality constraints for dqv:inDimension, since distinct quality of the dataset/distribution dqv:qualityAssessment plus oa:classification frameworks might represent the assignment of a classification type, typically from have different perspectives over a controlled vocabulary or list, to the target resource(s). For example, it could metric. A metric may therefore be used associated to more than one dimension. However, those who define new quality metrics should try to avoid this as much as possible and assign only one dimension to classify a dataset/distribution against a rating system (e.g., the 5 Stars linked open data rating system). metrics they define. More than one dimension can be indicated for each quality annotation or certificate.

4.8 4.12 Property: Has Quality Measure Measurement

RDF Property: dqv:hasQualityMeasure dqv:hasQualityMeasurement
Definition: Refers to the performed quality measurements. Quality measurements can be performed to any kind of resource (e.g., a dataset, a linkset, a graph, a set of triples). However, in the DQV context, this property is generally expected to be used in statements in which subjects are instances of dcat:Dataset and dcat:Distribution .
Range: dqv:QualityMeasure dqv:QualityMeasurement
Inverse property: dqv:computedOn

5. 4.13 Example Usage Property: Was Derived From

RDF Property: prov:wasDerivedFrom This section
Definition: A derivation is non-normative. a transformation of an entity into another, an update of an entity resulting in a new one, or the construction of a new entity based on a pre-existing entity.
Domain: prov:Entity
Range: prov:Entity
Usage note:

prov:wasDerivedFrom expresses a quite abstract relation of derivation. More specialized relations of derivation can be defined as subproperties of prov:wasDerivedFrom, whenever this is required by applications.

This

The section entitled "Expressing derivation between quality metrics, measurements and annotations" shows some examples to illustrate the application of the Dataset uses of this property.

4.14 Instance: Quality Vocabulary. Assessment

RDF Instance: dqv:qualityAssessment
Definition: Motivation that must be specified for quality annotations.
Instance of: oa:Motivation
Note

This section is still work in progress. Further examples will be provided as soon as some of Whenever DQV implementers need to extend the pending issues are resolved. We invite motivations for quality annotations, they should follow the public to contact instructions provided by the editors Web Annotation Data Model, and submit relevant examples of quality data, even not yet represented the concepts in DQV. We welcome your input! the extension should be defined as specializations of dqv:qualityAssessment .

4.15 Instance: Precision

RDF Instance: dqv:precision
Definition: Precision is a quality dimension which refers to the recorded level of details. It represents the exactness of measurement or description.
Instance of: dqv:Dimension
Equivalent to

iso:precision

5. Example Usage

This section shows some examples to illustrate the application of the Dataset Quality Vocabulary.

NB: in the remainder of this section, the prefix " : " refers to http://example.org/

5.1 Express a quality assessment with quality metrics

Let us consider a dataset myDataset , , and its distribution myDatasetDistribution , :myDataset a dcat:Dataset ; dcterms:title "My dataset" ; dcat:distribution :myDatasetDistribution . ,
:myDataset 

    a  dcat:Dataset ;
    dcterms:title "My dataset" ; 
    dcat:distribution :myDatasetDistribution
    .


:myDatasetDistribution

    a  dcat:Distribution ;
    dcat:downloadURL <http://www.example.org/files/mydataset.csv> ;
    dcterms:title "CSV distribution of dataset" ;
    dcat:mediaType "text/csv" ;
    dcat:byteSize "87120"^^xsd:decimal 
.
:myDatasetDistribution
	a  dcat:Distribution ;
	dcat:downloadURL <http://www.example.org/files/mydataset.csv> ;
       	dcterms:title "CSV distribution of dataset" ;
       	dcat:mediaType "text/csv" ;
       	dcat:byteSize "87120"^^xsd:decimal 
	.

An automated quality checker has provided a quality assessment with two (CSV) quality measures measurements for myDatasetDistribution . .

:myDatasetDistribution dqv:hasQualityMeasure :measure1, :measure2 . :measure1 a dqv:QualityMeasure ; dqv:computedOn :myDatasetDistribution ; dqv:hasMetric :csvAvailabilityMetric ; dqv:value "1.0"^^xsd:double . :measure2 a dqv:QualityMeasure ; dqv:computedOn :myDatasetDistribution ; dqv:hasMetric :csvConsistencyMetric ; dqv:value "0.5"^^xsd:double . #definition of dimensions and metrics :availabity a dqv:Dimension ; dqv:hasCategory :category1; . :consistency a dqv:Dimension ; dqv:hasCategory :category2 .
:myDatasetDistribution
    dqv:hasQualityMeasurement :measurement1, :measurement2
    .


:measurement1 

    a dqv:QualityMeasurement ;
    dqv:computedOn :myDatasetDistribution ;
    dqv:isMeasurementOf :downloadURLAvailabilityMetric ;
    dqv:value "true"^^xsd:boolean 
    .


:measurement2

    a dqv:QualityMeasurement ;
    dqv:computedOn :myDatasetDistribution ;
    dqv:isMeasurementOf :csvCompletenessMetric ;
    dqv:value "0.5"^^xsd:double 
    .


#definition of dimensions and metrics
:availability

    a dqv:Dimension ;
    skos:prefLabel "Availability"@en ;
    skos:definition "Availability of a dataset is the extent to which data (or some
    portion of it) is present, obtainable and ready for use."@en ; 
    dqv:inCategory :accessibility
    .


:completeness

    a dqv:Dimension ;
    skos:prefLabel "Completeness"@en ;
    skos:definition "Completeness refers to the degree to which all required information
    is present in a particular dataset."@en ;
    dqv:inCategory :intrinsicDimensions
    .



:downloadURLAvailabilityMetric 

    a dqv:Metric ;
    skos:definition "It checks if dcat:downloadURL  is available and  if its value is
    dereferenceable."@en ;
    dqv:expectedDataType xsd:boolean ;
    dqv:inDimension :availability
    .


:csvCompletenessMetric

    a dqv:Metric ; 	
    skos:definition "Ratio between the number of objects represented in the csv and the 
    number of  objects expected to be represented according to the declared dataset
    scope."@en ;
    dqv:expectedDataType xsd:double ;
    dqv:inDimension :completeness

	
:csvAvailabilityMetric 
	a dqv:Metric ;
	dqv:hasDimension :availabity
	.

.

:csvConsistencyMetric
	a dqv:Metric ; 
	dqv:hasDimension :consistency
	.
	

Categories and dimensions might be more extensively defined, see in the section 'Dimensions and metrics hints' for further examples. Any quality framework is free to define its own dimensions and categories.

Issue 8 Is there any reason for turning the classes dqv:Dimension, dqv:Metric and dqv:Category as well as the properties dqv:hasDimension and dqv:hasCategory into "abstract" classes and properties as they were defined in daQ (see Section "Extending the daQ" here )? ( Issue-204 ) Issue 9 Should we represent dimensions and categories as instances of skos:Concept ? This would allow publishers of quality framework to express (hierarchical) relations between dimensions or categories. This could also enable to align with quality-focused categorizations less focused on metrics. Including the DWBP Best Practices dimensions, or even the parts of DQV about annotations. ( Issue-205 )

5.2 Document the provenance of the quality metadata

The results of metrics obtained in the previous assessment are stored in the myQualityMetadata graph.

# myQualityMatadata is a graph :myQualityMetadata { :myDatasetDistribution dqv:hasQualityMeasure :measure1, :measure2 . }
# :myQualityMatadata is a graph 

:myQualityMetadata {


:myDatasetDistribution

    dqv:hasQualityMeasurement :measurement1, :measurement2
    .


# The graph contains the rest of the statements presented in the previous example.


}


# :myQualityMetadata has been created by :qualityChecker and it is the result of the
# :qualityChecking activity 


:myQualityMetadata 

    a dqv:QualityMetadata ;
    prov:wasAttributedTo :qualityChecker ;
    prov:generatedAtTime "2015-05-27T02:52:02Z"^^xsd:dateTime ;
    prov:wasGeneratedBy :qualityChecking 
    .


# :qualityChecker is a service computing some quality metrics 	


:qualityChecker

    a prov:SoftwareAgent ;   
    rdfs:label "a quality assessment service"^^xsd:string
    # Further details about quality service/software can be provided, for example,
    # deploying  vocabularies such as Data Usage Vocabulary (DUV), Dublin Core or ADMS.SW
    .


# the :qualityChecking is the activity that has generated :myQualityMetadata starting from 
# :myDatasetDistribution


:qualityChecking

    a prov:Activity;
    rdfs:label "the checking of myDatasetDistribution's quality"^^xsd:string;
    prov:wasAssociatedWith :qualityChecker;
    prov:used              :myDatasetDistribution;
    prov:generated         :myQualityMetadata;
    prov:endedAtTime      "2015-05-27T02:52:02Z"^^xsd:dateTime;
    prov:startedAtTime     "2015-05-27T00:52:02Z"^^xsd:dateTime

# myQualityMetadata has been created by: qualityChecker and it is the result of the :qualityChecking activity 
:myQualityMetadata 
	a dqv:QualityMetadata ;
	prov:wasAttributedTo :qualityChecker ;
	prov:generatedAtTime "2015-05-27T02:52:02Z"^^xsd:dateTime ;
	prov:wasGeneratedBy :qualityChecking 
	.

.

# qualityChecker is a service computing some quality metrics 	
:qualityChecker
	a prov:SoftwareAgent ;   
	rdfs:label "a quality assessment service"^^xsd:string
	
	.



# the qualityChecking is the activity that has generated myQualityMetadata starting from  MyDatasetDistribution    
:qualityChecking
	a prov:Activity;
	rdfs:label "the checking of myDatasetDistribution's quality"^^xsd:string;
   	prov:wasAssociatedWith :qualityChecker;
   	prov:used              :myDatasetDistribution;
   	prov:generated         :myQualityMetadata;
   	prov:endedAtTime      "2015-05-27T02:52:02Z"^^xsd:dateTime;
   	prov:startedAtTime     "2015-05-27T00:52:02Z"^^xsd:dateTime
	. 

5.3 Document the provenance of single quality measurement

Note

The group has discussed provenance at different level of granularity (dqv:QualityMeasure (dqv:QualityMeasurement and dqv:QualityMetadata), so dqv:QualityMetadata). In the previous example we might consider have shown how to add track provenance at level of quality metadata, in the following, we provide an example of provenance for dqv:QualityMeasure. the quality measurement :measurement.

:myDatasetDistribution
    dqv:hasQualityMeasurement :measurement 
    .


# :measurement has been created by :qualityChecker and it is the result of the
# :qualityChecking activity 



:measurement

    a dqv:QualityMeasurement ;
    dqv:computedOn :myDatasetDistribution ;
    dqv:isMeasurementOf :downloadURLAvailabilityMetric ;
    dqv:value "true"^^xsd:boolean ;
    prov:wasAttributedTo :qualityChecker ;
    prov:generatedAtTime "2015-05-27T02:52:02Z"^^xsd:dateTime ;
    prov:wasGeneratedBy :qualityChecking 
    .


:downloadURLAvailabilityMetric 

    a dqv:Metric ;
    skos:definition "It checks if dcat:downloadURL  is available and  if its value is
    dereferenceable."@en ;
    dqv:expectedDataType xsd:boolean ;
    dqv:inDimension :availability
    .

    
# :qualityChecker is a services computing some quality metrics 	


:qualityChecker

    a prov:SoftwareAgent ;   
    rdfs:label "a quality assessment service"^^xsd:string
    # Further details about quality service/software can be provided, for example,
    # deploying  vocabularies such as Data Usage Vocabulary (DUV), Dublin Core or ADMS.SW
    . 


# the :qualityChecking is the activity that has generated :measurement starting from 
# :myDatasetDistribution


:qualityChecking

    a prov:Activity;
    rdfs:label "the checking of myDatasetDistribution's quality"^^xsd:string;
    prov:wasAssociatedWith :qualityChecker;
    prov:used              :myDatasetDistribution;
    prov:generated         :measurement;
    prov:endedAtTime      "2015-05-27T02:52:02Z"^^xsd:dateTime;
    prov:startedAtTime     "2015-05-27T00:52:02Z"^^xsd:dateTime

.



5.4 Document the provenance of a dataset

Statements similar to the ones applied to the resource myQualityMetadata above can be applied to the resource myDataset to indicate the provenance of the dataset. I.e., a dataset can be generated by a specific software agent, be generated at a certain time, etc. The HCLS Community Profile for describing datasets provides further examples.

5.5 Express that a dataset received an ODI certificate

Let us express that an ODI certificate for the "City of Raleigh Open Government Data" dataset is available at the URL <https://certificates.theodi.org/en/datasets/393/certificate>.

<https://certificates.theodi.org/en/datasets/393> a dcat:Dataset ; dqv:hasQualityAnnotation :myDatasetQA .
<https://certificates.theodi.org/en/datasets/393> a dcat:Dataset ;
    dqv:hasQualityAnnotation :myDatasetQA .

:myDatasetQA 
    a dqv:QualityCertificate ;
    oa:hasTarget <https://certificates.theodi.org/en/datasets/393> ;
    oa:hasBody  <https://certificates.theodi.org/en/datasets/393/certificate> ;
    oa:motivatedBy dqv:qualityAssessment 
.

5.6 Express a question about dataset quality

Let us ask a question about the completeness of the "City of Raleigh Open Government Data" dataset.

<https://certificates.theodi.org/en/datasets/393> a dcat:Dataset ;

        dqv:hasQualityAnnotation :questionQA .

:questionQA  
    a dqv:UserQualityFeedback ;
    oa:hasTarget <https://certificates.theodi.org/en/datasets/393> ;
    oa:hasBody  :textBody ;
    oa:motivatedBy dqv:qualityAssessment, oa:questioning ;
    dqv:inDimension :completeness
    .

:textBody a cnt:ContentAsText, dctypes:Text ;
    cnt:chars "Could you please provide information about the completeness of your
    dataset?" ;
    dc:language "en" ; 
    dc:format "text/plain" 
.

5.7 Express that a dataset fits in a quality classification

Let us express that the "City of Raleigh Open Government Data" dataset is classified as a four stars dataset against the 5 Stars linked open data rating system.

<https://certificates.theodi.org/en/datasets/393> a dcat:Dataset ;

        dqv:hasQualityAnnotation :classificationQA .

:classificationQA  
    a dqv:UserQualityFeedback ;
    oa:hasTarget <https://certificates.theodi.org/en/datasets/393> ;
    oa:hasBody  :four_stars ; 
    oa:motivatedBy dqv:qualityAssessment, oa:classifying ;
    dqv:inDimension :availability
     .

:four_stars
   a skos:Concept;
   skos:inScheme :OpenData5Star ;
   skos:prefLabel "Four stars"@en ;
   skos:definition "Dataset available on the web with structured machine-readable non
   proprietary format. It uses URIs to denote things."@en 
   .

5.8 Express derivation between quality metrics, measurements and annotations

DQV models derivation with the property prov:wasDerivedFrom . For example, the accessability of the dataset :myDataset can be derived from the accessability of its distributions :myCSVDatasetDistribution and :mySPARQLDatasetDistribution.

:myDataset 

    a  dcat:Dataset ;
    dcterms:title "My dataset" ; 
    dcat:distribution :myDatasetDistribution
    .


:myCSVDatasetDistribution

    a  dcat:Distribution ;
    dcat:downloadURL <http://www.example.org/files/mydataset.csv> ;
    dcterms:title "CSV distribution of dataset" ;
    dcat:mediaType "text/csv" ;
    dcat:byteSize "87120"^^xsd:decimal 
    .


:mySPARQLDatasetDistribution

    a  dcat:Distribution ;
    dcat:accessURL <http://www.example.org/sparql>
    dcterms:title "SPARQL access to the dataset" ;
    dcat:mediaType "sparql-results+json"  
    .


#definition of dimensions and metrics
:availability

    a dqv:Dimension ;
    skos:prefLabel "Availability"@en ;
    skos:definition "Availability of a dataset is the extent to which data (or some
    portion of it) is present, obtainable and ready for use."@en ; 
    dqv:inCategory :accessibility
    .


:downloadURLAvailabilityMetric 

    a dqv:Metric ;
    skos:definition "Checks if dcat:downloadURL is available and if its value
    is dereferenceable."@en ;
    dqv:expectedDataType xsd:boolean ;
    dqv:inDimension :availability
    .


:SPARQLAvailabilityMetric 

    a dqv:Metric ;
    skos:definition "Checks if an URL specified in dcat:accessURL is available
    and if at that URL a SPARQL endpoint is active."@en ;
    dqv:expectedDataType xsd:boolean ;
    dqv:inDimension :availability
    .


 :datasetAvailabilityMetric 

    a dqv:Metric ;
    prov:wasDerivedFrom :downloadURLAvailabilityMetric, :SPARQLAvailabilityMetric;
    skos:definition "Checks the availabitity of the specified
    distributions."@en ;
    dqv:expectedDataType xsd:boolean ;
    dqv:inDimension :availability
    .

Depending on the specific application context, the expression of this derivation can be kept at level of the quality measurements. In the following the measurement :measurement3 of :myDataset 's availability is derived from :measurement1 and :measurement2.

:myCSVDatasetDistribution dqv:hasQualityMeasurement :measurement1 .


:mySPARQLDatasetDistribution dqv:hasQualityMeasurement :measurement2 .


:myDataset dqv:hasQualityMeasurement :measurement3 .


:measurement1 

    a dqv:QualityMeasurement ;
    dqv:computedOn :myCSVDatasetDistribution ;
    dqv:isMeasurementOf :downloadURLAvailabilityMetric ;
    dqv:value "true"^^xsd:boolean 

:measurement2

    a dqv:QualityMeasurement ;
    dqv:computedOn :mySPARQLDatasetDistribution ;
    dqv:isMeasurementOf :SPARQLAvailabilityMetric ;
    dqv:value "false"^^xsd:boolean
    .       


:measurement3 

    a dqv:QualityMeasurement ;
    dqv:computedOn :myDataset ;
    dqv:isMeasurementOf :datasetAvailabilityMetric ;
    prov:wasDerivedFrom measurement2, measurement3 ;
    dqv:value "false"^^xsd:boolean
    .

The classification of mydataset as :three_star can be derived from the result of a quality measurement :measurement2

:myDataset

    dqv:hasQualityAnnotation  :myDatasetClassification .

:myDatasetClassification  

    a dqv:UserQualityFeedback ;
    prov:wasDerivedFrom  :measurement2 ;
    oa:hasTarget :myDataset ;
    oa:hasBody :three_stars ; 
    oa:motivatedBy dqv:qualityAssessment, oa:classifying ;
    dqv:inDimension :availability
    .


:three_stars

   a skos:Concept;
   skos:inScheme :OpenData5Star ;
   skos:prefLabel "three stars"@en ;
   skos:definition "Dataset available on the web with structured machine-readable
   non proprietary format."@en 
   .

:myDatasetQA 
	a dqv:QualityCertificate ;
	oa:hasTarget <https://certificates.theodi.org/en/datasets/393> ;
	oa:hasBody  <https://certificates.theodi.org/en/datasets/393/certificate> ;
        oa:motivatedBy dqv:qualityAssessment .

5.6 5.9 Express quality of SKOS concept schemes

Let’s consider myControlledVocabulary , , a controlled vocabulary made available on the Web using the SKOS [ SKOS-reference ] and DCAT [ vocab-dcat ].

:myControlledVocabulary a dcat:Dataset ; dcterms:title "My controlled vocabulary" . :myControlledVocabularyDistribution a dcat:Distribution ; dcat:downloadURL <http://www.example.org/files/myControlledVocabulary.csv> ; dcterms:title "SKOS/RDF distribution of my controlled vocabulary"" ; dcat:mediaType "text/turtle" ; dcat:byteSize "190120"^^xsd:decimal .
:myControlledVocabulary 
    a dcat:Dataset ;
    dcterms:title "My controlled vocabulary" 
    .


:myControlledVocabularyDistribution

    a  dcat:Distribution ;
    dcat:downloadURL <http://www.example.org/files/myControlledVocabulary.csv> ;
    dcterms:title "SKOS/RDF distribution of my controlled vocabulary" ;
    dcat:mediaType "text/turtle" ;
    dcat:byteSize "190120"^^xsd:decimal 
.

qSKOS is an open source tool, which detects quality issues affecting SKOS vocabularies [ qSKOS ]. It considers 26 quality issues including, for example, “Incomplete Language Coverage” and “Label Conflicts” which are grouped in the category “Labeling and Documentation issues”. Quality issues addressed by qSKOS can be considered as DQV quality dimensions, whilst the number of concepts in which a quality issue occurs can be the metric deployed for each quality dimension.

# definition of instances for some of the metrics, dimensions and categories deployed in qSKOS. :numOfConceptsWithLabelConflicts a dqv:Metric; rdfs:label "Conflicting concepts"@en ; rdfs:comment "Number of concepts having conflicting labels"@en ; dqv:hasDimension :LabelConflicts . :numOfConceptsWithIncompleteLanguageCoverage a dqv:Metric; rdfs:label "Language incomplete concepts"@en ; rdfs:comment "Number of concepts having an incomplete language coverage"@en ; dqv:hasDimension :incompleteLanguageCoverage . :LabelConflicts a dqv:Dimension; rdfs:label "Label Conflicts"@en ; rdfs:comment "Dimension corresponding to the label conflicts quality issue"@en ; dqv:hasCategory :labelingDocumentationIssues . :incompleteLanguageCoverage a dqv:Dimension; rdfs:label "Incomplete Language Coverage"@en ; rdfs:comment "Dimension corresponding to the incomplete language coverage issue"@en ; dqv:hasCategory :labelingDocumentationIssues .
# definition of instances for some of the  metrics, dimensions and categories deployed
# in qSKOS. 

:numOfConceptsWithLabelConflicts 

    a dqv:Metric;
    skos:prefLabel "Conflicting concepts"@en ;
    skos:definition "Number of concepts having conflicting labels"@en ;
    dqv:expectedDataType xsd:interger ;
    dqv:inDimension  :LabelConflicts 
    .


:numOfConceptsWithIncompleteLanguageCoverage

    a dqv:Metric;
    skos:prefLabel "Language incomplete concepts"@en ;
    skos:definition "Number of concepts having an incomplete language coverage"@en ;
    dqv:expectedDataType xsd:interger ;
    dqv:inDimension  :incompleteLanguageCoverage .

:LabelConflicts

    a  dqv:Dimension;
    skos:prefLabel "Label Conflicts"@en ;
    skos:definition "Dimension corresponding to the label conflicts quality issue"@en ;
    dqv:inCategory :labelingDocumentationIssues .

:incompleteLanguageCoverage

    a  dqv:Dimension;
    skos:prefLabel "Incomplete Language Coverage"@en ;
    skos:definition "Dimension corresponding to the incomplete language coverage 
    issue"@en ;
    dqv:inCategory :labelingDocumentationIssues .

:labelingDocumentationIssues

    a  dqv:Category ;
    skos:prefLabel "Labeling and Documentation Issues"@en ;
    skos:definition "Category grouping labeling and documentation issues"@en 
    . 

:labelingDocumentationIssues
        a  dqv:Category ;
	rdfs:label "Labeling and Documentation Issues"@en ;
	rdfs:comment "Category grouping labeling and documentation issues"@en ;
.

DQV represents the qSKOS quality assessment on myControlledVocabulary for the dimensions “Incomplete Language Coverage” and “Label Conflicts”.

:myDatasetDistribution dqv:hasQualityMeasure :measure1, :measure2 .
:myDatasetDistribution
    dqv:hasQualityMeasurement :measurement1, :measurement2
    .


:measurement1 

    a dqv:QualityMeasurement ;
    dqv:computedOn :myControlledVocabulary ;
    dqv:isMeasurementOf :numOfConceptsWithMissingValues ;
    dqv:value "1500"^^xsd:integer  
    .


:measurement2

    a dqv:QualityMeasurement ;
    dqv:computedOn :myControlledVocabulary ;
    dqv:isMeasurementOf :numOfConceptsWithIncompleteLanguageCoverage ;
    dqv:value "450"^^xsd:integer 
    .

:measure1 
	a dqv:QualityMeasure ;
	dqv:computedOn :myControlledVocabulary ;
	dqv:hasMetric :numOfConceptsWithMissingValues ;
	dqv:value "1500"^^xsd:integer  
	.
     	
:measure2
	a dqv:QualityMeasure ;
	dqv:computedOn :numOfConceptsWithIncompleteLanguageCoverage ;
	dqv:hasMetric :csvConsistencyMetric ;
	dqv:value "450"^^xsd:integer 
	.

5.7 5.10 Express the quality of a linkset

(VoID) linksets are collections of (RDF) links between two datasets. Linksets are as important as datasets when it comes to the joint exploitation of independently served datasets in linked data. The representation of quality for a linkset offers a further example of how DQV can be exploited.

Let’s define three DCAT datasets, including one VoID linkset, which connects the two others:

:myDataset1 a dcat:Dataset ; dcterms:title "My dataset 1" . :myDataset2 a dcat:Dataset ; dcterms:title "My dataset 2" . :myLinkset a dcat:Dataset, void:Linkset ; dcterms:title "A Linkset between My dataset 1 and My dataset 2"; void:linkPredicate skos:exactMatch ; void:target :myDataset1 ; void:target :myDataset2 .
:myDataset1 
    a dcat:Dataset ;
    dcterms:title "My dataset 1" 

    .


:myDataset2 

    a dcat:Dataset ;
    dcterms:title "My dataset 2"  

    .


:myLinkset 

    a dcat:Dataset, void:Linkset ;
    dcterms:title "A Linkset between My dataset 1 and My dataset 2"; 

    void:linkPredicate skos:exactMatch ;
    void:target :myDataset1 ;
    void:target :myDataset2  
    .

We can represent information about the quality of :myLinkset using the “Multilingual importing” [ MultilingualImporting ] linkset quality metric. This metrics works on linksets between datasets that include SKOS concepts [ SKOS-reference ]. It quantifies the information gain when adding the preferred labels or the alternative labels of the concepts from a linked dataset to the descriptions of the concepts from the other dataset, which these concepts have been matched with a skos:exactMatch statement from the linkset. We must first define the proper metric, dimension and category.

# Definition of instances for Metric, Dimension and Category. :importingForPropertyPercentage a dqv:Metric; dqv:hasDimension :completeness. :completeness a dqv:Dimension; dqv:hasCategory :complementationGain .
# Definition of instances for Metric, Dimension and Category. 

:importingForPropertyPercentage 

    a dqv:Metric ;
    skos:definition "Ratio between novel preferred or alternative labels 
    gained via skos:exactMatch links and preferred or alternative labels 
    already in the dataset."@en
    dqv:expectedDataType xsd:double ;
    dqv:inDimension  :completeness .

:completenessGain

    a  dqv:Dimension ;
    skos:prefLabel "Completeness Gain"@en ;
    skos:definition "Degree to which a linkset 
    contributes to obtaining all required information in a particular dataset."@en ;
    dqv:inCategory :complementationGain 
    .


:complementationGain

    a  dqv:Category ;
    skos:definition "Category that groups dimensions measuring the data quality gain 
    obtained by exploiting linksets."@en

:complementationGain
        a  dqv:Category .

.

The quality assessment of the "label importing" can be made dependent depend on two extra parameters: property onProperty and language, onLanguage , respectively the SKOS property and the language tag. tag considered for measuring the completeness gains. We extend DQV to represent these parameters.

Issue 10
:onLanguage
    a qb:DimensionProperty, owl:DataProperty ;
    rdfs:comment   "language on which label importing is assessed."@en ;
    rdfs:domain    dqv:QualityMeasurement;
    rdfs:label     "label import assessment language"@en 
    .


:onProperty

    a qb:DimensionProperty, rdf:Property ;
    rdfs:comment 	"property on which label importing is assessed."@en ;
    rdfs:domain 	dqv:QualityMeasurement ;
    rdfs:label  	"label import assessment property"@en ;
    rdfs:range     	rdf:Property 
    .

We
need
to
further
evaluate
the
way
we
add
extra
parameters
for
the
metric
and
extend
the
DAQ
RDF-CUBE
data
structure
(postponed
issue)
:language
        a qb:DimensionProperty, owl:DataProperty ;
        rdfs:comment   "language on which label importing is assessed."@en ;
        rdfs:domain    dqv:QualityMeasure;
        rdfs:label     "label import assessment language"@en .
:property
        a qb:DimensionProperty, rdf:Property ;
        rdfs:comment 	"property which label importing is assessed."@en ;
        rdfs:domain 	dqv:QualityMeasure ;
        rdfs:label  	"label import assessment property"@en ;
        rdfs:range     	rdf:Property .

Let us add actual quality assessments:

:qualityMeasureDataset a dqv:QualityMeasureDataset ; qb:structure :dsd . :importingForPropertyPercentage # should dqv:hasObservation be added as inverse of dqv:hasMetric? dqv:hasObservation :exactMatchaltLabelit1 , :exactMatchaltLabelit2 , :exactMatchaltLabelen1 , :exactMatchaltLabelen2, :exactMatchprefLabelit1, :exactMatchprefLabelit2 . #Adding quality observations ## for Italian alternative labels :measure_exactMatchAltLabelItDataset1 a dqv:QualityMeasure; dqv:computedOn :myLinkset ; dqv:value "1.0"^^xsd:double ; dqv:hasMetric :importingForPropertyPercentage ; qb:dataSet :qualityMeasureDataset; :language "it" ; :property skos:altLabel . :measure_exactMatchAltLabelItDataset2 a dqv:QualityMeasure; dqv:computedOn :myLinkset ; dqv:value "1.0"^^xsd:double ; dqv:hasMetric :importingForPropertyPercentage ; qb:dataSet :qualityMeasureDataset; :language "it" ; :property skos:altLabel . ## for English alternative labels :measure_exactMatchAltLabelEnDataset1 a dqv:QualityMeasure; dqv:computedOn :myLinkset ; dqv:value "0.1"^^xsd:double ; dqv:hasMetric :importingForPropertyPercentage ; qb:dataSet :qualityMeasureDataset; :language "en" ; :property skos:altLabel . :measure_exactMatchAltLabelEnDataset2 a dqv:QualityMeasure; dqv:computedOn :myLinkset ; dqv:value "1.0"^^xsd:double ; dqv:hasMetric :importingForPropertyPercentage ; qb:dataSet :qualityMeasureDataset; :language "en" ; :property skos:altLabel . ## for Italian preferred labels :measure_exactMatchPrefLabelItDataset1 a dqv:QualityMeasure; dqv:computedOn :myLinkset ; dqv:value "0.5"^^xsd:double ; dqv:hasMetric :importingForPropertyPercentage ; qb:dataSet :qualityMeasureDataset; :language "it" ; :property skos:prefLabel . :exactMatchprefLabelit2 a dqv:QualityMeasure; dqv:computedOn :myLinkset ; dqv:value "0.5"^^xsd:double ; dqv:hasMetric :importingForPropertyPercentage ; qb:dataSet :qualityMeasureDataset; :language "it" ; :property skos:prefLabel .
:qualityMeasurementDataset  a  dqv:QualityMeasurementDataset ;
        qb:structure  :dsd .

:importingForPropertyPercentage

    dqv:hasObservation :measurement_exactMatchAltLabelItDataset1,
    :measurement_exactMatchAltLabelItDataset2, 
    :measurement_exactMatchAltLabelEnDataset1, 
    :measurement_exactMatchAltLabelEnDataset2,
    :measurement_exactMatchPrefLabelItDataset1, 
    :measurement_exactMatchprefLabelItDataset2 .



#Adding quality observations 
## for Italian alternative labels
:measurement_exactMatchAltLabelItDataset1 

       a 		dqv:QualityMeasurement;
       dqv:computedOn  	:myLinkset ;
       dqv:value       	"1.0"^^xsd:double ;
       dqv:isMeasurementOf      	:importingForPropertyPercentage ;
       qb:dataSet      	:qualityMeasurementDataset;
       :onLanguage    	"it" ;
       :onProperty    	skos:altLabel .


:measurement_exactMatchAltLabelItDataset2 

       a 		dqv:QualityMeasurement;
       dqv:computedOn  	:myLinkset ;
       dqv:value       	"1.0"^^xsd:double ;
       dqv:isMeasurementOf      	:importingForPropertyPercentage ;
       qb:dataSet      	:qualityMeasurementDataset;
       :onLanguage    	"it" ;
       :onProperty    	skos:altLabel .


## for English alternative labels
:measurement_exactMatchAltLabelEnDataset1 

       a 		dqv:QualityMeasurement;
       dqv:computedOn  	:myLinkset ;
       dqv:value       	"0.1"^^xsd:double ;
       dqv:isMeasurementOf      	:importingForPropertyPercentage ;
       qb:dataSet      	:qualityMeasurementDataset;
       :onLanguage    	"en" ;
       :onProperty    	skos:altLabel .


:measurement_exactMatchAltLabelEnDataset2  

       a 		dqv:QualityMeasurement;
       dqv:computedOn  	:myLinkset ;
       dqv:value       	"1.0"^^xsd:double ;
       dqv:isMeasurementOf      	:importingForPropertyPercentage ;
       qb:dataSet      	:qualityMeasurementDataset;
       :onLanguage    	"en" ;
       :onProperty    	skos:altLabel .      


## for Italian preferred labels
:measurement_exactMatchPrefLabelItDataset1 

       a 		dqv:QualityMeasurement;
       dqv:computedOn  	:myLinkset ;
       dqv:value       	"0.5"^^xsd:double ;
       dqv:isMeasurementOf      	:importingForPropertyPercentage ;
       qb:dataSet      	:qualityMeasurementDataset;
       :onLanguage    	"it" ;
       :onProperty    	skos:prefLabel .


:measurement_exactMatchprefLabelItDataset2  

       a 		dqv:QualityMeasurement;
       dqv:computedOn  	:myLinkset ;
       dqv:value       	"0.5"^^xsd:double ;
       dqv:isMeasurementOf      	:importingForPropertyPercentage ;
       qb:dataSet      	:qualityMeasurementDataset;
       :onLanguage    	"it" ;
       :onProperty    	skos:prefLabel .



Let us specify the RDF Data Cube data structure: :dsd a qb:DataStructureDefinition ; ##Copying the structure of daq:dsq qb:component [ qb:dimension dqv:computedOn ; qb:order 2 ] ; qb:component [ qb:measure dqv:value] ; qb:component [ qb:dimension <http://purl.org/linked-data/sdmx/2009/dimension#timePeriod> ; qb:order 3 ] ; qb:component [ qb:dimension dqv:hasMetric ; qb:order 1 ] ; qb:component [ qb:measure dqv:value;]; # Attribute (here: unit of measurement) qb:component [ qb:attribute sdmx-attribute:unitMeasure ; qb:componentRequired false ; qb:componentAttachment qb:DataSet ; ] ; ##Extending the structure of lds:dsq with two new dimensions qb:component [ qb:dimension :property ; qb:order 4 ] ; qb:component [ qb:dimension :language ; qb:order 5 ] . structure which includes the two extra parameters:
:dsd  a     qb:DataStructureDefinition ;
##Copying the structure of daq:dsq

    qb:component  [ qb:dimension  dqv:computedOn ;
                    qb:order      2
                  ] ;

    qb:component  [ qb:measure  dqv:value] ;
    qb:component  [ qb:dimension  
        <http://purl.org/linked-data/sdmx/2009/dimension#timePeriod> ;

                    qb:order      3
                  ] ;

    qb:component  [ qb:dimension  dqv:isMeasurementOf ;
                    qb:order      1
                  ] ;


    qb:component [ qb:measure dqv:value;];
    # Attribute (here: unit of measurement)

    qb:component [
        qb:attribute sdmx-attribute:unitMeasure
    ;

    qb:componentRequired false ;
    qb:componentAttachment qb:DataSet ; ] ;


    ##Extending  the structure of lds:dsq with two new dimensions

    qb:component  [ qb:dimension  :onProperty ;
                    qb:order      4
                  ] ;

    qb:component  [ qb:dimension  :onLanguage ;
                    qb:order      5
                  ] .

5.8 5.11 Express the conformance of a dataset's metadata with a standard

It is often desirable to indicate that metadata about datasets in a catalogue are compliant with a metadata standard, or an application profile of an existing metadata standard. A typical example is the GeoDCAT Application Profile [ GeoDCAT-AP ], an extension of the DCAT vocabulary [ vocab-dcat ] to represent metadata for geospatial data portals. GeoDCAT-AP enables to express that a dataset's metadata conforms to an existing standard, following the recommendations of ISO 19115, ISO 19157 and the EU INSPIRE directive. DCAT partly supports the expression of such metadata conformance statements. The following example illustrates how a (DCAT) catalog record can be said to be conformant with the GeoDCAT-AP standard itself.

ex:myDataset a dcat:Dataset; ex:myDatasetRecord a dcat:CatalogRecord ; foaf:primaryTopic :myDataset ; dcterms:conformsTo :geoDCAT-AP .
:myDataset a dcat:Dataset .

:myDatasetRecord a dcat:CatalogRecord ;

 foaf:primaryTopic :myDataset ;
 dcterms:conformsTo :geoDCAT-AP .

:geoDCAT-AP a dcterms:Standard;

  dcterms:title "GeoDCAT Application Profile. Version 1.0" ;
  dcterms:comment "GeoDCAT-AP is developed in the context of the Interoperability
  Solutions for European Public Administrations (ISA) Programme"@en;
  dcterms:issued "2015-12-23"^^xsd:date ;
  foaf:page 
  <https://joinup.ec.europa.eu/asset/dcat_application_profile/asset_release/geodcat-ap-v10>
  . 

ex:geoDCAT-AP a dcterms:Standard;
  dcterms:title "GeoDCAT Application Profile" ;
  dcterms:comment "GeoDCAT-AP is developed in the context of the Interoperability Solutions for European Public Administrations (ISA) Programme"@en;
  dcterms:issued "201X-XX-XX"^^xsd:date .

Note that this example does not include the metadata about the dataset ex:myDataset itself. We assume this is present in an RDF data source accessible via the URI ex:myDatasetRecord . . We also assume that ex:geoDCAT-AP is a reference URI that denotes the GeoDCAT-AP standard, which can be re-used across many catalog record descriptions, not just a locally introduced URI.

Issue 11 Note

Relation between DQV, ISO 19115/19157 Finer-grained representation of conformance statements can be found in the literature, and GeoDCAT-AP: DQV is already able to express applications with more complex requirements may implement them, including for example the notion requirement of "conformance" to representing 'non-conformance' tested by specific procedures. The GeoDCAT Application Profile, for example, suggests a standard using "provisional mapping" for extended profiles, which re-uses the property dcterms:conformsTo. However, there were suggestion PROV data model for provenance (see Annex II.14 at [ GeoDCAT-AP ]). Such patterns come however at the cost of having to publish and exchange representations that are much more elaborate. They will also have to be further compatible aligned with ISO 19157:2013 and INSPIRE by adding respectively "Not conformant" the result of another ongoing efforts on data validation and "Not evaluated" the reporting thereof, as possible properties or values. Should DQV be this expressive? ( Issue-202 ) currently discussed around SHACL, for example. We have thus decided to postpone addressing these requirements for now.

6. 5.12 Dimensions and metrics hints Express the conformance of a dataset with a quality policy

DQV introduces the class dqv:QualityPolicy to express that a Dataset or Distribution follows a policy or agreement that is chiefly defined by data quality concerns. DQV does not provide a complete framework for expressing policies. The class dqv:QualityPolicy is rather meant as an anchor point, through which DQV implementers can relate properties and classes of policy-dedicated vocabularies, such as ODRL [ ODRL ], to the core elements that define quality of datasets and distributions.

The example below specifies that a data provider grants the permission to access a dataset and commits to serve the data with a certain quality, more concretely, 99% availability of a SPARQL endpoint (distribution) associated with the dataset. This section is non-normative. expressed in ODRL as an offer with a duty on the service provider that states a constraint defined using a DQV metric ( sparqlEndpointUptime ), for which measurements have to be greater than a certain percentage (99). The odrl:assigner is the issuer of the policy statement; it is also the assignee of the duty to deliver the distribution as the policy requires it. There is no explicitly mentioned recipient for the policy itself, since this examples is about a generic data access scenario. Note that instances of dqv:QualityPolicy could be instances of the class odrl:Agreement , in which case an odrl:assignee is likely to appear for the policy.

:serviceProvider a odrl:Party .
:myDataset a  dcat:Dataset ;
  dcat:distribution :myDatasetSparqlDistribution ;
:myDatasetSparqlDistribution a dcat:Distribution .


:policy1 a odrl:Offer, dqv:QualityPolicy ;

  odrl:permission [
    a odrl:Permission ;
    odrl:target :myDataset ;
    odrl:action odrl:read ;
    odrl:assigner :serviceProvider;    
    odrl:duty [
      a odrl:Duty; 
      odrl:assignee :serviceProvider;
      odrl:target :myDatasetSparqlDistribution ;
      odrl:constraint [
        a odrl:Constraint ;
        prov:wasDerivedFrom :sparqlEndpointUptime;
        odrl:percentage  "99"^^xsd:double ;
        odrl:operator odrl:gteq
      ]     
    ]
  ]
  .
Note

This section will be refined as soon as Issue-204 and Issue-205 are solved. In particular, following the discussion The expression of constraints in ODRL seems quite unfit with expressing general constraints on Issue-200 , values in RDF graphs, as we plan would require here. However, ODRL can be easily extended, and is schedule to align undergo refinement in the DQV dimension classification with context of the ISO 25012 W3C Permissions & Obligations Expression Working Group . In the future implementers should investigate whether a general constraint expression language like the coming SHACL [ ISOIEC25012 SHACL ] provides a more appropriate mechanism to be used on top of ODRL permissions and duties.

5.13 Express dataset precision and accuracy

The need for documenting data precision (also sometimes refered to provide the classification proposed as "resolution") is a common requirement, in Zaveri Et Al. [ particular, when dealing with spatial data. The following example shows how DQV can meet this requirement.

:myDataset a dcat:Dataset ;

    dqv:hasQualityMeasurement :myDatasetPrecision, :myDatasetAccuracy .

:myDatasetPrecision a dqv:QualityMeasurement ;

   dqv:isMeasurementOf :spatialResolutionAsDistance ;
   dqv:value "1000"^^xsd:decimal ;
   sdmx-attribute:unitMeasure  <http://www.wurvoc.org/vocabularies/om-1.8/metre> 
   . 

   
:spatialResolutionAsDistance  a  dqv:Metric;

    skos:definition "Spatial resolution of a dataset expressed as distance"@en ;
    dqv:expectedDataType xsd:decimal ;
    dqv:inDimension dqv:precision 

ZaveriEtAl

.

]
as

Precision can be alternatively expressed without unit of measure specifying spatial resolution by means of an "equivalent scale" with a further example. Suggestions on possible mappings between ISO 25012 and Zaveri et al.'s dimensions fraction (e.g., 1:1,000, 1:1,000,000)

:myDataset a dcat:Dataset;

    dqv:hasQualityMeasurement :myDatasetPrecisionES .

:spatialResolutionAsEquivalentScale a dqv:Metric;

    skos:definition "Spatial resolution of a dataset expressed as equivalent scale,
		    by using a representative fraction (e.g., 1:1,000, 1:1,000,000)."@en ;
    dqv:expectedDataType xsd:decimal ;
    dqv:inDimension dqv:precision 
    .

    
:myDatasetPrecisionES a dqv:QualityMeasurement ;

    dqv:isMeasurementOf :spatialResolutionAsEquivalentScale ;
    dqv:value "0.000001"^^xsd:decimal
.

or specifying the angular distance.

:myDataset a dcat:Dataset;

    dqv:hasQualityMeasurement :myDatasetPrecisionAS .

:spatialResolutionAsAngularDistance a dqv:Metric;

     skos:definition "Spatial resolution of a dataset expressed as angular distance"@en ;
     dqv:expectedDataType xsd:decimal ;
     dqv:inDimension dqv:precision 
     .

    
:myDatasetPrecisionAS a dqv:QualityMeasurement ;

     dqv:isMeasurementOf :spatialResolutionAsAngularDistance ;
     dqv:value "3.5"^^xsd:decimal ;
sdmx

-

attribute
:
unitMeasure

<

http
:
//www.wurvoc.org/vocabularies/om-1.8/degree>
.

Note that the precision (or resolution) of a dataset is not equivalent to its accuracy. High precision values are not necessarily accurate. High precision values can even be pointless, as well when one asserts that Magna Carta was signed at 1215-06-15T00:00:00 . Accuracy is nonetheless an important dimension of data quality. Data accuracy metrics and measurements can be represented with DQV, as any other well-known classification are welcome. in the following example:

:myDatasetAccuracy a dqv:QualityMeasurement ;
   dqv:isMeasurementOf :spatialAccuracy ;
   dqv:value "98.2"^^xsd:decimal 
   sdmx-attribute:unitMeasure <http://www.wurvoc.org/vocabularies/om-1.8/Percentage>
   .

   
:spatialAccuracy   a  dqv:Metric;

    skos:definition "Percentage of spatial elements that are found accurate
    according to  methodology XYZ"@en ;
    dqv:expectedDataType xsd:decimal ;
    dqv:inDimension ldqd:semanticAccuracy 

.

6. Dimensions and metrics hints

This section gathers relevant quality dimensions and ideas for corresponding metrics, which might be eventually represented as instances of daq:Dimension dqv:Category , dqv:Dimension and daq:Metric dqv:Metric . The goal of the Data Quality Vocabulary is not to define a normative list of dimensions and metrics, rather, metrics. There are already several reference classifications available, which are the result of a lot of community work. Unifying them here seems both hard and not desirable, as fundamental approaches to quality vary between domains or even applications. This section provides instead a set of examples examples, starting from use cases included in the Use Cases & Requirements document and from . In particular, we offer the following sources: http://lists.w3.org/Archives/Public/public-dwbp-wg/2015Apr/0023.html quality dimension proposed in ISO 25012 [ http://www.slideshare.net/OpenDataSupport/open-data-quality-29248578 ISOIEC25012 ] and Zaveri et al. [ https://www.w3.org/2013/dwbp/wiki/Quality_and_Granularity_Description_Vocabulary ZaveriEtAl Issue 12 Are ] as two starting points. Ultimately, implementers will need to choose themselves the levels of granularity approach that fits best their needs. They can extend on these starting points, creating their own refinements of dqv:Dimension categories and dqv:Category well-defined enough dimensions, and of course their own metrics. They can mix existing approaches — we show that the proposals from ISO and Zaveri et al. are not completely disjoint. Implementers can also adopt completely different classifications, if existing ones do not fit for purpose? ( Issue-225 ) their specific application scenarios. They should however be aware that relying on existing classifications and metrics increases interoperability, i.e., the chance that human and machine agents can properly understand and exploit their quality assessments.

6.1 Statistics

The following table gives example on statistics that can be computed on a dataset and interpreted as quality indicators by the data consumer. Some of them can be relevant for the dimensions listed in the rest of this section. The properties come from the VoID extension created for the Aether tool .

Observation Suggested term
Number of distinct external resources linked to http://ldf.fi/void-ext#distinctIRIReferenceObjects
Number of distinct external resources used (including schema terms) http://ldf.fi/void-ext#distinctIRIReferences
Number of distinct literals http://ldf.fi/void-ext#distinctLiterals
Number of languages used http://ldf.fi/void-ext#languages
Note

The Aether VoID extension represents statistics as direct statements that have a dataset as subject and an integer as object. This pattern, which can be expected to be rather common, is different from the pattern that DQV inherits from DAQ. daQ. Guidance on how DQV/daQ can work with other quality statistics vocabulary will be provided.

6.2 Availability Quality dimensions defined in ISO/IEC 25012

Can the data ISO/IEC 25012 provides an example of quality dimensions grouped in three categories that can be accessed now adopted to document the quality of datasets. These quality dimensions and over time? categories are listed in the table below.

Yes/no, maybe with explanation why
Category Dimension Definition
Inherent Data Quality Accuracy The degree to which data has attributes that correctly represent the true value of the intended attribute of a concept or event in a specific context of use.
Completeness The degree to which subject data is not available (privacy, security, archived, lost, not yet captured etc.) Open/restricted/registration, again possibly associated with explanation For access/re-use Indication an entity has values for all expected attributes and related entity instances in a specific context of persistence use.
Consistency The degree to which data has attributes that are free from contradiction and longevity Since are coherent with other data in a dcat:Dataset is an abstract thing, it might specific context of use. It can be available at any point in time, past present either or future. We already have dcterms:issued so two properties come both among data regarding one entity and across similar data for comparable entities.
Credibility The degree to mind: dcat:verifiedAvailableOn {date} which data has attributes that are regarded as true and believable by users in a specific context of use. Credibility includes the concept of authenticity (the last time someone/something checked truthfulness of origins, attributions, commitments).
Currentness The degree to which data has attributes that are of the dataset was accessible, probably applies right age in a specific context of use.
Inherent and System-Dependent Data Quality Accessibility The degree to which data can be accessed in a dcat:Distribution, not dcat:Dataset) dcat:availableUntilAtLeast {date} (Potentially specific context of use, particularly by people who need supporting technology or special configuration because of some disability.
Compliance The degree to which data has attributes that adhere to standards, conventions or regulations in force and similar rules relating to data quality in a specific context of use.
Confidentiality The degree to which data on has attributes that ensure that it is only accessible and interpretable by authorized users in a specific context of use. Confidentiality is an aspect of information security (together with availability, integrity) as defined in ISO/IEC 13335-1:2004.
Efficiency The degree to which data has attributes that can be processed and provide the dataset is expected levels of performance by using the appropriate amounts and types of resources in a specific context of use.
Precision The degree to which data has attributes that are exact or that provide discrimination in a specific context of use.
Traceability The degree to which data has attributes that provide an audit trail of access to the data and of any changes made to the data in a specific context of use.
Understandability The degree to which data has attributes that enable it to be withdrawn) Other questions read and interpreted by users, and are expressed in appropriate languages, symbols and units in a specific context of use. Some information about data understandability are provided by metadata.
System-Dependent Data Quality Availability The degree to which data has attributes that come enable it to mind: how do we indicate be retrieved by authorized users and/or applications in a specific context of use.
Portability The degree to which data has attributes that enable it to be installed, replaced or moved from one system to another preserving the dataset is expected existing quality in a specific context of use.
Recoverability The degree to which data has attributes that enable it to maintain and preserve a specified level of operations and quality, even in the event of failure, in a specific context of use.

DQV can express the dimensions and categories listed in the table above. The following example includes only an exemplification of the ISO dimensions and categories which should be available 'for authoritatively provided by ISO. Semantic relation defined in SKOS can be exploited to related categories and dimensions, for example, in the foreseeable future?' following, skos:broader has been exploited to define iso:inherentSystemDependentDataQuality as a specialization of iso:inherentDataQuality and iso:systemDependentDataQuality .

# definition of ISO categories

iso:inherentDataQuality a dqv:Category ; 
   skos:prefLabel "Inherent Data Quality"@en. 

iso:systemDependentDataQuality a dqv:Category ; 
   skos:prefLabel "System-Dependent Data Quality"@en. 
   
iso:inherentSystemDependentDataQuality a dqv:Category ; 
           skos:prefLabel "Inherent and System-Dependent Data Quality"@en.
           skos:broader iso:inherentDataQuality, iso:systemDependentDataQuality .


# definition of ISO dimensions


iso:accuracy a dqv:Dimension ; 
    dqv:inCategory iso:inherentDataQuality ;
    skos:prefLabel "Accuracy"@en;
    skos:definition "The degree to which data has attributes that correctly represent
    the true value of the intended attribute of a concept or event in a specific context 
    of use."@en
    .


iso:completeness a dqv:Dimension ; 
    dqv:inCategory iso:inherentDataQuality ;
    skos:prefLabel "Completeness"@en;
    skos:definition "The degree to which subject data associated with an entity has
    values for all expected attributes and related entity instances in a specific context 
    of use."@en
    .


iso:consistency a dqv:Dimension ; 
    dqv:inCategory iso:inherentDataQuality ;
    skos:prefLabel "Consistency"@en;
    skos:definition "The degree to which data has attributes that are free from
    contradiction and are coherent with other data in a specific context of use.
    It can be either or both among data regarding one entity and across similar 
    data for comparable entities."@en
    .


# ...  ...


iso:accessibility a  dqv:Dimension ; 
    dqv:inCategory iso:inherentSystemDependentDataQuality ;
    skos:prefLabel "Accessibility"@en;
    skos:definition "The degree to which data can be accessed in a specific context of
    use, particularly by people who need supporting technology or special configuration
    because of some disability."@en
    .


# ... etc ...

6.3 Processability Quality dimensions defined for linked data

Is the Zaveri et al. provides a review of quality dimensions, which is specifically suited for linked open data machine readable ? [ ZaveriEtAl ].

Level on
Category Dimension Definition
Accessibility dimensions Availability Availability of a dataset is the 5-star scale (although there were opinions that it extent to which data (or some portion of it) is dangerous present, obtainable and ready for use.
Licensing Licensing is defined as the granting of permission for a consumer to attach value re-use a dataset under defined conditions.
Interlinking Interlinking refers to the linking because degree to which entities that represent the data might be good but link same concept are linked to ‘bad’ data) Links each other, be it within or between two or more data sources.
Security Security is the extent to metadata standards used which data is protected against alteration and misuse.
Performance Performance refers to the efficiency of a system that binds to a large dataset, that is, the more performant a data model/schema source is the more efficiently a system can process data.
Intrinsic dimensions Syntactic validity Syntactic validity is defined as the degree to enable automatic processing 6.4 Accuracy which an RDF document conforms to the specification of the serialization format.
Semantic accuracy Semantic accuracy is defined as the degree to which data values correctly representing represent the real-world entity or event? 6.5 real world facts.
Consistency Is Consistency means that a knowledge base is free of (logical/formal) contradictions with respect to particular knowledge representation and inference mechanisms.
Conciseness Conciseness refers to the minimization of redundancy of entities at the schema and the data not containing contradictions? Can I use it readily level.
Completeness Completeness refers to the degree to which all required information is present in an analysis tool? Can I open a particular dataset.
Contextual dimensions Relevancy Relevancy refers to the dataset provision of information which is in R accordance with the task at hand and do some statistical manipulations? Can I open it in Tableau important to the users’ query.
Trustworthiness Trustworthiness is defined as the degree to which the information is accepted to be correct, true, real and make a visualization credible.
Understandability Understandability refers to the ease with which data can be comprehended without doing ambiguity and be used by a lot human information consumer.
Timeliness Timeliness measures how up-to-date data is relative to a specific task.
Representational dimensions Representational-conciseness Representational-conciseness refers to the representation of cleaning? the data, which is compact and well formatted on the one hand and clear and complete on the other hand.
Interoperability Interoperability is the degree to which the format and structure of the information conforms to previously returned information as well as data from other sources.
Interpretability Interpretability refers to technical aspects of the data, that is, whether information is represented using an appropriate notation and whether the machine is able to process the data.
Versatility Versatility refers to the availability of the data in different representations and in an internationalized way.

There could DQV can express these dimensions and categories as shown in the following example. The encoding of all the dimensions and categories mentioned above can be some overlap with accuracy. found at http://www.w3.org/2016/05/ldqd .

# definition of categories from Zaveri et al

ldqd:accessibilityDimensions a dqv:Category ; 
   skos:prefLabel "Accessibility"@en. 

ldqd:intrinsicDimensions a dqv:Category ; 
   skos:prefLabel "Intrinsic dimensions"@en. 

ldqd:contextualDimensions a dqv:Category ; 
   skos:prefLabel "Contextual dimensions"@en. 

ldqd:representationalDimensions a dqv:Category ; 
   skos:prefLabel "Representational Dimensions"@en. 

#definition of dimensions from Zaveri et al


ldqd:availability
    a dqv:Dimension ; 
    dqv:inCategory ldqd:accessibilityDimensions ;
    skos:prefLabel "Availability"@en;
    skos:definition "Availability of a dataset is the extent to which data (or some
    portion of it) is present, obtainable and ready for use."@en
    .


ldqd:licensing
    a dqv:Dimension ; 
    dqv:inCategory ldqd:accessibilityDimensions  ;
    skos:prefLabel "Licensing"@en;
    skos:definition "Licensing is defined as the granting of permission for a consumer to
    re-use a dataset under defined conditions."@en
    .


ldqd:interlinking
    a dqv:Dimension ; 
    dqv:inCategory ldqd:accessibilityDimensions ;
    skos:prefLabel "Consistency"@en;
    skos:definition "Interlinking refers to the degree to which entities that represent
    the same concept are linked to each other, be it within or between two or more data
    sources."@en
    .


# ... etc ...

6.6 6.3.1 Relevance Expressing relations between quality dimensions

In Zaveri Et Al. [ Does the dataset include an appropriate amount of data? ZaveriEtAl It might be useful to include ] some information about dimensions are not completely independent and may be related. These relationships can be represented in DQV by using the context (e.g., why was appropriate SKOS properties or by specilizing the data created SKOS properties if more specific semantics must be expressed. For example, availability is related to performance and what purpose semantic accuracy , whilst semanticAccuracy is it supposed related to serve). timeliness , trustworthiness , consistency , syntaticValidity and completeness .

6.7 Completeness
ldqd:availability skos:related ldqd:performance ,  
    ldqd:interlinking .

ldqd:semanticAccuracy skos:related ldqd:timeliness , 
    ldqd:trustworthiness , ldqd:consistency , 
    ldqd:syntaticValidity , ldqd:completeness , 
    ldqd:interlinking . 

ldqd:consistency skos:related ldqd:conciseness , 
    ldqd:syntaticValidity , ldqd:interoperability .

ldqd:interoperability skos:related ldqd:conciseness , 
    ldqd:syntaticValidity .

ldqd:conciseness skos:related ldqd:completeness ,
     ldqd:representationalConciseness .

ldqd:interpretability skos:related ldqd:versatility .

# Note: skos:related is a symmetric property, hence from every statement
# ex:subject skos:related ex:object in this example, one can infer that 
# the statement ex:object skos:related ex:subject is true.

Does the data include all data items representing Dimensions can also be related across different categorizations. For example, in the entity or event ? following, we present two possible links between dimensions from ISO/IEC 25012 [ 6.8 Conformance ISOIEC25012 ] and Zaveri et al. Here we assume that completeness is equivalent across both classifications and that ISO's credibility is one specific facet of trustworthiness in Zaveri et al. (see Definition 12 in [ Is the data following accepted standards ? ZaveriEtAl ]). We pencil more such possible relationships in Annex C.

ldqd:completeness skos:exactMatch iso:completeness .
ldqd:trustworthiness skos:narrowMatch iso:credibility .

6.9 6.4 Credibility Examples of metrics

This section presents examples of metrics inspired by those reviewed in Zaveri et al. [ Is the data based on trustworthy sources ? ZaveriEtAl This is described using ], in order to further illustrate how dqv:Metric can be instantiated. Note that they are not all specific to linked data quality, as some dimensions in Zaveri et al. matches the provenance vocabulary PROV-O dimensions of ISO/IEC 25012 (see previous sub-section and Annex).

6.10 Timeliness Note
Is the data representing

These examples are just some of the actual situation possible ones. They show metrics for different dimensions and it is published soon enough ? kinds of dataset distributions. We might consider reorganizing examples around specific criteria (e.g., include at least a metric for each dimension, or focus on metrics for a specific kind of distribition, e.g., RDF, JSON, CSV). We might also consider to add further examples about derived metrics, multivalued metrics and extra parameters, once we have solved the remaining issues.

	
:downloadURLAvailabilityMetric
    a dqv:Metric ;
    skos:definition "It checks if dcat:downloadURL is available and if its value is
    dereferenceable."@en ;
    dqv:inDimension ldqd:availability ;
    dqv:expectedDataType xsd:boolean
    .


:sparqlAvailabilityMetric

    a dqv:Metric ;
    skos:definition "It checks if a void:sparqlEndpoint is specified for a dataset and 
    if the server responds to a SPARQL query."@en ;
    dqv:inDimension ldqd:availability ;
    dqv:expectedDataType xsd:boolean
    .


:misreportedContentTypeMetric

    a dqv:Metric ;
    skos:definition "It detects whether the HTTP response contains the header field 
    stating the appropriate content type of the returned file, e.g. application/rdf+xml"@en ;
    dqv:inDimension ldqd:availability ;
    dqv:expectedDataType xsd:boolean
    .


:licensingMetric

    a dqv:Metric ;
    skos:definition "It detects the indication of a license in a the DCAT/VoID 
    description or in the dataset of a license itself."@en ;
    dqv:inDimension ldqd:licensing ;
    dqv:expectedDataType xsd:boolean
    .


:highThroughput

    a dqv:Metric ;
    skos:definition "It represents the maximum number of answered HTTP-requests per
    second."@en ;
    dqv:inDimension ldqd:performance ;
    dqv:expectedDataType xsd:integer  
    .


:sparqlScalability

    a dqv:Metric ;
    skos:definition "It detects  whether the time to answer an amount of ten requests 
    divided by ten is not longer than the time it takes to answer one request."@en ;
    dqv:inDimension ldqd:performance ;
    dqv:expectedDataType xsd:boolean 
    .


:noRDFSyntaxError 

    a dqv:Metric ;
    skos:definition "It returns the number of syntax errors detected by an RDF
    validator."@en ;
    dqv:inDimension ldqd:syntacticValidity; 
    dqv:expectedDataType xsd:integer 
    .


:noJSONSyntaxError 

    a dqv:Metric ;
    skos:definition "It returns the number of syntax errors detected by an JSON
    validator."@en ;
    dqv:inDimension ldqd:syntacticValidity; 
    dqv:expectedDataType xsd:integer 
    .


:populationCompletenessMetric

    a dqv:Metric ;
    skos:definition "Ratio between the number of objects represented in the dataset and
     the  number of objects expected to be represented according to the declared dataset
     scope."@en ;
    dqv:inDimension ldqd:completeness ;
    dqv:expectedDataType xsd:double
    .

7. Requirements

This section is non-normative.

The UCR document lists relevant requirement for data quality and granularity :

The aforementioned requirements are going to be have been further elaborated considering on-going discussions and materials from these two extended by new use cases and examples, following discussions on the DWBP WG's mailing list, wiki pages: pages (see Requirements from FPWD_BP here and Quality Requirements From UCR . Issue 13 We have to confirm whether the scope of DQV work is indeed these "official" DQV reqs or if we should go beyond, e.g., reflecting the quality of here ), as well as external contributions during the vocabulary (re-)used, access to datasets, metadata and more generally review process (see the implementation general list of our best practices (cf. the "5 stars" thread ). The distinction between Intrinsic and extrinsic metadata may help making choices here. For example, DQV could be defined wrt. intrinsic properties of the datasets, not extrinsic properties (let alone properties of the metadata for a dataset!) ( Issue-190 issues ) Issue 14 Backward compatibility with DAQ and RDF Data Cube: DAQ exploits Data Cube to make metric results consumable by visualisers that includes such as CubeViz (see Jeremy's paper ). This may be useful to preserve in DQV. ( Issue-191 ) external feedback).

Issue 15 2

The W3C Human Care and Life Science Community Group has created a DCAT profile for describing datasets . This is work is well visible and used in the HCLS community. DQV should be aligned with this profile if there are overlapping areas. Are there such areas? ( Issue-221 )

A. Acknowledgements

The editors acknowledge the chairs of this Working Group: Hadley Beeman, Yaso Córdova, Deirdre Lee and the staff contact Phil Archer.

The editors also gratefully acknowledge the contributions made to this document by all members of the working group, specially the contributions received from Ghislain Auguste Atemezing, Carlos Laufer, Annette Greiner, Michel Dumontier, Eric Stephan.

The editors would also like to thank comments received from non-members of this working group, such as Andrea Perego, Rachel E. Heaven, Linda van den Brink, Werner Bailer, Jon Blower, Guillaume Duffes, Davide Ceolin, Anisa Rula.

B. Change history

Changes since the previous version include:

C. Correspondences between quality dimensions in ISO/IEC 25012 and Zaveri et al.

The dimensions listed in ISO/IEC 25012 [ ISOIEC25012 ] and Zaveri et al. [ ZaveriEtAl ] are not disjoint. Assuming that dimensions are expressed as instances of skos:Concept , the following table includes some of the correspondences that can be considered between these two classifications.

Dimension from Zaveri et al. Dimension from ISO/IEC 25012 Suggested mapping relation
Availability Availability skos:exactMatch
Completeness Completeness skos:exactMatch
Consistency Consistency skos:exactMatch
Timeliness Currentness skos:exactMatch
Interoperability Portability skos:relatedMatch
Interoperability Compliance skos:relatedMatch
Semantic Accuracy Accuracy skos:broadMatch
Trustworthiness Credibility skos:narrowMatch
Trustworthiness Traceability skos:relatedMatch
Understandability Understandability skos:exactMatch
Interpretability Understandability skos:relatedMatch
Versatility Understandability skos:broadMatch
Syntactic Validity Accuracy skos:broadMatch
Syntactic Validity Compliance skos:broadMatch
Licensing Accessibility skos:relatedMatch
Security Traceability skos:relatedMatch
Security Confidentiality skos:relatedMatch
Performance Efficiency skos:exactMatch
Interlinking Availability skos:broadMatch
Representation-conciseness Compliance skos:broadMatch

C. D. References

C.1 D.1 Informative references

[DaQ]
Jeremy Debattista; Christoph Lange; Sören Auer. daQ, an Ontology for Dataset Quality Information . 2014. LDOW 2014. URL: http://ceur-ws.org/Vol-1184/ldow2014_paper_09.pdf
[DaQ-RDFCUBE]
Jeremy Debattista; Christoph Lange; Sören Auer. Representing dataset quality metadata using multi-dimensional views . 2014. SEMANTICS 2014. URL: http://arxiv.org/abs/1408.2468
[GeoDCAT-AP]
ISA Programme. GeoDCAT-AP: A geospatial extension for the DCAT application profile for data portals in Europe . 13 July 23 December 2015. WG Draft. Version 1.0. URL: https://joinup.ec.europa.eu/catalogue/distribution/geodcat-ap-working-draft-6-0 https://joinup.ec.europa.eu/asset/dcat_application_profile/asset_release/geodcat-ap-v10
[ISOIEC25012]
ISO/IEC 25012 - Data Quality model . URL: http://iso25000.com/index.php/en/iso-25000-standards/iso-25012
[MultilingualImporting]
Riccardo Albertoni; Monica De Martino; Paola Podestà. A Linkset Quality Metric Measuring Multilingual Gain in SKOS Thesauri . 2015. LDQ@ESWC 2015. URL: http://ceur-ws.org/Vol-1376/LDQ2015_paper_01.pdf
[ODRL]
Renato Iannella; Susanne Guth; Daniel Paehler; Andreas Kasten. ODRL Version 2.1 Core Model . 5 March 2015. W3C Community Group Specification. URL: https://www.w3.org/community/odrl/model/2.1/
[RijgersbergEtAl]
Hajo Rijgersberg; Mark van Assem; Jan L. Top. Ontology of units of measure and related concepts . Semantic Web, vol. 4, no. 1, pp. 3-13, 2013. URL: https://dx.doi.org/10.3233/SW-2012-0069
[SHACL]
Holger Knublauch; Arthur Ryman. Shapes Constraint Language (SHACL) . 28 January 2016. W3C Working Draft. URL: https://www.w3.org/TR/shacl/
[SKOS-reference]
Alistair Miles; Sean Bechhofer. W3C. SKOS Simple Knowledge Organization System Reference . 18 August 2009. W3C Recommendation. URL: http://www.w3.org/TR/skos-reference
[Vocab-Data-Cube]
Richard Cyganiak; Dave Reynolds. The RDF Data Cube Vocabulary . W3C Recommendation 16 January 2014. URL: https://www.w3.org/TR/vocab-data-cube/
[ZaveriEtAl]
Amrapalia Zaveri; Anisa Rula; Andrea Maurino; Ricardo Pietrobon; Jens Lehmann; Sören Auer. Quality assessment for Linked Data: A Survey . Semantic Web, vol. 7, no. 1, pp. 63-93, 2015. URL: https://dx.doi.org/10.3233/SW-150175
[qSKOS]
Christian Mader; Bernhard Haslhofer; Antoine Isaac. Finding Quality Issues in SKOS Vocabularies . 2012. Theory and Practice of Digital Libraries, Lecture Notes in Computer Science, Vol. 7489, pp 222-233, 2012. 222-233. URL: https://dx.doi.org/10.1007/978-3-642-33290-6_25
[vocab-dcat]
Fadi Maali; John Erickson. W3C. Data Catalog Vocabulary (DCAT) . 16 January 2014. W3C Recommendation. URL: http://www.w3.org/TR/vocab-dcat/