Abstract

Datasets published on the Web are accessed and experienced by consumers in a variety of ways, but little information about these experiences is typically conveyed. Dataset publishers many times lack feedback from consumers about how datasets are used. Consumers lack an effective way to discuss experiences with fellow collaborators and explore referencing material citing the dataset. Datasets as defined by DCAT are a collection of data, published or curated by a single agent, and available for access or download in one or more formats. The Dataset Usage Vocabulary (DUV) is used to describe consumer experiences, citations, and feedback about the dataset from the human perspective.

By specifying a number of foundational concepts used to collect dataset consumer feedback, experiences, and cite references associated with a dataset, APIs can be written to support collaboration across the Web by structurally publishing consumer opinions and experiences, and provide a means for data consumers and producers advertise and search for published open dataset usage.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is the third iteration of the vocabulary, developed following extensive consultation among and outside the working group who now regard it as nearing completion. Comment and feedback is sought before the next iteration which is likely to be the final version for the foreseeable future.

This document was published by the Data on the Web Best Practices Working Group as a Working Draft. If you wish to make comments regarding this document, please send them to public-dwbp-comments@w3.org (subscribe, archives). All comments are welcome.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. The group does not expect this document to become a W3C Recommendation. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 1 September 2015 W3C Process Document.

1. Introduction

This section is non-normative.

This vocabulary is meant to fill a niche that helps standardize the way Web published dataset usage be conveyed and shared. At this time there is no clear standard way to describe dataset usage on the Web. Without a means to systematically describe dataset usage, searching and conveying techniques are application specific and discovery and collaboration across the Web is more difficult. This vocabulary also recommends and requires data publishers to provide a mechanism of receiving data usage information from data consumers in the form of feedback, citation and data correction.

View as: .

2. Namespaces

The namespace for DUV is http://www.w3.org/ns/duv#. However, it should be noted that DUV makes extensive use of terms from other vocabularies. DUV itself defines a minimal set of classes and properties of its own. A full set of namespaces and prefixes used in this document is shown in the table below.

Prefix Namespace
biro http://purl.org/spar/biro/
cito http://purl.org/spar/cito/
cnt http://www.w3.org/2011/content#
dcat http://www.w3.org/ns/dcat#
dct http://purl.org/dc/terms/
dctype http://purl.org/dc/dcmitype/
disco http://rdf-vocabulary.ddialliance.org/discovery#
dqv http://www.w3.org/ns/dqv#
duv http://www.w3.org/ns/duv#
fabio http://purl.org/spar/fabio/
foaf http://xmlns.com/foaf/0.1/
frbr http://purl.org/vocab/frbr/core#
oa http://www.w3.org/ns/oa#
pav http://purl.org/pav
rdf http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs http://www.w3.org/2000/01/rdf-schema#
skos http://www.w3.org/2004/02/skos/core#
vann http://purl.org/vocab/vann/
xsd http://www.w3.org/2001/XMLSchema#

3. Audience

The DUV is intended for data producers and publishers interested in tracking, sharing, and persisting consumer dataset usage. It is also intended for collaborators who require an exchange medium to advertise and interactively convey dataset usage.

4. Scope

The scope of the DUV is defined by the Data on the Web Best Practices (DWBP) Use Case document [DWBP-UCR] based on the data usage requirements about datasets. These requirements include: citing data on the Web, tracking the usage of data, sharing feedback and rating data. These requirements were derived from fourteen real world use cases examples provided in the use case document.

5. Relationship to other Vocabularies

The DUV is a “glue” vocabulary reusing and extending existing vocabulary classes and properties to support citation, feedback, and usage. This section provides our rationale and approach for vocabulary selection and reuse.

Core to the dataset usage vocabulary is the “dataset”. The DUV uses the Data Catalog Vocabulary's dcat:Dataset class and all properties associated with the class [VOCAB-DCAT]. From a data usage perspective the DUV can be considered an extension of dcat:Dataset.

The Web Annotation Vocabulary [OA] is used to describe duv:Feedback as a subclass inheriting the behavior of oa:Annotation. A crucial part of the Web Annotation Model are “motivations” that describe the role of particular Annotation. Each duv:Feedback must have at least one oa:motivated_by property with a relationship to an instance of oa:Motivation. A subset of the Motivation instances are important to describe feedback to data publishers, and blogs between dataset consumers. In addition to supporting duv:Feedback, because the Web Annotation vocabulary provides a generic way of annotating any Web resource, it is recommended that Web Annotation vocabulary be used to annotate the dcat:Dataset for uses beyond the scope of the DUV.

The Semantic Publishing and Referencing [SPAR] Ontologies provides a suite of vocabularies used to related entities to reference citations, bibliographic records, and describe the publication process along with other related activities. The DUV directly relies upon Citation Typing Ontology [CITO], the FRBR-aligned Bibliographic Ontology [FaBIO], and Dublin Core [DC-TERMS] vocabularies are used to describe citations and references between datasets and cited sources. In addition to ontologies the research community provided basic criteria for citing data on the Web [MSUDataCite] [EmoryUCite]. These resources helped scope the DUV citation model into the minimal requirements for electronic data publication. Finally, data citation principles being adopted [FORCE11-Citation] are also being considered to ensure the DUV is consistent with guidelines developed by other data citation communities.

6. Vocabulary Overview

This section is non-normative.

This section depicts the vocabulary as a conceptual model. Shaded boxes are used to identify each class. Labeled open arrows identify example properties between the classes. Unlabeled shaded arrows are used to show inheritance with the parent class identified by the arrow head.

The class diagram for DUV

6.1 Citation Model

The citation model was motivated by the UCR requirement R-Citable It should be possible to cite data on the Web. The citation model is largely based on classes, properties, and recommended approaches taken from the SPAR Ontologies. The remainder of the model is composed from the Open Annotation vocabulary, Dublin Core, FOAF and by newly introduced DUV properties.

The citation model seeks to meet the needs of:

  1. Data publishers who need to provide basic bibliographic reference criteria for dcat:Dataset and dcat:Distributions (See Example 1). To do this, DUV extensions are added to the DCAT model. Classes include: dcat:Dataset, dcat:Distribution, and foaf:Agent. Properties include: dct:title, pav:version, dct:issued, dct:created, dct:identifier, dct:creator, dct:publisher, disco:fundedBy.
  2. Data publishers wanting to annotate dcat:Datasets/dcat:Distributions with bibliographic references that provide data additional insights to data consumers (See Example 2). Classes include: dcat:Dataset, dcat:Distribution, biro:BibliographicReference, fabio:Expression. Properties include: biro:BibliographicReference, frbr:part, frbr:partOf, biro:references.
  3. Data consumers who want to understand the rationale as to why a dcat:Dataset and dcat:Distribution is being cited (See Example 3). Classes include: fabio:Expression, cito:CitationAct, oa:Annotation, oa:Motivation, cnt:ContentAsText. Properties include: cito:hasCitingEntity, cito:hasCitedEntity, cnt:chars, oa:hasTarget, oa:hasBody.

Note that while the DUV reuses a limited set of [SPAR] classes and properties to support basic [VOCAB-DCAT] Dataset and Distribution citation requirements, the [SPAR] ontologies a richer set of classes and properties for more advanced representations. For example [FaBIO] fabvio:Work class has many subclasses (e.g. fabio:Policy) that help characterize referenced material.

6.2 Usage Model

The usage model was motivated by the UCR requirement R-TrackDataUsage It should be possible to track the usage of data. Sharing and tracking data usage can help enhancing the utility of the datasets by providing help describing how the datasets can be used by a consumer (See Example 5). Usage could be considered as enabling descriptive information provided by the data publisher to help the consumer community make use of datasets and distributions. Based on the use cases, data usage can help provide guidance information about how to use the dataset and tools (See Example 6) that can be used with the dataset. As also stipulated by use cases is the need to track usage metrics

The following classes constitute the Usage Model: dcat:Dataset, dcat:Distribution, duv:Usage, duv:UsageTool, foaf:Agent. Properties include: duv:hasUsage, duv:hasUsageType, duv:performedBy, duv:performs, duv:refersTo, dct:uri, dct:title, dct:created, pav:version, dct:issued, and dct:description.

6.3 Feedback Model

The feedback model was motivated by UCR requirement R-UsageFeedback Data consumers should have a way of sharing feedback and rating data (See Example 4) . User feedback is important to address data quality concerns about published dataset. Different users may have different experience with the same dataset so it is important to capture the context in which data was used and the profile of the user who uses it. R-UsageFeedback should also provide a way for consumers to communicate suggested corrections or advice back to data publisher.

The following classes constitute the Feedback Model: dcat:Dataset, dcat:Distribution, oa:Annotation, oa:Motivation, duv:UserFeedback, dqv:UserQualityFeedback, duv:RatingFeedback, cnt:ContentAsText. Properties include: duv:hasRating, oa:hasTarget, oa:hasBody, duv:hasFeedback, duv:hasFeedback, oa:motivatedBy, cnt:chars.

Note
This section will be non-normative and will contain links back to the vocabularies we mention.

7. Vocabulary Specification

7.1 Class: RatingFeedback

RDF Class: duv:RatingFeedback
Definition Predefined criteria used to express a user opinion about a dataset or distribution using a discrete range of values.
rdfs:isDefinedBy http://www.w3.org/ns/duv
Label rating feedback
rdfs:subClassOf duv:UserFeedback

7.2 Class: Usage

RDF Class: duv:Usage
Definition A helpful description of actions that can be performed on a given dataset or distribution.
rdfs:isDefinedBy http://www.w3.org/ns/duv
Label usage

7.3 Class: UsageTool

RDF Class: duv:UsageTool
Definition A synopsis describing the way a tool can use a dataset or distribution.
rdfs:isDefinedBy http://www.w3.org/ns/duv
Label usage tool

7.4 Class: UserFeedback

RDF Class: duv:UserFeedback
Definition User feedback on the dataset. Expresses whether the dataset was useful or not, for example.
Label user feedback
rdfs:subClassOf oa:Annotation

7.5 Properties

Property: chars

RDF Property: cnt:chars
Definition Text content of an annotation body.
vann:usageNote cnt:ContentAsText (subject) cnt:chars (predicate) rdfs:Literal (object)
Label chars

Property: created

RDF Property: dct:created
Definition The creation date associated with the dataset or distribution
vann:usageNote dcat:Dataset (subject) dct:created (predicate) rdfs:Literal (object)
dcat:Distribution (subject) dct:created (predicate) rdfs:Literal (object)
Label created

Property: creator

RDF Property: dct:creator
Definition Author of the cited dataset/distribution
vann:usageNote dcat:Dataset (subject) dct:creator (predicate) foaf:Agent (object)
dcat:Distribution (subject) dct:creator (predicate) foaf:Agent (object)
Label creator

Property: description

RDF Property: dct:description
Definition free-text usage account of the resource.
vann:usageNote duv:Usage (subject) dct:description (predicate) rdfs:Literal (object)
duv:UsageTool (subject) dct:description (predicate) rdfs:Literal (object)
Label description

Property: fundedBy

RDF Property: disco:fundedBy
Definition Funding agent
vann:usageNote
dcat:Dataset (subject) disco:fundedBy (predicate) foaf:Agent (object)
dcat:Distribution (subject) fundedBy (predicate) foaf:Agent (object)
Label funded by

Property: oa:hasBody

RDF Property: oa:hasBody
Definition Body of the comment associated with either feedback or associated with citation.
vann:usageNote duv:UserFeedback (subject) oa:hasBody (predicate) cnt:ContentAsText (object)
oa:Annotation (subject) oa:hasBody (predicate) cnt:ContextAsText (object)
Label has body

Property: hasCitedEntity

RDF Property: cito:hasCitedEntity
Definition The dataset, distribution, or citation of interest being cited.
vann:usageNote cito:CitationAct (subject) cito:hasCitedEntity (predicate) dcat:Dataset (object)
cito:CitationAct (subject) cito:hasCitedEntity (predicate) dcat:Distribution (object)
Label has cited entity

Property: hasCitingEntity

RDF Property: cito:hasCitingEntity
Definition Citation that references a dataset or a distribution.
vann:usageNote cito:CitationAct (subject) cito:hasCitingEntity (predicate) fabio:Expression (object)
Label has citing entity

Property: hasFeedback

RDF Property: duv:hasFeedback
Definition User feedback associated with dataset or distribution
vann:usageNote dcat:Dataset (subject) duv:hasFeedback (predicate) duv:UserFeedback (object)
dcat:Distribution (subject) duv:hasFeedback (predicate) duv:UserFeedback (object)
Label has dataset/distribution feedback

Property: hasDistributor

RDF Property: duv:hasDistributor
Definition The distributor is the organization that makes the dataset available for downloading and use.
vann:usageNote dcat:Dataset (subject) duv:hasDistributor (predicate) foaf:Agent (object)
dcat:Distribution (subject) duv:hasDistributor (predicate) foaf:Agent (object)
Label has distributor

Property: hasProducer

RDF Property: duv:hasProducer
Definition The producer is the organization that sponsored the author’s research and/or the organization that made the creation of the dataset possible, such as codifying and digitizing the data.
vann:usageNote dcat:Dataset (subject) duv:hasProducer (predicate) foaf:Agent (object)
dcat:Distribution (subject) duv:hasProducer (predicate) foaf:Agent (object)
Label has producer

Property: hasRating

RDF Property: duv:hasRating
Definition RatingFeedback has rating opinion
vann:usageNote duv:RatingFeedback (subject) duv:hasRating (predicate) skos:Concept (object)
Label has rating

Property: oa:hasTarget

RDF Property: oa:hasTarget
Definition Dataset or distribution associated with UserFeedback.
vann:usageNote duv:UserFeedback (subject) oa:hasTarget (predicate) dcat:Dataset (object)
duv:UserFeedback (subject) oa:hasTarget (predicate) dcat:Distribution (object)
oa:Annotation (subject) oa:hasTarget (predicate) cito:CitationAct (object)
Label has target

Property: hasUsage

RDF Property: duv:hasUsage
Definition Dataset/distribution usage guidance or instructions.
vann:usageNote dcat:Dataset (subject) oa:hasUsage(predicate) duv:Usage (object)
dcat:Distribution (subject) oa:hasUsage (predicate) duv:Usage (object)
Label has dataset/distribution usage

Property: hasUsageTool

RDF Property: duv:hasUsageTool
Definition Provided guidance about a tool that can be applied to a dataset/distribution.
vann:usageNote dcat:Usage (subject) oa:hasUsageTool(predicate) duv:UsageTool (object)
Label has usage tool

Property: identifier

RDF Property: dct:identifier
Definition The identifier of the dataset or distribution.
vann:usageNote dcat:Dataset (subject) dct:identifier (predicate) rdfs:Literal (object)
dcat:Distribution (subject) dct:identifier (predicate) rdfs:Literal (object)
Label identifier

Property: issued

RDF Property: dct:issued
Definition The issue or publication date associated with the dataset or distribution
vann:usageNote dcat:Dataset (subject) dct:issued (predicate) rdfs:Literal (object)
dcat:Distribution (subject) dct:issued (predicate) rdfs:Literal (object)
Label Date of issue or publication date.

Property: motivatedBy

RDF Property: oa:motivatedBy
Definition reason behind citation annotation or userfeedback
vann:usageNote duv:UserFeedback (subject) oa:motivatedBy (predicate) oa:Motivation (object)
oa:Annotation (subject) oa:motivatedBy (predicate) oa:Motivation (object)
Label motivated by

Property: performedBy

RDF Property: duv:performedBy
Definition Usage performed by agent.
vann:usageNote duv:Usage (subject) duv:performedBy (predicate) foaf:Agent (object)
Label performed by

Property: performs

RDF Property: duv:performs
Definition Agent performs usage
vann:usageNote foaf:Agent (subject) duv:performs (predicate) duv:Usage(object)
Label performs

Property: publisher
RDF Property: dct:publisher
Definition Creator of the cited dataset of distribution
vann:usageNote dcat:Dataset (subject) dct:publisher (predicate) rdfs:Literal (object)
dcat:Distribution (subject) dct:publisher (predicate) rdfs:Literal (object)
Label publisher

Property: refersTo

RDF Property: duv:refersTo
Definition Dataset/distribution associated with Usage.
vann:usageNotes duv:Usage (subject) duv:refersToDataset (predicate) dcat:Dataset (object)
Label refers to dataset

Property: title

RDF Property: dct:title
Definition The title of citation, dataset or distribution
vann:usageNote dcat:Dataset (subject) dct:title (predicate) rdfs:Literal (object)
dcat:Distribution (subject) dct:title (predicate) rdfs:Literal (object)
Label title

Property: version

RDF Property: pav:version
Definition The version or edition number associated with the dataset.
vann:usageNote dcat:Dataset (subject) pav:version (predicate) rdfs:Literal (object)
dcat:Distribution (subject) pav:version (predicate) rdfs:Literal (object)
Label version

8. Examples

This section shows some examples to illustrate the application of the Dataset Usage Vocabulary.

Example 1 - Citation: Basic reference criteria for a dataset.

ex:timetable-001 a dcat:Dataset ;
        dct:title  "Bus timetable of MyCity"^^xsd:string;
        prism:doi "10.3456/4567.21"^^xsd:string ;
        pav:version "series-1.2"^^xsd:string;
        dct:issued "2015-05-05"^^xsd:date;
        dct:creator <http://example.org/transport-agency/contact>;
      

Example 2 - Citation: An electronic memorandum that references a dataset.

     
ex:timetable-001 a dcat:Dataset ;
        dct:title  "Bus timetable of MyCity"^^xsd:string;
        prism:doi "10.3456/4567.21"^^xsd:string ;
        pav:version "series-1.2"^^xsd:string;
        dct:issued "2015-05-05"^^xsd:date;
        dct:creator <http://example.org/transport-agency/contact>;
.

ex:timetable-memorandum
      a biro:BibliographicReference;
      a fabio:Policy  ;
      dct:bibliographicCitation
     "Costello, E. Mayor (2015). City Timetable Schedule Change Memorandum Effective 
January 1, 2015. Sept 15, 2014. DOI:0.3456/4567.21"^^xsd:string ; biro:references ex:timetable-001 ; .

Example 3 - Citation:  Citation characterization where memoradum is a policy.

     
ex:timetable-memorandum
      a biro:BibliographicReference;
      a fabio:Policy  ;
      dct:bibliographicCitation
      "Costello, E. Mayor (2015). City Timetable Schedule Change Memorandum Effective 
      January 1, 2015. Sept 15, 2014. DOI:0.3456/4567.21"^^xsd:string ;
      biro:references ex:timetable-001 .


ex:memorandum-cites-dataset
      a cito:CitationAct ;
      cito:hasCitingEntity ex:timetable-memorandum ;
      cito:hasCitedEntity ex:timetable-001 .

      

Example 4 - Feedback:  Providing a mechanism for consumers to comment and ask the dataset publisher questions.

      
ex:timetable-001
      a dcat:Dataset;
      dct:title "Bus timetable of MyCity"^^xsd:string;
      dct:issued "2015-05-05"^^xsd:date;
      dct:publisher ex:transport-agency-mycity ;
      dcat:distribution ex:timetable-001-csv;
      .

ex:timetable-001-csv
      a dcat:Distribution;
      dct:title "CSV distribution MyCity_busTimetable dataset."^^xsd:string ;
      dct:description "CSV distribution of the bus timetable 
                       dataset of MyCity."^^xsd:string ;
      dcat:mediaType "text/csv"^^xsd:string;
      .

ex:comment1Content a cnt:ContentAsText ;
      cnt:chars "This timetable is missing route 3"^^xsd:string .

ex:comment1
       a duv:UserFeedback ;
       oa:hasBody ex:comment1Content ;
       oa:hasTarget ex:timetable-001 ;
       dct:creator ex:localresident .

ex:comment2Content a cnt:ContentAsText ;
      cnt:chars "Are tab delimited formats also available?"^^xsd:string .

ex:comment2
       a duv:UserFeedback ;
       oa:hasTarget ex:timetable-001-csv;
       oa:hasBody ex:comment1Content ;
       dct:creator ex:localresident ;
    .

ex:localresident
       a foaf:Person  ;
       foaf:Name "Alan Law"^^xsd:string  ;
    .
      

Example 5:  Data Usage - Dataset publisher provides information about how consumers can use a particular distribution.

      

ex:timetable-001-csv
    a dcat:Distribution;
    dct:title "CSV distribution MyCity_busTimetable dataset."^^xsd:string ;
    dct:description "CSV distribution of the bus timetable dataset of MyCity."^^xsd:string ;
    dcat:mediaType "text/csv"^^xsd:string;
    .

ex:release-notes a duv:Usage;
  dct:title "New timetable format";
  dct:created "20-DEC-2014" ;
  dct:description "The previous proprietary format has been replaced with a 
                  CVS format to take advantage of new off the shelf tools"^^xsd:string ;
 .
      

Example 6 - Data Usage:  Dataset publisher provides recommendations about a bus route calculator that can be used with a timetable distribution.

      
ex:timetable-001-csv
      a dcat:Distribution;
      dct:title "CSV distribution MyCity_busTimetable dataset."^^xsd:string ;
      dct:description "CSV distribution of the bus timetable dataset of MyCity."^^xsd:string;
      dcat:mediaType "text/csv"^^xsd:string;
.

ex:route-calculator a duv:UsageTool;
    dct:title "Route Calculator"^^xsd:string;
    dct:created "15-FEB-2004"^^xsd:date ;
    dct:uri <http://example.org/route-calculator> .

    ex:release-notes a duv:Usage;
    dct:title "New timetable format"^^xsd:string;
    dct:created "20-DEC-2014"^^xsd:date ;
    duv:hasUsageTool ex:route-calculator ;
.

9. Future Work

This section is non-normative.

In the near future,  the editors want to internationalize the terms in the DUV as broadly as possible.  As a technical note this DUV activity officially concludes July 2016 according to the Data on the Web Best Practices (DWBP) Working Group Schedule.  It is the opinion of the DUV editors that the concepts of usage, feedback, and citation are still evolving and are considering forming a W3C Community Group to focus on: 1) the application of the DUV 2) based on future external community feedback make formal requests to the W3C to continue making minor vocabulary revisions.

A. References

A.1 Informative references

[CITO]
David Shotton; Silvio Peroni. CiTO, the Citation Typing Ontology. URL: http://purl.org/spar/cito
[DC-TERMS]
Dublin Core Metadata Initiative. Dublin Core Metadata Initiative Terms, version 1.1. 11 October 2010. DCMI Recommendation. URL: http://dublincore.org/documents/2010/10/11/dcmi-terms/.
[DWBP-UCR]
Deirdre Lee; Bernadette Farias Loscio; Phil Archer. W3C. Data on the Web Best Practices Use Cases & Requirements. 24 February 2015. W3C Note. URL: http://www.w3.org/TR/dwbp-ucr/
[EmoryUCite]
Emory University. General Citation Guidelines. URL: http://einstein.library.emory.edu/citations_general.html
[FORCE11-Citation]
Emory University. Data Citation Principles. URL: https://www.force11.org/group/joint-declaration-data-citation-principles-final
[FaBIO]
David Shotton; Silvio Peroni. FaBiO, the FRBR-aligned Bibliographic Ontology. URL: http://purl.org/spar/fabio
[MSUDataCite]
Michigan State University. How to Cite Data: General Info. URL: http://libguides.lib.msu.edu/citedata
[OA]
Herbert Van de Sompel; Paolo Ciccarese; Robert Sanderson. Open Annotation Data Model. URL: https://www.w3.org/ns/oa#
[SPAR]
Semantic Publishing and Referencing (SPAR) Ontologies. URL: http://www.sparontologies.net
[VOCAB-DCAT]
Fadi Maali; John Erickson. W3C. Data Catalog Vocabulary (DCAT). 16 January 2014. W3C Recommendation. URL: http://www.w3.org/TR/vocab-dcat/