Data usage notes

From Data on the Web Best Practices
Jump to: navigation, search

The working group will develop 2 new vocabularies to support the data ecosystem:

Data Usage Description Vocabulary

This will describe the use made of one or more data sets. Where data is used in an application, it will facilitate a description of what the application does and what problem it helps to solve. This will improve discoverability of the application. Where data is used in other contexts, such as in research, it will facilitate provision of information about what data was used and how it was used during the research. This information can link to and be cited within published papers. In these scenarios, and others, use of the Data Usage Description Vocabulary will encourage the continued publication of the data on which the usage depends.

Data Usage Vocabulary Meetings

Data Usage Scenarios

Table of Contents

Challenges areas

The Data Usage Vocabulary should focus on two main aspects:

  • Traceability
    • The use of data should be able to be tracked.
    • The process of tracking data can be done with API, ETL and other related tools.
    • Information about how dataset is being used must be provided.
      • How dataset should/should not be used.
    • The information about who is using the dataset should be provided.
      • Who should/should not use dataset.
    • Documentation of the use of the data as well as the combination of datasets.
    • The use of dataset should be documented.
    • The use of multiple datasets, when is the case, should be explicitly described.
    • If there is a transformation on the original data (data taken) to use in any application, the transformation should be described, showing the format of data:from and data:target.
    • The transformation of type or formats of a data should be explicitly described showing the format of original data and the format of target data.
    • Describing the process of taking data from a dataset (data transformation)
  • Feedback
    • The improvement of data should be described in machine readable format
    • Feedback about data (unqualified/qualified opinion)
    • Feedback should be used to improve data quality
    • Feedback should be used to help data selection

Some observations:

  • Data Usage and Provenance are actually highly complementary.
    • Data Usage is describes current/future activities/events pertaining to datasets
    • Data Provenance is describes past activities/events about how a dataset originated or was modified.
    • Some have called what we call data usage as "predictive provenance"


  1. Relying on Google Maps for some data but adding my own Point of Interest mapping points to a map. You could rely on Google Maps but maybe not my POI data
  2. If I have a index of drugs and take a dataset from FDA (if it was open) and then I add my impressions about each drug, to share with an app that provides data too


  • Initial proposal (challenges)
    • Industry-reuse
    • Dataset selection
    • Processing
    • Usability

Other aspects (out of the scope?):

  • Privacy
    • When use a dataset, respect the privacy rules.
    • When use a entire dataset, maybe you are able just to use a subset of the entire dataset, because there are any restriction or license over them.
  • Revenue

Data Usage Examples in CSVW Use Cases