Data Usage Vocabulary Meetings

From Data on the Web Best Practices
Data Usage Vocabulary Meetings

May 20, 2014 Kickoff Meeting

Attendees: Bernadette Locsio, Eric Stephan

  • On May 16, 2014 at the DWBP Group Telecon Bernadette and Eric agreed to be co-editors.
  • Data Usage Vocabulary documents will be hosted at:
  • Bernadette is already working with her students on aspects of data usage.

During our meeting different aspects of the data usage vocabulary were discussed:

  • Leveraging the existing W3C PROV vocabulary.
  • The vocabulary is the culmination of over a decade of data provenance research.
  • It was developed by an international team of researchers to track data lineage, event history, and human interaction with data and systems.
  • Provenance itself is only concerned from a past tense perspective, however data usage supports present tense (what you can do) and future tense (what is possible).
  • Since the focus is “Data on the Web” we want to focus on using distributed datasets.
  • Datasets are format dependent.
  • It doesn’t matter of RDF or XML, its just a “collection of data”
  • Think more in terms of mathematical representations: Graph, Tree, Table
  • Two interesting interviews with Peter Buneman on the role of mathematics and data modeling
  • Possible operations on such structures.
  • Providing usage design patterns
  • Identify usage patterns, showing examples in a similar way that patterns are used in software engineering

June 3, 2014 Meeting

Should consider two types of datasets: a general concept of a dataset, and one for specific structures. How can I map this abstract model to specific implementation model representations (json, rdf etc).

Topics for Data Usage:

  • Defining data processing steps (PROV, or new information)
  • Datasets comprised of many datasets
  • Datasets defining discoverability of applications that a user can leverage for the dataset.
  • Data usage, data publisher and data consumer feedback relationship.
  • Is this an overlap with the data quality vocabulary? Is there anything not considered feedback we should represent?
  • Reproducibility and repeatability as aspects that should be covered already in the above topics.

Path forward for first week of June 2014:

  • Define what data usage means to us that can be the foundation of our work.
  • What are the intersections between data quality vocabulary and best practices document?
  • Familiarize ourselves with use cases from the DWBP, and CSV working group. Find linkages between data usage and these use cases.
  • Follow up with those who might be interested in contributing to the vocabulary.
  • Put these notes on the DWBP data usage notes wiki.

Data Usage Vocabulary Definition