Use Cases Document Outline

From Data on the Web Best Practices
Jump to: navigation, search

Headings

Abstract

Status of this document

Table of contents

1. Introduction

The aim of this document is to present concrete use cases of publishing and using data on the Web Data. The use cases are provided as a means to guide the definition of the set of best practices for Data on the Web as envisioned by Data on the Web Best Practices Working Group.


2. Terminology

(maybe we can reuse the terminolgy used in other working groups)

  • Data on the Web
  • Publisher
  • Consumer
  • Metadata
  • ...


3. Use cases

3.1 Original use cases

3.2 General use cases

  • Data publication scenario
  • Data usage scenario


4. Challenges (or Data on the Web Dimensions)

...for each challenge (dimension) we provide a brief description based on the description of google drive. The challenges below are the ones from google drive.

  • Metadata
  • Static/Real-time (or Real-time data access)
  • Privacy/security
  • Archiving
  • Provenance
  • Licenses
  • Granularity
  • Quality
  • Formats
  • Vocabularies
  • APIs (or Data access)
  • URIs (or Identification)
  • Industry-reuse
  • Usability (or Usage)
  • Feedback (instead of Processing)
  • Policy (together with Licenses?)
  • Data selection

I'm not sure if the following challenges are in the scope of the best practices or the vocabularies. What do you think?

  • Tools
  • Skills/Expertise
  • Revenue

5. Requirements

The use cases may give rise some requirements. In the case of the best practices, those requirements may specify required best practices.

  • Metadata
    • RM1: There should be metadata
    • reference to supporting use case (“Motivation:”, list)
    • RM2: Metadata should be machine-readable
    • reference to supporting use case (“Motivation:”, list)
    • RM3: Metadata vocabulary, or values if vocabulary is not standardised, should be well-documented
    • reference to supporting use case (“Motivation:”, list)
  • Vocabularies
    • RV1: Existing reference vocabularies should be reused where possible
    • reference to supporting use case (“Motivation:”, list)
    • RV2: Reference vocabularies should be shared in an open way
    • reference to supporting use case (“Motivation:”, list)
  • APIs (or Data access)
    • RDA1: Collaboration between API providers and API users is necessary.
    • RDA2:APIs should be well documented.
  • Provenance
    • RP1: Data provenance should be provided.
    • RP2: Standard vocabularies should be used to describe data provenance.
  • SLA
    • RSLA1: SLAs should be provided in a machine-readable format.
    • RSLA1: Standard vocabularies should be used to describe SLA.
  • Formats
    • RF1: Data should be provided in several formats
    • RF2: Data should be provided in a machine-readable format
    • RF3: Standard data formats should be adopted
  • Quality
    • RQ1: Information about data quality should be provided
    • RQ2: Standard vocabularies should be used to describe the quality of the data
    • RQ3: Data quality should be verified before data release
    • RQ4: Information about data quality should be provided in a machine-readable format.
  • Static/Real-time (or Real-time data access)
    • RRT1: Real-time data should be provided when possible
    • RRT2: Bulk data access should be provided when possible


  • Versioning
    • RV1: Release schedule should be provided in meta-data
    • RV2: Guidance about how to keep different versions of the same data should be provided
  • Licenses
    • RL1: Data licenses should be interoperable
    • RL2: Data licenses should be provided in a machine-readable format
    • RL3: Standard vocabularies should be used to describe licenses
    • RL4: Guidance about license combination should be provided
  • Granularity
    • RG1: Guidance about how to define data granularity should be provided.
  • Data Selection
    • RDS: Guidance about how to select data to be published should be provided.


APIs can be too clunky/rich in their functionality, which may increase the amount of calls necessary and size of data transferred, reducing performance Collaboration between API providers and users is necessary to agree on 'useful' calls API key agreements could restrict Openess of Open Data? Documentation accompanying APIs can be lacking What is best practice for publishing streams of real-time data (with/without APIs)? For accessing numerous datasets scientists will be accessing the archive directly using other protocols such as sftp, rsync, scp, access techniques such as: http://www.psc.edu/index.php/hpn-ssh For accessing individual datasets a REST GET interface to the archive should be provided.

...

  • Candidate best practices

6. Conclusions

References

Acknowledgements