Data on the Web Best Practices

From Data on the Web Best Practices
Jump to: navigation, search

Introduction

Quoting from its charter:

The mission of this group, which is part of the Data Activity, is:

  • to develop the open data ecosystem, facilitating better communication between developers and publishers;
  • to provide guidance to publishers that will improve consistency in the way data is managed, thus promoting the re-use of data;
  • to foster trust in the data among developers, whatever technology they choose to use, increasing the potential for genuine innovation.

Abstract

This document aims to establish one set of best practices for publishing data on the Web

The publication of data on the Web can be performed using various technologies and methods and because of that, the goal is to use real use cases to extract them common ground that can be recommended as good practice. This should contribute to the increased publication of data on the Web, which is a requirement for innovation by publishing data.

The environment for publishing data on the Web is multiple and variable. Every minute show new ways to collect and distribute data. Between the input of data and the output of data is assumed that there is a publishing cycle which is common to all data types. Given this assumption, the idea is to start with a division for the cycle, similar to that proposed by [HAUSENBLAS] in the publication "Best Practices for Publishing Linked Data".

This graphic suggests six steps to think when publishing linked data: (1) data awareness, (2) modeling, (3) publishing, (4) discovery, (5) integration, and (6) use cases. Another reference is the cicle suggested to help stakeholders with their processes of opening and publishing data on the Web at the context of od4d.org project.

The use of divisions, or steps, to work with best practices for publishing data on the Web aims to resolve overlaps in decisions and reduce reworking when publishing data on the web.

Challenges

Metadata

First day discussion

Static/Real-time

To provide data in a timely manner, Metadata should declare:

  1. Expected/scheduled frequency of update;
  2. If the dataset is journalled (i.e. no deletions, only append);
  3. If the dataset is timestamped (can request data for a specific time interval);
  4. Actual timestamp of last update.

Tools

  1. Data might be provided via various access mechanisms including (but not limited to) Data catalogues, APIs, SPARQL endpoints, REST interfaces, dereferenceable URIs - and best practice is that data publishers should make use of available tools to support multiple access mechanisms;
  2. The registration of data within data-set catalogues (or auto-discovery and indexing/classification by such catalogues based on published metadata) should be supported so that data can be found easily;
  3. To further discuss the federation concept in relation with previous proposal.

Privacy/security

Acknowledging that much further discussion is needed on security, metadata could include information about security realms (see OASIS SAML/XACML) that apply to restricted-access data on the web. Realms indicate which security credentials need to be presented in order to be considered for access to the data.

Note: also note that other technologies such as http://www.w3.org/TR/P3P/ may also be relevant to consider in metadata.

Skills/Expertise

The interaction among the actors in the ecosystem could help increase the the value of the data (detecting and reporting errors, etc) and the skills among them.

Note: we are saying by interaction a "way" for different actors to come together and "speak about" the data, the way they are used, etc.. And this can be achieved via many channels (hackathon, other events, etc.)

References

[HAUSENBLAS]

Michael Hausenblas; Richard Cygankiak. Linked Data Life cycles, formerly at http://linked-data-life-cycles.info/.

OPEN DATA FOR DEVELOPMENT

formerly at od4d.org/en.