Warning:
This wiki has been archived and is now read-only.

Best Practices Discussion Summary

From Government Linked Data (GLD) Working Group Wiki
Jump to: navigation, search

Organisational

Note: the Editor's Draft of the 'Best Practices for Publishing Linked Data' is maintained on the HG repository.

Contributors

Editors: Bernadette Hyland, Boris Villazón-Terrazas, Ghislain Atemezing

Procurement
  • Mike Pendleton (Environmental Protection Agency, US)
  • George Thomas (Health & Human Services, US)
Vocabulary Selection
  • Ghislain (INSTITUT TELECOM)
  • Daniel Vila-Suero (UPM)
  • Boris Villazón-Terrazas (UPM)
URI Construction
  • Daniel Vila (UPM)
  • John Erickson (RPI)
  • Bernadette Hyland (3 Round Stones)
Linked Data Cookbook
  • Bernadette Hyland (3 Round Stones)
  • Boris Villazón-Terrazas
Source Data
  • Biplav Srivastava (IBM)
  • Spyros Kotoulas (IBM/SCTC)
Versioning (needs attention)
  • John Erickson (RPI)
  • Ghislain (INSTITUT TELECOM)
  • Hadley Beeman, (versioning as related to Data "Cube")
Pragmatic Provenance
  • John Erickson (RPI)

Best Practices Design Goals

  1. Must be relevant for government (local, state, federal, international)
  2. Content must be self-maintaining over time
  3. Data published in a W3C RDF serialization (or submitted W3C Standard)


Purpose of Best Practices Recommendation(s)

The following are some motivations for the need for publishing Recommendation(s) and Working Notes, identified in the GLD WG Charter.

  1. The overarching objective is to provide best practices and guidance to create of high quality, re-usable Linked Open Data (LOD).

More specifically, best practices are aimed at assisting government departments/agencies/bureaus, and their contractors, vendors and researchers, to publish high quality, consistent data sets using W3C Standards to increase interoperability.

Best practices are intended to be a methodical approach for the creation, publication and dissemination of governmental Linked Data. Best practices from the GLD WG shall include:

  1. Description of the full life cycle of a Government Linked Data project, starting with identification of suitable data sets, procurement, modeling, vocabulary selection, through publication and ongoing maintenance.
  2. Definition of known, proven steps to create and maintain government data sets using Linked Data principles.
  3. Guidance in explaining the value proposition for LOD to stakeholders, managers and executives.
  4. Assist the Working Group in later stages of the Standards Process, in order to solicit feedback, use cases, etc.

Content

Overview

Linked Data approaches address key requirements of open government by providing a family of international standards for the publication, dissemination and reuse of structured data. Further, Linked Data, unlike previous data formatting and publication approaches, provides a simple mechanism for combining data from multiple sources across the Web.

In an era of reduced local, state and federal budgets, there is strong economic motivation to reduce waste and duplication in data management and integration. Linked Open Data is a viable approach to publishing governmental data to the public, but only if it adheres to some basic principles.

From section 2.2 of the GLD Charter.

The Working Group, facilitated by the Best Practices Task Force, will produce Recommendation(s), (a Working Group Note / website, where noted), for the following:

  1. Procurement.
  2. Vocabulary Selection.
  3. URI Construction.
  4. Versioning.
  5. Stability.
  6. Legacy Data.
  7. Cookbook. (Working Group Note or website rather than Recommendation).

GLD Life cycle

Best Practices for Procurement

Procurement. Specific products and services involved in governments publishing linked data will be defined, suitable for use during government procurement. Just as the Web Content Accessibility Guidelines allow governments to easily specify what they mean when they contract for an accessible Website, these definitions will simplify contracting for data sites and applications.

See full Status & Working Notes for Procurement

Best Practices for Vocabulary Selection

The group will provide advice on how governments should select RDF vocabulary terms (URIs), including advice as to when they should mint their own. This advice will take into account issues of stability, security, and long-term maintenance commitment, as well as other factors that may arise during the group's work.

@@TODO: distinguish between vocab discovery and vocab creation and management.

See full status and working notes on Vocabulary Selection

URI Construction

The group will provide recommendations on how to create good URIs for use in government linked data. Inputs include:

Guidance will be produced not only for minting URIs for governmental entities, such as schools or agencies, but also for vocabularies, concepts, and datasets.

See full status and working notes for URI Construction

Versioning

The group will specify how to publish data which has multiple versions, including variations such as:

  1. data covering different time periods
  2. corrected data about the same time period
  3. the same data published using different vocabularies, formats, and presentation styles
  4. retracting published data
  • By John Erickson(RPI)*:

The Digital Library community has faced the problem of versions in digital repositories for more than a decade+. One useful summary of thinking in this space can be found at the Version Identification Framework (VIF) Project site. See especially:

  1. Essential Versioning Information
  2. Embedding Versioning Information in an Object
  3. Recommendations for Repository Developers

The Resourcing IDentifier Interoperability for Repositories (RIDIR) project (2007-2008) considered in depth the relationship between identifiers and finding versions of objects. See RIDIR Final Report. In their words, RIDIR set out to investigate how the appropriate use of identifiers for digital objects might aid interoperability between repositories and to build a self-contained software demonstrator that would illustrate the findings. A number of related projects are listed at JISC's RIDIR information page.

In addition, at TWC we have adopted an ad hoc approach to denoting versions of published linked data:

  1. The URI for the "abstract" dataset has no version information, e.g. http://logd.tw.rpi.edu/source/data-gov/dataset/1017
  2. The URI for a particular version appends this, e.g. http://logd.tw.rpi.edu/source/data-gov/dataset/1017/version/1st-anniversary
  3. The version indicator (e.g. "1st-anniversary") is arbitrary; a date code may be used. We sometimes use NON-ISO 8601 (e.g. "12-Jan-2012" to make it clear this is (in our case) not necessarily machine produced.

Stability

The group will specify how to publish data so that others can rely on it being available in perpetuity, persistently archived if necessary.

See full status and working notes on Stability.

Source Data

The group will produce specific advice concerning how to expose legacy data, data which is being maintained in pre-existing (non-linked-data) systems. Subject: Roadmap for cities to adopt open data

Biplav: use-case

  • Suppose a city is considering opening up its data. It has certain concerns:
  1. Business and legal level
    1. What are the privacy considerations in publishing data? On one hand, city will like to respect the privacy of citizens and businesses, and on the other, it will like the data to be valuable enough to lead to positive change.
    2. How to pay for the cost of opening up data? Cities may or may not have legal obligations to open up data. Accordingly, they will look for guidance on how to account for the costs. Further, can they levy a license fee if they are not obligated to open data
    3. Which data should be opened and when? Should it be by phases? What data should not be shared?
    4. What policies / laws are needed from the city so that businesses can collaborate on open data, while preserving their IP?
  2. Technical level
    1. What should be architecture to share large-scale public data? How do we ensure performance and security?
    2. What visualization should be supported for different types of data?
    3. Are we following a standard implementation for the reference architecture?
  • It is recommended that the publishing organization prepare a roadmap to address them for all stakeholders.

Ghislain: Source Data publication check-list

  • Before publishing your legacy data, be aware of the following elements of this list:
  1. Make sure the format of the data is not proprietary but rather standard formats e.g: csv, shp, kml, xml, DBMS, etc.
  2. Provide access to the data: API, web page so that users can refer to consistently
    1. It is recommended to use the domain of your organization for trust
  3. Provide a small description of the data such as scope, content.
    1. For tables, the names of the columns should be clear and self-descriptive if possible
    2. In spreadsheet, avoid having many sheets in the same book.
  4. Provide the type of license to be used for accessing the data
  5. Check also the frequency of the publication and provide that information
  6. Provide additional documents explaining the key concepts/terms of your data to avoid misinterpretation
  7. Provide a contact address/email of the role responsible for the data for any further support.
  8. Be privacy aware when publishing data. Although any data that is published can be potentially misused later by an unknown party, the threat is balanced by benefits by privacy groups (including [W3C Privacy - http://www.w3.org/Privacy/]) to provide recommendations. Follow them. Start by not revealing personally identifiable information without masking. Examples of such data are - individual names, national identification number, phone number, credit card number and driver license number.

Linked Data Cookbook

The group will produce a collection of advice on smaller, more specific issues, where known solutions exist to problems collected for the Community Directory. This document is to be published as a Working Group Note, or website, rather than a Recommendation. It may, instead, become part of the Community Directory site. The Cookbook for Open Government Linked Data.

Pragmatic Provenance

Provide best practice recommendations for stakeholders on documenting the provenance of their linked government data and how to interpret that data so that consumers know what they are looking at. (suggested by Hadley Beeman)

See full status and working notes for Pragmatic Provenance work.


  • Best Practices Wiki Organization*

NB: This is the BP Wiki page from which subsection pages have been created to facilitate working notes and status. In due course, the wiki subsection pages will be folded into the W3C ReSpec document system by the draft editors.

  • Approach for publishing Best Practices, Case Studies and Use Cases*

Best Practices may leverage the SWEO Semantic Web Case Studies and Use Cases approach.