Data Catalog Vocabulary

From W3C eGovernment Wiki
Revision as of 15:48, 29 April 2010 by Mlvarez (Talk | contribs)

Jump to: navigation, search

Data Catalog Vocabulary project

This is a project within the W3C Interest Group on eGovernment.

The project is currently starting up, a first teleconference will take place on Thursday April 15th.

Context: Government data catalogs

Governments produce large amounts of valuable data as part of daily operations and decision-making. This data can be useful to many citizens and organizations, and it is ultimately them who paid for producing it. Governments increasingly recognize this, and start to make this data publicly available through one-stop portals called data catalogs, such as data.gov, data.gov.uk, statcentral.ie and many others. (@@@ link to a list somewhere?)

Goals

The goal of this group are:

  • To propose a unified format for publishing the contents of such data catalogs.
  • To set up a demonstrator based on the contents of one or more existing catalogs.
  • To provide support to initial implementors of the format.

The format should support the querying, federation, consumption, decentralised publishing, and archival of these valuable data assets.

The dcat proposal will be used as a starting point.

Scope

The exchange format will be developed primarily as an RDF Schema vocabulary, although the group might explore other data syntaxes as well.

The main focus of this group is on catalogs of government data, such as data.gov and similar catalogs. Nevertheless, it would be desirable to make the format applicable to data catalogs in other domains as well, if this is possible without compromising the main use case. (E.g., catalogs of scientific data on climate change)

The focus is on data catalogs, that is, on collections of metadata about datasets. Expressing the actual data inside the datasets as RDF is out of scope.

The focus is on existing data catalogs and how their contents can be expressed in a unified way.

Deliverables

The expected deliverables are:

  • A best-practice guide on the topic of expressing the contents of data catalogs in RDF using vocabularies such as Dublin Core and SKOS
  • An RDF Schema vocabulary that contains additional required terms not yet found in existing vocabularies such as Dublin Core and SKOS

This list is still subject to discussion, and further deliverables could emerge in the course of the group's work.

Participants

This data started with the initial participation survey, and some names have been added as people showed up in meetings. Please feel free to edit your own data, below, or add or remove yourself as appropriate. To get write access to the wiki, you currently must formally join the group. If you need to change this and can't get write access, e-mail sandro@w3.org.

  1. Andrew Houghton: This effort has broader implications to the library community in describing and making their catalogs available to the Semantic Web community. My interest to follow and contribute to this effort and bring a broader perspective to it.
  2. Brand Niemann: US EPA
  3. Cory Casanave: Model Driven Solutions
  4. Craig Norvell: I work at Franz Inc (AllegroGraph) and we are interested in participating in development of this Gov't data.
  5. Dan Brickley
  6. Dan Thomas: Washington DC Gov't
  7. David James: Sunlight Foundation
  8. Ed Summers: US Library of Congress. (I am a software developer working at the Library of Congress. I participated in the w3c working group on skos, where my contribution was mainly reviewing documents, and a demonstration which now lives at id.loc.gov.)
  9. Erik Wilde: UC Berkeley. (core web architecture background and interest; interested in using lightweight approaches for exposing data so that people can use and reuse them with the simplest possible set of technologies and tools, so that this can be done on the largest possible variety of platforms and by the largest possible set of people.)
  10. Fadi Maali: DERI.
  11. George Thomas: US Dept of Health and Human Services. (interested in using dcat/void on data.gov work with us gov.)
  12. Jon Phipps
  13. Kate Geyer: Working on the Massachusetts Open Data Initiative. We are moving to open standards for our datasets and a linked open data model.
  14. Li Ding: RPI. (we are working on linking government data and building appealing demos to drive the consumption of linked government data at Rensselaer Polytechnic Institute.)
  15. Libby Miller
  16. Luigi Montanez: Sunlight Foundation
  17. Martín Álvarez: CTIC Spain. We maintain a list of the public open data catalogs.
  18. Niklas Lindström: I'm using and contributing to open source libraries for RDF (e.g. RDFLib in Python), and am currently developing the legal information system in Sweden, employing RDF and linked data principles for collecting documents and data produced by about a hundred agencies (using RDF and Atom).
  19. Paul Hermans
  20. Peter Krantz: Developer of the opengov catalog platform: http://code.google.com/p/opengov-catalog/ currently used in opengov.se. This currently supports RDF metadata about datasets primarily based on the DC vocabulary e.g.: http://www.opengov.se/data/71/rdf/
  21. Rich Wolverton: Massachusetts State Gov't
  22. Richard Cyganiak: DERI. Co-founder of the LOD project. Authored the first version of the dcat vocabulary together with Fadi Maali. Also involved in the development of related vocabularies such as voiD and SDMX+RDF.
  23. Rufus Pollock:
  24. Sandro Hawke: I'm here for logistical and process support (as W3C staff contact) and to offer what technical/design/implementation help I can (as a coder with lots of RDF experience)
  25. Thomas Bandholtz: need a data catalog for Linked Environment Data
  26. Vassilios Peristeras: Egovernment Cluster leader in DERI. I work with Fadi and Richard on governmental linked data.
  27. William Waites

Meetings

There are weekly teleconferences. Details are announced on the eGov IG mailing lists.

Links and Resources

General reading

Existing initiatives towards standards for data catalogs

Vocabularies for describing datasets and catalogs

  • DERI's dcat is a vocabulary for describing government datasets and data catalogs.
  • CTIC's Dataset Catalog Vocabulary is a simple vocabulary that provides classes for catalogs and datasets.
  • DERI's voiD is a vocabulary for describing datasets available in RDF format (but not for other formats).
  • SCOVO and SDMX-RDF are vocabularies for describing statistical datasets (but not other kinds of datasets).

Data catalogs in RDF

This section lists data catalogs that are already available in RDF.

Data catalogs in non-RDF formats

This section lists some examples of data catalogs that are available in other, non-RDF formats.