Data Catalog Vocabulary
From W3C eGovernment Wiki
Data Catalog Vocabulary project
This is a project within the W3C Interest Group on eGovernment.
The project is currently starting up, a first teleconference will take place on Thursday April 15th.
Context: Government data catalogs
Governments produce large amounts of valuable data as part of daily operations and decision-making. This data can be useful to many citizens and organizations, and it is ultimately them who paid for producing it. Governments increasingly recognize this, and start to make this data publicly available through one-stop portals called data catalogs, such as data.gov, data.gov.uk, statcentral.ie and many others. (@@@ link to a list somewhere?)
The goal of this group are:
- To propose a unified format for publishing the contents of such data catalogs.
- To set up a demonstrator based on the contents of one or more existing catalogs.
- To provide support to initial implementors of the format.
The format should support the querying, federation, consumption, decentralised publishing, and archival of these valuable data assets.
The dcat proposal will be used as a starting point.
The exchange format will be developed primarily as an RDF Schema vocabulary, although the group might explore other data syntaxes as well.
The main focus of this group is on catalogs of government data, such as data.gov and similar catalogs. Nevertheless, it would be desirable to make the format applicable to data catalogs in other domains as well, if this is possible without compromising the main use case. (E.g., catalogs of scientific data on climate change)
The focus is on data catalogs, that is, on collections of metadata about datasets. Expressing the actual data inside the datasets as RDF is out of scope.
The focus is on existing data catalogs and how their contents can be expressed in a unified way.
The expected deliverables are:
- A best-practice guide on the topic of expressing the contents of data catalogs in RDF using vocabularies such as Dublin Core and SKOS
- An RDF Schema vocabulary that contains additional required terms not yet found in existing vocabularies such as Dublin Core and SKOS
This list is still subject to discussion, and further deliverables could emerge in the course of the group's work.
This data started with the initial participation survey, and some names have been added as people showed up in meetings. Please feel free to edit your own data, below, or add or remove yourself as appropriate. To get write access to the wiki, you currently must formally join the group. If you need to change this and can't get write access, e-mail firstname.lastname@example.org.
- Andrew Houghton: This effort has broader implications to the library community in describing and making their catalogs available to the Semantic Web community. My interest to follow and contribute to this effort and bring a broader perspective to it.
- Brand Niemann: US EPA
- Cory Casanave: Model Driven Solutions
- Craig Norvell: I work at Franz Inc (AllegroGraph) and we are interested in participating in development of this Gov't data.
- Dan Brickley
- Dan Thomas: Washington DC Gov't
- David James: Sunlight Foundation
- Ed Summers: US Library of Congress. (I am a software developer working at the Library of Congress. I participated in the w3c working group on skos, where my contribution was mainly reviewing documents, and a demonstration which now lives at id.loc.gov.)
- Erik Wilde: UC Berkeley. (core web architecture background and interest; interested in using lightweight approaches for exposing data so that people can use and reuse them with the simplest possible set of technologies and tools, so that this can be done on the largest possible variety of platforms and by the largest possible set of people.)
- Fadi Maali: DERI.
- George Thomas: US Dept of Health and Human Services. (interested in using dcat/void on data.gov work with us gov.)
- Jon Phipps
- Kate Geyer: Working on the Massachusetts Open Data Initiative. We are moving to open standards for our datasets and a linked open data model.
- Li Ding: RPI. (we are working on linking government data and building appealing demos to drive the consumption of linked government data at Rensselaer Polytechnic Institute.)
- Libby Miller
- Luigi Montanez: Sunlight Foundation
- Martín Álvarez: CTIC Spain. We maintain a list of the public open data catalogs.
- Niklas Lindström: I'm using and contributing to open source libraries for RDF (e.g. RDFLib in Python), and am currently developing the legal information system in Sweden, employing RDF and linked data principles for collecting documents and data produced by about a hundred agencies (using RDF and Atom).
- Paul Hermans
- Peter Krantz: Developer of the opengov catalog platform: http://code.google.com/p/opengov-catalog/ currently used in opengov.se. This currently supports RDF metadata about datasets primarily based on the DC vocabulary e.g.: http://www.opengov.se/data/71/rdf/
- Rich Wolverton: Massachusetts State Gov't
- Richard Cyganiak: DERI. Co-founder of the LOD project. Authored the first version of the dcat vocabulary together with Fadi Maali. Also involved in the development of related vocabularies such as voiD and SDMX+RDF.
- Rufus Pollock:
- Sandro Hawke: I'm here for logistical and process support (as W3C staff contact) and to offer what technical/design/implementation help I can (as a coder with lots of RDF experience)
- Thomas Bandholtz: need a data catalog for Linked Environment Data
- Vassilios Peristeras: Egovernment Cluster leader in DERI. I work with Fadi and Richard on governmental linked data.
- William Waites
There are weekly teleconferences. Details are announced on the eGov IG mailing lists.
Links and Resources
- Publishing Open Government Data (W3C working draft)
Existing initiatives towards standards for data catalogs
- Sunlight Labs: Government Data Catalog Guidelines — collection of links, list of suggested metadata terms
- Sunlight Labs: Drafting Guidelines for Government Data Catalogs — proposes an API for accessing catalogs (based on ROA with JSON/XML/CSV), list of metadata terms, Atom-based updates
- DERI's dcat is an RDF Schema vocabulary for describing government datasets and data catalogs.
- RPI's data.gov vocabulary proposal — focuses on data.gov, has general recommendations and design principles for vocabularies
Vocabularies for describing datasets and catalogs
- DERI's dcat is a vocabulary for describing government datasets and data catalogs.
- CTIC's Dataset Catalog Vocabulary is a simple vocabulary that provides classes for catalogs and datasets.
- DERI's voiD is a vocabulary for describing datasets available in RDF format (but not for other formats).
- SCOVO and SDMX-RDF are vocabularies for describing statistical datasets (but not other kinds of datasets).
Data catalogs in RDF
This section lists data catalogs that are already available in RDF.
- Peter Krantz's opengov.se and opengov-catalog — uses Dublin Core, FOAF and Atom
- (not really released yet) CKAN — uses dcat, contains data.gov.uk and many other datasets, has SPARQL endpoint
- CTIC's Datos de Asturias — uses voiD, everything (including the actual datasets) is RDF
- CTIC's Public Sector Dataset Catalogs — “meta-catalog”, a list of catalogs; uses the CTIC vocabulary, Dublin Core
- DERI's DERI dcat Demonstrator — contains data.gov, data.australia.gov.au, datasf.org, data.london.gov.uk; uses dcat; has SPARQL endpoint
- RDFa in data.gov.uk — simple Dublin Core descriptions of all datasets
- RDFa in data.australia.gov.au - uses Dublin Core mainly. Creative Common RDF vocabulary is used for license information. Available also as RSS 1.0 feeds
- RPI has the data.gov catalog in RDF
- @@@ Old CKAN mapping (semantic.ckan.net)
- @@@ others?
Data catalogs in non-RDF formats
This section lists some examples of data catalogs that are available in other, non-RDF formats.