Re: [okfn-help] Fwd: Universal distributed open government data catalog?

On 3 February 2010 01:24, Jonathan Gray <jonathan.gray@okfn.org> wrote:
[...]
> ---------- Forwarded message ----------
> From: Peter Krantz <peter.krantz@gmail.com>
> Date: Tue, Feb 2, 2010 at 9:24 PM
> Subject: Re: Universal distributed open government data catalog?
> To: Ed Summers <ehs@pobox.com>
> Cc: eGov IG <public-egov-ig@w3.org>

Distributing the catalogue is an interesting and important issue. From
early on in developing CKAN (http://knowledgeforge.net/ckan/trac/,
http://ckan.net/)  we've been interested in this and now that we
already have several different live instances of CKAN running
(data.gov.uk, ckan.net itself, various national ones) we're especially
keen to have a good way to push/pull information between them (and to
and from other systems).

> On Tue, Feb 2, 2010 at 18:59, Ed Summers <ehs@pobox.com> wrote:
>>
>> My personal opinion is that a key ingredient to making this happen is
>> to publish dataset availability and metadata using a syndicated feed
>> (Atom and/or RSS).
>
> I second that suggestion. By using e.g. Atom you also get a way of
> receiving updates about changed datasets in a machine readable way.

Currently all changesets (revisions) on CKAN get published as an Atom feed:

http://www.ckan.net/revision/list?format=atom&days=1

Unfortunately the revision page that links to is currently HTML but it
would be very easy to add revision objects into the JSON api
(obviously packages, tags etc are already in there) which would then
provide an automated way to pull changesets from a CKAN system.

> We used Atom as the carrier for RDF data items in a project I was
> involved in  (swedish national legal information system). There are
> many benefits and it is easy to get started as there are many tools to
> create and consume Atom feeds.

Second that :)

By the way, we've just completed v2 of conversion of all the CKAN data
to RDF and that can be found here: http://semantic.ckan.net/. We still
need to do some work to integrate this properly in the main ckan.net
(so that e.g requests for RDF representations of ckan packages send
you back to semantic.ckan.net).

> If necessary you could extend Atom entries with information about the
> specific datasets, or you could just use Atom to carry the pointer to
> the rdf for a specific data set (described with whatever vocabularies
> necessary). Or use both approaches simultaneously.

There is also a necessity I think, at least in the long run, to deal
with changesets (not just the objects themselves) since this would
permit one to do distributed data versioning and editing in an
analogous (though not necessarily similar) way to distributed
versioning/editing for code (git, mercurial etc).

> With regards to open data my experience is that you have some basic
> information you want to capture:
>
> Title (e.g. "Vehicle licence data")
> Summary
> Publisher
> Categories
> License
> Timestamps
> Link to about API page
> Link to actual data set (if available)
> Link to RDF data about dataset (if available)
>
> Most of these are already available in Atom
> (http://www.ietf.org/rfc/rfc4287.txt).
>
> You could add licensing info as well if there is a simple way of capturing that.
>
> Anyone who would like to contribute ideas for this in practice is
> welcome to join the opengov catalog project here:
> http://code.google.com/p/opengov-catalog/

Likewise, info on CKAN (comprehensive knowledge archive network!) code
can be found here:

<http://knowledgeforge.net/ckan/trac/>

Peter: I know we're already in contact regarding CKAN and
opengov-catalog stuff. Using an Atom-based format like the one you
outline above might be a good wayt to start pushing and pulling info
between the two systems?

Regards,

Rufus Pollock
-- 
Open Knowledge Foundation
Promoting Open Knowledge in a Digital Age
http://www.okfn.org/ - http://blog.okfn.org/

Received on Thursday, 4 February 2010 11:20:37 UTC