W3C

A Week in Washington

I’ve just spent a week in Washington DC going from one meeting to another all around the general topic of open government data. It’s been a very interesting trip.

The primary reason for going was to take part in a workshop organized by the Millennium Institute with which we are partners in the (EU funded) Crossover Project. I’ve blogged about that separately on the project Web site. After that event, I spent a couple of days meeting people concerned with open data in different agencies within the US Government.

A lot of the discussion with people in agencies like the Recovery Board, the FDIC and the EPA was around the issue of identifiers: identifiers for legal entities, contracts, places and facilities etc.

In this regard the development of Orgpedia is critical. Set up by Beth Noveck and Joel Gurin, this is a cross between Open Corporates and Wikipedia. What Open Corporates does is to take data from company registers – sometimes by agreement/data download, other times by screen scraping – and publish it as freely available open data. As of today it has details of over 51 million registered legal entities, each one with a stable, dereferencable URI that returns data in any of a variety of formats (including RDF using the Registered Organization vocabulary). Work is under way to create links between those registered companies and so build up a picture of how different companies, or parts of the same company, interrelate.

Orgpedia is a linked data project planning to do something similar and I’m deeply relieved to see that cooperation between the two is well established. There are differences in approach but they both are already/planning to crowd source information about companies although I think Orgpedia has a slightly broader focus. On the upside that’s two organizations working towards a common goal so that’s twice the power behind it. On the downside it’s two separate URI sets – I hope the cooperation extends to including direct (owl:sameAs) links between the two. It’s good to see W3C Government Linked Data WG co-chair Bernadette Hyland involved with Orgpedia too.

I also took the opportunity to catch up with Jeanne Holm, chief evangelist for data.gov and co-chair of W3C’s eGov Interest Group. Her focus at data.gov at the moment is on rebuilding the platform using a combination of Drupal and CKAN known as the Open Government Platform, OGPL. This isn’t just a US thing – it involves people like Somnath Chandra of the Indian government’s Information Technology and Services (TDIL) who also run the W3C India office, as well as others from around the world. Excitingly, data sets published through OGPL will all display their star rating according to the 5 Stars of Open Data scheme.

But what’s the bigger picture? After just 4 days I can’t possibly have a full perspective, all I can do is to reflect on what I saw and heard from the people I met and offer a hypothesis, no more. If I’m wrong – please put me right through comments below.

Orgpedia is good. The EPA is using and publishing linked data and of course thousands of data sets are available from data.gov and so on. But I feel there are two things missing.

The first of these is a strengthened legal and policy framework. Despite years of work by the web community and allies in government, the goals of the open government data movement appear to remain elusive. Furthermore, the more important the data set is to the workings of government, the less likely it is to be fully open. Examples of data that isn’t fully open include spending, financial and regulatory data. Sharon Dawes of Center for Technology at the University of Albany like to use a cartoon by Ted Goff to explain this. It shows two booths, one offering information at $1, and another offering information you need for $500 (the cartoon is copyright so I can’t include it directly).

A new industry-lead organization headed by Hudson Hollister, the Data Transparency Coalition is trying to offer the kind of leadership that appears to be lacking or hard to find by being actively involved in revising the DATA Act that didn’t quite make it through the previous session of congress. That activity is aimed squarely at changing the law to encourage the standardization and publication of open government data – something akin to Europe’s Public Sector Information Directive (which is currently under revision).

As a European citizen and member of W3C staff it is not for me to make any comment on potential US legislation beyond a general comment that this “this looks encouraging.”

That’s one aspect. But there’s another… there really doesn’t seem to be an active community within and between US Government agencies. The contrast with London is stark: there are weekly “tea camp meetings“. These came out of the annual UKGovCamp – an annual event that people working in government IT give up their free time to attend. There are meetups, ad hoc working groups, official cross-departmental working groups and, above all, enthusiastic individuals within government making the case for open data and open standards. Civil servants who want open data to succeed, supported by a bunch of enthusiastic, talented, professional geeks. Who are the American counterparts to Britain’s John Sheridan, Paul Davidson and Andrew Stott?; or the Netherlands’ Paul Suijkerbuijk, Sweden’s Peter Krantz, Australia’s Chris Beer, Belgium’s Raf Buyle, Greece’s Thodoris Papadopoulos and so on? Mike Pendleton and Dave Smith at the EPA, yes; Jeanne Holm clearly, yes… but these folk need support from across the Beltway.

As I came away from Washington last night, it seemed to me that there is a need to create a self-sustaining open data ecosystem that delivers on the promise of open data. Hudson Hollister’s work to encourage the successful passing of the revised Data Act is part of that, but the rest is up to we the people.