Open Data Standards

Data on the Web Best Practices WG co-chair Steve Adler writes:

Yesterday, Open Data reached a new milestone with the publication of the W3C’s first public working draft of its Data on the Web Best Practices. Open Data is spreading across the globe and transforming the way data is collected, published, and used. But all of this is happening without well-documented standards, leading to data published with inconsistent metadata, lacking official documentation of approval processes, corroboration of sources, and with conflicting terms of use. Often Open Data is hard to compare to other sources, even across administrative departments located in the same building. Open Data from more than one source has to be aggregated, normalized, cleansed, checked for quality, verified for authenticity, and validated for terms of use, at huge expense before it can be analyzed.

Continue reading Steve Adler’s blog post on Open Data Standards

CSV on the Web: new drafts including JSON and RDF conversion

The CSV on the Web Working Group has published four drafts. Alongside updates to the existing Model for Tabular Data and Metadata and Metadata Vocabulary for Tabular Data documents, are two new documents. These describe mechanisms for generating JSON and RDF from tabular data. This work builds on the earlier specifications which describe higher level metadata for tabular data such as CSV. We also anticipate the creation of a W3C Community Group for exploring more advanced mappings that exploit text-oriented templating systems such as Mustache, or W3C’s R2RML. The Working Group welcomes all feedback on its drafts, and in particular solicits review of the new specifications for generating JSON and RDF from tabular data.

Spatial Data on the Web WG launched

It was 10 months ago today, 6th March 2014, that the Linking Geospatial Data workshop in London came to an end with Bart De Lathouwer of the OGC and I standing side by side announcing that our two organizations would work together to come up with some common standards. This was in response to the clear conclusions of the workshop that the two standards bodies needed to come together if the Web and GIS communities were to benefit from each other’s expertise, methods and data.

That was an easy thing for Bart and me to say but it proved rather more difficult to pull off. The two standards bodies are of the same age and have roughly the same number of members and more or less analogous missions – but we serve different communities and there are minor differences in the way we work. Thankfully the commitment to royalty free standards is mutual or we’d never have achieved anything. Both OGC and W3C are driven by our members and it has been tricky to ensure that membership privileges of both organizations have not been weakened by this collaboration. Still, we’re done: as of today, work begins on standards that will be published by both OGC and W3C. This quote from today’s joint press release sums up the ambition:

Spatial data is integral to many of our human endeavors and so there is a high value in making it easier to integrate that data into Web based datasets and services. For example, one can use a GIS system to find “the nearest restaurant” but today it is difficult to associate that restaurant with reviewer comments available on the Web in a scalable way. Likewise, concepts used widely on the Web such as “the United Kingdom” do not match the geographic concepts defined in a GIS system, meaning Web developers are missing out on valuable information available in GIS systems. Bridging GIS systems and the Web will create a network effect that enriches both worlds.

These are exactly the kind of issues being faced in the EU-funded SmartOpenData project that was behind the workshop originally.

I would personally like to record my thanks to OGC’s Denise McKenzie (@SpatialRed) for all her work in making this happen. It’s been a real privilege. Now we hand over to CSIRO’s Kerry Taylor and Google’s Ed Parsons as co-chairs of both the W3C and OGC instances of the Working Group to drive the work forward. Ingo Simonis (OGC) and I will act as Team Contacts but, as with all working groups, it’s the chairs, the editors, the participants and the community that do the real work.

Encouraging Commercial Use of Open Data

I went to Paris this week to give a talk at SemWeb.Pro, an event that, like SemTechBiz in Silicon Valley or SEMANTiCS in Germany/Austria, is firmly established in the annual calendar. These are events where businesses and other ‘real world’ users of Semantic Web and Linked Data technologies come together as distinct from events like ISWC/ESWC where the focus is more on academic research. Both types of event are essential for the health of data on the Web in my view.

Business use of open data, that is: freely available, openly licensed data, remains relatively low key. In Paris this week we heard about the BNF‘s use of Linked Data which, as Emmanuelle Bermes told me, is driven largely by the gains in internal efficiency rather than providing open data for others. Sure, you can have the data if you want but how else would BNF curate and maintain rich data on an author like Jules Verne so easily without Linked Data?

Can we encourage the commercial use of open data? Former European Commissioner Neelie Kroes famously said that data is the new oil. There are claims that public sector information/open data is worth many billions of Euros to industry and that developers are itching to get their hands on the data, unleashing a tidal wave of creativity.

Hmmm …

We’ll be testing that at an event in Lisbon next month. There will be almost no presentations but a great many conversations held in unconference style looking at issues like licensing, running hackathons, start up funding, making data multilingual and more. The workshop is the latest being run under the Share-PSI 2.0 network, co-funded by the European Commission. It is perhaps no surprise therefore that one of the few plenary presentations will be from the Deputy Head of Unit at DG CONNECT’s Data Value Chain, Beatrice Covassi.

Check out the agenda and, if you can, please join us in Lisbon in December. Naturally, it’s free, but do please register.

The Importance of Use Cases Documents

The Data on the Web Best Practices WG is among those who will be meeting at this year’s TPAC in Santa Clara. As well as a chance for working group members to meet and make good progress, it’s a great opportunity for attendees to drop in to other working group meetings. Most working groups use the occasion to gather new perspectives on their work that can be really helpful in ensuring that the emerging standards meet the needs of the widest community.

In that context, Use Cases and Requirements documents are crucial. This is where working groups collect evidence that informs its work. In a later discussion about whether a feature should or should not be included in a specification, the cry “where’s the use case for that?” is always the show stopper. Prove it’s needed and we’ll work on it. If it’s just a pet idea you have, we probably won’t.

The Data on the Web Best Practices Working Group has an incredibly broad charter. It says that the WG’s mission is:

  1. to develop the open data ecosystem, facilitating better communication between developers and publishers;
  2. to provide guidance to publishers that will improve consistency in the way data is managed, thus promoting the re-use of data;
  3. to foster trust in the data among developers, whatever technology they choose to use, increasing the potential for genuine innovation.

What the heck does that actually mean?

To find out we have gathered more than 20 use cases and derived requirements from them. The recently updated version of the UCR document was published this week so that when we gather at TPAC we can ask a simple question:

have we covered your use case?.

If we have forgotten something – and it’s more than possible that we have – please tell us. You can do this by commenting on the document or, better still, sending in your own use case to (subscribe, archives).

Data Shapes Working Group Launched

It’s taken a while but we’ve finally been able to launch the RDF Data Shapes Working Group. As the charter for the new WG says, the mission is to produce a language for defining structural constraints on RDF graphs. In the same way that SPARQL made it possible to query RDF data, the product of the RDF Data Shapes WG will enable the definition of graph topologies for interface specification, code development, and data verification. In simpler terms, it will provide for RDF what XML Schema does for XML. A way to define cardinalities, lists of allowed values for properties and so on.

Can’t you do that already?

Of course you can.

You can do it with OWL, SPIN, Resource Shapes, Shape Expressions and any number of other ways, but the workshop held a year ago suggested that this landscape was less than satisfactory. Each of the technologies in that incomplete list has its adherents and implementation experience to draw on, but what is the best way forward? Does the technology to address 80% of use cases need to be as sophisticated as the technology to address all 100%?

As the charter makes clear, there are many different areas where we see this work as being important. Data ingestion is the obvious one (if I’m going to ingest and make sense of your RDF data then it must conform to a topology I define), but we also see it as being important for the generation of user interfaces that can guide the creation of data, such as metadata about resources being uploaded to a portal. Tantalizingly, knowing the structure of the graph in detail has the potential to lead to significant improvements in the performance of SPARQL engines.

The new WG will begin by developing a detailed Use Cases and Requirements document. More than anything, it is that work that will inform the future direction of the working group. If you’re a W3C Member, please join the working group. If not, please subscribe to the RDF Data Shapes public mailing list.

CSV on the Web: Metadata Vocabulary for Tabular Data and other updates

The CSV on the Web Working Group has published a First Public Working Draft of a Metadata Vocabulary for Tabular Data. This is accompanied by an update to the Model for Tabular Data and Metadata on the Web document, alongside the group’s recently updated Use Cases and Requirements document.

Validation, conversion, display and search of tabular data on the web requires additional metadata that describes how the data should be interpreted. The “Metadata vocabulary” document defines a vocabulary for metadata that annotates tabular data, at the cell, table or collection level, while the “Model” document describes a basic data model for such tabular data.

A large percentage of the data published on the Web is tabular data, commonly published as comma separated values (CSV) files. The Working Group welcomes comments on these documents and on their motivating use cases. The next phase of this work will involve exploring mappings from CSV into other popular representations. See the Working Group home page for more details or to get involved.

Linking Geospatial Data on the Web

It was a year ago that Alex Coley and I first started discussing the idea of a workshop around geospatial data and how it can link with other data sources on the Web. Alex is the person at the UK’s Department of the Environment, Food and Rural Affairs (DEFRA) who is behind things like the Bathing Water Data explorer (developed by Epimorphics’ Stuart Williams) and the recent release of flood-related data. It didn’t take us long to bring in John Goodwin from Ordnance Survey, Ed Parsons from Google and the Open Geospatial Consortium‘s Bart De Lathouwer and Athina Trakas. That was the team that, on behalf of the Smart Open Data project, I worked with to organize the Linking Geospatial Data workshop that took place in London in early March.

For various reasons it took until now to write and publish the report from the event (mostly my fault, mea cupla) but a lot has been going on in the background, only some of which is evident from the report which just focuses on the workshop itself.

The workshop was really about two worlds: the geospatial information system world, effectively represented by OGC, and the Web world, represented by W3C. Both organizations operate in similar ways, have similar aims, and have more than 20 members in common. But we also both have 20 years of history and ‘ways of doing things.’ That has created a gap that we really want to fill in – not a huge one – but a gap nonetheless.

I hope the report gives a good flavor of the event – we were honored with contributors from places as distant as the Woods Hole Oceanographic Institute on the US West Coast, Natural Resources Canada, the National Institute of Advanced Industrial Science and Technology in Japan and the Australian government plus, of course, many European experts.

End result? I’m delighted to say that W3C and OGC are in advanced and very positive discussions towards an MoU that will allow us to operate a joint working group to tackle a number of issue that came up during the workshop. At the time of writing the charter for that joint WG is very much in its draft state but we’re keen to gather opinions, especially, of course, from:

  • OGC and W3C members who plan to join the working group;
  • developers expecting to implement the WG’s recommendations;
  • the closely related communities around the Geolocation Working Group and Web application developers who will want to access sources of richer data;
  • members of the wider community able to review and comment on the work as it evolves.

If you have comments on the charter, please send them to [subscribe] [archive].

Better yet, if you’re going to the INSPIRE conference in Aalborg next week, please join us for the session reviewing the workshop and the charter on Tuesday 17th at 14:00.

Those links again:

Data on the Web Best Practices UCR Published

The Data on the Web Best Practices WG is faced with a substantial challenge in assessing the scope of its work which could be vast. What problems should it prioritize and what level of advice is most appropriate for it to develop in order to fulfill the mission of fostering a vibrant and sustainable data ecosystem on the Web? A a significant amount of work has gone in to collecting use cases from which requirements can be derived for all the WG’s planned deliverables. The Use Case & Requirements document, a first draft of which is published today, is expected to evolve significantly in future but already it provides a strong indication of the direction the WG is taking. Further use cases and comments are very welcome.

Congratulations and thanks in particular to the editors, Deirdre Lee and Bernadette Farias Lóscio, both first time W3C document editors, on getting this document out of the door.

Uses of Open Data Within Government for Innovation and Efficiency

You are warmly invited to participate in the first of a series of workshops being organized during this year and next by the Share-PSI 2.0 Thematic Network. Partners from 25 countries are working on issues surrounding the implementation of the European Commission's revised PSI Directive and this will feed into the Data on the Web Best Practices Working Group.

We're beginning with "Uses of Open Data Within Government for Innovation and Efficiency" – i.e. we're looking for cases where opening data has made it easier for government departments (local or national) to do their job better. What worked? What didn't work? What lessons can you share with others? What would most help you benefit from other people's work?

The workshop is taking place as part of the 5th Samos Summit on ICT-Enabled Governance which means participants can look forward to spending time on a beautiful island in the Aegean sea, formerly the home of Pythagoras.

Entry is by position paper which should not be a full academic paper, rather, a short description of what you'd like to talk about.

Deadline for submissions: 13 April
Notification of acceptance: 1 May
Workshop: 30 June – 1 July

Full details at

Join us!

Phil Archer, W3C Data Activity Lead
On behalf of the Share-PSI partners

Share-PSI 2.0 is co-funded by the European Commission under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme.