End of Year Bonanza!

Three of our data-centric Working Groups have rounded off their year and published new documents today.

First of all, congratulations are due to the CSV on the Web Working Group whose work has reached Recommendation status. That means they have successfully defined and proved technologies for describing tabular data, and for converting that data into either JSON or RDF. Thanks are due in particular to the co-chairs, Jeni Tennison and Dan Brickley, and to the WG stalwarts Jeremy Tandy, Gregg Kellogg and my colleague Ivan Herman.

The Data on the Web Best Practices Working Group has been making significant progress in the latter half of 2015 leading today to the publication of a substantially updated version of its primary document and one of its two vocabularies, the Data Quality Vocabulary. The former codifies the approach data publishers should take to encourage the maximum reuse of their work while the latter provides a framework in which assertions can be made about a dataset’s quality and appropriateness for given tasks. The next iteration of DWBP’s other vocabulary, the Dataset Usage Vocabulary is expected very early in the new year.

Finally the Spatial Data on the Web WG has updated its (extensive) use case document. This will underpin three Recommendations and a best practice document – there is a lot of ground to cover in this WG that sees W3C collaborating directly with our sister Standards Development Organization, the Open Geospatial Consortium. That working group’s best practices document is close to being ready for formal publication (by both W3C and OGC) as a First public Working Draft.



As many people who work in the field will know, the 2007 INSPIRE Directive tasks European Union Member States with harmonizing their spatial and environmental data. The relevant department of the European Commission, the JRC, has lead the definition of a complex data model that is broken down into various themes. Naturally enough, the data is modeled in UML and the implementations are based largely on OGC standards that make use of XML/GML etc. However, a number of projects are experimenting with using the model in Linked Data environments. These include GeoKnow, MELODIES and SmartOpenData (SmOD) in which W3C’s European host, ERCIM, is a partner. This project has been instrumental in establishing the Spatial Data on the Web Working Group that is now racing towards the first formal publication of its use cases and requirements document (like most W3C WGs, the document is being developed in full public view on Github).

Like GeoKnow, when SmOD first started to consider using INSPIRE data in RDF we felt duty bound to try and represent the whole of the detailed data model. However, as the project enters its final phase, much of this has been rejected in favor of a simpler approach that is more in line with ‘Linked Data thinking’ and no longer attempts to recreate the full scope of INSPIRE in RDF. There are two principal motivations for this:

  1. Experience: when creating Linked Data for use in a range if pilot projects within SmOD, a slavish following of INSPIRE proved burdensome and unhelpful. The aim of taking a different approach (Linked Data) must be to gain some benefit from that approach not available from the original (XML/GML), recognizing that the original will offer features not available in the derived work.
  2. The publication of the Study on RDF and PIDs for INSPIRE by Diederik Tirry and Danny Vandenbroucke under ISA Action 1.17: A Reusable INSPIRE Reference Platform (ARE3NA). This report summarized work by three experts: Clemens Portele, Linda van den Brink and Stuart Williams. The summary proved extremely useful to the current project partners. All documents from that work are available and remain marked as ‘for review’ although a conversation with JRC staff suggests that no further work is foreseen on these documents.

One call to action from the ARE3NA work was that the INSPIRE Registry be extended to include SKOS concept schemes in addition to the formats already offered. This has been done and allows SmOD to use the registry’s persistent URIs as identifiers for many of the concepts that are important in the current work.

It is this combination of factors that is behind the final model being at once simpler and much more comprehensive than the initial one in its coverage of the INSPIRE themes. For example, the three classes originally associated specifically with representing Geographical Names have disappeared altogether to be replaced by rdfs:label!

The work is fully available via the most stable namespace available to the project, namely http://www.w3.org/2015/03/inspire/.

Only the INSPIRE themes relevant to SmOD have been modeled in RDF as part of this work and, even within those themes, only the classes and properties needed in the project have been defined. Therefore, the ‘SmOD vocabularies’ should be seen only as a beginning.

Can they be added to?

Certainly. Indeed, that’s the express hope.


Ideally, the JRC itself will publish RDF vocabularies that mirror the INSPIRE model. In that eventuality, the ones on w3.org should almost certainly be deprecated. However, until that happens, the best vehicle W3C has for gathering people together with a common interest is the Community Group system. This is open to W3C Members and non-members alike and a Community Group could act as the forum for discussion of INSPIRE in RDF with the ability to add new terms, clarify existing ones, add new translations and, if needed, deprecate old terms.

I already have some expressions of interest in this but would like to gather more before proposing the CG be formed. If this interests you, please get in touch.

Before closing, I want to thank colleagues in the SmartOpenData project, notably Tatiana Tarasova and Jindřich Mynarz, for their help, advice and expertise.

CSV on the Web: Seeking comments and implementations

The CSV on the Web Working Group has just published a new set of Working Drafts, which the group considers feature complete and implementable. The drafts are:

The group are keen to get comments on these specifications, either as issues on the Group’s GitHub repository or by posting to public-csv-wg-comments@w3.org.

The CSV on the Web Working Group would also like to invite people to start implementing these specifications and to donate their test cases into the group’s test suite. Building this test suite, as well as responding to comments, will be the group’s focus over the next couple of months.

Linked Data Platform WG Open Meeting

A special open meeting of the W3C Linked Data Platform (LDP) Working Group to discuss potential future work for the group. The deliverable from the workshop will be a report that the LDP WG will take into consideration as it plans its way forward.

LDP offers an alternative vision to data lockdown, providing a clean separation between software and data, so access to the data is simple and always available. If you run a business, using LDP means your vital data isn’t locked out of your reach anymore. Instead, every LDP data server can be accessed using a standard RESTful API, and every LDP-based application can be integrated. If you develop software, LDP gives you a chance to focus on delivering value while respecting your customer’s overall needs. If you are an end user, LDP software promises to give you choice and freedom in the new online world.

So how will this vision become reality? LDP 1.0 has recently become a W3C Recommendation, but there’s still a lot of work to do. Come join the conversation about where we are and what happens next, on April 21st in San Francisco.

See the event wiki page for details.

A writable Web based on LDP

Last week has marked the culmination of almost three years of hard work coming out of the Linked Data Platform WG, resulting in the publication of the Linked Data Platform 1.0 as a W3C Recommendation. For those of you not yet familiar with LDP, this specification defines a set of rules for HTTP operations on Web resources, some based on RDF, to provide an architecture for read-write Linked Data on the Web. The most important feature of LDP is that it provides us with a standard way of RESTfully writing resources (documents) on the Web [examples], without having to rely on conventions (APIs) based around POST and PUT.

In practice, LDP should allow developers to take full advantage of the decentralized nature of the Web. Web apps now have a way to read and write data to any server that has implemented LDP 1.0. This technology has the potential to radically transform the way we are used to viewing Web application development, by decoupling the app (user interface) from the data it produces/consumes. We hope it will usher in a wave of innovation in terms of UI and app quality, enabling developers to easily “fork” apps and seamlessly add new features, since the data model is not directly impacted by the fork.

Being quite a radical change from the so-called “silo” apps we are used to, it also means that we are now faced with a lot of challenges, such as paging large resources, optimizing write operations by patching resources, and especially in terms of decentralized personal identities and access control. The LDP working group has plans to address these challenges in the coming year. Please consider joining the group if you are doing relevant work in those directions.

Open Data Standards

Data on the Web Best Practices WG co-chair Steve Adler writes:

Yesterday, Open Data reached a new milestone with the publication of the W3C’s first public working draft of its Data on the Web Best Practices. Open Data is spreading across the globe and transforming the way data is collected, published, and used. But all of this is happening without well-documented standards, leading to data published with inconsistent metadata, lacking official documentation of approval processes, corroboration of sources, and with conflicting terms of use. Often Open Data is hard to compare to other sources, even across administrative departments located in the same building. Open Data from more than one source has to be aggregated, normalized, cleansed, checked for quality, verified for authenticity, and validated for terms of use, at huge expense before it can be analyzed.

Continue reading Steve Adler’s blog post on Open Data Standards

CSV on the Web: new drafts including JSON and RDF conversion

The CSV on the Web Working Group has published four drafts. Alongside updates to the existing Model for Tabular Data and Metadata and Metadata Vocabulary for Tabular Data documents, are two new documents. These describe mechanisms for generating JSON and RDF from tabular data. This work builds on the earlier specifications which describe higher level metadata for tabular data such as CSV. We also anticipate the creation of a W3C Community Group for exploring more advanced mappings that exploit text-oriented templating systems such as Mustache, or W3C’s R2RML. The Working Group welcomes all feedback on its drafts, and in particular solicits review of the new specifications for generating JSON and RDF from tabular data.

Spatial Data on the Web WG launched

It was 10 months ago today, 6th March 2014, that the Linking Geospatial Data workshop in London came to an end with Bart De Lathouwer of the OGC and I standing side by side announcing that our two organizations would work together to come up with some common standards. This was in response to the clear conclusions of the workshop that the two standards bodies needed to come together if the Web and GIS communities were to benefit from each other’s expertise, methods and data.

That was an easy thing for Bart and me to say but it proved rather more difficult to pull off. The two standards bodies are of the same age and have roughly the same number of members and more or less analogous missions – but we serve different communities and there are minor differences in the way we work. Thankfully the commitment to royalty free standards is mutual or we’d never have achieved anything. Both OGC and W3C are driven by our members and it has been tricky to ensure that membership privileges of both organizations have not been weakened by this collaboration. Still, we’re done: as of today, work begins on standards that will be published by both OGC and W3C. This quote from today’s joint press release sums up the ambition:

Spatial data is integral to many of our human endeavors and so there is a high value in making it easier to integrate that data into Web based datasets and services. For example, one can use a GIS system to find “the nearest restaurant” but today it is difficult to associate that restaurant with reviewer comments available on the Web in a scalable way. Likewise, concepts used widely on the Web such as “the United Kingdom” do not match the geographic concepts defined in a GIS system, meaning Web developers are missing out on valuable information available in GIS systems. Bridging GIS systems and the Web will create a network effect that enriches both worlds.

These are exactly the kind of issues being faced in the EU-funded SmartOpenData project that was behind the workshop originally.

I would personally like to record my thanks to OGC’s Denise McKenzie (@SpatialRed) for all her work in making this happen. It’s been a real privilege. Now we hand over to CSIRO’s Kerry Taylor and Google’s Ed Parsons as co-chairs of both the W3C and OGC instances of the Working Group to drive the work forward. Ingo Simonis (OGC) and I will act as Team Contacts but, as with all working groups, it’s the chairs, the editors, the participants and the community that do the real work.

Encouraging Commercial Use of Open Data

I went to Paris this week to give a talk at SemWeb.Pro, an event that, like SemTechBiz in Silicon Valley or SEMANTiCS in Germany/Austria, is firmly established in the annual calendar. These are events where businesses and other ‘real world’ users of Semantic Web and Linked Data technologies come together as distinct from events like ISWC/ESWC where the focus is more on academic research. Both types of event are essential for the health of data on the Web in my view.

Business use of open data, that is: freely available, openly licensed data, remains relatively low key. In Paris this week we heard about the BNF‘s use of Linked Data which, as Emmanuelle Bermes told me, is driven largely by the gains in internal efficiency rather than providing open data for others. Sure, you can have the data if you want but how else would BNF curate and maintain rich data on an author like Jules Verne so easily without Linked Data?

Can we encourage the commercial use of open data? Former European Commissioner Neelie Kroes famously said that data is the new oil. There are claims that public sector information/open data is worth many billions of Euros to industry and that developers are itching to get their hands on the data, unleashing a tidal wave of creativity.

Hmmm …

We’ll be testing that at an event in Lisbon next month. There will be almost no presentations but a great many conversations held in unconference style looking at issues like licensing, running hackathons, start up funding, making data multilingual and more. The workshop is the latest being run under the Share-PSI 2.0 network, co-funded by the European Commission. It is perhaps no surprise therefore that one of the few plenary presentations will be from the Deputy Head of Unit at DG CONNECT’s Data Value Chain, Beatrice Covassi.

Check out the agenda and, if you can, please join us in Lisbon in December. Naturally, it’s free, but do please register.

The Importance of Use Cases Documents

The Data on the Web Best Practices WG is among those who will be meeting at this year’s TPAC in Santa Clara. As well as a chance for working group members to meet and make good progress, it’s a great opportunity for attendees to drop in to other working group meetings. Most working groups use the occasion to gather new perspectives on their work that can be really helpful in ensuring that the emerging standards meet the needs of the widest community.

In that context, Use Cases and Requirements documents are crucial. This is where working groups collect evidence that informs its work. In a later discussion about whether a feature should or should not be included in a specification, the cry “where’s the use case for that?” is always the show stopper. Prove it’s needed and we’ll work on it. If it’s just a pet idea you have, we probably won’t.

The Data on the Web Best Practices Working Group has an incredibly broad charter. It says that the WG’s mission is:

  1. to develop the open data ecosystem, facilitating better communication between developers and publishers;
  2. to provide guidance to publishers that will improve consistency in the way data is managed, thus promoting the re-use of data;
  3. to foster trust in the data among developers, whatever technology they choose to use, increasing the potential for genuine innovation.

What the heck does that actually mean?

To find out we have gathered more than 20 use cases and derived requirements from them. The recently updated version of the UCR document was published this week so that when we gather at TPAC we can ask a simple question:

have we covered your use case?.

If we have forgotten something – and it’s more than possible that we have – please tell us. You can do this by commenting on the document or, better still, sending in your own use case to public-dwbp-comments@w3.org (subscribe, archives).