Uses of Open Data Within Government for Innovation and Efficiency

You are warmly invited to participate in the first of a series of workshops being organized during this year and next by the Share-PSI 2.0 Thematic Network. Partners from 25 countries are working on issues surrounding the implementation of the European Commission's revised PSI Directive and this will feed into the Data on the Web Best Practices Working Group.

We're beginning with "Uses of Open Data Within Government for Innovation and Efficiency" – i.e. we're looking for cases where opening data has made it easier for government departments (local or national) to do their job better. What worked? What didn't work? What lessons can you share with others? What would most help you benefit from other people's work?

The workshop is taking place as part of the 5th Samos Summit on ICT-Enabled Governance which means participants can look forward to spending time on a beautiful island in the Aegean sea, formerly the home of Pythagoras.

Entry is by position paper which should not be a full academic paper, rather, a short description of what you'd like to talk about.

Deadline for submissions: 13 April
Notification of acceptance: 1 May
Workshop: 30 June – 1 July

Full details at http://www.w3.org/2013/share-psi/workshop/samos/.

Join us!

Phil Archer, W3C Data Activity Lead
On behalf of the Share-PSI partners

Share-PSI 2.0 is co-funded by the European Commission under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme.

RDF 1.1 has been published as Recommendation

The RDF Working Group has published today a set of eight Resource Description Framework (RDF)Recommendations:

  • “RDF 1.1 Concepts and Abstract Syntax” defines an abstract syntax (a data model) which serves to link all RDF-based languages and specifications. The abstract syntax has two key data structures: RDF graphs are sets of subject-predicate-object triples, where the elements may be IRIs, blank nodes, or datatyped literals. They are used to express descriptions of resources. RDF datasets are used to organize collections of RDF graphs, and comprise a default graph and zero or more named graphs.
  • “RDF 1.1 Semantics” describes a precise semantics for the Resource Description Framework 1.1 and RDF Schema, and defines a number of distinct entailment regimes and corresponding patterns of entailment.
  • “RDF Schema 1.1″ provides a data-modelling vocabulary for RDF data. RDF Schema is an extension of the basic RDF vocabulary.
  • “RDF 1.1 Turtle: defines a textual syntax for RDF called Turtle that allows an RDF graph to be completely written in a compact and natural text form, with abbreviations for common usage patterns and datatypes. Turtle provides levels of compatibility with the N-Triples format as well as the triple pattern syntax of the SPARQL W3C Recommendation.
  • “RDF 1.1 TriG RDF Dataset Language” defines a textual syntax for RDF called TriG that allows an RDF dataset to be completely written in a compact and natural text form, with abbreviations for common usage patterns and datatypes. TriG is an extension of the Turtle format.
  • “RDF 1.1 N-Triples” is a line-based, plain text format for encoding an RDF graph.
  • “RDF 1.1 N-Quads” is a line-based, plain text format for encoding an RDF dataset.
  • “RDF 1.1 XML Syntax” defines an XML syntax for RDF called RDF/XML in terms of Namespaces in XML, the XML Information Set and XML Base.

Furthermore, the Working Group has also published four Working Group Notes:

  • “RDF 1.1 Primer” provides a tutorial level introduction to RDF 1.1.
  • The RDF 1.1 Concepts, Semantics, Schema, and XML Syntax documents supersede the RDF family of Recommendations as published in 2004. “What’s New in RDF 1.1″ provides a summary of the changes between the two versions of RDF.
  • “RDF 1.1: On Semantics of RDF Datasets” presents some issues to be addressed when defining a formal semantics for datasets, as they have been discussed in the RDF 1.1 Working Group
  • “RDF 1.1 Test Cases” lists the test suites and implementation reports for RDF 1.1 Semantics as well as the various serialization formats.

More Languages for More Vocabularies

Last month I encouraged the provision of multi-lingual labels for vocabularies hosted at W3C. Tokyo librarian Shuji Kamitsuna has been doing terrific work recently and has translated the specification documents for DCAT (English, Japanese) and ORG (English, Japanese), and is now well into completing his work on the Data Cube Vocabulary. After Shuji had completed his work on the specifications, I wanted to update the schemas to include the Japanese labels too, but doing this threw up some issues.

First up was DCAT. The vocabulary is formally specified in the Recommendation and for each term there is a table showing the definition and a usage note. Immediately before each table, the term itself is given as a section title and it’s these section titles that are the English language labels in the schema. See the entry for dcat:Catalog for example. When Shuji translated the spec, the labels were therefore translated too. Transferring these to the schema was trivial. But that was the easy part.

The definitions in the spec are copied into the schema as the rdfs:comment for each term – except they’re not 100% aligned. Take the definition of the property dcat:dataset. The spec says “A dataset that is part of the catalog” whereas the schema gives just a little more help when it says “Links a catalog to a dataset that is part of the catalog.” The Arabic, Spanish, Greek and French labels, definitions and usage notes in the DCAT schema were all translated from the schema, the Japanese from the spec.

This begs the question: assuming that there is no difference in semantics, just a difference in the clarity with which the semantics are expressed, how much does it matter that the definitions in the schema and the spec are not 100% aligned?

When Shuji sent us the translation of ORG, a different issue arose. Like DCAT, the specification for ORG has a small table for each term that gives its definition and usage note. Before each table there is a heading but here’s the difference: in the ORG specification, those headings are written as the vocabulary term such as subOrganizationOf. If ORG followed exactly the same style as DCAT, this would have been written ‘sub organization of’ which is the English language label for the term – i.e. as proper words, not terms written in camel case. Actually it’s even more confusing as the actual label in the schema for ORG says “subOrganization of” – a sort of half way house. Again, does this matter?

Finally Shuji’s work threw up an issue around the use of upper and lower case letters in vocabularies. The well established convention is that RDF class names begin with upper case letters, properties with lower case letters, both use camel case. Further, where an object property is used for an n-ary relationship between classes, the property is often named in exactly the same way as the class that is the range. For example, in ORG we have org:role that has range org:Role.

You see the problem for Japanese? It’s is one of many languages that does not have the concept of upper and lower case letters.

I raised this issue in the Web Schemas Task Force and was relived that there was consensus that for the purpose of translation, it was safe to advise Shuji that the label for the property org:role could legitimately be ‘has role.’

In this and other work I’ve done over the years it’s clear to me that if you really want to check that what you’ve written is consistent and unambiguous – see how it comes out of a translation process. On this occasion I think we’ve got some pointers for future work to tighten these things up.

Final Publications from GLD

In the short time since the beginning of the year, the Government Linked Data Working Group has successfully published its final documents. The Best Practices for Publishing Linked Data Note was published last week providing advice and insights into how linked data publishing differs from other formats; and this week has seen three vocabularies published as Recommendations. Each of these will enhance data interoperability, especially, but not exclusively, in government data. Each one specifies an RDF vocabulary (a set of properties and classes) for conveying a particular kind of information:

  • The Data Catalog (DCAT) Vocabulary is used to provide information about available data sources. When data sources are described using DCAT, it becomes much easier to create high-quality integrated and customized catalogs including entries from many different providers. Many national data portals are already using DCAT.
  • The Data Cube Vocabulary brings the cube model underlying SDMX (Statistical Data and Metadata eXchange, a popular ISO standard) to Linked Data. This vocabulary enables statistical and other regular data, such as measurements, to be published and then integrated and analyzed with RDF-based tools.
  • The Organization Ontology provides a powerful and flexible vocabulary for expressing the official relationships and roles within an organization. This allows for interoperation of personnel tools and will support emerging socially-aware software.

Many members of the GLD deserve specific thanks, in particular Dave Reynolds for his work on Data Cube and ORG, Fadi Maali for his work on DCAT, Richard Cyganiak for his work on all those, Boris Villazón-Terrazas and Ghislain Atemezing for their work on the LD-BP document, and Hadley Beeman who has ensured that the WG kept up the pace to the end, all under the expert guidance of Sandro Hawke. There were many other members of the WG who remained active right to the end and without whom the work could not have been completed and who also deserve sincere thanks. I’d like to end by expressing my particular thanks to Bernadette Hyland who has chaired the Government Linked Data working group since its initial charter, giving up huge amounts of time to the group. I believe Bernadette will be recording her own thoughts here imminently.

JSON-LD Has Been Published as a W3C Recommendation

The RDF Working Group has published two Recommendations today:

  • JSON-LD 1.0. JSON is a useful data serialization and messaging format. This specification defines JSON-LD, a JSON-based format to serialize Linked Data. The syntax is designed to easily integrate into deployed systems that already use JSON, and provides a smooth upgrade path from JSON to JSON-LD. It is primarily intended to be a way to use Linked Data in Web-based programming environments, to build interoperable Web services, and to store Linked Data in JSON-based storage engines.
  • JSON-LD 1.0 Processing Algorithms and API. This specification defines a set of algorithms for programmatic transformations of JSON-LD documents. Restructuring data according to the defined transformations often dramatically simplifies its usage. Furthermore, this document proposes an Application Programming Interface (API) for developers implementing the specified algorithms.

RDF 1.1 document suite on its way to Recommendation

The RDF Working Group has published the documents of
the RDF 1.1 document suite as Proposed (Edited) Recommendation.
Together, these documents provide significant updates
and extensions of the 2004 RDF specification. For example:

  • Multiple graphs are now part of the RDF data model.
  • Turtle is included in the standard and is as much as possible aligned with SPARQL.
  • TriG is an extension of Turtle and provides a syntax for multiple graphs. Any Turtle document is also a valid TriG document.
  • N-Triples and N-Quads are corresponding line-based exchange formats.
  • JSON-LD provides an exciting new connection between the RDF and JSON worlds.

In “What’s New in RDF 1.1″ you can find a detailed description of
the new and updated features. The Working Group has also published the
first version of a new RDF Primer and a note on semantics of multiples graphs. Comments very welcome!

Vocabularies at W3C

In my opening post on this blog I hinted that another would follow concerning vocabularies. Here it is.

When the Semantic Web first began, the expectation was that people would create their own vocabularies/schemas as required – it was all part of the open world (free love, do what you feel, dude) Zeitgeist. Over time, however, and with the benefit of a large measure of hindsight, it’s become clear that this is not what’s required.

The success of Linked Open Vocabularies as a central information point about vocabularies is symptomatic of a need, or at least a desire, for an authoritative reference point to aid the encoding and publication of data. This need/desire is expressed even more forcefully in the rapid success and adoption of schema.org. The large and growing set of terms in the schema.org namespace includes many established terms defined elsewhere, such as in vCard, FOAF, Good Relations and rNews. I’m delighted that Dan Brickley has indicated that schema.org will reference what one might call ‘source vocabularies’ in the near future, I hope with assertions like owl:equivalentClass, owl:equivalentProperty etc.

Designed and promoted as a means of helping search engines make sense of unstructured data (i.e. text), schema.org terms are being adopted in other contexts, for example in the ADMS. The Data Activity supports the schema.org effort as an important component and we’re delighted that the partners (Google, Microsoft, Yahoo! and Yandex) develop the vocabulary through the Web Schemas Task Force, part of the W3C Semantic Web Interest Group of which Dan Brickley is chair.

But there’s a lot more to vocabularies at W3C than supporting schema.org.

First of all, we want to promote the use of our Community Group infrastructure as a place to develop and maintain vocabularies. Anyone can propose a Community Group, anyone can join. Moreover, it’s really easy for us to allocate a namespace for your vocabulary, i.e. http://www.w3.org/ns/yourVocab. That gives the outside world a promise of persistence of your terms that you can add to, clarify and, if needs be, deprecate – but not delete

As an example, one Community Group that has recently become very active in its discussion of a vocabulary is the Locations and Addresses CG which is looking after http://www.w3.org/ns/locn, originally developed by the European Commission’s ISA Programme.

Another aspect of vocabulary development and maintenance I’m very keen to promote at W3C is the provision of multilingual labels and comments. We’ve got some good examples of this to shout about: the Data Catalog Vocabulary, DCAT, has labels in English, French, Spanish, Greek and Arabic. The Organization Ontology has long had labels in both English and French and just last week, I was able to add Italian, thanks to Antonio Maccioni and Giorgia Lodi at the Italian Digital Agency.

If you use a vocabulary hosted by W3C, whether you’re involved in its development or not, and you’re able to offer a translation of the labels, comments and usage notes, please let us know – we’ll add them.

We’re still developing our ideas on how we can best support the development and maintenance of vocabularies at W3C but the direction of travel is clear – we’re very much here to help.

Three Vocabularies Are Proposed Recommendations

The Government Linked Data Working Group has published three Proposed Recommendations.

  • The Data Catalog Vocabulary (DCAT), an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. This document defines the schema and provides examples for its use. By using DCAT to describe datasets in data catalogs, publishers increase discoverability and enable applications easily to consume metadata from multiple catalogs. It further enables decentralized publishing of catalogs and facilitates federated dataset search across sites. Aggregated DCAT metadata can serve as a manifest file to facilitate digital preservation.
  • The Organization Ontology, which describes a core ontology for organizational structures, aimed at supporting linked data publishing of organizational information across a number of domains. It is designed to allow domain-specific extensions to add classification of organizations and roles, as well as extensions to support neighboring information such as organizational activities.
  • The RDF Data Cube Vocabulary, which provides a means, by using the W3C RDF (Resource Description Framework) standard, to publish multi-dimensional data, such as statistics, on the web in such a way that it can be linked to related data sets and concepts.

Proposed Recommendation is the last but one stage in the standardization process at W3C and calls for review by the members. It’s a signal that work on these standards is complete and that evidence of independent implementation of the vocabularies has been gathered. These particular vocabularies are already in widespread use, particularly by public sector bodies, and form an important part of the open data landscape.

Three RDF First Public Drafts Published

Today the RDF Working Group published three First Public Working Drafts; they are all expected to become W3C Notes:

  • RDF 1.1 Primer, which explains how to use this language for representing information about resources in the World Wide Web.
  • RDF 1.1: On Semantics of RDF Datasets, which presents some issues to be addressed when defining a formal semantics for datasets, as they have been discussed in the RDF Working Group, and specify several semantics in terms of model theory, each corresponding to a certain design choice for RDF datasets.
  • What’s New in RDF 1.1

Welcome

Welcome to the Data Activity — the new home of the Semantic Web and eGovernment at W3C.

The Web is transforming the way governments interact with their citizens in two distinct ways: through the provision of online services, or access to physical services through online means, and through the release of open data. This latter aspect is at the heart of a huge and growing community right around the world, one that encompasses not just government data but cultural heritage data and scientific research, both for its data and open access to its publications. Open data isn’t an ephemeral fashion, it’s shift in the way things are done, made possible by the World Wide Web.

The Semantic Web, in particular Linked Data, is an important part of this shift. Its unparalleled ability to publish self-describing data at Web scale, data that carries meaning and intelligence within itself, has clear and distinct advantages. Reference data such as that published by the (UK mapping agency) Ordnance Survey, and the European Environment Agency is complemented by initiatives such as OpenCorporates and Product Open Data. Industries such as health care and life sciences and the financial industry are making extensive use of Linked Data, a lot of which is open.

Over more than a decade, the technologies that underpin the Semantic Web have become mature, in many cases going through a round of recent updates that are finished or close to finishing. There are many tools available already with greater capacity and sophistication being added all the time.

But not all data is open, and not all data is linked. Indeed, the data that can now be found on portals around the world is generally either in geospatial formats or the simplest data format of all: CSV. Comma Separated Variable files (or its near equivalent Tab Separated Variable) dominate. They’re easy to produce in a variety of software from desktop spreadsheets to relational databases and they’re easily converted into JSON – the data format of choice for most Web application developers.

The Data Activity recognizes and builds on these different strands:

  • the Semantic Web is a mature technology at the heart of a large and growing user base;
  • governments, industry, researchers and the cultural heritage sector, are all making increasing use of the power and flexibility of the Web to deliver services and data;
  • there is a lot of highly valuable data available in a variety of formats, including most notably, CSV.

These will form the focal point of the work at W3C in the short to medium term. We want to make data more interoperable and to make the power and flexibility of the Semantic Web technologies more readily accessible to other formats.

Kicking us off in the new Activity are two new working groups: CSV on the Web, which focuses on creating metadata for tabular data; and Data on the Web Best Practices, which has the ‘simple’ task of fostering a self-sustaining ecosystem for data publishers and consumers. Alongside this, we’ll also be putting more effort into promoting the use of our infrastructure to create and maintain vocabularies in w3.org/ns space in cooperation with the Web Schemas Task Force which is part of the Semantic Web Interest Group – that’s a whole blog post waiting right there. Meanwhile the RDF, Linked Data Platform  and Government Linked Data Working Groups are all very close to completing their work and the Health Care and Life Science Interest Group continues to extend the use of the technologies in this exciting field.

My new role as Activity Lead is to support this work of course but also to look for new areas where Web technologies can be applied to data-centric applications and where W3C standardization can help. If you see a gap in our technologies, an opportunity for doing more exciting and impactful work, do please let me know.