RDF 1.1 document suite on its way to Recommendation

The RDF Working Group has published the documents of
the RDF 1.1 document suite as Proposed (Edited) Recommendation.
Together, these documents provide significant updates
and extensions of the 2004 RDF specification. For example:

  • Multiple graphs are now part of the RDF data model.
  • Turtle is included in the standard and is as much as possible aligned with SPARQL.
  • TriG is an extension of Turtle and provides a syntax for multiple graphs. Any Turtle document is also a valid TriG document.
  • N-Triples and N-Quads are corresponding line-based exchange formats.
  • JSON-LD provides an exciting new connection between the RDF and JSON worlds.

In “What’s New in RDF 1.1″ you can find a detailed description of
the new and updated features. The Working Group has also published the
first version of a new RDF Primer and a note on semantics of multiples graphs. Comments very welcome!

Vocabularies at W3C

In my opening post on this blog I hinted that another would follow concerning vocabularies. Here it is.

When the Semantic Web first began, the expectation was that people would create their own vocabularies/schemas as required – it was all part of the open world (free love, do what you feel, dude) Zeitgeist. Over time, however, and with the benefit of a large measure of hindsight, it’s become clear that this is not what’s required.

The success of Linked Open Vocabularies as a central information point about vocabularies is symptomatic of a need, or at least a desire, for an authoritative reference point to aid the encoding and publication of data. This need/desire is expressed even more forcefully in the rapid success and adoption of schema.org. The large and growing set of terms in the schema.org namespace includes many established terms defined elsewhere, such as in vCard, FOAF, Good Relations and rNews. I’m delighted that Dan Brickley has indicated that schema.org will reference what one might call ‘source vocabularies’ in the near future, I hope with assertions like owl:equivalentClass, owl:equivalentProperty etc.

Designed and promoted as a means of helping search engines make sense of unstructured data (i.e. text), schema.org terms are being adopted in other contexts, for example in the ADMS. The Data Activity supports the schema.org effort as an important component and we’re delighted that the partners (Google, Microsoft, Yahoo! and Yandex) develop the vocabulary through the Web Schemas Task Force, part of the W3C Semantic Web Interest Group of which Dan Brickley is chair.

But there’s a lot more to vocabularies at W3C than supporting schema.org.

First of all, we want to promote the use of our Community Group infrastructure as a place to develop and maintain vocabularies. Anyone can propose a Community Group, anyone can join. Moreover, it’s really easy for us to allocate a namespace for your vocabulary, i.e. http://www.w3.org/ns/yourVocab. That gives the outside world a promise of persistence of your terms that you can add to, clarify and, if needs be, deprecate – but not delete

As an example, one Community Group that has recently become very active in its discussion of a vocabulary is the Locations and Addresses CG which is looking after http://www.w3.org/ns/locn, originally developed by the European Commission’s ISA Programme.

Another aspect of vocabulary development and maintenance I’m very keen to promote at W3C is the provision of multilingual labels and comments. We’ve got some good examples of this to shout about: the Data Catalog Vocabulary, DCAT, has labels in English, French, Spanish, Greek and Arabic. The Organization Ontology has long had labels in both English and French and just last week, I was able to add Italian, thanks to Antonio Maccioni and Giorgia Lodi at the Italian Digital Agency.

If you use a vocabulary hosted by W3C, whether you’re involved in its development or not, and you’re able to offer a translation of the labels, comments and usage notes, please let us know – we’ll add them.

We’re still developing our ideas on how we can best support the development and maintenance of vocabularies at W3C but the direction of travel is clear – we’re very much here to help.

Three Vocabularies Are Proposed Recommendations

The Government Linked Data Working Group has published three Proposed Recommendations.

  • The Data Catalog Vocabulary (DCAT), an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. This document defines the schema and provides examples for its use. By using DCAT to describe datasets in data catalogs, publishers increase discoverability and enable applications easily to consume metadata from multiple catalogs. It further enables decentralized publishing of catalogs and facilitates federated dataset search across sites. Aggregated DCAT metadata can serve as a manifest file to facilitate digital preservation.
  • The Organization Ontology, which describes a core ontology for organizational structures, aimed at supporting linked data publishing of organizational information across a number of domains. It is designed to allow domain-specific extensions to add classification of organizations and roles, as well as extensions to support neighboring information such as organizational activities.
  • The RDF Data Cube Vocabulary, which provides a means, by using the W3C RDF (Resource Description Framework) standard, to publish multi-dimensional data, such as statistics, on the web in such a way that it can be linked to related data sets and concepts.

Proposed Recommendation is the last but one stage in the standardization process at W3C and calls for review by the members. It’s a signal that work on these standards is complete and that evidence of independent implementation of the vocabularies has been gathered. These particular vocabularies are already in widespread use, particularly by public sector bodies, and form an important part of the open data landscape.

Three RDF First Public Drafts Published

Today the RDF Working Group published three First Public Working Drafts; they are all expected to become W3C Notes:

  • RDF 1.1 Primer, which explains how to use this language for representing information about resources in the World Wide Web.
  • RDF 1.1: On Semantics of RDF Datasets, which presents some issues to be addressed when defining a formal semantics for datasets, as they have been discussed in the RDF Working Group, and specify several semantics in terms of model theory, each corresponding to a certain design choice for RDF datasets.
  • What’s New in RDF 1.1

Welcome

Welcome to the Data Activity — the new home of the Semantic Web and eGovernment at W3C.

The Web is transforming the way governments interact with their citizens in two distinct ways: through the provision of online services, or access to physical services through online means, and through the release of open data. This latter aspect is at the heart of a huge and growing community right around the world, one that encompasses not just government data but cultural heritage data and scientific research, both for its data and open access to its publications. Open data isn’t an ephemeral fashion, it’s shift in the way things are done, made possible by the World Wide Web.

The Semantic Web, in particular Linked Data, is an important part of this shift. Its unparalleled ability to publish self-describing data at Web scale, data that carries meaning and intelligence within itself, has clear and distinct advantages. Reference data such as that published by the (UK mapping agency) Ordnance Survey, and the European Environment Agency is complemented by initiatives such as OpenCorporates and Product Open Data. Industries such as health care and life sciences and the financial industry are making extensive use of Linked Data, a lot of which is open.

Over more than a decade, the technologies that underpin the Semantic Web have become mature, in many cases going through a round of recent updates that are finished or close to finishing. There are many tools available already with greater capacity and sophistication being added all the time.

But not all data is open, and not all data is linked. Indeed, the data that can now be found on portals around the world is generally either in geospatial formats or the simplest data format of all: CSV. Comma Separated Variable files (or its near equivalent Tab Separated Variable) dominate. They’re easy to produce in a variety of software from desktop spreadsheets to relational databases and they’re easily converted into JSON – the data format of choice for most Web application developers.

The Data Activity recognizes and builds on these different strands:

  • the Semantic Web is a mature technology at the heart of a large and growing user base;
  • governments, industry, researchers and the cultural heritage sector, are all making increasing use of the power and flexibility of the Web to deliver services and data;
  • there is a lot of highly valuable data available in a variety of formats, including most notably, CSV.

These will form the focal point of the work at W3C in the short to medium term. We want to make data more interoperable and to make the power and flexibility of the Semantic Web technologies more readily accessible to other formats.

Kicking us off in the new Activity are two new working groups: CSV on the Web, which focuses on creating metadata for tabular data; and Data on the Web Best Practices, which has the ‘simple’ task of fostering a self-sustaining ecosystem for data publishers and consumers. Alongside this, we’ll also be putting more effort into promoting the use of our infrastructure to create and maintain vocabularies in w3.org/ns space in cooperation with the Web Schemas Task Force which is part of the Semantic Web Interest Group – that’s a whole blog post waiting right there. Meanwhile the RDF, Linked Data Platform  and Government Linked Data Working Groups are all very close to completing their work and the Health Care and Life Science Interest Group continues to extend the use of the technologies in this exciting field.

My new role as Activity Lead is to support this work of course but also to look for new areas where Web technologies can be applied to data-centric applications and where W3C standardization can help. If you see a gap in our technologies, an opportunity for doing more exciting and impactful work, do please let me know.