W3C Data Activity Building the Web of Data

More and more Web applications provide a means of accessing data. From simple visualizations to sophisticated interactive tools, there is a growing reliance on the availability of data which can be “big” or “small”, of diverse origin, and in different formats; it is usually published without prior coordination with other publishers — let alone with precise modeling or common vocabularies. The Data Activity recognizes and works to overcome this diversity to facilitate potentially Web-scale data integration and processing. It does this by providing standard data exchange formats, models, tools, and guidance.

The overall vision of the Data Activity is that people and organizations should be able to share data as far as possible using their existing tools and working practices but in a way that enables others to derive and add value, and to utilize it in ways that suit them. Achieving that requires a focus not just on the interoperability of data but of communities.

Questions? Contact Phil Archer <phila@w3.org>, W3C Data Activity Lead.

Context & Vision

The Data Activity merges and builds upon the eGovernment and Semantic Web Activities. The eGovernment Activity comprised an interest group that offered members a series of interesting talks from well placed speakers in governments around the world, including from countries that are often under-represented at W3C such as Jordan and Uganda. Primary topics have been the use of social media for citizen engagement and open data. The Semantic Web Activity was launched in 2001 to lead the use of the Web as an exchange medium for data as well as documents. That overall aim, along with a series of associated activities by W3C and others, has been highly successful — although not necessarily in the way originally envisioned. For example, the vision was that organizations and individuals would publish data in much the same way that they were already publishing Web pages. Enormous volumes of data are available on the Web today but it is typically published through portals that act on behalf of multiple agencies, not on the Web sites operated by those agencies themselves. Data publication is seen as a specialist activity, not as something anyone can do, and therefore it is more centralized than expected.

The Activity will make data publication less of a specialist activity and ensure that the excellent work done by portals does not lead to de facto data silos.

There is a benign current of centralization in vocabularies. The success of Linked Open Vocabularies as a central information point about vocabularies is symptomatic of a need, or at least a desire, for an authoritative reference point to aid the encoding and publication of data. This need/desire is expressed even more forcefully in the rapid success and adoption of schema.org. The large and growing set of terms in the schema.org namespace includes (and references) many established terms defined elsewhere, such as in vCard, FOAF, Good Relations and rNews. Designed and promoted as a means of helping search engines make sense of unstructured data (i.e. text), schema.org terms are being adopted in other contexts, for example in the ADMS vocabulary originally developed by the European Commission.

The Data Activity will continue to support this work as well as promoting W3C's existing open approach to the coordination, recognition and persistent hosting of vocabularies, which the user community sees as critical companions to Web standards such as XML, RDF and HTML.

The use of the Web as a platform for delivering data has been driven by policy as much as by technology. The G8 Open Data Charter being a prime example. Other examples include President Obama’s Executive Order and the European Union’s revised PSI Directive. These policies apply equally to the areas of government information, scientific research, and cultural heritage and that creates a further source of diversity of workflows, people and the technologies they use.

The W3C Data Activity will support technologists tasked with responding to this political pressure. It will do so in a way that works for those individuals and at the same time delivers maximum return on the political and financial investments made, minimizing the risk that data produced in one community remains only usable by other members of that same community.

Although the needs and views of application developers are, of course, of critical importance, the Data Activity is designed to support the needs of the public and private sector organizations working to publish and integrate data across the Web. W3C has traditionally worked on Semantic Web technologies and has promoted the publication of data through the (Enterprise) Data and (5 star) Linked Open Data approaches. The primary value of Linked Data, of RDF and related technologies, is that these technologies have the Web at their “core,” providing a unique means of integrating data at Web scale. Such integration may happen online but often happens within industry (offline or in the cloud). YarcData gave some examples of this in their interview with Ian Jacobs in which Shoaib Mufti explained how Semantic Web technologies can help to process Big Data and derive insights from it that might otherwise remain hidden.

However, not all applications need the power of Semantic Web technologies to achieve data integration; in many cases applications work with one or two specific datasets that can be accessed and managed individually. Datasets of significant size are published on the Web in different formats and the conversion or access to this data specifically as RDF is not always necessary. The Data Activity will contribute to the larger data ecosystem to ensure interoperability and ease of application development.

New W3C Documents

See Activity News for details and older news

Activity News

See Activity News for details and older news