W3C Data Activity Building the Web of Data

Illustration of the vision for digitization as a meme

More and more Web applications provide a means of accessing data. From simple visualizations to sophisticated interactive tools, there is a growing reliance on the availability of data which can be “big” or “small”, of diverse origin, and in different formats; it is usually published without prior coordination with other publishers — let alone with precise modeling or common vocabularies. The Data Activity recognizes and works to overcome this diversity to facilitate potentially Web-scale data integration and processing. It does this by providing standard data exchange formats, models, tools, and guidance.

The overall vision of the Data Activity is that people and organizations should be able to share data as far as possible using their existing tools and working practices but in a way that enables others to derive and add value, and to utilize it in ways that suit them. Achieving that requires a focus not just on the interoperability of data but of communities.

W3C gratefully acknowledges support from the European Commission for participation in a number of projects, e.g. Create-IoT, Big Data Europe (Linked Data), Boost 4.0 (Big data in Industry 4.0) and SPECIAL (Linked Data for data privacy management).


Data and data services are increasingly strategically important for businesses. W3C is seeking to address the challenges for the digital enterprise with the Workshop on Graph Data on 4-6 March 2019 in Berlin. We're seeking to bring together experts in SQL/RDBMS, Property Graphs, RDF/Linked Data, and AI/ML to bridge the different communities to create a fresh view of the challenges ahead and the standards that will be needed to overcome them. See the blog post The Digital Enterprise - W3C Graph Data Workshop for more details.

W3C is pleased to announce the First Public Working Draft for the Data Catalog Vocabulary (DCAT) – revised edition. DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. This revised version of DCAT was developed by the Dataset Exchange Working Group in response to a new set of Use Cases and Requirements based upon extensive experience with the original DCAT specification and related work on DCAT application profiles.

Dave Raggett gave a plenary presentation on the Web of Things at the opening session of the FIWARE Summit, and later met with FIWARE Foundation staff to discuss potential opportunities for collaboration between W3C and FIWARE in respect to alignment between the W3C Web of Things object model and API with the FIWARE Orion context broker, which is based upon ETSI's NGSI-LD as a REST API using JSON-LD for querying, updating and notifications of changes to the context, including IoT devices. FIWARE is a leading open source IoT platform.

W3C held a Workshop on Privacy and Linked Data in Vienna on 17-18 April 2018. The presentations and meeting minutes will be available from the Workshop page.

As a starting point for making W3C a more effective, more welcoming and sustainable venue for communities seeking to develop Web data standards and exploit them to create value added services, we are pleased to announce a W3C study on Web data standardization that has been produced with support from the Open Data Institute and Innovate UK.

W3C took part in the January 2018 kick off meeting for the Boost 4.0 European project on big data in smart manufacturing (Industry 4.0). Our role focuses on standardisation, data governance and certification.

Questions? Contact Dave Raggett <dsr@w3.org>, W3C Data Activity Lead.

Context & Vision

The Data Activity merges and builds upon the eGovernment and Semantic Web Activities. The eGovernment Activity comprised an interest group that offered members a series of interesting talks from well placed speakers in governments around the world, including from countries that are often under-represented at W3C such as Jordan and Uganda. Primary topics have been the use of social media for citizen engagement and open data. The Semantic Web Activity was launched in 2001 to lead the use of the Web as an exchange medium for data as well as documents. That overall aim, along with a series of associated activities by W3C and others, has been highly successful — although not necessarily in the way originally envisioned. For example, the vision was that organizations and individuals would publish data in much the same way that they were already publishing Web pages. Enormous volumes of data are available on the Web today but it is typically published through portals that act on behalf of multiple agencies, not on the Web sites operated by those agencies themselves. Data publication is seen as a specialist activity, not as something anyone can do, and therefore it is more centralized than expected.

The Activity will make data publication less of a specialist activity and ensure that the excellent work done by portals does not lead to de facto data silos.

There is a benign current of centralization in vocabularies. The success of Linked Open Vocabularies as a central information point about vocabularies is symptomatic of a need, or at least a desire, for an authoritative reference point to aid the encoding and publication of data. This need/desire is expressed even more forcefully in the rapid success and adoption of schema.org. The large and growing set of terms in the schema.org namespace includes (and references) many established terms defined elsewhere, such as in vCard, FOAF, Good Relations and rNews. Designed and promoted as a means of helping search engines make sense of unstructured data (i.e. text), schema.org terms are being adopted in other contexts, for example in the ADMS vocabulary originally developed by the European Commission.

The Data Activity will continue to support this work as well as promoting W3C's existing open approach to the coordination, recognition and persistent hosting of vocabularies, which the user community sees as critical companions to Web standards such as XML, RDF and HTML.

The use of the Web as a platform for delivering data has been driven by policy as much as by technology. The G8 Open Data Charter being a prime example. Other examples include President Obama’s Executive Order and the European Union’s revised PSI Directive. These policies apply equally to the areas of government information, scientific research, and cultural heritage and that creates a further source of diversity of workflows, people and the technologies they use.

The W3C Data Activity will support technologists tasked with responding to this political pressure. It will do so in a way that works for those individuals and at the same time delivers maximum return on the political and financial investments made, minimizing the risk that data produced in one community remains only usable by other members of that same community.

Although the needs and views of application developers are, of course, of critical importance, the Data Activity is designed to support the needs of the public and private sector organizations working to publish and integrate data across the Web. W3C has traditionally worked on Semantic Web technologies and has promoted the publication of data through the (Enterprise) Data and (5 star) Linked Open Data approaches. The primary value of Linked Data, of RDF and related technologies, is that these technologies have the Web at their “core,” providing a unique means of integrating data at Web scale. Such integration may happen online but often happens within industry (offline or in the cloud). YarcData gave some examples of this in their interview with Ian Jacobs in which Shoaib Mufti explained how Semantic Web technologies can help to process Big Data and derive insights from it that might otherwise remain hidden.

However, not all applications need the power of Semantic Web technologies to achieve data integration; in many cases applications work with one or two specific datasets that can be accessed and managed individually. Datasets of significant size are published on the Web in different formats and the conversion or access to this data specifically as RDF is not always necessary. The Data Activity will contribute to the larger data ecosystem to ensure interoperability and ease of application development.