The Digital Enterprise – W3C Graph Data Workshop

Illustration of the vision for digitization as a meme

Data and data services are increasingly strategically important for businesses. This is reflected in initiatives such as the EU’s Digitising European Industry Initiative, and claims by Mckinsey that by 2025, digitization is expected to contribute $2 trillion to US GDP. Meanwhile, China plans to boost its trillion dollar digital economy to drive job creation in sectors such as big data and artificial intelligence. On 4-6 March 2019, in Berlin, W3C will seek to bridge different communities to create a fresh view of the challenges ahead and the standards that will be needed to overcome them.

The drive to realise the benefits of digitization necessitates addressing the challenge for managing many heterogeneous data sources distributed across the enterprise. Whilst businesses have relied on relational databases for many years, SQL and RDBMS are cumbersome when it comes to rapidly evolving requirements. As a result we have seen the rise of NoSQL databases that address the need for flexible handling of unstructured data. The need to create links across data is fuelling rapid growth in graph database solutions. Unfortunately, there is lack of portability across these solutions.

The W3C workshop will bring together experts in relational databases, property graphs, RDF/Linked Data, big data, and artificial intelligence and machine learning with a view to forging a shared vision for future needs for graph data, and alignment on graph data query languages. We will discuss what’s needed for positioning RDF as an interchange format between different graph database solutions, making RDF easier to use by the vast majority of developers, and opportunities for blending symbolic and statistical approaches for tackling the challenges of real-world data that is incomplete, uncertain, inconsistent and includes errors.

If you are interested in being part of the discussion and helping to shape the future of data on the Web, you are urged to submit a position statement in response to the call for participation preferably before the seasonal break this month, and a hard limit of Friday, 11 January 2019.

W3C/ERCIM at Boost 4.0 kick off meeting

W3C/ERCIM is one of fifty organizations participating in the Boost 4.0 European project on big data in Industry 4.0 which kicked off with an initial face to face meeting at the Automotive Intelligence Center in Bilbao on 30-31 January 2018. Boost 4.0 will demonstrate the benefits of big data in Industry 4.0 through pilots by major European manufacturers. W3C’s role focuses on standardisation, data governance and certification, with a central role for rich metadata as the basis for semantic interoperability across diverse sources of data. This follows on from W3C’s involvement in the Big Data Europe project.

W3C study on Web data standardization

The Web has had a huge impact on how we exchange and access information. The Web of data is growing rapidly, and interoperability depends upon the availability of open standards, whether intended for interchange within small communities, or for use on a global scale. W3C is pleased to release a W3C study on practices and tooling for Web data standardization, and gratefully acknowledges support from the Open Data Institute and Innovate UK.

A lengthy questionnaire was used to solicit input from a wide range of stakeholders. The feedback will be used as a starting point for making W3C a more effective, more welcoming and sustainable venue for communities seeking to develop Web data standards and exploit them to create value added services.

W3C Workshop on Linked Data and Privacy

W3C is inviting position papers for a workshop on data controls and linked data vocabularies to be held in Vienna, Austria on 7-8 March 2018. This is motivated by the challenges for addressing privacy across an ecosystem of services involving personal data. This is especially relevant to the growth of services based upon the Internet of Things with a rapidly increasing number of sensors that can track our behavior wherever we are. Moreover, legislators are seeking to give consumers greater control over their personal data, e.g. the right to be forgotten. Workshop participants will discuss opportunities for using Linked Data as part of a technical framework for conforming to privacy legislation, e.g. the European Unions GDPR, as personal data flows across an ecosystem of services.

Dataset Exchange WG publishes use cases and requirements

The Dataset Exchange Working Group (DXWG) is pleased to announce the publication of the First Public Working Draft of the Dataset ExchangeUse Cases and Requirements.

The working group will produce a second version of the Data Catalog(DCAT) Vocabulary, guidance for the creation of application profiles, and content negotiation based on those profiles. The Use Cases and Requirements cover all three deliverables.

This document is the outcome of collaborative effort from the Working Group. We want to hear your comments on the document as it will guide the group in the three work areas. Please send any comments to the comments list by January 20, 2018.

All feedback is welcome and will receive a response from the group. We look forward to hearing from you!

End of Year Bonanza!

Three of our data-centric Working Groups have rounded off their year and published new documents today.

First of all, congratulations are due to the CSV on the Web Working Group whose work has reached Recommendation status. That means they have successfully defined and proved technologies for describing tabular data, and for converting that data into either JSON or RDF. Thanks are due in particular to the co-chairs, Jeni Tennison and Dan Brickley, and to the WG stalwarts Jeremy Tandy, Gregg Kellogg and my colleague Ivan Herman.

The Data on the Web Best Practices Working Group has been making significant progress in the latter half of 2015 leading today to the publication of a substantially updated version of its primary document and one of its two vocabularies, the Data Quality Vocabulary. The former codifies the approach data publishers should take to encourage the maximum reuse of their work while the latter provides a framework in which assertions can be made about a dataset’s quality and appropriateness for given tasks. The next iteration of DWBP’s other vocabulary, the Dataset Usage Vocabulary is expected very early in the new year.

Finally the Spatial Data on the Web WG has updated its (extensive) use case document. This will underpin three Recommendations and a best practice document – there is a lot of ground to cover in this WG that sees W3C collaborating directly with our sister Standards Development Organization, the Open Geospatial Consortium. That working group’s best practices document is close to being ready for formal publication (by both W3C and OGC) as a First public Working Draft.



As many people who work in the field will know, the 2007 INSPIRE Directive tasks European Union Member States with harmonizing their spatial and environmental data. The relevant department of the European Commission, the JRC, has lead the definition of a complex data model that is broken down into various themes. Naturally enough, the data is modeled in UML and the implementations are based largely on OGC standards that make use of XML/GML etc. However, a number of projects are experimenting with using the model in Linked Data environments. These include GeoKnow, MELODIES and SmartOpenData (SmOD) in which W3C’s European host, ERCIM, is a partner. This project has been instrumental in establishing the Spatial Data on the Web Working Group that is now racing towards the first formal publication of its use cases and requirements document (like most W3C WGs, the document is being developed in full public view on Github).

Like GeoKnow, when SmOD first started to consider using INSPIRE data in RDF we felt duty bound to try and represent the whole of the detailed data model. However, as the project enters its final phase, much of this has been rejected in favor of a simpler approach that is more in line with ‘Linked Data thinking’ and no longer attempts to recreate the full scope of INSPIRE in RDF. There are two principal motivations for this:

  1. Experience: when creating Linked Data for use in a range if pilot projects within SmOD, a slavish following of INSPIRE proved burdensome and unhelpful. The aim of taking a different approach (Linked Data) must be to gain some benefit from that approach not available from the original (XML/GML), recognizing that the original will offer features not available in the derived work.
  2. The publication of the Study on RDF and PIDs for INSPIRE by Diederik Tirry and Danny Vandenbroucke under ISA Action 1.17: A Reusable INSPIRE Reference Platform (ARE3NA). This report summarized work by three experts: Clemens Portele, Linda van den Brink and Stuart Williams. The summary proved extremely useful to the current project partners. All documents from that work are available and remain marked as ‘for review’ although a conversation with JRC staff suggests that no further work is foreseen on these documents.

One call to action from the ARE3NA work was that the INSPIRE Registry be extended to include SKOS concept schemes in addition to the formats already offered. This has been done and allows SmOD to use the registry’s persistent URIs as identifiers for many of the concepts that are important in the current work.

It is this combination of factors that is behind the final model being at once simpler and much more comprehensive than the initial one in its coverage of the INSPIRE themes. For example, the three classes originally associated specifically with representing Geographical Names have disappeared altogether to be replaced by rdfs:label!

The work is fully available via the most stable namespace available to the project, namely

Only the INSPIRE themes relevant to SmOD have been modeled in RDF as part of this work and, even within those themes, only the classes and properties needed in the project have been defined. Therefore, the ‘SmOD vocabularies’ should be seen only as a beginning.

Can they be added to?

Certainly. Indeed, that’s the express hope.


Ideally, the JRC itself will publish RDF vocabularies that mirror the INSPIRE model. In that eventuality, the ones on should almost certainly be deprecated. However, until that happens, the best vehicle W3C has for gathering people together with a common interest is the Community Group system. This is open to W3C Members and non-members alike and a Community Group could act as the forum for discussion of INSPIRE in RDF with the ability to add new terms, clarify existing ones, add new translations and, if needed, deprecate old terms.

I already have some expressions of interest in this but would like to gather more before proposing the CG be formed. If this interests you, please get in touch.

Before closing, I want to thank colleagues in the SmartOpenData project, notably Tatiana Tarasova and Jindřich Mynarz, for their help, advice and expertise.

CSV on the Web: Seeking comments and implementations

The CSV on the Web Working Group has just published a new set of Working Drafts, which the group considers feature complete and implementable. The drafts are:

The group are keen to get comments on these specifications, either as issues on the Group’s GitHub repository or by posting to

The CSV on the Web Working Group would also like to invite people to start implementing these specifications and to donate their test cases into the group’s test suite. Building this test suite, as well as responding to comments, will be the group’s focus over the next couple of months.

Linked Data Platform WG Open Meeting

A special open meeting of the W3C Linked Data Platform (LDP) Working Group to discuss potential future work for the group. The deliverable from the workshop will be a report that the LDP WG will take into consideration as it plans its way forward.

LDP offers an alternative vision to data lockdown, providing a clean separation between software and data, so access to the data is simple and always available. If you run a business, using LDP means your vital data isn’t locked out of your reach anymore. Instead, every LDP data server can be accessed using a standard RESTful API, and every LDP-based application can be integrated. If you develop software, LDP gives you a chance to focus on delivering value while respecting your customer’s overall needs. If you are an end user, LDP software promises to give you choice and freedom in the new online world.

So how will this vision become reality? LDP 1.0 has recently become a W3C Recommendation, but there’s still a lot of work to do. Come join the conversation about where we are and what happens next, on April 21st in San Francisco.

See the event wiki page for details.

A writable Web based on LDP

Last week has marked the culmination of almost three years of hard work coming out of the Linked Data Platform WG, resulting in the publication of the Linked Data Platform 1.0 as a W3C Recommendation. For those of you not yet familiar with LDP, this specification defines a set of rules for HTTP operations on Web resources, some based on RDF, to provide an architecture for read-write Linked Data on the Web. The most important feature of LDP is that it provides us with a standard way of RESTfully writing resources (documents) on the Web [examples], without having to rely on conventions (APIs) based around POST and PUT.

In practice, LDP should allow developers to take full advantage of the decentralized nature of the Web. Web apps now have a way to read and write data to any server that has implemented LDP 1.0. This technology has the potential to radically transform the way we are used to viewing Web application development, by decoupling the app (user interface) from the data it produces/consumes. We hope it will usher in a wave of innovation in terms of UI and app quality, enabling developers to easily “fork” apps and seamlessly add new features, since the data model is not directly impacted by the fork.

Being quite a radical change from the so-called “silo” apps we are used to, it also means that we are now faced with a lot of challenges, such as paging large resources, optimizing write operations by patching resources, and especially in terms of decentralized personal identities and access control. The LDP working group has plans to address these challenges in the coming year. Please consider joining the group if you are doing relevant work in those directions.