W3C Workshop on Data and Services Integration October 20-21 2011, Bedford, MA, USA


Executive Summary

Web Services has been around since 2000, with the first publication of the SOAP W3C Member Submission. Over the years, it has evolved to address additional uses cases, led by implementers who needed them. Some design choices and abstractions, however. were not a perfect match for some users, and it led to the creation of a parallel stack based on different choices.

Now that the Web Services stack is mature (seven W3C Recommendations were recently published), and that more and more abstractions around JSON are starting to appear (JSON-LD, JSON schema, JSON links, JSON namespaces...), we are approaching a situation where we have at least two parallel stacks with enough differences that makes integration difficult. One goal of this workshop was to figure out the needs in the domain of integration that would benefit from standardization, or where discussion via Community or Business Groups could gather a critical mass. Another important goal was to identify and clarify topics that would serve as input to the Linked Enterprise Data Patterns Workshop.

The participants came to the conclusion that solutions to the data integration issues are no longer circumscribed to a single domain, or an grand unification system, but can be the result of better integration of tools helping going cross-stacks. The other main topic of discussion was about how to architect RESTful services in the enterprise, a topic that was further discussed during the Linked Enterprise Data Patterns Workshop (Data-driven Applications on the Web), that took place December 6th and 7th 2011 in Cambridge, MA.

Workshop Overview

The workshop attracted 20 papers, (see Agenda) many of which were from people with real-world experience in building systems. Services, they argued, were hard to discover and hard to compose. The data they worked on had different representations for the same concept, as different producers and consumers have different expectations on the quantity of details needed to perform their job. Moreover, the structure of the data and details of the services kept changing because of services improvements, the need to extend the data format for new uses or just because the format itself changed in moving to a different technology stack. This last issue was discussed at the end of the workshop under the topic of «Versioning» which is a generalization of the variability of data and services.

In the services world, RESTful design is getting traction, replacing old SOAP-based architecture. One paper said «SOAP crumbled under its own weight». But RESTful services, based on JSON or XML, do not have a description language (WADL is not widely accepted) and while some means of describing the message structures are emerging, there are no sanctified mechanisms for requesting message-level security, message integrity, etc…

Another development is the surge of JSON, often as a replacement of XML, despite JSON not being a hypertext format. The need for tools options comparable to the XML ones in the JSON stack was highlighted by presentations about a query language for JSON (JSONiq, an XQuery equivalent in the JSON world), conversions from XML to JSON, Schema languages and linking facilities. This led to a general discussion and alarm/snide comments about how JSON started as XML-lite and may become as complex as XML with related technologies: pointers, namespaces, and schemas; The needs and the tools driving such evolution the way it did for XML but starting with a data angle instead of a document angle.

Services discovery is still an issue, and extensive discussion on the use of semantic annotations to help discovery and matching. We already have SAWSDL for this, but better tooling would be helpful to get better deployment. One tool presented was a browser plugin that helped annotate biotech services (See Wright State University paper).

Linked Data topics were also present during the workshop, with papers and discussions about mapping Relational data to RDF, the use of rules using RIF syntax to build reactive systems instead of regular workflows as well as easing the adaptation to a variety of different sources, the use of existing relational data tools to query uniformly SPARQL endpoints, etc… After refining the different use cases and potential need, authors were encouraged to submit proposals to the Linked Enterprise Data Patterns Workshop.

Outcome of the Workshop

During the Workshop, two main directions emerged from the presentations and subsequent conversations:

First, handling integration issues within a single domain is now a thing of the past. We need to make information go across stacks, up and down the abstractions layers, between service implementations and different levels of service descriptions. The need is now to have tools to facilitate cross-border reuse of data and map data between formats.

We now have RDB2RDF, but being able to query SPARQL endpoints using SQL-based tools was in the list of things to help integration. The same applies for querying RDF graphs from the JSON world (JSON-LD is a contender), or the opposite (JSONiq is an example). These are examples where work has already started to find real solutions to the highlighted issues.

The second direction is the rise of RESTful services. REST, based on the success of the Web, is often difficult to grasp as it is an architectural style rather than a set of libraries or data formats to reuse.

Thus, we need to explain REST and promote the use of REST for building enterprise solutions. Some of what is needed is straightforward; some of it is more difficult. For example, how does REST deal with collections, how does it deal with large quantities of data? Documenting REST patterns and best practices in these and other situations would be very useful. More important, we need to do some work on how to specify integrity, confidentiality etc… with REST messages (the role that WS-Policy plays with Web Services) and on a language for declarative descriptions of services. The Linked Data Patterns Workshop mentioned above will explore these ideas further. The current plan is to create a group to work on Linked Data Patterns, specifically REST-based patterns on RDF and other formats.