W3C

Interview: IBM on the Linked Data Platform

Arnaud Le Hors

Shortly after W3C announced the launch of the Linked Data Platform Working Group, I spoke with Arnaud Le Hors about IBM’s interest in linked data and their decision to co-chair the Working Group.

IJ: Why did IBM get involved in organizing the Linked Enterprise Data Patterns Workshop in December 2011 and now the Linked Data Platform Working Group?

ALH: IBM has been involved in Semantic Web activities from the beginning, but primarily from a research perspective. Until recently we had no products using the technology. Now we have IBM Rational, which develops a set of tools for application and product lifecycle management (requirements, bugs, etc.). Other parts of IBM are actively exploring it for complementary purposes. Customers typically use tools from more than one vendor and require integration; this is a problem IBM addresses.

Over the years Rational has seen and tried a variety of approaches, but they have had limitations. For instance, if your tools interact through proprietary, language-specific APIs, you end up with an exponential number of APIs and versioning issues. The API approach doesn’t scale. Another approach has been for all tools to converse with a central database. But this usually depends on a schema that is acceptable to all parties, which is more difficult to achieve and proves fragile as needs change.

Though there are supporters of these approaches; IBM was not satisfied. We realized that many characteristics we sought were present in the Internet and Linked Data. We have moved from a “tool-centric” approach to a “data-centric approach” based on Web standards. In this decentralized and scalable model, every piece of information that we have (e.g., a bug report) is addressable with a URI and can be accessed with HTTP.

IJ: Please describe a scenario.

ALH: Suppose someone files a bug report on a piece of software. The bug report is available at a URI. Another person then registers a change request, and an engineer is assigned to fix the code. The change request includes the URI of the bug report. The engineer changes the code and registers the URI of the change request in the source code management system. This chain of links gives fine-grained accountability. We store the information as linked data, which enables the communication among the tools.

So we are using linked data for application integration. I think this is fairly uncommon today. Most people are using linked data to create virtual data stores. A lot of linked data is read-only; our approach is inherently read-write.

IJ: Have you encountered any issues with updates when writing?

ALH: Yes. HTTP PUT is not enough. We need a PATCH method. There is a draft standard HTTP extension for PATCH, and I know that the SPARQL update community has a particular view of how PATCH should work. W3C needs to work on how PATCH would work in the context of RDF.

PATCH is a good example of a topic that motivated us to hold the Workshop in December. In adopting linked data in products, we have found that there may be multiple ways to achieve the same end, and no clear guidance or consensus on which to use where. So we are seeking guidance on how to use various technologies and we look for the W3C to organize discussions with industry on linked data good practices. Some examples of constructs where we would like to see industry converge include containers, lists, and pagination (requesting a piece of a representation). The Workshop was about these sorts of linked data “patterns” and we are looking forward to formally addressing these issues in the new Linked Data Platform Working Group.

IJ: Are there other projects using linked data within IBM? Or other parts of the Semantic Web stack?

ALH: Yes, for example IBM Watson. Watson uses a triple-store but also ontologies and inference. Watson downloads data from the Web (e.g., from dbpedia) that is curated and added to the triple store. Watson reasons over the data, using Semantic Web technology in a major way. We also have products from Tivoli (for help desk tickets) and Information Management (DB2) using linked data.

IJ: What do you see as barriers to RDF adoption?

ALH: A big one has been the XML syntax. I know the history of the XML syntax, but I think it obscures the otherwise simple RDF model. Turtle or some other simple syntax will help. Turtle is not yet a standard; we would like it to become one.

IJ: Is IBM creating tools to manipulate linked data?

ALH: IBM just released DB2 version 10 which provides support for RDF with a SPARQL engine on top of DB2. This was driven by demand from our products.

IJ: What else should W3C be doing in the linked data space?

ALH: Although the full Semantic Web stack is general and elegant, there may be too much going on, which raises obstacles for many engineers. Practical guidance (how to solve classes of problems, for instance) would help a lot. The linked open data movement has been very helpful for advocacy.

We think the industry will benefit from a clear definition of a linked data platform (or profile). Tim Berners-Lee listed four principles in his writing on linked data, but that view has no formal standing. With the Linked Data Platform Working Group we will enumerate the necessary standards, and provide guidance on conventions and good practices. We need big players to join the Working Group and agree on what it means to have a linked data platform. This will simplify deployment (reducing options), increase interoperability, and move the industry forward.

IJ: What do you consider core to a linked data platform?

ALH: IBM submitted a specification in March to try and answer that question. This is just a start though. We want to engage industry on the question.

IJ: Thank you for your time, Arnaud.