Interview: Oracle on Data on the Web – Part 2 with Xavier Lopez

This is part 2 of a 2-part interview with Oracle about data on the Web. In part 1, the focus is on the consumption of data by applications, such as those that enterprises provide to their employees. In part 2, the focus is on back-end data management.

For this part of the interview I spoke with Xavier Lopez, Director Spatial & Semantic Technologies.

IJ: Oracle is known for relational databases. But today I’d like to talk about Oracle support for the Semantic Web graph model. Tell me about support for the graph model on the back end.

Xavier: As the W3C made progress with the Semantic Web specifications (RDF and SPARQL) we had more customers in Life Science, Health Science, Public Safety, and Publishing looking to adopt triple or RDF graph stores. Oracle responded by release an RDF graph feature to the Oracle Spatial and Graph option. Our RDF features was focused on providing customers with a highly scalable, secure and high performance data management solution.

Xavier: Customers like these are looking for a relationship-centric or linked data navigation model to support a variety of data integration, analytics, and discovery solutions. The Semantic Web approach is designed just for this. However, since the volumes of data for these social and entity graph can be quite large and queries complex, it is essential to build these solutions on a high performance and highly scalability software infrastructure.

Xavier: Our objective is to make RDF graph a mainstream capability in the IT infrastructure. Most IT environments use databases, and we want to make sure they can benefit from the RDF data model. The advantage we offer is integration with the rest of our services. We also have optimized adapters for Jena and Sesame, tools used by about 90% of Semantic Web developers. The adapters are optimized to work with Oracle. We also have:

  • In-database OWL inferencing engine.
  • SPARQL query support through the Jena adapter
  • a feature unique to Oracle allowing people to query the graph through SQL by embedding SPARQL queries. So you can get at any data in the Oracle environment through SQL, which provides expressivity and lets you access more heterogeneous data.
  • Plug-in for Cytoscape visualization tool

Xavier: Customers also expect the use the standard manageability utilities and services that Oracle database traditionally offers (partitioning, parallelism compression, high availability, etc). Hence our approach is to ensure customer can leverage the Oracle Database to build a wide range of graph-based applications.

IJ: You mentioned some of the industries looking at these solutions. For these customers, what are examples of problems that the graph model can address?

Xavier: There are a large variety of uses for Semantic Web technology. However, in the enterprise space, three application areas stand out:

  1. Semantic Metadata Layer: A standard, graph oriented unified content metadata for federated resources (database, files, big data, online services). This layer can be enhanced with rules to validate semantic and structural consistency across disparate federated resources.
  2. Text Mining and Entity Analytics: Here, we are primarily dealing with unstructured text. After running content through entity extraction engines, and placing the resulting labeled text in the database as RDF, it is possible to use SPARQL query patterns to find related content & relations by navigating connected entities. It is also possible to apply reasoning rules on the RDF entities to discovery implicit data and relationships that were not previously evident.
  3. Social Media Management and Analysis: The data model underlying most social media sites is a graph that represents relationships and properties of the entities – people, products, events, locations. The underlying graph model structure is ideal for supporting ever-evolving schemas without impacting performance.

Xavier: For each of these broad categories, RDF and OWL are ideal as a canonical data model for integration, navigation, and pattern query across diverse sources of data.

IJ: Tell me more about, for example, the semantic metadata approach.

Xavier: This is an IT issue that never goes away: people want to use data from different sources, (whether relational, graph, or whatever). Applications using that data want it to look like a single database. To achieve that, you represent schemas of the different databases, and link them across common terms (with some mapping, typically).

Xavier: This technique is the highest value we see for this technology. First, it lets you add new data sources by extending your graph representation. When you extend your graph you begin to get network effects, letting you reuse definitions, for example. Second, this approach does not require underlying applications to align with any one schema. People don’t have to change their own models. This is important for managing mergers and acquisitions, or data syndication for example. Third, this approach lets you adapt easily as people’s schemas change over time.

IJ: Tell me more about performance, which you mentioned a moment ago.

Xavier: We had a graph database prior to RDF, and used it to represent all the roads in Western Europe. But the graph size was limited by memory, and for some applications in pharma or government, customers asked for persistent graph stores. So we developed a second graph model for massive graphs for hundreds of billions of triples. These RDF stores are bigger than most typical db stores. We provide all the traditional benefits of a traditional oracle database to graphs.

Xavier: However using “vanilla” Oracle database tables for RDF would be inefficient — traditional relational models are not ideal for graphs. That’s why graph databases emerged. We let people represent information as a graph but we optimize it in a traditional relational database, taking advantage of partitioning, parallelism, and query loading, for example. Thus, we offer for graphs the same performance enhancing and management features that people expect for relational databases.

Xavier: RDF graphs pose another performance challenge when you do inferencing, which they often do in drug discovery applications or intelligence. Inferencing is a powerful feature – you apply rules to triples and generate more triples, potentially a lot more (in some cases up to 2/3 more data). Things fall apart with in-memory solutions. We also do some pre-inferencing to speed up performance.

IJ: Which customers look for these capabilities?

Xavier: We see more and more adoption by large customers. They have heard that graph solutions work and whatever they are currently doing does not. There’s a lot of literature showing this is possible, even if the approach has not yet gone mainstream. People see similar companies solving similar issues with the graph approach.

Xavier: There is, however, a learning curve. Though this has been slow, I have seen a lot more acceptance and availability in the past year in terms of people and tools. Overall, there has been considerable industry development of RDF graph triple stores in the last decade. In many ways, this layer of the technology stack is fairly mature. Challenges still remain in software tooling to help mainstream IT and Web developers to build out solutions without having to learn the intricacies of RDF and SPARQL. A related challenge is the availability of skilled resources familiar with semantic web concepts, building out enterprise solutions using a combination of commercial and open source technologies.

IJ: What would encourage adoption by customers that are not as large?

Xavier: People want to interact with JSON objects, for example. They need tools and high-level APIs they can work with. We’re not there yet but we are working with partners on this sort of project since it is clear that that’s what the customers want.

IJ: What about connecting with data not in RDF?

Xavier: Oracle experts played an active role in RDB2RDF standard, which exposes relational data to the Semantic Web. We did so recognizing the importance of making existing relational data sources available to SPARQL based applications. In short, it helps achieve the “mainstreaming” of the semantic web.

IJ: And CSV?

Xavier: Tools for converting data to and from RDF are important and available. Fortunately, some of the more widely used semantic frameworks, such as Jena, can perform these file transformation operations. However, there is still need to embed such RDF transformation utilities into mainstream ETL and data integration tools. This will to along with treating RDF as a native type when converting to/from files, databases and Big Data sources.

IJ: Xavier, thank you for taking the time to chat!