Towards a universal data representation and exchange format for graphs

 By George Anadiotis, Founder - Linked Data Orchestration / Contributor - ZDNet

Abstract

When viewing the graph database world today, the bifurcation between RDF and LPG is more than clear. While this has an array of reasons, and results in an array of differences, there are best practices to be exchanged between the two.

One of them, currently a distinctive feature of the RDF world, is standardization. RDF itself can be seen as, first and foremost, a data exchange format. Despite the fact that many flavors exist, from RDF-XML to JSON-LD, simply having a format that can be used to serialize and exchange data has helped the RDF ecosystem tremendously.

This proposal is about moving forward with a progressive strategy for the standardization of a universal data representation and exchange format for graphs. The starting point would be a JSON format for LPG, with the end goal of reaching a universal data representation and exchange format for graphs - both RDF and LPG.

 

Author's Background

I’ve been working with graph databases since 2005, when i implemented my first RDF graph database prototype. This includes award-winning R&D, startups, enterprises, and consulting the (then) top Graph Database vendor on distributed queries in 2008. I have also been active as an analyst, consultant and entrepreneur since 2012.

I have hands-on, deep knowledge and experience in graph databases, both RDF and LPG. I’ve been monitoring graph databases for a long time, publishing the Year of the Graph newsletter since 2018, and the homonymous research report also in 2018. I used my expertise to devise a methodology to evaluate and compare graph databases.

Since 2016, i have also been a ZDNet contributor for the Big on Data column. I have helped bring graph databases to the mainstream, and called 2018 The Year of the Graph, 8 months before Gartner included Knowledge Graphs in its Hype Cycle.

 

Topic Elaboration

In the RDF world, both query language and data format standards seem to have been adequately addressed. Unfortunately, the same does not hold for property graphs. Before, or in parallel to the query language standardization efforts, it is worth addressing the data format aspect.

An obvious point to improve would be some kind of interoperability and standardization for a property graph representation format. There are a few efforts in this area, and it would be interesting to overview and consolidate them.

Currently, CSV seems to be the most commonly used format to import / export data to / from graph LPG databases. This is not ideal, as CSV fails to capture the intricacies of graphs, and can be interpreted differently by different solutions.

Although a number of formats for expressing LPGs exist, none of them seems to have universal support. The most prominent ones are GraphML, JSON Graph Format (JGF), and JSON Graph. GraphML is XML-based, and was created by the Graph Drawing community. JGF and JSON Graph are JSON-based, and have been created by ADS and Netflix, respectively.

Given the success that JSON-LD has seen, the proposal would be to gear this effort towards JSON as well. JSON is used by more developers than XML, as it is considered easier to work with. Part of the reason is the fact that it can be used directly by modern front-end frameworks.

The starting point would be to study JGF and JSON Graph, find common ground, and try to reconcile any differences to come up with a commonly acceptable format. If this is something a number of LPG vendors can agree on supporting, and can be promoted and adopted as a W3C standard, it would greatly promote interoperability among graph solutions.

Furthermore, this could help expose the majority of developers who currently use JSON to basic graph concepts, and facilitate the use of graph data in applications. The fact that both JGF and JSON Graph have been developed by commercial entities to deal with real-world problems attests to this.

The above should be a worthy goal in and by itself. If that can be achieved, the next step could be to infuse / adopt elements of JSON-LD, anchoring JSON elements to RDF concepts. Although that is a stage 2 goal, and the exact way to get there needs to be investigated, the fact that JSON-LD is valid JSON should make this a viable option.

The end goal / result could be a universal data exchange format for graphs. This would enable the exchange of LPG graphs among systems seamlessly, while the infusion of JSON-LD could add semantics to them based on existing work in the RDF world.