Warning:
This wiki has been archived and is now read-only.

CGReport

From Property Graphs Model and API Community Group
Jump to: navigation, search

Proposed Report from the Property Graphs CG

The genesis of the Property Graphs CG was the W3C Social Business Workshop where Ashok Malhotra (Oracle) argued that the Property Graphs data model was gaining a great deal of traction due to the large amount of data being generated by Social Media giants such as Facebook, Twitter and LinkedIn that could be modeled as Property Graphs. In fact, an industry was developing on mining this data for a variety of business purposes.

The Property Graphs CG started in September 2013 with the goal of making a recommendation to the W3C about whether work should be started on the standardization of a Property Graphs data model and API.

The CG has 23 participants that include IBM and Oracle and representatives from vendor companies Franz, GraphLab, Orient Technologies, Sparsity, VELTI, Objectivity, OpenLink and a number of individual contributors. However, full disclosure, it does not have representatives from Neo and Tinkerpop/Aurelius who are major players in this field. Neither does it have representatives from the Social media companies such as Facebook, Twitter, LinkedIn and Google who generate the data and would be major consumers of the data.

After six months of discussion and deliberation the CG decided to recommend that the W3C should start a WG to standardize a data model for Property Graphs. The WG would also discuss and decide whether future work was warranted on a Property Graphs API either REST-based or declarative. The scope and the deliverables of the WG are described below. A more complete WG charter is available at: WG Charter

A Data Model for Property Graphs

The goal of the WG is to standardize a data model for Property Graphs. The main deliverable of the WG will be a recommendation that describes a data model for Property Graphs in detail.

Property Graphs is a node (vertex) and edge (link) model in which both nodes and edges can have properties. But we need to decide some details such as should nodes be typed; what datatypes can the properties have; whether to support multi-valued properties, etc. It has also been suggested that the data model include collection types such as sets and bags.

Property Graphs will often be used to model data for a particular domain. Thus, it is important for the data model be able to use attributes from that domain. This requires, at a minimum, the ability to use URIs that designate attribute names from that domain. Schema.org is an example.

It would be useful to be able to identify nodes, graphs and subgraphs by URIs to allow graphs and subgraphs to be shared. Use of URIs in the data model would also help integrate Property Graph data with other data on the Web.

Serialization Format

To test the specification and facilitate exchange of models the WG should agree on a simple exchange format, perhaps similar to N-Triples for RDF. Clearly, there will be several serialization formats for Property Graphs optimized for different usecases. This is outside the scope of the suggested WG and such formats will continue to be developed independently.

Other Considerations

RDF Compatibility

Some of the people in the CG said that they would be more likely to implement the Property Graphs model if it was based on RDF or was compatible with RDF. We are aware that W3C is pulling back from RDF but this opinion seemed important to bring forward.

It would be great if the RDF infrastructure could be extended and used with Property Graphs. This would widen the reach of Property Graphs and attract RDF adherents, but the amount of effort required is unclear and it would take some time to create a credible plan.

Property Graphs and Big Data

There have been rumblings that W3C is interested in starting work on Big Data and is contemplating a workshop on Data on the Web. As mentioned above, much of what is Big Data comes from social media and conforms to a Property Graph model. Thus, as we start thinking about processing Big Data we should note that we will be processing Property Graph data and it will be important to understand how to process this data over a distributed cloud infrastructure. An interesting question is whether some variants of the Property Graph data model will make it easier to process than others in this environment?