W3C

W3C Workshop on Web Standardization for Graph Data

Creating Bridges: RDF, Property Graph and SQL

Monday 4th March to Wednesday 6th March 2019, Berlin, Germany (venue)

Preliminary Workshop Agenda

The call for participation for the W3C Graph Data workshop on 4-6 March 2019 is now closed. We’ve had 116 responses to the call for participation and nearly 50 position statements sent to the email archive for the program committee. We’ve had a lot of great input and are looking forward to a rewarding workshop. The following agenda is provisional and we’re seeking your feedback and suggestions for people to lead sessions. We want to focus on discussions relevant to standardisation and plan to keep the number of presentations to a minimum.

The workshop starts on Monday early afternoon on March 4th. On Tuesday, we will have three meeting rooms and parallel sessions. On Wednesday morning, we come back together to review what was learned the previous day and to identify questions and recommendations for inclusion in the workshop report. The workshop then ends at lunchtime on Wednesday March 6th.

Monday Tuesday Wednesday
      Interoperation Problems & Opportunities Standards Evolution    
  Chairs/PC Synch 09:00 Graph Data Interchange Specifying a Standard SQL and GQL 09:00 Introduction & Reports from Tuesday's sessions
09:30 09:30 09:30
10:00 10:00 10:00
    10:30 Break 10:30 Break
    11:00 Graph models and schema Composition, patterns and tractability Easier RDF and next steps 11:00 Extending, Incubating, Initiating
    11:30 11:30
    12:00 12:00
  OPEN 12:30 Lunch 12:30 CLOSE
13:00 Intro & Keynote 13:00    
13:30 13:30 Triumphs and tribulations Queries and computation Rules and reasoning    
14:00 Venues & Vectors 14:00    
14:45 Break 14:30    
15:00 Coexistence or Competition 15:00 Break    
15:30 15:30 Graph query interoperation Temporal, spatial and streaming Outreach and education    
16:00 Lightning Talks 16:00    
16:30 16:30    
17:00 17:00 Preview of next day    
17:45 Preview of next day 17:15 Posters    
18:00 END 18:15 END    
19:00 Social Dinner        
21:30        

Session Descriptions

Monday 09:30 - 10:30 Chairs/PC Synch

A meeting of the Workshop Chairs and Program Committee to sync up before the start of the workshop.

Monday 13:00 - 14:00 Opening and Keynote

Monday 14:00 - 14:45 Vectors and Venues

Moderator: Dave Raggett, panelists: Jan Michels, Keith Hare, Alan Bird

The SQL, property graph, and RDF/Linked Data and Semantic Web communities are coming together at this workshop. The workshop is about standards: how do de jure and de facto standards venues operate and co-operate? How can informal communities and open source projects contribute alongside official specifications like International Standards and Recommendations.

ISO and W3C experts explain the history, the policies and the possible approaches for ongoing and future work.

Monday 15:00 - 16:00 Coexistence or Competition

Graph data standards like SPARQL and OWL are well-established in the RDF world. Existing declarative languages like openCypher and PGQL are evolving towards de jure standards in the the Property Graph arena. SQL extensions for graph are in formation. Other industrial and research languages like GraphQL, GSQL, Gremlin and G-CORE are all in the mix.

Models and schema, data interchange, querying and computation all pose issues of rationalization and interoperation.

How many standards do we need? And how should they relate to each other?

Monday 16:00 - 17:45 Lightning Talks

We had over thirty position statements submitted for the workshop. There are many fascinating projects and angles of view represented among our 100 attendees.

Each lightning talk gives you 3 minutes to present your work or your point of view, with 2 minutes to answer questions. There will be 19 talks in all, with one short 10 minute break.

Be ready to put yourself down on the list, on the day. Lots will be drawn, slots will be allocated, the order will be random, and there will be no time extensions!

You are not obliged to use slides, but if possible, please use this PowerPoint template and email your slides to the Chairs by Friday 1st March 2019 so that we can collate presentations into a single deck to avoid wasting precious time when switching between talks. Whether you use slides or not, you are strongly encouraged to email us with a one page summary that we will make publicly available after the workshop.

Monday 17:45 - 18:00 Preview of Tuesday

A wrap up for Monday's session, and a preview for Tuesday, where we will be split across three rooms. You are welcome to move between rooms to attend different sessions. The format for each session will be up to the session leaders, but and this is an important but, the session leaders are responsible for preparing a written summary and three minute verbal report for Wednesday morning. They can of course delegate this to another person if appropriate. Good quality minutes are a plus! You should think about which sessions you plan to attend on Tuesday as there won't be a plenary gathering that day!

Monday 19:00 - 21:30 Social Dinner

Neo4J is kindly hosting a social dinner on Monday evening at the Hotel nhow Berlin (same venue as the workshop). This will no doubt be preceded by informal gatherings at the bar!

Stream 1: Tuesday 09:00 - 10:30 Graph Data Interchange

Turtle, N-Triples, N-Quads, JSON-LD for RDF datasets. GraphSON, GraphML, Gryo and now GraphBinary from Apache Tinkerpop. Ways of serializing graph data are an important arena for standardization, particularly if we are going to make it easier to exchange datasets from different graph models.

And it's not just for graph data import and import: the spread of compositional graph querying sharpens the need to think about client-server protocols like Bolt, and how to move graph-typed data, from vertices and edges to paths and graphs across network boundaries. Shape rules (e.g. SHACL and ShEx) are relevant here to, as is the whole issue of graph typing, constraint and validation: to which a separate interoperation session is dedicated.

When mapping between different graph data frameworks, we will need to address differences in identifiers, e.g. URLs, URNs and identifiers that are local to a given database. Some identifiers are intended to be public whilst others are internal and only accessible via path queries from public nodes, e.g. RDF's blank nodes. Different communities may have different requirements and perspectives. This means that we will need the means to map data between vocabularies with similar but different semantics. This suggests the need for discussion on context dependent data mapping solutions that take into account differences in identifiers and semantics.

Stream 2: Tuesday 09:00 - 10:30 Specifying a Standard

Moderator: Leonid Libkin

Natural language prose is not the only way for defining a standard. Reference implementations and conformance suites can play their role, as can mathematical formalisms like denotational semantics.

Official standards consortia are a proven way of writing industry standards for information technology. IETF RFCs have long shown how very very lightweight consensus groups can provide a different model. The massive success of open source projects like those working under the aegis of the Apache Software Foundation provides another model. And sometimes bespoke consortia arise, like the GraphQL Foundation, or semi-formal communities like openCypher.

Which artefacts? Who writes them? Who governs their evolution? Which ones are normative? How do we make the sum exceed the parts? Evergreen standards - what they are and when they are suitable.

Stream 3: Tuesday 09:00 - 10:30 SQL and GQL

Moderator: Keith Hare

Since 2017 work has been going on to extend SQL with read-only property graph extensions based on the pattern-matching paradigm of Cypher and PGQL. SIGMOD 2017 saw the publication of the future-looking G-CORE paper on fresh directions in PG querying, matched by implementation of compositional queries and graph views in Cypher for Apache Spark. Since spring 2018 the property graph world has been coalescing around the idea of a single GQL language, drawing on all of these precedents, open to other inputs, and closely coordinated with key aspects of SQL and its ecosystem.

Designers and contributors to SQL, Cypher, GSQL and PGQL will describe, discuss and doubtless differ on plans for the new international standard GQL for property graph querying.

Stream 1: Tuesday 11:00 - 12:30 Graph Models and Schemas

Moderator: Juan Sequeda

The RDF world has always paid great attention to expressing rules about the content, and inferrable meaning, of data. The property graph world has come from a different direction, with a strong emphasis on data integration, heterogeneous typing and speed of prototyping, relying on sample data to form the model. The property graph community has expressed huge interest in addressing this shortfall in reaction to the GQL initiative. This is reflected in current work in the SQL Property Graph Query workstream on graph typing and model expression. At the same time projects like SHACL and ShEx have pushed forward thinking in this space. New challenges arise when considering how applications might mix, federate, or map triple-based graphs and labelled property graphs. How do graph models relate? How can we describe, validate, constrain and process the metadata associated with or implied by a graph? How does this relate to the problem of data interchange and query interoperation. This session can hopefully educate, build bridges and stimulate much future work.

Stream 2: Tuesday 11:00 - 12:30 Composition, Patterns and Tractacility

The SIGMOD 2018 G-CORE paper described new directions in graph query composition, advanced path pattern-based queries and paths as first-class elements in a property graph. Implementation work using Apache Spark as a baseline, particularly in the highly active Cypher for Apache Spark project, have brought compositional queries and graph views to life.

Research work from the last few years, for example GXPath, has sought to increase understanding of the power and scope of path languages for graphs. The SQL work on property graph querying has drawn on prior work in Oracle's PGQL language with respect to path pattern "macros" or "views", and on SQL Row Pattern Recognition, also seeks to provide more powerful and concise path queries. SPARQL extensions have been suggested for path queries that go beyond reachability and allow path element testing.

Within the openCypher project, and in SQL/PGQ, the issues raised by G-CORE, PGQL and GXPath with respect to graph query tractability raise important questions for language designers: Should industry standard language forbid queries to be formulated that may not terminate, may return huge result sets, or may consume excessive resources and time, or should implementations and applications use heuristic approaches in these circumstances.

Stream 3: Tuesday 11:00 - 12:30 Easier RDF and Next Steps

Moderator: David Booth

What do we need to do to attract the middle 33% of developers? How can we bridge the gap with Property Graphs and what does this mean for RDF serialisation languages? For instance, allowing unnamed collections of triples as the subject or object of other triples. Can we position RDF as an interchange framework across different graph database solutions? What is the experience with context sensitive mapping rules and how to address different kinds of identifiers? Is it time to update the RDF core after two decades of experience and can we do this in a backwards compatible way? For instance, dropping restrictions on what is allowed for subjects and predicates of triples and providing alternative and richer ways to annotate data types. Is it time to update SPARQL? Note that there is a separate session on rules and reasoning.

Stream 1: Tuesday 13:30 - 14:00 Triumphs and Tribulations

30+ years for SQL, two decades of RDF, around three years of open governance for Tinkerpop and Cypher. Lots of experience in what worked well, and what didn't, both technically and socially, for the authors and the users. What lessons should be learnt for the future?

Stream 2: Tuesday 13:30 - 14:00 Queries and Computation

Many statistical analyses can exploit well-known network or graph algorithms to establish connectedness, clustering and comparative metrics. Such algorithms are frequently iterative, and may benefit from focussing on sub-graphs. These needs emphasize the importance of graph schema, transformations/projections, and the ability to marry computation with data operations. Increasing emphasis on graph networks in the machine learning space underlines the significance of this relatively unexplored synthesis between querying and computation.

Stream 3: Tuesday 13:30 - 14:00 Rules and Reasoning

Moderators: Harold Boley, Dörthe Arndt

Rule languages offer a high level alternative to low level graph APIs. What can we learn from existing rule languages, including RDF shapes? How can we make rules easier for the next 33% of developers? This includes ideas for graphical visualisation and editing tools. How can rules support different forms of reasoning, e.g. deductive, inductive, abductive, causal, counterfactual, temporal and spatial reasoning? How can rules be combined with graph algorithms on the instance and schema (vocabulary/ontological) levels? How can rules be used with reinforcement learning, simulated annealing and other machine learning techniques (beyond induction and abduction)? How can we collect use cases that offer clear business benefits that decision makers can easily understand?

Stream 1: Tuesday 15:30 - 17:00 Graph Query Interoperation

Query languages tend to be tightly bound to a data model, and to create their own stylistic universe and mental model. But they also share many characteristics. The rise of the labelled property graph model has increased interest in finding solutions for attributed and labelled graphs in the RDF context. At the same time, users of property graph databases are often interested in datasets owned and managed by RDF-centric systems. RDF* and SPARQL* are an example of approaches designed to bridge or map the models.

Work on mapping relational data to RDF, or to a property graph model, effectively allowing cross-model views to be defined is another approach that may have relevance for interoperation of different declarative languages like SQL, the planned GQL or SPARQL. At the same time, there are languages that operate imperatively or have semi-procedural characteristics. Gremlin is an example: its traversal API allows fast, bespoke explorations of a graph, and makes iterative operations easy. Projects like sparql-gremlin or Cypher for Gremlin open the road to interoperation across the imperative/declarative boundary.

At the other end of the spectrum, GraphQL aims to be "super declarative" and is designed to be less expressive than full-scale declarative languages, but to allow applications to rise above data model and query language differences. PostgreSQL-based GraphSQL servers, and the Cypher-based GRANDStack illustrate this plurality. GraphQL has also attracted research interest, which relates to thinking on the equivalence or non-equivalence of the relational, RDF and LPG models.

The Semantic Web emphasises deductive logic and sound reasoning. This allows for logic based queries that assume the deductions implicit in the given ontology. This can be contrasted with approaches based upon constraint propagation across graphs. In addition, there are many other forms of reasoning that are justified by rational belief based upon prior knowledge and past experience, and which are usable when data is incomplete, uncertain, inconsistent and includes errors. Inductive reasoning can be used to learn models from a sequence of examples. The next example may force a revision to what made sense based upon earlier examples. Likewise, abduction allows you to devise rational explanations based upon observations, and to use these explanations to make testable predictions. This is applicable to diagnosing faults and their implications. Causal reasoning can be used to propose plans for how to reach a given goal state. What are the implications for graph query languages for these additional forms of reasoning?

Stream 2: Tuesday 15:30 - 17:00 Temporal, Spatial and Streaming

A plethora of research and application-level projects relating to temporal, spatial and streaming applications of graphs, building on existing capabilities and standards, indicate that these aspects, and often their intersections, are likely to be prominent in the future of graph data management. The need to relate to existing standards, especially with respect to geo-spatial coordinate or graph based data, and the fact that streaming is not yet standardized for SQL, indicate some of the challenges.

Stream 3: Tuesday 15:30 - 17:00 Outreach and Education

Successful adoption of standards is often dependent on good quality tutorials, examples, reference materials, online demos and tooling. What is needed to support sustainable community driven outreach and education? What can we learn from existing community efforts, e.g. "MDN Web Docs" which describes itself as a resource for developers, maintained by the community of developers and technical writers.

Preview for Wednesday

A look forward to the final day and encouragement for everyone to think about potential recommendations for next steps, and which associated future work items you would expect to get involved with.

Posters and Demo's

Tuesday evening concludes with an opportunity to share ideas using posters or demo's. Please let the Chairs know in advance (by Friday 1st March) if you would like to present a poster or demo.

Wednesday 09:00 - 10:30 Introduction and Reports from Tuesday's sessions

Each session moderator will present a three minute summary with three minutes of questions.

Wednesday 11:00 - 12:30 Extending, Incubating, Initiating

This is the final session, and we will focus on summarising and drawing up recommendations for the workshop report. How much interest is there for participating in incubation and standardisation for different areas? What's the best way to ensure liaison and coordination across different standardisation organisations? What can be done to establish sustainable communities around education and outreach?