W3C

W3C Workshop on Web Standardization for Graph Data

Creating Bridges: RDF, Property Graph and SQL

Monday 4th March to Wednesday 6th March 2019, Berlin, Germany (venue)

Workshop Agenda

Remote access via Zoom meeting with thanks to Uber - this is limited to the main meeting room (Music Hall 4). Presenters in that room should connect to share their screen so that remote attendees can see the slides. Please don't connect to audio. If you do, you must mute your audio to avoid howling feedback! Please use the microphones for questions and comments, and don't forget to wait until you've been passed the microphone before you start talking.

The call for participation for the W3C Graph Data workshop on 4-6 March 2019 is now closed. We’ve had 116 responses to the call for participation and nearly 50 position statements sent to the email archive for the program committee. We’ve had a lot of great input and are looking forward to a rewarding workshop. We want to focus on discussions relevant to standardisation and plan to keep the number of presentations to a minimum.

The workshop starts on Monday early afternoon on March 4th. On Tuesday, we will have three meeting rooms and parallel sessions. On Wednesday morning, we come back together to review what was learned the previous day and to identify questions and recommendations for inclusion in the workshop report. The workshop then ends at lunchtime on Wednesday March 6th.

We plan to use a Google document for collaborative minute taking. Here is the document for Monday and Tuesday. Links will be added for the minutes for each of the Tuesday sessions, so remember to revisit and refresh this page.

For WiFi connect to the access point NH-hotel-group, then login with user "NH" and password "wifi".

The workshop will take place on the first floor using the rooms Music Hall 4, Soul and Jazz 1 + 2. See the following floor plan.

first floor room plan

The room assignments are given later with the session descriptions.

Monday Tuesday Wednesday
      Interoperation Problems & Opportunities Standards Evolution    
  Chairs/PC Synch 09:00 Graph Data Interchange Easier RDF and next steps SQL and GQL 09:00 Introduction & Reports from Tuesday's sessions
09:30 09:30 09:30
10:00 10:00 10:00
    10:30 Break 10:30 Break
    11:00 Graph query interoperation Composition, patterns and tractability Triumphs and tribulations 11:00 Extending, Incubating, Initiating
    11:30 11:30
    12:00 12:00
  OPEN 12:30 Lunch 12:30 CLOSE
13:00 Intro & Keynote 13:00    
13:30 13:30 Specifying a Standard Queries and computation Rules and reasoning    
14:00 Venues & Vectors 14:00    
14:45 Break 14:30    
15:00 Coexistence or Competition 15:00 Break    
15:30 15:30 Graph models and schema Temporal, spatial and streaming Outreach and education    
16:00 Lightning Talks 16:00    
16:30 16:30    
17:00 17:00 Preview of next day    
17:45 Preview of next day 17:15 Posters    
18:00 END 18:15 END    
19:00 Social Dinner        
21:30        

Session Descriptions

Monday 09:30 - 10:30 Chairs/PC Synch

A meeting of the Workshop Chairs and Program Committee to sync up before the start of the workshop.

Monday 13:00 - 14:00 Opening and Keynote (Music Hall 4)

We will use a google document for minute taking, and everyone is welcome to monitor and help with the editing of the minutes.

Monday 14:00 - 14:45 Vectors and Venues (Music Hall 4)

Moderator: Dave Raggett. Panelists: Jan Michels, Keith Hare, Alan Bird

Relational information systems and their associated Query Language(s) are today one of the main defacto Standards in the semantic web community, being SQL and RDF the most popular ones as examples. The SQL, property graph, and RDF/Linked Data and Semantic Web communities are coming together at this workshop. The workshop is about standards: how do de jure and de facto standards venues operate and co-operate? How can informal communities and open source projects contribute alongside official specifications like International Standards and Recommendations.

ISO and W3C experts explain the history, the policies and the possible approaches for ongoing and future work.

Monday 15:00 - 16:00 Coexistence or Competition (Music Hall 4)

Moderator: Dave Raggett. Panelists: Olaf Hartig, Alastair Green (slides), Peter Eisentraut

Graph data and their corresponding Query Language standards like SPARQL and OWL are well-established in the RDF world. Existing declarative languages like openCypher and PGQL are evolving towards de jure standards in the Property Graph arena. Graphs are becoming the de facto standard in structured data domains and SQL extensions for graph data are under consideration. Other industrial and research languages like GraphQL, GSQL, Gremlin and G-CORE are all in the mix, following this global trend for data interoperation.

Data modelling and schema representation, data interchange, querying and computation all pose issues of optimisation, rationalisation and interoperation.

How many standards do we need? And how should they relate to each other?

Monday 16:00 - 17:45 Lightning Talks (Music Hall 4)

Moderators: Dave Raggett, Martin Serrano

This session present the global activities around the modeling activity, semantic aspects of the meta data, data annotation techniques and evolution of semantic web technologies to convey multiple approaches and exciting progress in the way from simple data modeling to data graph collection and representation. This Session represent the interest of all the semantic web community, and we had over thirty position statements submitted for the workshop. There are many fascinating projects and points of view represented among our 100 attendees.

Each lightning talk gives you 3 minutes to present your work or your point of view, with 2 minutes to answer questions. There will be 19 talks in total, with one short 10 minute break.

Be ready to put yourself down on the list, on the day. Lots will be drawn, slots will be allocated, the order will be random, and there will be no time extensions!

You are not obliged to use slides, but if possible, please use this PowerPoint template and email your slides to the Chairs by Friday 1st March 2019 so that we can collate presentations into a single deck to avoid wasting precious time when switching between talks. Whether you use slides or not, you are strongly encouraged to email us with a one page summary that we will make publicly available after the workshop.

Monday 17:45 - 18:00 Preview of Tuesday (Music Hall 4)

A wrap up for Monday's session, and a preview for Tuesday, where we will be split across three rooms. You are welcome to move between rooms to attend different sessions. The format for each session will be up to the session leaders, but and this is an important but, the session leaders are responsible for preparing a written summary and three minute verbal report for Wednesday morning. They can of course delegate this to another person if appropriate. Good quality minutes are a plus! You should think about which sessions you plan to attend on Tuesday as there won't be a plenary gathering that day!

Monday 19:00 - 21:30 Social Dinner (Hotel Restaurant)

Neo4J is kindly hosting a social dinner on Monday evening at the hotel restaurant in the Hotel nhow Berlin (same venue as the workshop). This will no doubt be preceded by informal gatherings at the bar!

Tuesday 09:00 - 10:30 Graph Data Interchange (Music Hall 4)

Moderators: George Anadiotis, Dominik Tomaszuk

There are many formats for exchanging graph data, e.g. Turtle, N-Triples, N-Quads, JSON-LD for RDF datasets. GraphSON, GraphML, Gryo and now GraphBinary from Apache Tinkerpop. Ways of serializing graph data are an important arena for standardization, particularly if we are going to make it easier to exchange datasets from different graph models.

And it's not just for graph data import and import: the spread of compositional graph querying sharpens the need to think about client-server protocols like Bolt, and how to move graph-typed data, from vertices and edges to paths and graphs across network boundaries. Shape rules (e.g. SHACL and ShEx) are relevant here to, as is the whole issue of graph typing, constraint and validation: to which a separate interoperation session is dedicated.

When mapping between different graph data frameworks, we will need to address differences in identifiers, e.g. URLs, URNs and identifiers that are local to a given database. Some identifiers are intended to be public whilst others are internal and only accessible via path queries from public nodes, e.g. RDF's blank nodes. Different communities may have different requirements and perspectives. This means that we will need the means to map data between vocabularies with similar but different semantics. This suggests the need for discussion on context dependent data mapping solutions that take into account differences in identifiers and semantics.

Tuesday 09:00 - 10:30 Easier RDF and Next Steps (Soul)

Moderator: David Booth

The value of RDF for graph data has been well proven in many applications over the 20+ years since it was first created. However, RDF is perceived as difficult to use limiting its adoption. We're seeking to make it easier for a much wider audience, considering what is needed in respect to standards, tools and guidance, etc.

What do we need to do to attract the middle 33% of developers? How can we bridge the gap with Property Graphs and what does this mean for RDF serialisation languages? For instance, allowing unnamed collections of triples as the subject or object of other triples. Can we position RDF as an interchange framework across different graph database solutions? What is the experience with context sensitive mapping rules and how to address different kinds of identifiers? Is it time to update the RDF core after two decades of experience and can we do this in a backwards compatible way? For instance, dropping restrictions on what is allowed for subjects and predicates of triples and providing alternative and richer ways to annotate data types. Is it time to update SPARQL? Note that there is a separate session on rules and reasoning.

Tuesday 09:00 - 10:30 SQL and GQL (Jazz 1+2)

Moderator: Keith Hare (slides)

Since 2017 work has been proceeding on extending SQL with read-only property graph extensions based on the pattern-matching paradigm of Cypher and PGQL. SIGMOD 2018 saw the publication of the future-looking G-CORE paper on fresh directions in PG querying, matched by implementation of compositional queries and graph views in Cypher for Apache Spark. Since Spring 2018 the property graph world has been coalescing around the idea of a single GQL language, drawing on all of these precedents, open to other inputs, and closely coordinated with key aspects of SQL and its ecosystem.

In this session, designers and contributors to SQL, Cypher, GSQL and PGQL will describe, discuss and doubtless differ on plans for the new international standard GQL for property graph querying.

Tuesday 11:00 - 12:30 Graph Query Interoperation (Music Hall 4)

Moderators: Olaf Hartig, Hannes Voigt

This session focuses on how to address the challenges involved in interchange across different information management systems. Query languages tend to be tightly bound to a data model, and to create their own stylistic universe and mental model. But they also share many characteristics. The rise of the labelled property graph model has increased interest in finding solutions for attributed and labelled graphs in the RDF context. At the same time, users of property graph databases are often interested in datasets owned and managed by RDF-centric systems. RDF* and SPARQL* are an example of approaches designed to bridge or map the models.

Work on mapping relational data to RDF, or to a property graph model, effectively allowing cross-model views to be defined is another approach that may have relevance for interoperation of different declarative languages like SQL, the planned GQL or SPARQL. At the same time, there are languages that operate imperatively or have semi-procedural characteristics. Gremlin is an example: its traversal API allows fast, bespoke explorations of a graph, and makes iterative operations easy. Projects like sparql-gremlin or Cypher for Gremlin open the road to interoperation across the imperative/declarative boundary.

At the other end of the spectrum, GraphQL aims to be "super declarative" and is designed to be less expressive than full-scale declarative languages, but to allow applications to rise above data model and query language differences. PostgreSQL-based GraphSQL servers, and the Cypher-based GRANDStack illustrate this plurality. GraphQL has also attracted research interest, which relates to thinking on the equivalence or non-equivalence of the relational, RDF and LPG models.

Tuesday 11:00 - 12:30 Composition, Patterns and Tractability (Soul)

Moderator: Peter Boncz

Research work from the last few years, for example GXPath, has sought to increase understanding of the power and scope of path languages for graphs. The SQL work on property graph querying has drawn on prior work in Oracle's PGQL language with respect to path pattern "macros" or "views", and on SQL Row Pattern Recognition, also seeks to provide more powerful and concise path queries. SPARQL extensions have been suggested for path queries that go beyond reachability and allow path element testing.

This session looks at recent advances on graph queries and their featured composability and tractability, for example, the SIGMOD 2018 G-CORE paper described new directions in graph query composition, advanced path pattern-based queries and paths as first-class elements in a property graph. Implementation work using Apache Spark as a baseline, particularly in the highly active Cypher for Apache Spark project, have brought compositional queries and graph views to life.

Research work from the last few years, for example GXPath, has sought to increase understanding of the power and scope of path languages for graphs. The SQL work on property graph querying has drawn on prior work in Oracle's PGQL language with respect to path pattern "macros" or "views", and on SQL Row Pattern Recognition, also seeks to provide more powerful and concise path queries. SPARQL extensions have been suggested for path queries that go beyond reachability and allow path element testing.

Within the openCypher project, and in SQL/PGQ, the issues raised by G-CORE, PGQL and GXPath with respect to graph query tractability raise important questions for language designers: Should industry standard language forbid queries to be formulated that may not terminate, may return huge result sets, or may consume excessive resources and time, or should implementations and applications use heuristic approaches in these circumstances.

Tuesday 11:00 - 12:30 Triumphs and Tribulations (Jazz 1+2)

Moderator: Keith Hare

30+ years for SQL, two decades of RDF, around three years of open governance for Tinkerpop and Cypher. Lots of experience in what worked well, and what didn't, both technically and socially, for the authors and the users. What lessons should be learnt for the future?

Tuesday 13:30 - 15:00 Specifying a Standard (Soul)

Moderator: Leonid Libkin

This session will look at different aspects for developing and specifying standards. Natural language prose is not the only way for defining a standard. Reference implementations and conformance suites can play their role, as can mathematical formalisms like denotational semantics.

Official standards consortia are a proven way of writing industry standards for information technology. IETF RFCs have long shown how very very lightweight consensus groups can provide a different model. The massive success of open source projects, like those working under the aegis of the Apache Software Foundation, provides another model. And sometimes bespoke consortia arise, like the GraphQL Foundation, or semi-formal communities like openCypher.

Which artefacts? Who writes them? Who governs their evolution? Which ones are normative? How do we make the sum exceed the parts? Evergreen standards - what they are and when they are suitable.

Tuesday 13:30 - 15:00 Queries and Computation (Jazz 1+2)

Moderator: Victor Lee

This session focuses on the inherent computational resources that are necessary to make the query system(s) run smoothly and how to optimise the query process in order to obtain best query results. Many statistical analysis techniques can exploit well-known network or graph algorithms to establish connectedness, clustering and comparative metrics. Such algorithms are frequently iterative, and may benefit from focussing on sub-graphs. These needs emphasize the importance of graph schema, transformations/projections, and the ability to marry computation with data operations. Increasing emphasis on graph networks in the machine learning space underlines the significance of this relatively unexplored synthesis between querying and computation.

Tuesday 13:30 - 15:00 Rules and Reasoning (Music Hall 4)

Moderators: Harold Boley, Dörthe Arndt

Rule languages offer a high level alternative to low level graph APIs. What can we learn from existing rule languages, including RDF shapes? How can we make rules easier for the middle 33% of developers? This includes ideas for graphical visualisation and editing tools.

How can rules support different forms of reasoning, e.g. deductive, inductive, abductive, causal, counterfactual, temporal and spatial reasoning? This would take us beyond the limits of sound deductive reasoning and into the realm of rational beliefs that are based upon prior knowledge and past experience, and which enable reasoning when data is incomplete, uncertain, inconsistent and includes errors. What can we learn from work in Cognitive Psychology?

How can rules be combined with graph algorithms on the instance and schema (vocabulary/ontological) levels, for instance techniques based upon constraint propagation? How can rules be used with reinforcement learning, simulated annealing and other machine learning techniques (beyond induction and abduction)? This points to the need for ways to combine symbolic reasoning with computational statistics. How can we collect use cases that offer clear business benefits that decision makers can easily understand?

Tuesday 15:30 - 17:00 Graph Models and Schemas (Music Hall 4)

Moderator: Juan Sequeda

The RDF world has always paid great attention to expressing rules about the content, and inferrable meaning, of data. The property graph world has come from a different direction, with a strong emphasis on data integration, heterogeneous typing and speed of prototyping, relying on sample data to form the model. The property graph community has expressed huge interest in addressing this shortfall in reaction to the GQL initiative. This is reflected in current work in the SQL Property Graph Query workstream on graph typing and model expression. At the same time projects like SHACL and ShEx have pushed forward thinking in this space. New challenges arise when considering how applications might mix, federate, or map triple-based graphs and labelled property graphs. How do graph models relate? How can we describe, validate, constrain and process the metadata associated with or implied by a graph? How does this relate to the problem of data interchange and query interoperation. This session can hopefully educate, build bridges and stimulate much future work.

Tuesday 15:30 - 17:00 Temporal, Spatial and Streaming (Soul)

Moderators: Martin Serrano, Ingo Simonis

A plethora of research and application-level projects relating to temporal, spatial and streaming applications of data and graphs, building on existing capabilities and standards, indicate that these aspects, and often their intersections, are likely to be prominent in the future of graph data management. The need to relate to existing standards, especially with respect to geo-spatial coordinate or graph based data, and the fact that streaming is not yet standardized for SQL, indicate some of the challenges.

The increasing number of spatio-temporal data, particularly now within the emergence of the web of things, where various vocabularies representing geospatial data are required demands to pay special attentio to the fact that there is no agreement in which format will be used as standards describing those spatio temporal characteristics by means of Linked Data, There are different examples for practical approaches, e.g. BasicGeo, GeoSPARQL or GeoJSON.

Additionally, the coordinate reference systems are often not specified in the dataset, which complicates data visualization and information extraction. These and other issues will be discussed in this session. Come prepare to provide your insights and experiences from those projects and share your views on future directions for standard.

Tuesday 15:30 - 17:00 Outreach and Education (Jazz 1+2)

Moderators: Peter Winstanley, James Masters

Successful adoption of standards is often dependent on good quality tutorials, examples, reference materials, online demos and tooling. What is needed to support sustainable community driven outreach and education? What can we learn from existing community efforts, e.g. "MDN Web Docs" which describes itself as a resource for developers, maintained by the community of developers and technical writers.

Preview for Wednesday (Music Hall 4)

A look forward to the final day and encouragement for everyone to think about potential recommendations for next steps, and which associated future work items you would expect to get involved with.

Posters and Demo's (Music Hall 4)

Tuesday evening concludes with an opportunity to share ideas using posters or demo's. Please let the Chairs know in advance (by Friday 1st March) if you would like to present a poster or demo.

Wednesday 09:00 - 10:30 Introduction and Reports from Tuesday's sessions (Music Hall 4)

Each session moderator will present a three minute summary with three minutes of questions.

Wednesday 11:00 - 12:30 Extending, Incubating, Initiating (Music Hall 4)

This is the final session, and we will focus on summarising and drawing up recommendations for the workshop report. How much interest is there for participating in incubation and standardisation for different areas? What's the best way to ensure liaison and coordination across different standardisation organisations? What can be done to establish sustainable communities around education and outreach?