W3C

W3C Workshop on Web Standardization for Graph Data

Creating Bridges: RDF, Property Graph and SQL

Monday 4th March to Wednesday 6th March 2019, Berlin, Germany

Workshop Minutes

Agenda: https://www.w3.org/Data/events/data-ws-2019/schedule.html

Minutes for Monday 4 March 2019

This is for collaborative minute taking. Help us to create good quality minutes.

Twitter feed: https://twitter.com/hashtag/W3CGraphWorkshop 

Live audio/video via Zoom. Recordings will be available for 90 days.

Monday, 4 March

Keynote by Brad Bebee, Amazon Neptune

http://www.w3.org/Data/events/data-ws-2019/assets/slides/BradBebee.pdf 

"It's just a graph"

How customers want to use graphs.

In the past: Resource description, metadata management applications. Machine processable, understandable way to process information. Knowledge rep, NLP,  RDF, SW Scientific American article, OWL standard.

Worked on big data for a while, and distributed query

BigData rebranded as Blazegraph. Adopted by Wikidata.

A number of us went to Amazon, and in 2018 released Neptune.

When to use a graph database? If your app needs to process a lot of connected data (data from different sources, relationships, different schemas, information produced at different rates). Lots and lots of different graph use cases. Cancer genomics problem, security issues, etc.

Use of graphs not just for reimagining old applications. But for new functionality.

Can I use a relational DB to process this connected data? Yes, you can. But there are drawbacks: SQL joins, which are complex to develop, kinds of workshops that relational DBs are optimised for are very different than those required for connected data/ SQL joins.

Graphs has the flexibility to let you integrate data ‘like crazy’.

Two camps: property graphs and RDF/SPARQL

Neptune supports both of these. Which one to use? The one that makes more sense to you. Developers coming from relational or document-oriented DBs find that transitioning to a property graph model is more natural to them.

How big is the graph market?

Q - what benefits do other communities see in working together (coming from an RDF community)

A - lot of standards that could help property graph applications; interoperability for property graph apps, move fluently from property graphs to linked data

Q - we are using Neptune and have worked with clients using neo4j. Within the Neptune stack, do people need to make a commitment to property graphs?

A - we don’t provide data interoperability between graph models; we want to. It is much easier to show property graphs from an RDF view than the other way around.

Q - is it important and useful for RDF to evolve so that it makes it easier to model property graphs within RDF? Two approaches to the same problem.

A - from RDF perspective, lots of things we can make better. Lots of features in SPARQL 1.1. that people want to add (e.g. analytics). It’s hard to say in terms of interoperability. RDF-led initiative? Or opportunity to something together?

Q - extending RDF or SPARQL or something in the property graph world to make statements in an analytical way. E.g. find all the connected components of a graph.

Panel Discussions

Vectors and Venues

Moderator: Dave Raggett. Panelists: Jan Michels, Keith Hare, Alan Bird

Alan Bird described the structure and processes for developing standards within W3C. Keith Hare described the structure and processes for developing standards within ISO/IEC JTC1, with an emphasis on ISO/IEC JTC1 SC32 WG3 (Database Languages), the committee that has developed the SQL standard and is working on a property graph standard. Jan Michels described the structure and processes within ANSI INCITS DM32.2, the US national body that corresponds with SC32 WG3.

[Lots missed]

Alan Bird: Today should be day 0.5 of putting together a standard for data querying, where graphs (or relational data, or documents / time-series) are just an incidental implementation detail.

Andy Seaborne: In Apache everything is pushed down to the contributors.  But in Apache they are individuals -- people driven by doing.  Difficult to see how some of the engagements can be set up, but worthwhile.

James ___: I expected an outcome would be something like R2RML.  I want info in any relational DB with semantic consistency across the DBs of different types.

List of Graph “Things”

RDF World

PG World

Others

RDF → PG:

PG → RDF and RDF -> PG

Coexistence or Competition

Panelists: Olaf Hartig, Alastair Green, Peter Eisentraut

Background:

Olaf’s presentation

Peter Eisentraut’s presentation

Alastair Green’s presentation

Q&A:

Lightning presentations

Querying RDF: SPARQL 1.2/2.0 and imperative query languages, Adrian Gschwend

Mobile Money in Africa, Alex Miłowski, Orange Silicon Valley

Need for VIEWs in graph systems, Barry Zane, Cambridge Semantics

Vehicle graph data, Daniel Alvarez, BMW Research, New Technologies, Innovations

https://ssn2018.github.io/submissions/SSN2018_paper_4_submitted.pdf 

The Sentient Web: IoT + graphs + AI/ML, Dave Raggett, W3C/ERCIM, Create-IoT & Boost 4.0

Cypher and Gremlin Interoperation, Dmitry Novikov, Neueda Technologies (summary)

Neo4J for Enterprise RDF Data Management, Ghislain Atemezing, MONDECA

Cyber-Physical Graphs vs RDF graphs, Gilles Privat, Orange Labs, Grenoble, France

JSON-LD 1.1 Update, Gregg Kellogg – Spec Ops

An Executable Semantics as a Tool and Artifact of Language Standardization, Filip Murlak, University of Warsaw, Jan Posiadała, Nodes and Edges and Paweł Susicki, Nodes and Edges (summary)

NGSI-LD: JSON-LD based representation and Open APIs for Property Graphs, José Manuel Cantera, FIWARE Foundation (summary)

Implications of Solid, Kjetil Kjernsmo, Inrupt Inc.

Cypher for Apache Spark, Max Kießling, neo4j

Information Architecture, Natasa Varytimou, Refinitiv

Graph Data on the Web extend the pivot, don’t reinvent the wheel, Olivier Corby et al., Inria – Wimmics team

SQL extensions for Property Graphs (PGs), Oskar van Rest, Jan Michels – Oracle (summary)

Do we need 3-valued logic to deal with NULLS? Leonid Libkin

Path Queries in Stardog, Pavel Klinov, VP R&D, Stardog Union, and Evren Sirin, Chief Scientist, Stardog Union

Compiled GraphQL as a Database Query Language, Predrag Gruevski, Kensho

A Product View on Graph Data: PoolParty Semantic Suite , Robert David CTO, Semantic Web Company

Bridges between GraphQL and RDF, Ruben Taelman, IDLab, Ghent University — imec

Schema validation and evolution for PGs, Russ Harmer, CNRS. Eugenia Oshurko, ENSL, and Angela Bonifati, Peter Furniss, Alastair Green and Hannes Voigt, Neo4J,

Standardized local property graph models across browsers, Theodoros Michalareas, wappier.com

Graph the Language Graphs! Thomas Frisendal

Property Graphs need a Schema Juan Sequeda, Capsenta, on behalf of the Property Graph Schema Working Group

Tuesday, 5 March

This day was split across three rooms, with twelve sessions in all.

Graph Data Interchange

Moderators: George Anadiotis, Dominik Tomaszuk

Minutes - Graph Data Interchange

Moderators: George Anadiotis, Dominik Tomaszuk

George Anadiotis. https://my.pcloud.com/publink/show?code=XZOjYq7ZuHjtvPmooepzOtl8J3Br5yM7lcXX 

Working in data since 92. With graph db since 2005

Graph data interchange = minimum viable product for standardisation

Key decision points: Formats → Protocol → Semantics

Graph Data Interchange Syntax

What would that format be? JSON based or not?

Based on a twitter poll, 80% prefer JSON-LD. https://twitter.com/aaranged/status/1090237302217924608 

Graph Data Interchange Protocol: Graphson, JSON Graph (used by netflix), Json Graph Format JGF, mostly used by a company ADS, GraphML used for visualization tools

Graph Data Interchange Semantics: JSON-LD/ Schema.org

Connecting vocabularies (i.e. schema.org) to the serialization/syntax of the graph data

Q/Comment: There was a lot of discussion between XML vs JSON. Both syntax can represent the same data.

Does Neo4j has a serialization format?

Not really. There is a format to serialize between the server and visualization front end but that is not general purpose.

Bolt is a data record serialization but not a graph serialization. There will be a graph serialization format for Bolt in the future.

There is already data in existing format so focus is on importing that one (CSV).

YARS-PG (more at poster session)

We should enable different communities to work with the tools that work best for them.

A bit of history: RDF and XML started at the same time and many people saw RDF that it was not very good XML. The communities moved on from using RDF/XML

AxelP: Why not start from the RDF syntaxes such as Turtle and JSON-LD. Or N3.

Barry: Nothing against JSON representation, but it sounds that there is not a widely adopted method of representing graphs. But with Turtle there is.

In cambridge semantics, we don’t see data in RDF/XML but we do see ontologies.

GreggK: Just to be clear, JSON-LD is a w3c recommendation to represent RDF Graphs. It’s important to be syntax independent and it seems that RDF* is syntax dependent.

MonikaS: what does the developer community like? It seems that the dev prefer JSON.

JSON community is bigger than Turtle.

AxelP: there is an advantage of using Turtle. Additionally, Turtle is also a basis of SPARQL.

Be careful by allowing to do different things with the syntax. In XML you can add values in tags or attributes and that gets complicated.

Olaf Hartig: RDF* and SPARQL*

Slides: http://olafhartig.de/slides/W3CWorkshop2019RDFStarAndSPARQLStar.pdf 

It’s not just about serialization. We need to understand how to exchange data between different systems (RDF vs PG)

RDF at a triple level does not support metadata about statements.

Kubrik influencedBy Welles → X

X significance 0.8

Solutions to this are:

Proposal: Nested Triples [http://olafhartig.de/files/Hartig_ISWC2017_RDFStarPosterPaper.pdf]

<<Kubrik influencedBy Welles>> significance 0.8

Two Perspectives:

1.        Syntactic sugar on top of standard RDF/SPARQL

2.        A logical model in its own right, with the possibility of a dedicated physical schema

Use SPARQL* to 1) query data in RDF reification 2) Query Property Graphs by translation to Gremlin/Cypher

Contributions 1) mapping between RDF* and RDF and 2) definition of RDF* as its own data model and SPARQL* as a formal extension of SPARQL

Additionally defined mappings for RDF* to PG and PG to RDF*, 3) full support in the Blazegraph triple store

Axel: what is the real advantage of RDF*/SPARQL* over singleton property reification

Olaf: RDF*/SPARQL* abstract over the concrete reification used to store the data

Axel:… IIUC, you mean, there could be roundtripable mappings to different reifications underneath, or to graph stores supporting PGs directly?

Olaf: yes.

Barry: This is also in Cambridge Semantics “this plugs a big hole in RDF”

Brad: The tinkerpop community would say that Graphson3 would be the de facto syntax.

Olaf: Right! Our RDFstarPGConnectionTools contain a converter from RDF* (Turtle* files) to a Tinkerpop Graph object.

Barry: RDF* is not just syntax sugar, at ingest time it is very valuable.

JSON-LD for Property Graphs Jose Manuel Cantera

Use case is in Smart Cities

Gregg Kellogg

JSON-LD introduced the notion that blank nodes can be the identifier of the named graph.

Toward interoperable use of RDF and property graphs - Hirokazu Chiba

Mapping RDF graphs to property graphs.

G2GML currently supports RDF → PG

Barry: Is this a general mapping tool?

Hirokazu: It’s similar to SPARQL CONSTRUCT

Discussion

Hands raised (on Zoom) that we should spend time on JSON-LD for PG and on a mapping language.

Should we see how JSON-LD can support PG? Everybody raised their hand

Do we need a mapping language that does RDF<->PG? ~50% raised hand for yes. ~50% raised hand for “I don’t know/care.

Brad Bebee: Apache Tinkerpop community under-represented here. GraphSON is a de facto standard, it should be taken into account.

Should RDF*/SPARQL* be sent as a W3C member submission? Everybody raised their hand

AxelP: and it should include a mapping to JSON-LD

Easier RDF and Next Steps

Moderator: David Booth - Slides

Collaborative minutes, please join in!

David Booth: Yosemite Project

Guiding slides for discussion:

Difficulties in using RDF?

Background in ground up effort in getting healthcare data to be interoperable. (yosemite project). Allow different syntax but information content the same due to shared semantic mapping through RDF.

Claim that RDF is to hard to use. Can be used successfully but by highly skilled teams.

https://github.com/w3c/EasierRDF has a collection of issues that where collected earlier.

Summary:

  1. Lack of standard n-ary relations
  2. Moribundity of tools
  3. Lack of SPARQL-friendly lists
  4. Blank nodes
  5. Lack of beginner-friendly tutorials
  6. RDF is too low level

Lack of standard way to represent n-ary relations - just patterns.   https://www.w3.org/TR/swbp-n-aryRelations/ 

Example with “has_ceo” between company and employee, with a start date. One common pattern to represent this in RDF involves a blank node.

Question: please expand on why you want to tag n-ary relations?

Answer: So that tools can know that it is an n-ary relation.

Question: The example is more like a qualified relationship.  Another example is for a sensor: where it is located, what it measures, what its accuracy is, when it was last calibrated, etc.

Answer: Yes, it is a restricted kind of n-ary relation.  It was chosen so that later I can easily illustrate how property graphs can be represented in RDF.

Some meta discussion on  what we are doing in this session - we should avoid spending too much time on the low level details.

David takes us through a list of challenges (see slides), e.g. issues with blank nodes, lack of tutorials, the sense that RDF is too low level (assembly language vs high level language)

Question about how to determine what the middle 33% actually think, other than that they aren’t using RDF in a meaningful way other that sprinkling RDF into web pages for better search results.

Adrian: our experience is that beginners needs lots of tutorial stuff including JavaScript libraries and tools.

Maybe we don’t have good visualisation tools because we don’t have a standard way to represent n-ary terms etc.

Ivan: Maybe graphs are the problem: many people “think in trees”

David: I don’t quite agree with that, but they don't think in RDF graphs or property graphs.  They think in higher level graphs.  We need to work up to a higher level RDF that capture more directly the graphs that people visualize.

DanBri: agree with Ivan that property graphs are roughly similar in difficulty to RDF.  Adds that unfriendly reactions on mailing lists put beginners off

Maybe IRIs are the problem instead of blank nodes.  IRIs are a pain for new users whereas blank nodes are much easier for them.

Richard Cyganiak:

Dave Raggett: we need a sustainable model for a strong community around developing tutorials, javascript demos etc. and perhaps the Mozilla Developer Network (MDN) provides a good precedent.

I also agree that enterprise support is a key for RDF, and that we need to step up to defining the mapping to property graphs and a role for an interchange framework across heterogeneous data silos using different database technologies and standards

David: RDF is at the assembly language level and we need to define a high level framework that is better suited to average developers.

Adrian walked out

Quick overview of 5 ways of representing n-ary relations in RDF, with their pros and cons.

Ivan: these are mostly hacks, the question is whether we need to revisit the RDF core. Can we have local IRIs? That would make it so much easier to get started. (Jerven’s thought: can we also have more formal automatic skolemization of bnodes as a default?)

Richard: if we open RDF core, then RDF* is a promising thing to look at.

Discusses the challenge of sustainable funding for educational materials. Maybe we can approach companies?

Property graphs are successful because they differentiate links and properties, and use the closed world model. If we had reification with a standard “hash” name, just a W3C note then reification is nice and enough for property graph capabilities. Also then just needs syntax sugar in SPARQL and the serializations.

IRIs can be hidden (JSON-LD @context) as they are only relevant when it comes to external links. We need a higher level framework and representation that is easy to use

David: we still aren’t clear what the higher level RDF syntax should look like

DanBri: I want to pick up on packaging. Docker goes a long way to seamless packaging, making it easy to install applications without having to deal with the dependencies. If only there was an easy way to add the data.

A higher level framework should simplify handling of identifiers, JSON-LD is a good         example.

Dave: we should incubate a high level framework and tooling above RDF, including visual editors for graph data and rules, look at sustainable models for tutorials etc., and ensuring that RDF can act as an interoperability framework for integration across different property graph and SQL data stores etc.

David does a straw poll:

Andy: we should ask the users for what they find hard in practice and we need to engage with enterprise users in particular.

Need for work on richer mapping standards across heterogeneous silos.

Lift the turtle syntax to make it easier to express property graphs and then make it easier to SPARQL (Robert Buchmann?)

We haven’t talked about semantics.  This proved a painful area for RDF1.1.

Lack of courses on RDF.  This is the tutorials space.

David: Some vendors offer courses.

If I had a bunch of money to improve RDF uptake I would hire a designer.  More designers! Nicer layouts for the tools.

Need to start writing standards for nicer Turtle and Sparql (Richard Cyganiak) we know the issues.

Ivan: RDF has lots of work on inference that virtually nobody uses - we need to work out why?

Perhaps this is because this isn’t what people actually need?

We should work on RDF*, SPARQL* etc. this would address much of what people are looking for.

Need for work on SPARQL 1.2

We’re in competition with Property Graphs and need to make RDF better fitted to succeed[a].

We differ in respect to whether we need the semantics in the Semantic Web. Some people are comfortable with defining inference in terms of the application of rules rather than relying on a logic prover. Standard vocabularies (ontologies) make it easier to re-use shared semantics.

Inference is very important but needs to be easier to specify selectively what inference to use and where to use it.

[a]the only place where PGs outcompete RDF is marketing

https://twitter.com/jindrichmynarz/status/775633426149965824

SQL and GQL

Moderator: Keith Hare

Summary:

The SQL and GQL session had four parts:

  1. Keith Hare, Convenor of the ISO/IEC SQL Standards committee, presented a quick history of the SQL Database Language standards, and talked about how the new SQL part, SQL/PGQ -- Property Graph Queries, and the potential new Graph Query Language (GQL) standard fit together with the existing SQL standard.
  2. Victor Lee, TigerGraph, talked about how TigerGraph achieves high performance using defined graph schemas. TigerGraph also supports Accumulators to gather summary data during graph traversals.
  3. Oskar van Rest, Oracle, described the SQL/PGQ syntax to create property graph “views” on top of existing tables. He also provided an introduction to the Oracle PGQL property graph language.
  4. Stefan Plantikow, Neo4j, provided an overview of the capabilities currently planned for the proposed GQL standard.One key goal is composability – the ability to return the result of a graph query as a graph that can be referenced in other graph queries.

Minute takers: Alex Miłowski, Predrag Gruevski

SQL Standardization

(a summarization of the slides - see presentation for reference)

Keith Hare

“The SQL Standard has been the borg!” - 30 years of history of expanding SQL

SQL technical reports describe parts of the standard:

Next:

What’s a property graph?

SQL/PGQ = the intersection of SQL and GQL

SQL/PGQ and GQL work -

GQL project potential structure

Various inputs to SQL/PGQ to GQL

Tiger Graph (Victor Lee)

“Property Graph Language for High Performance”

Origins of GSQL: attempting to design a property graph database of the future

Why schema-based?

Proposed GQL graph model: schema-first options

Proposal for GQL (see slides):

Accumulators:

Oskar van Rest - Oracle

Property Graph DDL:

More complex DDL:

Query:

PGQL  = Property Graph Query Language

Stephan Plantikow - Neo4J

Cypher - MATCH/RETURN structure - conceptually simple for users

From Cypher/PGQL/etc. to GQL:

Alignment w/ basic data types from SQL

Query composition:

Enable graph construction via query composition

Queries can be defined by name with functional semantics - used in later queries like a function call.

Graph types:

Type system:

GQL Scope and Features - document from Neo4J (need link)

Among questions:

  Specifying cardinality constraints in a graph schema - how many edges of a type in/out of a node; how many nodes of a type

Composibiity -- if the output from graph query One is a graph, that output can be used as input for graph query two.

Graph Query Interoperation

Moderators: Olaf Hartig, Hannes Voigt

Initial Statements

Discussion

Composition, Patterns and Tractability

Moderator: Peter Boncz

CONSTRUCT (c)<-[:worksAt]-(n)

MATCH (c:Company) ON company_graph,

      (n:Person) ON social_graph

WHERE c.name = n.employer

UNION social_graph

CONSTRUCT social_graph, // another way to express a union

  (n)-[:worksAt]->(x:Company {n=n.employer})

MATCH (n:Person) ON social_graph

In order to get only one Company vertex per distinct employer name, grouping is used:

CONSTRUCT social_graph

  (n)-[y:worksAt]->(x GROUP e :Company {name=e})

MATCH (n:Person {employer=e}) ON social_graph

PATH wKnows = (x)-[e:knows]->(y)

     WHERE NOT 'Google' IN y.employer

     COST 1 / (1 + e.nr_messages)

Triumphs and Tribulations

Moderator: Keith Hare

Summary:

Keith Hare presented the processes used by ISO/IEC JTC1 SC32 WG3 (Database Languages) for the SQL standards along with the benefits and challenges of those processes.

Peter Eisentraut described the processes used by the Postgres open source database project.

Ivan Herman described the W3C processes.

While the processes have some bureaucratic differences, they have many similarities. Perhaps the biggest similarity is that the end result is determined by the people who show up to do the work.

Minutes:

Keith Hare : presented ISO/IEC JTC1 SC32 WG3, SQL editing process (slides)

Pro:

Detailed specifications

Technical Reports

Standards Documents are built in XML

Process is fairly open

Entry costs depend on national body

Real cost is time to effectively participate

Consistent participation

Collegial participation

Ability to adapt to changing bureaucratic environment

Standards are defined by the people who do the work

Often tied to database implementations

        WG3 good at fitting into/around ISO bureaucracy

Con:

spec is complex

        Documents cost money (can get round to some degree, tech reports are free)

        No external validation testing

        Getting new people involved - property graph work is helping

        Document format arguments with ISO central secretariat

        Work only proceeds where people do the work

4 parts of SQL that won’t be revised - effectively obsolete

Peter Eisentraut - Postgres development (as OpenSource example)

        Long-term OS cooperation

        Process evolved on realising some things needed

        People generally get on

        Code-review done but not enough people doing it

        Regular release cycle nowadays (longer ago less organised)

        Over time, more process structure has helped

        Similarities to some of the ISO process

        Introductory documentation available

        

ISO  WG3 process is perceived as a difficulty - who is delegate representing ? joining national body may be easy, but if all the real discussion is in US committee ….

Problem with bug reporting into ISO process.

Ivan Herman W3C

        Membership by company, not by country

        Member company can send as many people to as many groups as they want

        But vote by member

        Development process fairly similar to ISO

        Working groups organise as they like, but weekly 1 hour call + some f-t-f common

        Can invite other experts

        All material is now on github; any at all can comment via github commenting

        WG has to respond to comments

        All documents, draft to final are public

        Patent policy - member companies have to commit they will not enforce a patent that is required for implementation of a recommendation. Not an issue in recent years.

        Groups produce a first public draft, then updated, eventually to candidate recommendation (CR)

        Then review - primarily testing - requires every feature to be proven implementable by at least two independent implementations (issue with only 3 continuing browser rendering engines)

        W3C does not check the validity as such

        Testing phase can be long - 6-12 months

        Members voting points:

 charter of a working group (requires 5% of members to vote - approx 20 members)

Proposed recommendation - yes/no - potential for director ( T B-L) to overrule

Role of editor : depends on WG, but editor expected to follow WG decisions; Other WGs give editor more authority

Q: is there high signal:noise on open comments : not too much.

Every WG has at least one member of W3C team - so very flat from editor - wg chair - w3c team - w3c Director

All documents have to be checked for security, privacy, internationalization, accessibility concerns. (RDF lacks right-to-left marker for Hebrew etc)

Email discussions archived and accessible. (unlike WG3 - archive exists but not accessible)

Specifying a Standard

Moderator: Leonid Libkin

Leonid Libkin:

Jan Posiadala, “Nodes and Edges”


BEGIN of notes redacted by Paolo Guagliardo

Initial thoughts for discussion put forward by the moderator (Leonid Libkin):

Contribution by Jan Posiadala: Executable semantics as a tool and artefact to support standardization

Points raised during the discussion:

Conclusions

There is general consensus that a formal specification should be devised in the context of the new GQL standard (the vast majority of people in the room agree with this and no one is directly against it). It remains to be seen who will be responsible for developing and maintaining this formal specification, whether it will be an integral part of the standard, or how it will otherwise influence the standardization process.

END of notes redacted by Paolo Guagliardo

Queries and Computation

Moderator: Victor Lee

Intro

Victor had slides to help guide the session.

Thinking about queries and the data on which they are running. How to tie into machine learning.

What is a query

  1. Favor the human  or the computer
  2. To restrict or not restrict inefficient operations
  3. Influence of schema
  1. When do you want a schema (dynamic/predefined)
  1. Digging into query and workload types

Computers are smart, but hardware is dumb.

One line of code can translate into O(n), O(n^2), etc. CPU cycles etc.

Overhead: disk access, OS management, network access

High level queries are just the top of the iceberg

Translation into sequence of ql primitives e.g. loops or subqueries -> IR -> machine language -> hardware

Incurs hidden costs e.g. transactions or memory models. VMs or data layer access

Memory is cheaper but still much more expensive. Latency is a big problem.

Cost of complex algorithms, can it be fast for large N at all?

Amdahl's law hits.

What is a "query"?

Audience responses:

Topic 1: Favour human or the computer.

Topic 2: Computational Limits

Some computational tasks are expensive in terms of time or memory.

Some such tasks are built into the language; some are queries that the use writes.

What should we do about excessively expensive operations?

How can you estimate the cost of an operation/query?

Some algorithms give progressive better results the longer you run them, so you can abort them when they reach some limit and possibly still have obtained a good approx. result:

Have parachutes in place

Topic 3: Schema-less vs. schema-first

Topic 4: Query and Workload Types

Rules and Reasoning

Dave: we may want to use mindmaps as a way to sketch the landscape for different kinds of rules and reasoning. This is a convenient way to devise informal taxonomies for concepts.

[Online comment by Harold: An earlier mindmap-like arrangement of rule systems was done in RIFRAF (also see follow-up pages): http://www.w3.org/2005/rules/wg/wiki/Rulesystem_Arrangement_Framework.html]

Moderators: Harold Boley, Dörthe Arndt

Introductory slides

Questions to discuss (proposal): https://docs.google.com/presentation/d/1LkyXSE_86JNGoJRJx8HfK0PU6Xv_HUDEOLBnFh_dmgY/edit?usp=sharing

Proposal: Notation3 logic to reason over RDF/LPG graphs https://docs.google.com/presentation/d/1I-gS3lEsmuUlEcy3EK010NVH-0NAT3-TD4yPtTR3nH4/edit?usp=sharing

Harold: Database views are a special kind of rules (cf. Datalog).

Rules can describe one-step derivations: iform ==> oform, replacing iform with oform;
or (equivalently), oform <== iform, obtaining oform from iform.

Reasoning can chain rules for multi-step derivations, forward (adding data), backward (querying data) or forward/backward (bidirectional).   Ontologies complement rules and prune conflict space.

Languages for graph rules and reasoning augment languages for graph DBs and relational rules; N3, LIFE, F-logic, RIF, PSOA RuleML are examples.

Ontology languages can be defined by rules, e.g., OWL2RL in RIF and SPIN rules.

Beyond deductive reasoning (from relations to graphs), includes quantitative/probabilistic extensions and qualitative extensions (inductive, abductive, relevance, defeasible and argumentation).

Comment: We have a few billion statements, can interchange rules, but not execute on big data.  Usability profiles are important.  Shape understanding of PG people.  

Harold: Big data rules?  See Datalog+/- -- it is scalable.

Evrin: Practicality is important.  OWL, RIF, etc. may be too much to use.  Worked on reasoning for 15 years.  Need semantics people understand.  We support a SPARQL-based syntax.   Q: What about N3?  Evrin: Maybe.

Comment: Want to be able to have wave of rules and use the results of one rule in another.  Need to control info based on SPARQL CONSTRUCT, but use the results of another.  For derivation rules, we need a standardized way to describe complex rules that is more than a direct derivation from one data to the other

David Booth: I want rules to be: 1. convenient and concise to write and read; and 2. easy to specify where and when I want to apply them.  I do not want to apply all rules to all data!  For example, I want to apply rules to only a particular named graph, or collection of named graphs.  And then I want to use those result in another named graph, for example, and compose results.

Gregg: N3 rules have a difficult mapping to RDF datasets. We should consider basing reasoning on SPARQL query/construct feeding back to the dataset.

Axel Polleres: We had statistical rules, and ontology rules that we computed, and tricky combining them.  How to handle them?

Comment: Might want to manipulate graphs having nothing to do with reasoning.  If we can associate KG with cyber physical graphs...  (missed)

Riccardo Tommasini: I work in two domains.  Rules allow me to work at my preferred domain level.  Complex event processing.  Two domains, both with a concept of rules, and want to mix them.  The metadata is timestamps.

Ghislain Atemezing: Rules are important.  Our customers want to see rules in action.  They want to be able to easily read and write them, and easily understand data that was inferred and how.  Q: formal proof and provenance?  A: Yes.   And sometimes the complexity of the OWL profiles, end users don't want to know that part.

___: Rules control process, and check consistency.  Q: validation?  A: Yes.

Harold: Rules found inconsistency in Aircraft Characteristics information: an airplane model should not be in both the Light and Medium weight classes (http://ceur-ws.org/Vol-2204/paper9.pdf).

David Raggett: Rules that operate on graph, and rules that operate by miracle and update the data.  Also machine learning rules, from reinforcement learning.  Different kinds of rules.

Andy Seaborne: Rules have good mind share.  If you say "inference" they switch off.  They want control and understanding of what the rules are.  They're programmers!  They start with an idealized expectation of rules.  Good potential.  Lots of desire for rules to be more used.  Better to start low level, e.g., computing a value from the data.

Axel: What would be the starting point?  SPARQL-based without datatype inference.

___:

David: A third thing I want: the full power of a programming language.

Richard Cyganiak: SHACL rules package a SPARQL CONSTRUCT query. It’s a good starting point that already is a W3C Working Group Note. https://www.w3.org/TR/shacl-af/#rules

Ivan Herman: If people come to RDF, they will learn Turtle.  From Turtle to SPARQL is relatively easy, because it is based on the same syntax.   CONSTRUCT means that they are within the same syntax.  Problem w N3: too late.  It should have been done years ago, before SPARQL.  If we add n3 we force the user to learn another syntax.  If we cover 70%-80% of use cases with rules then that would be a good start.  Q: Is n3 really hard to learn?  A: Yes, n3 is different from SPARQL/Turtle.  Yet another obstacle.

Harold: A tiny addition to SPARQL CONSTRUCT could allow results to be chained.

Adrian: I see the temptation in having programming language in rules, but there are limits to what you should do.  If your data is so bad that you need to clean it up, then you should use a programming language.

Evrin: Caution against the full power of a programming language.  Once you go to procedural semantics, you get problems, like priorities of rules.  Simple addition of filters of binds is important to SPARQL.  It makes a huge difference in practice.

___: Lots of discussion of RDF and SPARQL.  Where do PGs fit in this?  Need to think of the PG world also.

Gregg: Might want to define an abstract data model for both RDF and PGs.  Perhaps that would be a framework for these rule engines.  Not necessarily SPARQL.  

Doerthe: Would an extra abstract layer lose people?

Dave Raggett: There is scope for procedural and declarative.  I want procedural rules operating on PGs.

Ivan: Has there been dev of rules languages in PGs?   Alistair: No.  :)

___: In DB communities, there are papers on reasoning over graphs and dependencies on graphs.

Doerthe: Where do you want to use rules?

David Booth: Data alignment.  Transform from one data model to another.

___: Check consistency rules for cars.

Adrian: Data pipeline, not have to materialize stuff that is in the schema.  The less I have to write the easier it is to maintain.

Evrin: Data alignment, beyond just subclassing.  Also encoding business language in your app, like fraud detection.  Typically combined with statistical reasoning.  Q: How to combine them?  A: Relational mapped to RDF, statistical reasoning uses the logical reasoning output.

Riccardo Tommasini: We do the same thing with complex event processing.  Correlation of events is also a concept.  In most approaches rules rely on a symbolic level that is different from SPARQL.  Don't care much how the data looks, but what it means.

___: Data alignment.  You can extend these cases as much as you want.  You get this with multiple DBs.  Also, company based in Seattle, and you know that Seattle is in Washington.  Region is transitive.  

Peter W: Rules like models of buildings, and whether they comply with regulations.  Modeling regulations.  Looked at the Scottish legal corpus, putting into LegalRuleML.  Building control process.

Doerthe: Base rules on existing languages or new language?

___: Now I am trying to get other people to use our rules.    Did anyone try to study how long it takes to solve a problem with a programming language versus rules?

Monika Solanski: We had to make decisions based on certain criteria.  Users wanted to do if-then-else all the time, and the benefit was that people in the business could read the rules.

Adrian: We manipulate something very close to triples.  We  use a DSL on top.  It compiles to a more generic rules language.  We do code completion also on the DSL.  

Dave Raggett: Single format, but visualizable in a convenient rendering.   Common underlying model, but not specific to one syntax.  Consider a visual rules language.

[Online comment by Harold: One visual data/ontology/rules language: http://wiki.ruleml.org/index.php/Grailog]

___: When you have a set of rules, (missed)

___: Could borrow from programming languages,testing tools to check whether all cases have been covered.

___: Argumentation tools might be helpful.

Doerthe: Most use cases: aligning data.  Rule languages must be easy.  What take-aways.  

___: SHACL rules.  Very interested in moving it to core standard.  Q: Does it have semantics defined?  A: SHACL operates on a graph, and SHACL rules operate on a graph.  Q: Rules applied once, or recursively?   A: Recursively, and you don't know what will happen.  More work was needed to nail down what should happen, and that work could be done as future work.

Andy: Syntax matters.  People engage more in the concrete than the theory.

Dave Raggett: What about adding mindmaps?

Harold: Would be nice to add them.  This would be a new application.

Harold (shows the one-way return heuristic as a rule example involving CONNECTIONs between Airports: http://wiki.ruleml.org/index.php/Graph-Relational_Data,_Ontologies,_and_Rules#GraRelDOR.2C_AI.2C_and_IT): Rule syntax bridging RDF/Turtle’s and OpenCypher’s data syntaxes (e.g., with a parameterized property: https://dzone.com/articles/rdf-triple-stores-vs-labeled-property-graphs-whats).

Doerthe: Syntax understood?

Gregg: If we extend what we discussed this morning, such as the <<>> reification operator discussed by Olaf.  

Harold: The CONNECTION property is a constructor function with two parameters, distanceKm and costUSD, provided as (link-annotation-like) properties of that property. This constructor-function application is used as a property-parameterized property attached to the departure Airport.

[Online comment by Harold: More flexible (especially for rules) than, e.g., use of (reification/bNode-like) ‘symbolic’ property names (e.g., CONNECTIONdistanceKm4100costUSD300 etc., for every distance/cost combination).]

Ivan: If we have a good syntax, trying to reuse syntax both for query and rules makes a lot of sense.

David Booth: I am conflicted about syntax.  I like the idea of using familiar SPARQL syntax to write rules, and I've written lots that way.  But I also find it overly verbose, and I am drawn also toward N3.   Bottom line is that we have to try them out to find out which one is easier.

Doerthe: Why are SPARQL FILTERS different from WHERE clause?  A: Unknown.  (Andy is not here to answer.)

Richard: Maybe to distinguish functions from prefixed names?

Question from Zoom: Laurent Lefort> Better Rules Better Outcomes is a good example of “legal informatics” use case where users (a mix of legislative drafters and developers) are trying to capture and reuse rules on a large scale see http://www.rules.nz/variables using an approach called Open Fisca. I have similar needs in my organisation and am looking at approaches like the QB-Equations work. My question is has someone defined a URI scheme for rules, especially these kinds of rules defining how variables are derived from other variables?

Similar examples possibly found for GDPR, also Linked Building Data.

Harold: I don't know.

Graph Models and Schemas

Moderator: Juan Sequeda

Olaf’s slides: http://olafhartig.de/slides/W3CWorkshop2019GraphQLSDL4PGs.pdf 

PGSWG (property graph schema working group)

Links to working documents of PGSWG:

GraphQL as a schema definition language (Olaf Hartig)

Questions

Discussion

From ZOOM

Laurent Lefort:

Agree with Olaf on not having Incoming edges- There is a socio-technical benefit in being rigorous in defining directions in the graph in a multi-party setup.

Josh Shinavier

I agree as well. OK to treat edges as interchangeable with properties in this respect.

Laurent Lefort

Mine (positive) would be modularity and support to multi-party data integration - esp. finding the right balance in managing cyclic dependencies (between sub-parts of the schema supplying atomic definitions when developed by different parties) - that's the main blocker when you scale up - found OWL DL profiles to be a tad too restrictive but not by much. See FIBO, ISO 20022, OGC HMMG.

Laurent Lefort

Mine (Negative) is not having fully addressed the tool interoperability issues associated to the incremental development of the standard stack components (need to track/manage cases when new standards add something handy which has no equivalent in a previously defined standard).

Temporal, Spatial and Streaming

Moderators: Martin Serrano, Ingo Simonis

Summary

Christophe: wondering whether GeoSPARQL as a foundation of a data processing platform shouldn’t be revised. GeoSPARQL’s functions are binary, whereas domain experts are used to deal with ternary functions (with error margins/thresholds) as to support some fuzziness (i.e., something with 0.0….6mm^2 overlap is considered touching). Why? Increase uptake by subject matter experts and platforms/standards not inherit limitations

Riccardo Tommasini:

        FraPPE: Ontology to represent spatio-temporal data in a streaming environment

        RSP-QL: reference model for RDF Stream Processing Query Languages

        Use of Named Graph for RDF Stream representation

         Vocabulary & Catalog of Linked Streams

Dave: the need to support queries at different levels of abstraction e.g. coordinates, shapes, adjacency graph, and semantic relationships - in respect to both spatial and temporal queries.

Use-Cases:

Smart city: CDC data + social dato to understand how crowd is acting

News stream modeling and processing

Dave: we should consider existing work on temporal reasoning, e.g. the application of event calculus and situation calculus in respect to truth maintenance, i.e. the ability to reconstruct the state of a system at a given time, or during a given time interval.

This is relevant to reasoning about faults, and identifying plausible explanations for the observations, and the likely consequences of particular explanations - this is an example of abductive reasoning.  Inductive reasoning is relevant to constructing models that account for a sequence of examples whether these are in time or in space. People are good at spotting repeating patterns.  A related use case involves recognising that the current situation is similar to a past one, and using that to make predictions about how to handle the current situation.

Niels: A lot of thinking/research on granularity and time has been done in the BI/Datawarehouse world in the last 20+ years.

Also we are discussing topics on different abstraction levels… I think W3C/OGC should focus on core capabilties/vocabularies and validate whether the different higher level use cases can be satisfied with those.

Daniel: From the IoT perspective (e.g., time-series sensor data), streaming is also essential. Moreover, when several sensors are part of a device (i.e., a Sensor Network), which is moving in time, the spatial component adds complexity too. It would also be feasible to stream Observation entities (i.e., sensor measurements) such as the ones described by the Semantic Sensor Network Ontology. And to attach geospatial and temporal data to the Observations.

Going/Moving Forward Actions:

* Spatial Data on the Web Best Practices Working group is producing good results for spatial moving forward.

* There is a need for further development of GeoSparql addressing a.o. Uncertainties as described earlier as well as serialization.

* RDF Stream Processing Community Group

  (re-activate by promoting participation)

* RSPQL has now a complete reference implementation

* Better Coordination with Other Groups i.e. JSON-LD

Outreach and Education

Moderators: Peter Winstanley, James Masters

https://docs.google.com/presentation/d/1PSzSSAAVrBjgYrO8rDtovRv330glCS6xQQPRFeE5YeA/edit?usp=sharing 

Attendees: (Apologies if I missed or mispelled anyone’s names)

Adrian Gschwend (adrian.gschwend @ zazuko . com)

David Booth

Richard Cyganiak from TopQuadrant

Robert Buchmann

Natasa

Steve Sarsfield from Cambridge Semantics (steve.sarsfield@cambridgesemantics.com)

Tomas

Peter Winstanley (peter.winstanley@gov.scot)

James Masters (james.masters@credit-suisse.com)

Mo McRoberts

Problem Space:

Successful adoption of standards

Promote sustainable communities

Maintain resources for developers

Evangelist community development and technical writing

Need to create a sustainable community to maintain longer strands of communication over geography

Semantic Web Education and Outreach Interest group is closed - why?

How do we create market conditions for standards?

MDN - a model for our collaboration? Possibly good for developers, but not for CEOs

People are asking: Should data scientists be licensed? What kind of quality control can be enforced?

Do we need to mimick these efforts?

We are operating in a very western European / American paradigm. The RoW need educational resources through groups like UNESCO for global outreach

Do we need training certification programs? What would they look like / include?

Develop cookbooks for designing and delivering reusable models?

We have numerous small / vendor driven efforts to provide training - how do we converge on a shared resource of training materials?

What about a HackerRank for RDF / graph solutions development?

Recommendations / Suggestions from the audience:

The world of graph is very broad, with many use cases; we aren’t good at identifying what tools are fit to different use-cases.

David: New users coming into the space know what they want to do. Need signposts based on use-case. Need 3-5 common use-cases to direct people into the right technologies.

Richard: Want to reach the different user communities. Developers like RDF less than information architects. That is a pattern that is common. We (in this room) are all in the information architect camp, but we have to hand our work over to developers and that is where it gets complicated. Do we get them to like RDF or get them to “swallow the pill?”

Peter: Do we get them to “swallow the pill” by doing most of the work for them? Give them pre-existing URIs instead of having them invent them?

That is often been the promise, but in reality it falls down. Things have multiple identifiers with different lifecycles and that is just the reality.

Need a “learn python” app for RDF and SPARQL

RDF is really good in the low-level documentation; we suck on the other end of the spectrum.

Adrian: We need to get away from branded training material that we can point people to.

Peter: How are we going to organize that? If we get the market right, things will go. If we get it wrong, we are dead.

Peter: We need a wide partnership of corporate and community resources to build the tutorial and educational materials.

David: Need a central starting point website for all things RDF. Every other commonly used technology has a well maintained website where you get started. RDF doesn’t have that. How do we fund such a thing and sustain it?

?: Terminology is a mess

Richard: The successful examples where there is a single entry point are usually around a product or library, something where you start using it. We need an obvious entry point.

There are parallels with XML in a way; no one goes looking for XML on its own; they do it in the context of an application that needs to interact with it.

Adrian: I do a lot of SPARQL training; I really like the DuCharme Learning Sparql book; also Hendler and Allemang’s book. How would we write a book of practical use-case driven examples?

Adrian: I started documenting our JavaScript library and realise I need to rethink it to be more use-case driven.

Robert: I have the perspective of why students don’t like it.

  1. Visibility: They think it’s a myth. If there isn’t a tutorial on Coursera it doesn’t exist.

Peter: We absolutely need these resources for students; where can they get a docker image with the tools and data, for example? It wouldn’t take very much to pull stuff together, but who is going to host it? How do we keep it current?

Natasa: Why don’t we give users the ability to describe their use-cases in our portal and have consultants give their advice? Input for design. People are struggling to find RDF consultants.

Hosting isn’t as much a problem as having the people to maintain it. It feels like a coalition of vendors and consultants could collaborate on this.

Robert: Avoid RDF/XML for newbies.

David: OWL community is the biggest offender of RDF/XML propagation. Also: Hosting isn’t the issue, it’s curation.

Chip: Need hackathon problems using RDF

Adrian: We published a lot of Swiss data to use for hackathon; last August our hackathon using RDF / Linked Data went off really well. They were very surprised how far they could get using RDF instead of data in CSV.

Want to show you can rapidly do useful things with RDF in Jupiter Notebooks.

Peter: YasGUI.org is really nice for interfacing with RDF data in development environments.

Steve: Also Zeppelin

Robert: Students are more impressed when they see visual stuff.

We can get mileage out of embracing the correlation between RDF and object-oriented programming models.

Peter: A useful tool is www.visualdataweb.org/relfinder to help capture people’s imagination of what is possible.

Adrian: Also ontopedia.org (or was it ontodia?)

Peter is curating the outcomes and recommendations in a Google PowerPoint document.

Niels: wasn’t able to attend the session, but do you know this site: https://data.labs.pdok.nl/stories/

Preview for Wednesday

Dave Raggett summarised the proposed agenda for Wednesday morning.

Posters and Demos

to be completed

Wednesday, 6 March

Summaries of Tuesday sessions

The chairs will reach out to session moderators for any slides presentations and session summaries, to add these into the minutes.

We listen to summaries from each of the break out sessions on Tuesday.

De facto interchange format, role of JSON-LD and need for more work on relation to LPGs, plenty of interest in RDF*/SPARQL*. Apache TinkerPop has a serialization format for graphs.   Also consensus around use of RDF* for handling property graphs.  JSON-LD is popular; 1.1 effort is ongoing.  Discussion around reification.

Focus on needs of middle 33% of developers, but how to accurately assess their needs? Can we learn from the experience with microData and the development of RDF/A as a reaction?  See summary on slide 54

Views of property graphs, PG queries on top of SQL queries, graph query language that does CRUD.

Reports on graphQL implementations, federation.  Reports on projects that map Cypher or SPARQL to Gremlin, intermediate representation with possibility of bidirectional.   on interop is important, and we need to put effort into it.  Mapping data models, query preservation.  To do more on query op we need understanding of how data maps to each other.  As long as PG model is not standardized, canonical PG model could help as an intermediate.  Nesting of data models.  Direct?  Customizable?  Standard could cover both.  Could also take a similar approach for query: use an abstract query model.  Discussion around how serialization could look.  Quick poll: majority favor PG model and standardized mappings to RDF.

Reminded of query language work -- regular queries, GXPath, restricted REs -- they influenced early work on query.  Pattern syntax and Gcore is converging in discussion on how graph query should go.  Composability of queries.  Gcore is academic exercise in taking this further.  In composability, many graph query languages are grah-in-table-out.  That means that they are not composable.  Need to return a graph to make them composable.  Gcore introduced the CONSTRUCT clause to return graph, with multiple types of nodes and edges.  Need to include grouping.  Paths are also important.  Path analytics and discovering them.  How do paths and graphs relate?  Gcore takes that to the extreme of saying that a path is a part of the data model and can be returned.  PG can have a path in it.  SPARQL has paths but cannot return a path.  Tractability: graph queries can have high complexity, e.g., all possible paths.  Other areas: simple paths, or ensure that paths do not have cycles.  These also increase complexity.  Proposal to exclude everything that is not tractable, and discussion about that.  People pointed out that even in SQL, people can write queries that do not finish.  Discussion of more advanced path queries, and demo of cypher with Apache Spark, returning query language features.  Q: Gcore excludes complex queries? A: Gcore does not allow all paths returned.

Small  session.  Peter and Ivan described standardization processes.  Bureaucratic differences but also similarities.  End result is determined by those who show up to do the work.  As ISO geek, this helped me understand how other stds processes work.  Q: How do people show up in ISO?  A: join your national body.  Many of them have mirror committees for e.g., SQL standard.  Q: Could that work for Tinkerpop, e.g. Gremlin?  A: Not in this session, but worth pursuing.  (how to involve open source projects in ISO work)

How to write the new standard?  SQL has the standard.  Can we follow their lead?  Yes, but issues: they used natural language spec (ambiguities).  Attempts now to formally define semantics of SQL.  Cypher has natural language description, but no standard yet.  But formal semantics was written.  Should we have parallel development of natural language and formal semantics?  Executable semantics?   Demo of one in prolog, and another demo.  Outcomes of discussion: we should do formal semantics of GQL (unanimous), but not to replace the natural language standard.  Issues: many artefacts.  Who maintains them?  Is formal semantics normative?  If not, then what?  How to coordinate?  Universities will not send someone to attend standards meetings.  Q: Was testing discussed?  A: Yes.  There are test suites for Cypher, but need reference implementation.  Q: I have been reading calls for research projects.  What would help academics participate in the standards process?  A: Stay in the EU.  :)   But yes, there is a gap.   Comment: In Sem Web work we had too many academics, perhaps we can aim for a balance.

Overlap with standards sessions.  How much work can get done in one CPU cycle?  Add, subtract, move one thing, etc.  But one line of high level code could be thousands of lines of machine code.  Favor human or computer?   If you favor human, where does it make sense to consider the workload of the computer?   Better imperative or declarative?  Conclusion: Lean toward declarative, but mix.  Even those terms are ambiguous.  Declarative shields the user from what is going on; imperative exposes them.  How standards?  NL spec?  Test suites.  No conclusions about what makes it easy, except test suites.  Every real system has computational limits.  Some things in the language are complex, and sometimes the user query creates complexity.  Should we prohibit some things?  Nobody said we should prohibit, but assess the complexity and perhaps warn users at compile or runtime.  Schemaless or schema first?  No such thing as schemaless.  Data always conforms to some kind of schema.  Nice to have systems support progressive solidification of schema.  System could perform better if schema is known, and humans benefit cognitively.  Queries and workload types, no standard for ACID in RDF world.  That's a requirement for PG commercial systems (transactional consistency).  How much is reasoning used?  Supported but not a goal.  Machine learning: feature vector exported, run outside (graph convolution or embedding), then results brought back in.  Not feasible to run ML directly in the graph.

There is general interest in rules.  Rules should be easy to write and read.  Based on existing syntaxes, perhaps SPARQL or N3.   Starting point should be easy rules (no datatype reasoning), then extend.  Should we combine rules with full power of a programming language, such as JavaScript?   Do we need procedural rules?  Should rules over RDF and PGs be over a common abstract layer?  Use cases for rules: aligning different vocabularies; data validation; compliance to laws, etc.  Advantages of rules: reasoning can be explained (proof trees, provenance); knowledge can be stated more concisely.  No rules language for PGs.  How to combine statistical reasoning with rules?  Future work: make existing rules landscape more understandable; define standard for rules on graphs.  Comment: Different applicability of rules languages.  Reasoning doesn't work.  Q: Where do you see rules living in relation to their data?  In the data?  As schema?  What benefits either way?  A: Personal opinion: living in the DB.  Comment: Users like rules, because they understand them.  Q: RIF does not discussed?  A: Mentioned. Worth a postmortem on RIF?

Presentations from PQ group.  Tomorrow we'll continue working on schemas.  Academic survey: start small, start from foundation, start with flexibility in mind.  How to use graphQL as schema language for PG.  Top desired feature?  Keep it simple; enable future extensibility; allow permissive vs restrictive, but keep simple; allow OWA and CWA.  Historically PGs were a reaction to the complexity of RDF, so keep it simple.  XML Schema group did not keep it simple.  Summary: Keep it simple!  

Domain specific session.  Historically we have waited for underlying standard before adding spatial, such as SPARQL  GeoSPARQL.  But need spatial in the data itself.  Lots of cases where things move around, so support for geospatial is important.  Temporal dynamics: graphs change.  Shouldn't we allow comparison of graphs, yesterday to today?  Modeling of spatial-temporal: mathematicians still dominate right now.  Need to become more user friendly.  Might only need a cell index, or proximity.  RDF stream processing group is reactivated.  OGC and W3C collaboration for all standards needed.  Q: Existing OGC standard sucks.  Should be completely redone: not pragmatic and hard to understand.  Q: Distinction made between schema and state?  A: No, distinguish between self-changing things and graph changes.  Both should be supported.  

11 people discussed.  Lots has been done in the past, but withered.  Also need to look at who is interested in this, to develop our market.  MDM is one example.  In data science, people consider regulation.  Need to recognize the most of use are from Western countries.  Training certificate industry has not picked up on graph DB training.  Look at that?  Other areas have created cookbooks, for example for gene cloning.  Are RDF vendor trainings working?  Fragmented.  Other efforts have been abandoned.  Reached several suggestions: 1. Get a list of common RDF use cases (3-5) to indicate directions available; these give routes from concrete starting points.  2.  interactive learning is needed, including gamifying learning.  3.  Low level RDF docs are good, but need higher level docs to the same standard.  4.  Need to move branded training effort into shared training resources.  Also, early learning materials used RDF/XML and that should be jettisoned.  5.  Need a central starting website that is well maintained, like react.js, or New4J.  Needs funding from vendors/consultants/other.  Needs curated materials.  6. Students appreciate Coursera etc courses (ZOOM comment: there are good SW available now e.g. the ones by HPI or INRIA).  If they don't see the topic there, they don't think it is important.  7.  Career/recruiting resources -- RDF jobs are not very visible in job postings.  Q: Graph analysts?   A: Yes.  Q; Funding?  What models available for funding a hub of content?  A; Not sure.  Need long term sponsorship, possibly from gov bodies.  Maybe link with other facilitating the use of tools or consulting services.  Comment: Several vendors were present and agreed that we should contribute common materials.  

Coffee break and hallway discussion on next steps

Extending, Incubating, Initiating

The final session.  Focus on summarising and drawing up recommendations for the workshop report. How much interest is there for participating in incubation and standardisation for different areas? What's the best way to ensure liaison and coordination across different standardisation organisations? What can be done to establish sustainable communities around education and outreach?

See Google slides whiteboard

Juan Sequeda:

  1. Propose:  RDF* & SPARQL * should be a W3C submission (Olaf Hartig)
  1. Follow up with a community group looking at consequences
  2. Follow up with WG to standardize
  3. WU Vienna and Fraunhofer expressed support for a member submission
  1. Propose: Define an abstract PG model (Juan Sequeda?) Where? ISO?
  1. Existing “Schema group” is currently informal. Should this go under W3C somehow?  This is the GQL community.  Could put this under LDBC. Josh S. will participate… represent TinkerPop
  1. Propose: Extension of JSON-LD to support PGs (Greg Kellogg)
  2. Propose: Mapping Relational to PG (may be outside W3C) (Juan Sequeda)
  1. This work is in flight in ISO.

Alastair:

  1. Propose: Community Group for CGore
  1. Already in ISO, why have one in W3C also?  To help W3C members not have to navigate ISO.  It is so much easier than ISO, apparently.  CG’s don’t require membership.

Ricardo:

  1. Propose: RDF Streaming
  1. Ref impl has been worked on for RDF and SPARQL re streaming
  2. Dormant last 1-2 years
  3. Ricardo volunteers to revive the group.

Ivan Herman: We must be careful to not have too many things going on in parallel that could change the RDF standard.

Thomas Lortsch:

  1. RDF Metamodeling has issue, e.g. named graphs not formally defined.

David Booth: Consider which of these proposals could be done with the existing RDF 1.1 vs. an RDF 2.0

Alan Bird:

  1. Proposes: We create a W3C Business Group to go through these issues, as a way to organize our attack.
  1. (Cost 200 USD as individual, $2000 biz (check tables for your location) ).
  2. No real commitment to do something w/ CGs.  BGs can do this better.  BGs set context for all the technical work that needs to take place.
  3. Charter is written.  That dictates type of folks who need to be in it.
  4. Has W3C member on the team to help through the process.
  5. BG would help decide what is worth doing
  6. BG could be a way to keep this week’s community together and moving forward.
  1. Note: W3C X is an education resource; curated and kept current. Need participation to build content.

General Consensus: PG model needs to come before RDF/PG interop.  But PG model should be informed by RDF so the sides of the “bridge” will connect properly when the time comes.  Would BG help with this?  BG could define task force to liasson between RDF and PG efforts.

Common biz goal is to increase the graph market overall.  So BG may be good place to ensure that stays at top of priority list.

Victor:

Sooner we can get PG formal data model, and RDF* and bridge the better for the graph community and businesses using graph.

80% of UCs sooner is best for community (over 100% taking 2 years longer)

Alastair:

Focus in next year to build the piers of the bridge: Formal model on PG side, properties on RDF side (e.g. RDF*).

So next year, can work on the interoperation.

Proposal: on glossary, very light weight, so communities can talk to each other.  

OUTCOME:

A W3C BG will be created to coord W3C technical work and liasson w/ ISO folks, etc. on PG/SQL sides.

LDBC will decide by July 2019 on how to coordinate

List of Attendees

The following may include a few people who were unable to turn up in person at late notice.