W3C

– DRAFT –
SPARQL-TF

27 June 2025

Attendees

Present
AndyS, gtw, james, olaf, pfps, TallTed
Regrets
-
Chair
AndyS
Scribe
gtw

Meeting minutes

Scribe?

Recap discussion on subqueries

AndyS: last week discussed subqueries. how they relate to usage patterns. underlying question: what is the nature of variables inside subqueries that are not projected from the subqueries?
… related to parameterized queries and intuition that all occurrences of variables would change. cf. a more algebraic viewpoint where anything inside a subquery which is not projected can be renamed, and no change to query outcome.
… in terms of values insertion, it's whether you do or do-not do the renaming step.
… if you perform renaming, you get the more algebraic viewpoint. otherwise you get the parameterized query behavior.

pfps: other option is just doing shallow values injection.
… options: no values injection (qlever); values injection everywhere (without renaming); values injection (with renaming); values injection only at the top.

AndyS: You have to decide shallow vs. not anyway.
… Qlever doesn't do substitute at all?

pfps: yes. I put comments in the issue.
… afaict, it does MINUS, essentially.
… that's the easy thing to do.

AndyS: I would say it's not "easy" if it doesn't do what user expects.

pfps: Qlever implemented exists in response to complaint from me. Done very quickly.

james: I look at queries and wanted to satisfy curiosity on what people use EXISTS for.
… there's a file in the repo called exist statistics(?) and the numbers are very low.
… you can also look into individual directories and there's a txt file in each which shows the number of unique queries.
… literally every single query that has been run on respective hosts for at least 5 years.

<pfps> where is the file?

james: they are hash coded, so unique. of that, I used javascript rdf tools to parse, hash-code, and reserialize every exists form.
… output isn't perfect. places that are not SPARQL (json means re-serialization wasn't possible)
… surprised that didn't find any attempt to do a sub-select.
… mostly two triples. three triples. occassionally 4-triple BGPs. but very few.

<AndyS> email -- "EXISTS variants" -- 24/06/2025, X:02

james: my concerns about complexity are in some sense not well founded.

<pfps> https://github.com/datagraph/SPARQL-exists/tree/main/test-tools

james: if you look at the csv file in the repo, even the ones that are 4-statement BGPs, if you run into large cases, those won't behave nicely with an iterative process.
… that's my concern.
… this is produced by a javascript script. all it cares about is finding a directory of queries. others can run it.

pfps: the serializer doesn't work all the time?

james: yes, that's a problem with the serializer.

pfps: I found a fair number of "SELECT", but those are all serializer issues.

james: I'd have to look more closely to know why they failed.

<pfps> EXISTS {"queryType":"SELECT","type":"query","variables":[{"termType":"Variable","value":"v0"}],"where":[{"expression":{"termType":"NamedNode","value":"http://example.org/x0"},"type":"bind","variable":{"termType":"Variable","value":"v0"}},{"expression":{"args":[{"termType":"Variable","value":"v1"},{"termType":"NamedNode","value":"http://example.org/x1"}],"operator":"=","type":"operation"},"type":"filter"}]}

pfps: I found this [pasted example]

james: I did not collect the hash code of this example. Would need to look to find context.

james: if that is a sub-select, that migth be the only one (or two).
… 6 select forms in there. I can look at them and find the actual context.

pfps: also could look for embedded groups.
… where you have a left { and right }.

<AndyS> e.g. {} UNION {}

pfps: that's another place where normal execution of SPARQL introduces a new variable context.

james: I'd expect to see a join in that case.
… it's preliminary, but shows information.

pfps: groups are introduced in some places, operations. MINUS introduces a new group.

AndyS: yes, but your characterization of introducing new variables misses the fact that it's because of bottom-up execution.
… UNION would introduce. BIND has a join (relative to the things on the left).
… VALUES also.

james: other thing not attempted is disconnected variables.
… not good enough at the tooling to do that.

AndyS: most important thing is the observation of how rare EXISTS is at all.
… you dont' support remote SHACL on your stores?

james: no support for SHACL.

AndyS: some implementations can run client side, but run large number of calls.
… possible because all of SHACL 1.0 have a definition in SPARQL.
… not true with 1.2.
… somewhere in SHACL there is something which is effectively EXISTS.
… generally very fine-grained.

olaf: in the CSV files, there's a column called complexity?

james: that is looking at how many operators were in the parsed version. a single join is 1. second join 2. other operators similarly.
… in the EXISTS pattern.
… no sense of what's being fed into the EXISTS pattern.

AndyS: interesting one of them has BOUND in it.

james: I could make a column referring to which query it came from with context.
… with respect to numbers, the txt files show places where 0.5M queries. 100 EXISTS forms of them. prevelance is very low.

AndyS: Qlever doesn't support exists. Other engines?

pfps: you can look at the issue (156).
… the last comment has 4 different queries and what happens with them in jena, blazegraph, and qlever.
… [summarizing details in issue]

<olaf> w3c/sparql-query#156 (comment)

<gb> Issue 156 Addressing SPARQL EXISTS errata (by afs) [ErratumRaised]

<AndyS> w3c/sparql-query#156 (comment)

AndyS: can't read too much into qlever results if it's really doing a MINUS.

pfps: hannah had said MINUS is the right thing to do. Not sure there's much support for that.

AndyS: that would be more change for users.
… couldn't consider that an errata at that point.

pfps: hoped to get virtuoso, too. but endpoing has been down.
… maybe it has been moved.
… created queries to not depend on the data. should be able to run them on any data and get same results.

TallTed: I think that's true. Send me an email if that continues to be true.

TallTed: there are a bunch of public endpoints.

pfps: can run it on any endpoint.
… will try it on virtuoso again.

<TallTed> this one is likely to be up -- https://dbpedia.org/sparql

pfps: Qlever endpoint has a link to the old virtuoso wikidata endpoint.

<pfps> Yes, the QLever endpoint thinks Virtuoso Wikidata is at https://wikidata.demo.openlinksw.com/sparql

AndyS: summarizing: top-level or everywhere for variable injection.

olaf: can you explain using queries pfps has in the issue?

AndyS: effect means that you only encounter injected value after you've done some evaluation.
… evaluation may be a filter. so any variables mentioned in the filter and e.g. on branch of a union could be unbound at that point.
… if you put it at beginning of ALL BGPs, then it is available to the filter.

olaf: you're talking now about substitution?

AndyS: any mechanism.
… or values insertion.
… mechanism for doing the correlation.

<pfps> https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/Alternative_endpoints also has the old endpoint for Virtuoso

AndyS: if we move away from that, then we're not changing the filter expression, just making sure the value is available during evaluation.
… in the algebra, every group pattern starts with an empty BGP. one way of looking at it is to start with a VALUES instead of an empty BGP.

olaf: distinction between shallow and deep is (looking at last example in issue), before the deepest select, in shallow there would be a VALUES and only there?

<AndyS> EXISTS { SELECT ==> EXISTS { HERE join { SELECT ==> }}

pfps: in shallow one, you get VALUES right at the beginning. that VALUES does not penetrate into sub-select or the embedded group.
… in deep one, you get VALUES at beginning of every group.
… in second and third, you get values injection in two places. at top and inside the braces.

pfps: in shallow one, only after the starting left brace of the EXISTS

AndyS: in algebra terms, into every leaf of the tree.
… techniques hyper uses should be applicable. but they are particularly for decorelating queries.
… should be possible to do it, but I haven't.

james: how does that apply to the fourth query?

<pfps> Ted: I sent you an email asking for the new Virtuoso endpoint.

<james> PREFIX ex: <https://example.com/>

<james> SELECT ?x ?y ?z WHERE {

<james> VALUES (?x ?y ?z) { ( ex:a ex:b 7 ) }

<james> FILTER EXISTS { SELECT ?k WHERE { SELECT ?z WHERE { FILTER ( ?z = 7 ) } } }

<james> }

<AndyS> EXISTS { ^^^ join SELECT ?k WHERE { ^^^ join SELECT ?z WHERE { ^^^ FILTER ( ?z = 7 ) } } }

AndyS: do it at beginning of every BGP

james: even though the SELECT ?k prevents the ?z from being injected?

pfps: inner ?z changes to ?z' (or something). still do injection, but it doesn't do anything since innermost filter has been changed to use ?z'

AndyS: all the ?z would change.

pfps: I could shorten that example.

pfps: status of getting somebody from Qlever to these meetings?

AndyS: have previously sent invites.
… we have their input. detailed comment on the issue.

TBD

AndyS: we got top- vs. everywhere. and to rename or not. any other key areas for discussion?

pfps: haven't seen anything to indicate anything else.

james: wasn't there an issue about process being bottom-up or top-down?

pfps: that's covered by the one AndyS just mentioned.
… it's on the issues list.

AndyS: "any execution is top down" is confusing wrt. rest of the spec.
… top-down comes from the first last call of SPARQL 1.0.

AndyS: will work on agenda and maybe strawpolls for next time.

Minutes manually created (not a transcript), formatted by scribe.perl version 244 (Thu Feb 27 01:23:09 2025 UTC).

Diagnostics

Succeeded: s/ticket/issue/

Succeeded: s/algenra/algebra/

All speakers: AndyS, james, olaf, pfps, TallTed

Active on IRC: AndyS, gtw, james, olaf, pfps, TallTed