Meeting minutes
Scribe?
Recap discussion on subqueries
AndyS: last week discussed subqueries. how they relate to usage patterns. underlying question: what is the nature of variables inside subqueries that are not projected from the subqueries?
… related to parameterized queries and intuition that all occurrences of variables would change. cf. a more algebraic viewpoint where anything inside a subquery which is not projected can be renamed, and no change to query outcome.
… in terms of values insertion, it's whether you do or do-not do the renaming step.
… if you perform renaming, you get the more algebraic viewpoint. otherwise you get the parameterized query behavior.
pfps: other option is just doing shallow values injection.
… options: no values injection (qlever); values injection everywhere (without renaming); values injection (with renaming); values injection only at the top.
AndyS: You have to decide shallow vs. not anyway.
… Qlever doesn't do substitute at all?
pfps: yes. I put comments in the issue.
… afaict, it does MINUS, essentially.
… that's the easy thing to do.
AndyS: I would say it's not "easy" if it doesn't do what user expects.
pfps: Qlever implemented exists in response to complaint from me. Done very quickly.
james: I look at queries and wanted to satisfy curiosity on what people use EXISTS for.
… there's a file in the repo called exist statistics(?) and the numbers are very low.
… you can also look into individual directories and there's a txt file in each which shows the number of unique queries.
… literally every single query that has been run on respective hosts for at least 5 years.
<pfps> where is the file?
james: they are hash coded, so unique. of that, I used javascript rdf tools to parse, hash-code, and reserialize every exists form.
… output isn't perfect. places that are not SPARQL (json means re-serialization wasn't possible)
… surprised that didn't find any attempt to do a sub-select.
… mostly two triples. three triples. occassionally 4-triple BGPs. but very few.
<AndyS> email -- "EXISTS variants" -- 24/06/2025, X:02
james: my concerns about complexity are in some sense not well founded.
<pfps> https://
james: if you look at the csv file in the repo, even the ones that are 4-statement BGPs, if you run into large cases, those won't behave nicely with an iterative process.
… that's my concern.
… this is produced by a javascript script. all it cares about is finding a directory of queries. others can run it.
pfps: the serializer doesn't work all the time?
james: yes, that's a problem with the serializer.
pfps: I found a fair number of "SELECT", but those are all serializer issues.
james: I'd have to look more closely to know why they failed.
<pfps> EXISTS {"queryType":"SELECT","type":"query","variables":[{"termType":"Variable","value":"v0"}],"where":[{"expression":{"termType":"NamedNode","value":"http://
pfps: I found this [pasted example]
james: I did not collect the hash code of this example. Would need to look to find context.
james: if that is a sub-select, that migth be the only one (or two).
… 6 select forms in there. I can look at them and find the actual context.
pfps: also could look for embedded groups.
… where you have a left { and right }.
<AndyS> e.g. {} UNION {}
pfps: that's another place where normal execution of SPARQL introduces a new variable context.
james: I'd expect to see a join in that case.
… it's preliminary, but shows information.
pfps: groups are introduced in some places, operations. MINUS introduces a new group.
AndyS: yes, but your characterization of introducing new variables misses the fact that it's because of bottom-up execution.
… UNION would introduce. BIND has a join (relative to the things on the left).
… VALUES also.
james: other thing not attempted is disconnected variables.
… not good enough at the tooling to do that.
AndyS: most important thing is the observation of how rare EXISTS is at all.
… you dont' support remote SHACL on your stores?
james: no support for SHACL.
AndyS: some implementations can run client side, but run large number of calls.
… possible because all of SHACL 1.0 have a definition in SPARQL.
… not true with 1.2.
… somewhere in SHACL there is something which is effectively EXISTS.
… generally very fine-grained.
olaf: in the CSV files, there's a column called complexity?
james: that is looking at how many operators were in the parsed version. a single join is 1. second join 2. other operators similarly.
… in the EXISTS pattern.
… no sense of what's being fed into the EXISTS pattern.
AndyS: interesting one of them has BOUND in it.
james: I could make a column referring to which query it came from with context.
… with respect to numbers, the txt files show places where 0.5M queries. 100 EXISTS forms of them. prevelance is very low.
AndyS: Qlever doesn't support exists. Other engines?
pfps: you can look at the issue (156).
… the last comment has 4 different queries and what happens with them in jena, blazegraph, and qlever.
… [summarizing details in issue]
<olaf> w3c/
<gb> Issue 156 Addressing SPARQL EXISTS errata (by afs) [ErratumRaised]
<AndyS> w3c/
AndyS: can't read too much into qlever results if it's really doing a MINUS.
pfps: hannah had said MINUS is the right thing to do. Not sure there's much support for that.
AndyS: that would be more change for users.
… couldn't consider that an errata at that point.
pfps: hoped to get virtuoso, too. but endpoing has been down.
… maybe it has been moved.
… created queries to not depend on the data. should be able to run them on any data and get same results.
TallTed: I think that's true. Send me an email if that continues to be true.
TallTed: there are a bunch of public endpoints.
pfps: can run it on any endpoint.
… will try it on virtuoso again.
<TallTed> this one is likely to be up -- https://
pfps: Qlever endpoint has a link to the old virtuoso wikidata endpoint.
<pfps> Yes, the QLever endpoint thinks Virtuoso Wikidata is at https://
AndyS: summarizing: top-level or everywhere for variable injection.
olaf: can you explain using queries pfps has in the issue?
AndyS: effect means that you only encounter injected value after you've done some evaluation.
… evaluation may be a filter. so any variables mentioned in the filter and e.g. on branch of a union could be unbound at that point.
… if you put it at beginning of ALL BGPs, then it is available to the filter.
olaf: you're talking now about substitution?
AndyS: any mechanism.
… or values insertion.
… mechanism for doing the correlation.
<pfps> https://
AndyS: if we move away from that, then we're not changing the filter expression, just making sure the value is available during evaluation.
… in the algebra, every group pattern starts with an empty BGP. one way of looking at it is to start with a VALUES instead of an empty BGP.
olaf: distinction between shallow and deep is (looking at last example in issue), before the deepest select, in shallow there would be a VALUES and only there?
<AndyS> EXISTS { SELECT ==> EXISTS { HERE join { SELECT ==> }}
pfps: in shallow one, you get VALUES right at the beginning. that VALUES does not penetrate into sub-select or the embedded group.
… in deep one, you get VALUES at beginning of every group.
… in second and third, you get values injection in two places. at top and inside the braces.
pfps: in shallow one, only after the starting left brace of the EXISTS
AndyS: in algebra terms, into every leaf of the tree.
… techniques hyper uses should be applicable. but they are particularly for decorelating queries.
… should be possible to do it, but I haven't.
james: how does that apply to the fourth query?
<pfps> Ted: I sent you an email asking for the new Virtuoso endpoint.
<james> PREFIX ex: <https://
<james> SELECT ?x ?y ?z WHERE {
<james> VALUES (?x ?y ?z) { ( ex:a ex:b 7 ) }
<james> FILTER EXISTS { SELECT ?k WHERE { SELECT ?z WHERE { FILTER ( ?z = 7 ) } } }
<james> }
<AndyS> EXISTS { ^^^ join SELECT ?k WHERE { ^^^ join SELECT ?z WHERE { ^^^ FILTER ( ?z = 7 ) } } }
AndyS: do it at beginning of every BGP
james: even though the SELECT ?k prevents the ?z from being injected?
pfps: inner ?z changes to ?z' (or something). still do injection, but it doesn't do anything since innermost filter has been changed to use ?z'
AndyS: all the ?z would change.
pfps: I could shorten that example.
pfps: status of getting somebody from Qlever to these meetings?
AndyS: have previously sent invites.
… we have their input. detailed comment on the issue.
TBD
AndyS: we got top- vs. everywhere. and to rename or not. any other key areas for discussion?
pfps: haven't seen anything to indicate anything else.
james: wasn't there an issue about process being bottom-up or top-down?
pfps: that's covered by the one AndyS just mentioned.
… it's on the issues list.
AndyS: "any execution is top down" is confusing wrt. rest of the spec.
… top-down comes from the first last call of SPARQL 1.0.
AndyS: will work on agenda and maybe strawpolls for next time.