RCH WG – 12 October 2022

Meeting minutes

phila: Any new attendees?

Today: comparing the algorithms, deciding editors,

<phila> https://github.com/w3c/rch-rdc/issues/6

phila: How do we compare the two algorithms
… described as being similar
… as a group we need to decide how to proceed
… various ideas - need to be open and fair.

<Zakim> manu, you wanted to suggest one approach to compare algorithms.

phila: if the graph is simple then it is a simple algorithm, as bnode structures increase it gets harder.

manu: Aiden presented his algorithm. Could compare algorithm A and B work as the stages are similar.
… run both in parallel
… guilded tour of the process

gkellogg: if we able to pick some examples to run through would be useful
… complex category of graphs - set of promlematic use cases - performance testing

<Zakim> dlongley, you wanted to ask about criteria for making choices if we knew what the differences were to understand what differences to look for

dlongley: if we figure out the differences - need to keep on mind criteria.
… how formal do we need to get in understanding the differences
… worried about the amount of work

manu: add an input:
… formal analysis

<manu> Technical Report on the Universal RDF Dataset Normalization Algorithm: https://lists.w3.org/Archives/Public/public-credentials/2021Apr/att-0032/Mirabolic_Graph_Iso_Report_2020_10_19.pdf

manu: we might consider bringing in that group
… introduction to the formal analysis

AndyS: You mentioned - might not be all graphs that were covered. I think we ought to target all graphs as you don't know what you'l encounter in the real word

<Zakim> manu, you wanted to comment on the "all graphs" thing

dlongley: solve for all graphs (within resource limits)
… by default solve all "normal graphs"
… special flag for all graphs

<Zakim> manu, you wanted to comment on the "all graphs" thing -- concerns around "as big as the web"

AndyS: I'd push back a little on that as it means deciding what is and is not normal

<dlongley> "don't (try to) canonicalize the Web"

manu: at a higher level ... potential formal objections on charter ... e.g. very very large graphs
… working on documents that are bounded
… general algorithm ... state caveats e.g. not unbounded graphs
… "poison graphs" as an attack vector.
… we can eat up a lot of time on this.
… scoping of graph needed

phila: we have to limit the scope
… we could create an algorithm for all but not the requirement

Explainer doc

phila: UCR says what we are trying to solve
… (editors needed)

<manu> +1 to explainer document to set the boundary of what we're trying to do.

<gkellogg> SHACL doesn't do datasets, only graphs.

phila: is there a condition we can do as a preprocessing graph

AndyS: If you take a FOAF graph built up from bnodes - can become complex in a small file
… I'd rather an approach that recognizes that sometimes you can't execute, rather than defining upfront what you can't compute

<Zakim> dlongley, you wanted to say i think you'd have to formally prove a preprocessing step would protect you if there will be no false safe constraints in the processing algorithm

dlongley: a preprocessing step would need proving

<Zakim> manu, you wanted to speak about "multiple phased solutions" not THE algorithm.

manu: we are not generating one algorithm. There exists today some impls in the field.
… we might look at whether it is good enough
… then consider next version
… not all or nothing

AndyS: What are the limitations? Assumption?

<Zakim> dlongley, you wanted to say we also know that RDF-star is coming -- and we'll need another algorithm for that

dlongly: current limitations/assumption URDA2015 - any bound dataset
… bail out at cost points.

AndyS: I'm happy with bailing out. But you can go further and say it doesn't handle all graphs. I'm happy with all graphs, with a bail out if it takes too much computing

AndyS: Defining a shape before hand is not something we should do

<manu> +1 to what AndyS is saying -- sounds like we're agreeing :)

phila: Others?

Kazue: thinking external criteria hard to decide

phila: and it is political

yamdan: also important to be clear about processing.
… A difference of the two algorithms is scope - dataset vs graph.

<manu> +1 to yamadan's points.

dlongley: criteria important. Formally defining the differences is itself difficult.

<Zakim> gkellogg, you wanted to suggest identifying specific categories of graphs in our hypothetical dataset that are known to create computational problems.

phila: please think of two criteria

gkellogg: want a collection of cases beyond test cases e.g. known expensive.

<dlongley> 1. ease of implementation, 2. existing incubation / use in the marketplace, 3. time / resource complexity in solving common datasets, 4. time / resource complexity in solving complex (or poison?) datasets

dlongley: not an ordered list

<manu> 5. Existence of formal proofs for the algorithms

<manu> 6. Demonstration of review of formal proofs for the algorithms

phila: easy of implementation - yes.
… incubation - yes
… resource complexity - yes
… formal proofs - yes

AndyS: Ease of implementation and complexity of algorithm can be in opposition

<dlongley> yes, there is a tension between ease of implementation and time complexity (sometimes)

<manu> +1 to create an issue to track this.

<dlongley> 7. reusing existing primitives that are available on various platforms

<Kazue> coverage of target RDF?

dlongley: reuse primitives e.g. hashing algorithms.
… existing RDF serialization.

Kazue: cover real life examples

phila: need to note that only usual graph trigger the failsafes.

<Zakim> manu, you wanted to note "hashed data" as the output... for BBS.

manu: BBS signature do a statement by statement signature

<dlongley> 8. allow signatures on individual statements and components of statements

manu: criteria: has to support selective disclosure. Hashing alternatves.

<Zakim> AndyS, you wanted to give criteria

<yamdan> +1 to BBS-friendly hash

AndyS: Dataset, not graph, no shape excluded, cover RDF-star

<gkellogg> +1 to AndyS

AndyS: Translates as do stuff with the longest life

<manu> I was with AndyS all the way up to "cover RDF-star" :)

<gkellogg> Also, Generalized RDF (bnode predicates, literal subjects)

+1 to gkellogg.

dlongley: RDF-star. Do existing use cases.

phila: rdf-star is a nice to have but should not fail because of rdf-star

URDNA2015 FPWD

phila: URDNA2015 as FPWD.
… likes explanatory examples.

Editors

phila: need to do a test suite and an explainer.

<Zakim> gkellogg, you wanted to volunteer to edit one or both of the documents and help with the test suites.

gkellogg: have been active in CG
… hat in the ring

For the C14N spec: ...
… For the C14N spec ...

<manu> Thank you, Gregg for Editor-ing the canonicalization spec! :)

dlongley: can contribute as backup editor

phila: any one like to be an editor or contribute in some way.
… hash doc
… "RDH"

<Zakim> manu, you wanted to note they might be the same doc?

manu: might be the same doc.
… hashing is C14N input, hash it. -- one page?

<manu> Woo! Thanks Tobias for volunteering to be an Editor!

Tobias_: happy to help edit esp hashing

<dlongley> +1

<manu> (for the second part)

<pchampin> +1

<Zakim> gkellogg, you wanted to discuss testing implications.

gkellogg: tesring may be easier as 2 docs

<pchampin> a contrario, the C14N itself may be a complex document. That could justify keeping the hashing part out.

yamdan: interested in hashing part

<dlongley> +1 to Phil

phila: end meeting

<manu> Note: Ahmad Alobaid volunteered to be a first-time Editor in this group.

<pchampin> > rrsagent, draft minutes

– DRAFT –
RCH WG

12 October 2022

Attendees

Meeting minutes

Editors

Diagnostics