RCH WG – 07 December 2022

Meeting minutes

markus_sabadello: Introductions? Is there anyone here for the first time?

markus_sabadello: I don't think so, I think everyone has been here before.

markus_sabadello: Agenda review... I don't think we have a complicated agenda, just going over issues. Maybe talk about a potential face to face meeting. Before that, I just want to ask if anyone in the group has any news or updates.

markus_sabadello: Any other agenda items?

manu: Thanks Markus. Before Thanksgiving, we had a meeting in the VCWG.

manu: I think we've hit a bit of a snag. I think it will probably resolve itself with time but this group should know about it. We proposed bringing in the Ed25519Signature2020 crypto suite and the proposal failed. It failed for a couple of reasons, I take the blame for some of that, we didn't give a good background on what the cryptosuite was about. There were -1s from Microsoft and +0s from many other participants.

manu: This presents as some of a problem because we were going to have multiple cryptosuites pulled into the WG, and for a variety of reasons that's a problem now. One of the reasons for the failure was that some people in this group weren't in that group. That cryptosuite that failed is going into thousands of retail locations next year into production.

manu: It's been incubating over years with at least 5 implementers. The fact that something like that would be rejected as an editors draft is somewhat shocking.

manu: I've been talking to implementers and getting more data for the VCWG. I presumed the group would have known that but they didn't.

manu: So we're getting some desire written down that this become a standard.

manu: I wanted this group to be aware and things aren't going as smoothly as we had hoped in the VCWG and we might need support from this group.

manu: For those cryptosuites.

markus_sabadello: This cryptosuite is based on the data integrity stuff and therefore relying on the dataset canonicalization and hashing. Right?

manu: Yes, correct.

markus_sabadello: Thanks.

ivan: For my understanding, from this working group's point of view, this doesn't affect the work done here.

manu: Two answers. At a high level, potentially no. But if we see more data integrity crypto suites rejected from the VCWG, there may be a question on whether the work here has any relevance. If there's only one cryptosuite that's going through VCWG and that gets into trouble we could have issues -- we could end up with none. Things got pretty hostile before the Thanksgiving meeting ... what I thought was very unlikely which was no data integrity

crypto suites going through the group but now that's possible. That affects this group.

manu: I don't think that's where it's going but hopefully the group will open up to letting more data integrity cryptosuites going through, we only have one today.

<pchampin> +1 to what Ivan says

ivan: Yes and no. I don't think we need to spend a lot of time on this, we also need to be careful to present this WG as not being dependent on VC. What we have here is useful and important outside of the VCWG. We shouldn't give a one-sided image.

<manu> Agree with Ivan

ivan: We can be hit with that later. The problem in the VCWG is real and we have to know about.

ivan: When you sort of hint that without that there's no value or reason to the work in this WG, I absolutely informally and formally object to that statement. Just to be clear.

ivan: We have to realize we're working for a larger community than VC.

manu: I agree with you, it just opens up the ability for others to say that.

<manu> +1 to Ivan

ivan: I agree with you, just separate the concerns.

markus_sabadello: Manu, this is important information. But you mentioned one cryptosuite is in VCWG, is it the Ed25519Signature2020 one?

manu: No, that one got rejected. The one that's in is JsonWebSignature2020. There was a presumption that those in the charter would make it, but they are not making it.

markus_sabadello: Ok, thanks for the update.

markus_sabadello: I think you transferred all the open issues from the CCG repo for the original input document to this WG's repo.

manu: No, I closed them. We could choose to transfer them.

manu: If we were going to transfer them we should have transferred the whole repo. Some of the issues have no relevance anymore. I didn't carefully go through the issues one by one. I just closed the issues and archived the repo.

<markus_sabadello> Archived CCG repo: https://github.com/w3c-ccg/rdf-dataset-canonicalization/

manu: We can always undo that, we can unarchive it, reopen the issues but it seemed like it was more important to prevent new issues getting opened there.

markus_sabadello: Ok, the CCG repo is archived now, if anyone wants to go over those and check if anything is still relevant or still out there that would be good.

dlehn1: The issues are no longer visible on that repo because the issues were shut off. Is it possible to make them still visible? You can't even see what they were, the old links are broken.

manu: We can undo everything I did if I did something wrong -- I thought you could still see issues.

manu: I will look.

markus_sabadello: It looks a bit strange, it doesn't even have a tab for issues, never seen that before.

<ivan> +1 to redirection

manu: Yeah, that's not there, it's not there anymore, it may be if you have no issues open when you archive it removes the tab. The other issue for the group is that I didn't update a redirect. The spec deployed at the location is the final group specification, but I'd imagine we'd want to redirect to the W3C WG spec.

manu: If no one objects, I will unarchive the repo, I will set up the redirect, and try to make it so the issues show up again and then rearchive.

manu: Any objections to that approach?

gkellogg: Maybe an issue template would do a redirect as well.

manu: Well, if it's archived you can't create new issues.

gkellogg: I guess it depends why that's off.

TallTed: There is an issue enable / disable thing and that might have been accidentally hit. Typically that doesn't conceal issues you just can't open them.

ivan: Yeah, that's surprising Manu will find it.

markus_sabadello: Next topic, Phil and I have been talking about potentially having a F2F meeting, no concrete proposals or dates. Just to remind the group that there's a possibility to do that. The WG can meet in person, it would probably be useful if there's some deep hard topics that really benefit from focus time where the group can work together.

markus_sabadello: If there's some complex open issue about the algorithm or something like that. Like I said, I don't have a concrete proposal or date so far.

<manu> I fixed Issues in CCG RDC -- https://github.com/w3c-ccg/rdf-dataset-canonicalization/issues (and re-archived for now).

<manu> I had clicked off the "issues" feature in General Settings... it's re-enabled, you can see closed issues now.

markus_sabadello: Just want to bring up the possibilities for the group to think about -- maybe start thinking about what time or event the F2F meeting could be aligned or co-located with. Just a reminder that this is a possibility so if anyone has thoughts or inputs / suggestions how and when that should be done, feel free to share it the rest of us.

markus_sabadello: Manu has already fixed the issues in the archive repository.

manu: I tend to push back against F2F meeting especially with groups that have been working together for a long time. If we are going to have a F2F meeting it might make sense to put it in the same place as the VCWG F2F, I know Brent is still searching for a date for that in Q1.

manu: To hold a F2F meeting around the same time.

markus_sabadello: If there's really some topic that we feel that an intense / more focused meeting is needed.

markus_sabadello: If all the issues are relatively straightforward then it's probably not necessary for everyone to be physically in the same place.

markus_sabadello: Let's just consider it as an option.

+1 to no F2F unless we feel it's really necessary

<markus_sabadello> https://github.com/w3c/rdf-canon/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc

(because of hard issues we couldn't otherwise solve)

markus_sabadello: Looking at issues now.

markus_sabadello: This is sorted by most recently updated, we'll see if this is the best order to go over these or not.

markus_sabadello: Editors, gkellogg do you feel there are any issues that would benefit most from a quick discussion on the call?

markus_sabadello: Or should we just go down the list?

gkellogg: I don't think there's anything there that really needs controversy or needs being resolved.

<gkellogg> https://github.com/w3c/rdf-canon/issues/48

gkellogg: There's an issue that just came up the other day, #48, on functional implementation.

<markus_sabadello> https://github.com/w3c/rdf-canon/issues/48

gkellogg: It's a recent implementation that was done in Elixir.

gkellogg: Which is purely functional, I think.

gkellogg: One observation he had was that he found that the code involved in the issue wasn't exercised by the test suite. He found that there potential paths in the N-Degree hash quads algorithm that weren't exercised in the test suite.

gkellogg: That at least points to some missing coverage, I think.

markus_sabadello: And you added a label for that, right?

markus_sabadello: Other than that, it's just in terms of the specification, it's an editorial thing?

gkellogg: Yeah, I don't think that any change ... it's just a question of what's normative or what's not.

gkellogg: It doesn't change the results, but it changes some steps, it might be substantive, but that doesn't really matter at this point. It would only make sense to do this if we felt it was advantageous to describe the algorithm in a functional manner which we are not bound to do at all.

markus_sabadello: So the potential action is to add an optional functional description of the algorithm in addition to what's already here.

gkellogg: I suggested that that might be a note as a group note or member submission to describe that. I don't know how important that is, just interesting thing.

markus_sabadello: Anything else to highlight?

ivan: I really have a respect for functional programming, but we have to realize that most of the world doesn't use it and I don't want to open a religious discussion here, but the fact is that this is no so widely used. From our point of view, I think our work should be clear and understandable in the more traditional languages, including even Rust.

ivan: I think that's really where our leaders are.

ivan: That's what should govern us. I am not sure exactly what he would change, but if the changes would propose would lead to a more awkward way of doing it in a non-functional way then we shouldn't do it.

gkellogg: I can summarize the method if you're interested.

ivan: Let's take that aside.

gkellogg: I think it's in the issue.

markus_sabadello: I think it could be a potential note.

ivan: That's perfectly fine.

<Zakim> TallTed, you wanted to suggest least-recently-updated sort to maintain churn of old issues, rather than always focusing on issues that are already getting attention https://github.com/w3c/rdf-canon/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-asc

TallTed: I found in a number of other groups that sorting issues by least recently updated is helpful in keeping the oldest stuff churning instead of focusing on the hottest recent thing.

<manu> +1 to TallTed "least recently updated" approach.

TallTed: Obviously focus can be drawn elsewhere but the sorting "least recently" helps.

markus_sabadello: I agree with Ted.

markus_sabadello: We'll improve that going forward. For now, let's stick with this and change it next time.

markus_sabadello: Any other issues we want to highlight or explicitly talk about?

gkellogg: The only other thing I think is the timeline for choosing another algorithm is starting to recede further in the past, in my opinion.

markus_sabadello: There's an issue for that #6.

gkellogg: And #10.

<markus_sabadello> https://github.com/w3c/rdf-canon/issues/6 and https://github.com/w3c/rdf-canon/issues/10

markus_sabadello: This has been a question since the start, since the beginning of the group.

markus_sabadello: You're saying that it will become increasingly more difficult to switch to another algorithm. We make a choice to start working with one, which was the output document from the CCG. We said we'd maintain the option to change the algorithm if the WG feels that's necessary or advantageous. You're saying that will get increasingly difficult.

gkellogg: We've invested so much in documenting and fixing the issues, we could start over, for my part, that's starting like a daunting thing to do without any compelling reasons.

gkellogg: So are there any analysis of short comings of this algorithm compared to Aidan's for example. Is there something it doesn't do properly or some large order of magnitude in its computation that's difference -- but I don't know of any analysis. I think Ivan is the only one that has implemented both algorithms.

<Zakim> manu, you wanted to ask about a "missing issue"?

manu: I can go after Ivan, mine is another issue.

ivan: Indeed I did implement Aidan's stuff, but it was several years ago.

ivan: As far as the ease of implementation, I think the two or more or less on the same level. Aidan's I spent about the same time back then as what I spent to do this one.

ivan: I didn't make any measurement or so, so I can't really comment on the aspects of efficiencies or these things. In a way I cannot really add anything with this background.

ivan: If it had been 10 times easier this way or with Aidan's, that would be a data point, but that's not the point. I can't really say anything.

ivan: I do not know -- but I may be wrong, I do not know of any implementations, really widely used of Aidan's algorithm. Apart from his thing. I don't think anyone used mine. I haven't seen traces of that.

ivan: Which is obviously not the case for the one we're working on now. That's clearly an important data point.

ivan: Frankly, I'm not saying that tossing a coin is the best approach, but I really don't know. I'm looking at #10 on the various points in the issues.

ivan: You know, it's the same thing.

ivan: Sorry, one thing is -- but it's not a big deal, formally the paper on Aidan was on graphs and not on datasets.

ivan: When I implemented it, I added datasets more or less, it was obvious to do that. But what I did was never reviewed formally by anyone.

ivan: So that may be a small difference between the two.

markus_sabadello: Thanks, that's valuable to hear they are similar.

markus_sabadello: That's a reminder for everyone who is qualified and cares about this topic to maybe review them again.

markus_sabadello: Anyone in the group can always make a proposal on which algorithm should be used.

markus_sabadello: Or how it can be improved.

markus_sabadello: Thanks also again for Gregg for the heads up that if the WG wants to change that, it should be done soon.

markus_sabadello: Otherwise the default mode of operation will be to continue with the draft.

<Zakim> manu, you wanted to note HMAC/BBS requirement on algorithm and if its tracked.

manu: I agree broadly with what Ivan noted. I had a totally different question. We're looking at issues. I don't see one that I feel is pretty important

manu: With the BBS selective disclosure / unlinkable scheme, there's this new feature that we were hoping to add, but one that's important for unlinkability.

manu: It has to do with HMACing or salting the blank node IDs.

dlongley: I'll drop the link in for that.

dlongley: We do have an issue for it

https://github.com/w3c/rdf-canon/issues/4

dlongley: I believe we might have linked to a second one from there.

https://github.com/w3c/rdf-canon/issues/8

dlongley: We have two issues to track that issue.

ivan: It was a very long time when I looked at this issue and I don't remember.

ivan: Dave, can you give a short summary of how, if anything, would influence the algorithm we have today?

ivan: I had the question of the once we had the canonicalization done -- you do something more complex. But the canonicalization itself wasn't affected.

dlongley: That is the path that I would recommend people take. An alternative path would be to switch out the hashing algorithm used in c14n that uses an HMAC -- it creates more linkability/correlation than doing it the other way.

dlongley: Something that we want, if we specify one hash function, is to allow future specs to swap out hash function -- sha256 probably doesn't have to change anytime soon, but we might want to say what hash function is and parameterize it and say version we're releasing is using sha256 -- it might also allow someone to use HMAC in algorith, but that seems to be wrong design. You should get canonical labels and change them instead of HMAC, better privacy

characteristics.

ivan: My implementation allows the user to choose which hash algorithm to use, that should be easy to do and is worthwhile, but if that's the only change, that's simple.

ivan: It so happens that when I did the implementation I made the hash parameterized, but if that's the only change on the algorithm that's really simple.

<ivan> +1 to markus_sabadello

markus_sabadello: This highlights how it's really important to design in a clear way what the inputs and outputs of the algorithm are. And what steps of the algorithm are -- we had this discussion about canonizing and hashing whether they were two specs or not.

markus_sabadello: We have one extreme where one input is the dataset and the output is a hash. That's a thing we probably don't want.

+1 to layers

yes, important for layering -- remember, we might want to hash each nquad separately and have a list at the end.

markus_sabadello: For example, what you just discussed can be enabled using HMAC. The other topic that comes to mind when we talk about the outputs is the work that Dan and Kazue have been doing in their paper. It talks about reordering the quads in a different way or inserting random quads and things like that.

markus_sabadello: I think these topics are related to constructing the specification with the right amount of flexibility and extensibility without making it too complicated.

<Zakim> manu, you wanted to speak to "reordering quads, random quads, atomic quads"

manu: +1 to that Markus. One of the things that has come up in the past in the last couple of weeks in the VCWG. As well as the CCG call yesterday. There's a requirement to do selective disclosure but potentially not unlinkability.

manu: There was a push towards using BBS which would give us both selective disclosure and unlinkability but one downside is that BBS is not a NIST standard. And that often drives what people use.

manu: Even if we define BBS in W3C and IETF, etc. -- it may still not be a NIST-approved algorithm for 5-10 years.

manu: That means that we may be put in a position where we have to define a selective disclosure scheme that is based on NIST crypto. I think Stephen Curren is working on AnonCreds in Data Integrity as one expression of that. I know there seems to be a desire to do selective disclosure in data integrity.

manu: That would mean that we here would need to specify hashing every single quad individually, so salting and hashing each quad separately and then revealing a subset of those.

manu: I think there's a real use case there that seems to be getting more and more support that is based on more and more requirements that are coming up.

manu: Hashing all the quads together, hashing them individually, relabeling them, injecting quads, shuffling them, all use cases.

kazue: I'm not familiar with NIST standardization, what are the procedures for getting BBS standardized?

manu: Unfortunately, it's incredibly tedious and time intensive.

manu: NIST operates somewhat separately from IETF. Just because IETF standardizes an algorithm doesn't mean NIST will approve it. The timeline is... one standardized at IETF, NIST will consider it. This is the US governmental standards body.

manu: We see other countries also using NIST standards, Europe, NZ, Canada, Japan.

manu: Those pick them up every now and then.

manu: And unfortunately they are incredibly short staffed and it can take years. Ed25519 got standardized like 2014-2015 in IETF and only last year or so they (NIST) put out a draft form. Maybe 2024-2025 they'll be done, but that's 8-10 years.

manu: I'd expect that we'd see the same kind of delay with BBS at NIST.

manu: That of course is making many of the European implementers nervous because they need a selective disclosure mechanism that uses NIST crypto.

kazue: I'm with an ISO group and they are concerned about privacy and perhaps standardizing there could make it BBS work?

manu: NIST tends to do what they want. I think all of us in this group just want BBS to be accepted if it's stable, but NIST is very conservative. They wait on purpose to see if it breaks in the private market before accepting it as a NIST standard. Maybe accepting it at ISO would make them move more quickly, hard to tell. Hard to even talk with them.

manu: It's something we should be aware of.

manu: If there's something this group could do to enable selective disclosure for NIST curves that would be good, especially for layering things.

markus_sabadello: I said I would write to Aidan about the current state of the algorithms.

markus_sabadello: Please contribute to issues.

markus_sabadello: We will continue to think about structure and layering of the spec, we will reorder the issues differently on the agenda.

ivan: Next call would be on the 21st, is it planned to hold it?

ivan: Or take a winter stop?

Digital Bazaar will be off the week of the 21st

(none of us will be able to make that call)

markus_sabadello: I think VCWG is not holding calls. I will talk to Phil. My sense is that we should follow the VCWG example.

markus_sabadello: Will talk to Phil.

– DRAFT –
RCH WG

07 December 2022

Attendees

Meeting minutes

Diagnostics