RCH WG weekly – 26 April 2023

Meeting minutes

phila: First thing to record is, based on comments after last meeting, is we need to increase meeting frequency.

phila: Proposal from the chairs is we continue to use the slots not used by VCWG. that group alternates between this slot, and a slot later in the day.

phila: Proposal is that we do the same. This means we meet again next week.

phila: Then we flip back and forth, like the VCWG does.

phila: This probably makes it difficult for yamdan , but we need to increase progress.

phila: Any objections about this?

<TallTed> +1

yamdan: I understand additional meetings are required, but as you said, it's harder for me and Kazue to join such additional time range. I will try to join as much as possible, but our joining rate for additional meetings will probably be low. I hope you understand

phila: We understand and are grateful that you join meetings as often as you do

ivan: Meetings for me are not "inhuman", but I also may have difficulties to join, just like it is for me and VCWG.

ivan: Not objecting, I understand the reasons.

seabass: It seems like a good idea to not have resolutions in the later slot.

<Zakim> seabass, you wanted to suggest that we don't have resolutions on the later slot for yamdan

seabass: This is from experience from other WGs, were important members are not present in certain slots. We should defer resolutions.

manu: +1 for current times as proposed, but I'm also concerned that if yamdan and ivan can't make it, those are important voices in the WG.

manu: My suggestion is we start at the proposed times and see how it goes, but we should also start a Doodle to maybe find times that are better for Europa and Japan.

ivan: Any resolution which is taken at any meeting or F2F is considered a draft resolution for one week, and only after a week it becomes official, unless someone from the WG objects.

phila: Makes sense, and it would fit what we are discussing here.

phila: Does anyone have anyone to say about IIW that is relevant to RCH?

<Zakim> manu, you wanted to comment on updates

markus_sabadello: At IIW there were sessions about credential formats, DID methods, OIDC protocols, but not directly related to canonicalization.

manu: In the VCWG Special Topic call, yesterday, there was a bit of discussion about removing Data Integrity from the spec. People were making arguments to push Data Integrity to the background.

manu: There is a concrete PR to remove DI from the base VC JSON-LD context.

manu: We were able to say that's a bad idea, the reason why it's in the context is to make it easier for developers' lives. People should weigh in. It would be good to review and comment on these issues.

manu: Mike Prorock has removed himself from the Data Integrity spec as an editor. Chairs are trying to get the editors to start committing things.

manu: We need another editor for the Data Integrity specification. It would be great if there was a volunteer from this group.

seabass: I'd be happy to help out there.

manu: Thank you Sebastian, that's wonderful. Do you have an associated company?

manu: We'd need to talk to chairs and maybe add you as invited expert.

<Zakim> seabass, you wanted to volunteer myself

ivan: The editor must be a member of the WG.

ivan: Maybe your org can join W3C, otherwise maybe you can become invited expert.

ivan: You have to contact the two chairs of the VCWG and tell them that this is what you would like to do.

ivan: You're already an IE in this group.

ivan: At the end of the day, the chairs decide.

phila: It's in the interest of some people here that you do that.

manu: We are not limited to 2 editors, others in the group may also help.

Issues

Issue 4

<phila> w3c/rdf-canon#4

<pchampin> manu, seabass, I might know some people interested in editing data integrity... will contact them and make introductions if needed

phila: This was raised at the TPAC kick-off meeting.

phila: We have copied content regarding N Quads serialization to our spec.

<phila> markus_sabadello: It has been partially addressed by dealing with the n-quads issue

markus_sabadello: We also added a section around serialization. Still some issues around other outputs. At the moment, the algo says it returns the normalized dataset
… maybe something else as the output?

<Zakim> dlongley, you wanted to say yesish, BUT

<dlongley> index map output issue: w3c/rdf-canon#89

dlongley: I think we could close this issue, since we have other issues for other outputs (e.g. issue 89)

<pchampin> generally: +1 to close, the rest has been said :)

dlongley: If you pass in a map, you could get an optional output with new positions

yamdan: I agree with dlongley and markus_sabadello , I also think we can close issue 4

yamdan: I haven't checked issue 89 yet, will check it this or next week.

yamdan: I think issue 4 can be closed, since we already have Serialization section added.

phila: It feels, given what people have said, and since we have other issues, we can close this one.

Proposal: close issue 4, noting that some elements are in other issues

<manu> +1

<pchampin> +1

<dlongley> +1

<yamdan> +1

ivan: I'm looking at the spec, and it does not answer exactly the issue question. I think we should have a formal resolution what the answer is.

ivan: That answer should appear as part of the specification.

The spec says "The canonicalization algorithm converts an input dataset into a normalized dataset."

seabass: It seems to me it's not a good idea to have hashes as part of the output of the algorithm.

seabass: One of the issues with this is that hash algorithms age, they become less usable over time.

seabass: I would agree that the output should be strictly serialized N-Quads, something that is unambiguous.

ivan: I agree with what you say.

<dlongley> this is the present output: https://www.w3.org/TR/rdf-canon/#dfn-normalized-dataset

ivan: Does it mean that the output is serialized canonical N-Quads? Or it the output a normalized dataset?

seabass: I suggest it should be serialized N-Quads. So you could hash with SHA256, but that algorithm is up to the user.

dlongley: The spec today says that the output is a normalized dataset, for which we have a (partial) definition

dlongley: When you produce a concrete serialization of the normalized dataset, the N-Quads are in sorted order.

dlongley: The reason why this is the output is that it enables two use cases: The individual N-Quads, and one giant block of N-Quads.

dlongley: That's what the spec says today. If we want to change it, we can continue discussing it in issue 4.

phila: I understand the output is the normalized dataset. That is the output of the algorithm. Do we disagree that this is the output of canonicalization?

<Zakim> manu, you wanted to agree with phila

manu: I agree with phila . Why can't we just say that. Take an abstract dataset and serialize it to N-Quads.

<dlongley> notably, it's a "normalized dataset" (the blank node labels are stable)

<manu> yes ^

<ivan> https://www.w3.org/TR/rdf11-concepts/#dfn-blank-node

ivan: The reason why we have a problem.. If I look at this link from rdf11-concepts. It doesn't say that a blank node has a visible name/identifier. That's a problem with the current document.

ivan: If you follow all the links from the normalized dataset, you don't have a reference to the name. That is why I believe to say the output is the canonical N-Quads.. What is the standard is to return the canonical N-Quads.

dlongley: We might be say the output of the algorithm is the normalized dataset, and then we add a piece that says how you transform it to canonicalized N-Quads.

<ivan> what I said: the rdf spec does not explicitly returns the "name" of a bnode

dlongley: The output of canonicalization could be canonical N-Quads. We may also want to return the array..

<manu> yes, +1 to what dlongley just said. A less well said version of that was what I was going to say. :)

pchampin: I agree with what Ivan said. We have an abstract dataset, and a set of blank node identifiers. Maybe we need to formally define the mappings.

pchampin: +1 to what dlongley just said

<Zakim> seabass, you wanted to advocate for canonical n-quads as a normative requirement

seabass: I wanted to advocate that from a practical perspective canonical N-Quads really make it possible to use this for integrity. You want to know if a dataset is the same as another one, or verify it against a digital signature.

seabass: If you only have an abstract dataset, you still need to do the sorting.

seabass: For practical use cases, it makes sense to have this as part of the specification.

<ivan> sorting is not necessarily part of a normalized nquad representation, is it?

<pchampin> I find the current definition of normalized dataset satisfactory: it is an abstract RDF dataset *and* a set of blank node identifiers. The relation between the two might be described in more details, but generally I find it clear enough.

seabass: I would like to ask other members, is there a strong reason for preserving the identifiers of a blank node? I feel replacing them with numbers would be appropriate, considering integrity use cases.

dlongley: I think we partiall agreed to that.

<pchampin> the set of blank nodes idenifiers is *part* of the normalized dataset

dlongley: I don't know where we need to say that canonical N-Quads need to be sorted

dlongley: My comment to seabass is that some algorithms may replace identifiers, prior to sorting.

dlongley: I think we must say somewhere, that N-Quads must be sorted if we treat them as a single block

seabass: dlongley can you put in a link for that in the chat?

dlongley: I don't have a link for you, but the reason for replacing identifiers is for privacy reasons. The identifiers may leak information.

dlongley: You need them to be stabilized first.

manu: I have created a new label "ready for PR". We should write specific changes we want to see in the specification.

manu: An editor needs to translate what we discussed today into concrete text.

<ivan> +1 to Manu, this is the right way to do this

<Zakim> seabass, you wanted to myself to draft such a PR this week

manu: I think it might be helpful to list all things the spec needs to say in issue 4, on this call, then we mark issue 4 as "ready for PR".

yamdan: +1 to this proposal

<dlongley> 1. canonical N-Quads must be (unicode-code-point by each N-Quad) sorted

w3c/rdf-canon#90

markus_sabadello: I hear people talking about sorting of N-Quads, but wasn't that in PR 90 that did that?

markus_sabadello: That mentions sorting and concatenating...

https://w3c.github.io/rdf-canon/spec/#serialization

<Zakim> manu, you wanted to ask the group what else to write.

pchampin: I think we should be able to close issue 4. There is a definition of "normalized dataset", which is the output. That definition may have to be improved, but that could be done in more focused issues.

manu: Making an attempt to write up items that need to be addressed in a PR.

manu: Normalized dataset if represented as array or arrays.

ivan: No, that is implementation dependent.

ivan: The serialization can be sorted, not the normalized dataset.

seabass: The RDF Canonicalization spec has the normative requirement that you have to use N-Quads serialization.

dlongley: Does anyone need problems with implementations that need to change the normalized dataset, before it is serialized?

dlongley: When you get the normalized dataset, another spec may want to change the blank node identifiers, before serialization.

ivan: Implementations should give access to the normalized dataset before serialization.

<pchampin> note tha the current definition of normalized dataset mandates the blank node identifiers to be the ones generated by the algorithm; but I don't think it's essential to the definition... could be fixed later

<Zakim> seabass, you wanted to propose a solution to dlongley's point

seabass: To reference dlongley 's point. If you number blank node identifiers, you get a consistent representation, even if you use different data store software.

<pchampin> +1

yamdan: The canonical N-Quads should be used for serialization, that's important to mention.

phila: Next time we may be able to close issue 4. This issue is fundamental to what we are doing. This will be real progress.

phila: We have many issues marked "proposed close"

phila: We have decided that we want to meet at TPAC, for 1 day, is that true?

pchampin: I don't remember, the thing is we must inform TPAC organizers by 8th May. There is a web form open for that.

pchampin: Roughly how many people will attending, and what time we want to meet.

phila: We should have a meeting that doesn't clash with VCWG.

phila: Thanks everyone for the discussion, we made progress, will meet again next Wednesday.

phila: Any last comments?

phila: Thanks to the scribe, thanks all for being here.

RRSAgent: draft minutes

– DRAFT –
RCH WG weekly

26 April 2023

Attendees

Meeting minutes

Issues

Issue 4

Diagnostics