bi-weekly RCH WG meeting – 01 March 2023

Meeting minutes

phila: No one new today, can skip intros.

phila: Interested to hear about the VCWG F2F in Miami, and any other news that people might have

Outcome of 2023 Miami Verifiable Credentials WG Meeting: https://lists.w3.org/Archives/Public/public-credentials/2023Feb/0125.html

manu: I was there. I wrote up a summary of what happened there and I can put that in the minutes.

manu: The things that are relevant to this group...

manu: Are that the VCWG is going into feature freeze at the end of this month, March.

manu: CR is expected during the summer probably towards the end of it. That means that the DI specs and the cryptosuites will go into CR during the summer and it would be nice to have the URDNA spec close to that as well. I think DI can go into CR without URDNA being in CR as well, but with the understanding that they are close together.

manu: Feature freeze end of March, Candidate Recommendation during the summer. The other thing that happened is that the VCWG abandoned the JWS 2020 cryptosuite.

manu: There are now three cryptosuites that depend on the work this group is doing, the EdDSA, BBS, and ECDSA cryptosuites will use URDNA.

manu: The other decision made is that at least two of those three cryptosuites will allow JCS (JSON canonicalization) to be used as an alternative to RDF canonicalization.

manu: It's a simple swap out in the algorithms, you call JCS instead of URDNA if you want to use a cryptosuite with JSON canonicalization.

phila: Thanks, that's a helpful update.

manu: There will be an alternative to URDNA which is JCS for some cryptosuites.

phila: So the feature freeze in 30 days time is interesting, that group would like us to more or less be in the same place by that time?

phila: Gregg, Dan, how close do you think to that we are?

gkellogg: One of the items today is to discuss what the results are -- we have a canonicalization algorithm, we just don't represent it... we have some dependencies that are not as far along, NQuads Canonicalization -- I believe/gather from what Ivan said, we can be one rev behind so as long as RDF* gets to a WD, then we're ok to be in CR. Even though a FPWD could happen immediately, there are some things that have held it up... issues around scope wrt.

RDF Star charter that need to be resolved in order to get some of the changes we need in there.

phila: we were hoping to resolve to go to CR at TPAC, from what we're hearing, that's later than people want. To get there, we need to get some other things sorted out, like the c14n -- we need to discuss if we're doing one spec or two... we think it's one, but we haven't said so.

manu: I want to make it clear that the backup plan is not a replacement for URDNA. That is just an alternative ... the back up plan would be a worst case disaster and we can't do a number of use cases without URDNA, like BBS. We can't meet some selective disclosure requirements, we can't meet some privacy requirements, etc. without the URDNA spec, it's not optional.

phila: We do want to make sure we're done on time. Let's try to list the things we have to get done, sort it out. Some of these things are obvious roadblocks... nquad canonicalization and output.

phila: In terms of process, we need to get other HRs underway... if we are going to try to get to CR at same time as VCWG, then we need to make sure we're in good shape for that.

phila: I know we can't answer the question now, hopefully by end of call we can do it.

seabass: Is there something I can do to help with this? I can increase my participation a little.

phila: Wonderful, thank you very much! Yes.

phila: Gregg, you know what needs to be done more than anyone, can he help in some way?

gkellogg: Yes, there are Editors Notes and issue markers throughout the spec, we can use help there.

gkellogg: The other thing that came to mind is, text direction, which is another issue in front of RDF Star -- probably doesn't affect us, we'd get it along w/ NQuads, we might need some text in there about it -- i18n consideration that we inherited from RDF.

gkellogg: There are some history bits, description of the work that preceded this... both Aiden's work, JSON-LD CG, VCWG, a number of areas in the spec that could use some work. In terms of technical content, there are open issues, things we still have to address, RDF Star is difficult to consider because it's still being hashed out.

gkellogg: We need to discuss reification, they need some consideration.

phila: Sebastian, please take a look at the spec, if you think you can address issues/things, please raise PRs.

seabass: I'll look at the spec and see where I can help.

gkellogg: I'm sure there are areas where we don't have adequate coverage, new ways to test things is a useful area.

seabass: To someone that's used RDF Datasets -- if you have c14n for a triple, then what is unique about a quad that makes it different?

gkellogg: N-Triples canonicalization doesn't cover a quad, it feels like an obvious extension. There are some weaknesses in N-Triples canonicalization, it was not done rigoriously, there is more to be done there. Multiple ways in which a given code point can be represented, there shouldn't be only one way.

gkellogg: Do plain literals have a datatype of xsd:string, should that be spelled out or not? If we did that, it would break most of our existing tests.

gkellogg: Are we escaping the code points we need to?

-1 to breaking existing tests and implementations :)

gkellogg: The graph position could be IRI or blanknode, canonicalization consideration follows from c14n of blank nodes and IRIs.

N-quads c14n

phila: Sounds to me that it brings us to topic on the list.

phila: Let's start with why we do need to canonicalize and what the issues are?

<gkellogg> w3c/rdf-n-quads#16

gkellogg: Are we escaping enough? Some code points need to be escaped using backslashes... form feed doesn't need to be escaped, but other things do. You also can't use UCHAR \uXXX -- so, null character isn't allowed to be escaped. Whole range 0x00-0x7f should be escaped.

<gkellogg> w3c/rdf-n-quads#17

gkellogg: There is a PR that is hitting a roadblock, that PR gets most bits in... that PR basically questions whether the Editors are permitted to work on this, it hasn't been decided by WG... text direction, we can't work on until that group decides that its in scope and agrees on recommendation.

gkellogg: The N-Quads canonicalization doesn't affect other things in the group, but what is in scope to be worked on? Sent private email out to chairs, we need communication between this group and RDF Star chairs to indicate our needs for it. Particularly, get issues resolved and into a WG so we can cite something. We don't have something to cite?

phila: So the request is for Chairs of both groups to get together.

gkellogg: Some of the other open issues will go over, RDF quoted triples, Aiden's recommendation was that those should be put into a form that is compatible w/ normal RDF Triples so they can be reified.

gkellogg: RDF subject, predicate, object doesn't require changes to algorithms, that group needs to do that. That group is in big ontological discussions, are quote triples opaque or transparent... they are rehashing a lot of stuff that CG went through.

phila: What's going through my head is -- there is a danger that each one is expecting other to do this, I'm all for having a call.

gkellogg: Ora and Adrian are the Chairs of the other WG.

phila: Ok, Gregg, can you set that up since you know everyone?

gkellogg: I'm on leave for two weeks starting tomorrow. Good luck!

<Zakim> manu, you wanted to ask about fallback positions.

manu: The only trepidation I have is ... how long do those other discussions go on. I would like this group to have a backup plan so if decisions aren't made fast enough we can move forward. For example, as Dave Longley noted, breaking the existing URDNA2015 implementations, we should not be doing that. These things are in production, we don't have to rubber stamp, of course, but this stuff has shipped. We need to be aware of that and don't convey

something that those production deployments are all broken and wrong, bad messaging.

phila: Yes, of course.

manu: For what's being suggested, let's try and make stuff work with what's already out there, like the quoted stuff, feels like the right direction.

manu: I wonder if this group needs to be prepared to do things like, but saying -- if you don't ... where we say things like "escaping things is undefined", that would be a terrible position to be in, or us saying the 0x00-0x7f range must be back-slash escaped. We say "For this algorithm, this is how it works".

manu: For this algorithm we're working on, when it canonicalizes, here are the dark corners, these are the decisions just for this algorithm, it doesn't have to be the decision that the N-Quads canonicalization spec does.

manu: We all want this clean, but we also have timelines that won't be gentle to us.

manu: I would like us, in each of these conversations, state what the fallback position is, be very uncomfortable with it, but do our best to make it better.

manu: The other approach is waiting for decisions to be made that could drag on for months.

phila: I agree, the best option is the people defining N-Quads define this stuff. Us saying "We're doing this but it may not be what they do" isn't ideal, but we might have to do it for timelines.

phila: yes, agree, best option is people defining NQuads define this stuff... but saying we are doing X and they are doing Y is not good, but we might have to do that.

ivan: If we rely on an existing library, then I'm not willing to re-implement some stuff only for purposes of this implementation, many people might find themselves in that situation.

ivan: Gregg, are there any of these issues that affect the algorithm itself? If not, we can move onto CR... we just need to make sure our spec is consistent and algorithm is correct, but we rely on outside forces... rest of it is in email I sent.

<Zakim> gkellogg, you wanted to say that the RDF-star WG needs to hear this, but we should consider if what we have now on escaping literals and representing datatypes is adequate.

phila: We haven't decided if there is a specific output.

gkellogg: We're not particularly... whatever happens with N-Quads canonicalization, does not affect our own spec text. It does potentially affect the results. The particular hash that would be created.

gkellogg: I don't think we will need to go to Manu's fallback position, we already sort of have a fallback position.

gkellogg: Falling back wouldn't be a good idea. The thing we could have pointed to, the RDF 1.1 canonicalization has ambiguities.

gkellogg: It's not something we can live with. That there are two legitimate ways to canonicalize the same term, and that needs to be specified. We can deal with that in our spec if we need to.

gkellogg: I don't believe that getting the text we want into an RDF 1.2 spec is an issue, we just need communication and the RDF-star group needs to understand the importance. They just want to know what this group needs.

gkellogg: The people working on this are PA and Andy Seaborne.

gkellogg: Same set of people.

gkellogg: We can have the literal or xsd string and we just need to specify which way. My thought was that if the data type was xsd string, it should NOT be used, because that keeps us as close to the existing behavior as possible.

gkellogg: If you were starting from scratch you might say it should be used, but that's just a decision point and input from this group for preserving compatibility.

gkellogg: There is a vulnerability if we allow code points that might confuse or obfuscate text ... if they go unescaped and we want to escape those.

gkellogg: So a range for that and then nothing else should be escaped.

gkellogg: I proposed those changes and some others and I found one test that fails because of that. If we were more exhaustive at testing all the various code points -- but if we're not testing it then it's probably the case that there is not code in the wild that is using control characters for example.

gkellogg: I think it's important that this get resolved one way or the other, this group should say precisely what it would like to see and it's what the RDF-star group will end up doing with sufficient prodding.

phila: Is there an option issue?\

gkellogg: Yes, I'll put in links.

gkellogg: With respect to RDF-star, I'm afraid that I have to say that RDF-star is something we will explicitly not do or probably, better, would be a note that this is subject to change in future versions, an "at risk" type thing.

phila: I think my mind is on the second one, and any group has to end, we're aware of other groups but the situation hasn't changed and we can't sit around waiting we have to work with what we have.

ivan: All this affects our own process. My belief is that we should separate those tests that are concentrating on these unicode and related issues -- and they are not tests that should be part of the CR exit criteria of this working group. They test features that are not affecting the algorithm itself. It's good to have them for later.

ivan: But we can move to CR without this issue being solved simply because CR's goal is not to test all possible inputs, it's for testing if the specification is correct, consistent, interoperable, etc.

ivan: We can move onto CR without these things and we should separate and make clear that those tests are not required.

+1 to ivan

phila: I was looking at the PR -- there's a lot of discussion there.

ivan: Yes, but it doesn't affect the algorithm we're supposed to standardize.

phila: So we don't have to wait, we can go to CR without them. But forgive me if this is wrong -- one of our open issues is "What is our output?"

ivan: That's true.

phila: I think they are related.

ivan: I don't think so.

phila: If the output is N-Quads, it matters.

ivan: I disagree, they will be serialized however the N-Quads people say.

ivan: It does not affect how our spec talks.

ivan: That's the only thing that we are really responsible for in this group.

ivan: Therefore we can more to CR and even to REC.

phila: I understand that, what's going through my head -- we get what we're given, but the other group, so we're not responsible for that, that makes sense. We want to be able to take the output of the algorithm ... one open issue is what the output will be, it's likely to be N-Quads.

phila: The next step is to hash it -- and that will be affected by whether control characters are escaped.

ivan: That's true, but our own algorithm itself won't change.

ivan: The output will change, but the spec won't change.

ivan: The specification we write is consistent and complete.

ivan: We define the output by referring to the REC that the other group must produce -- I was wrong about that, we can go to CR, but we can't go to REC.

phila: Ok.

phila: In my head, the end result of this group, no matter what you put in, this is exactly what you put out ... part of the exactitude of what comes out is defined elsewhere -- and we kind of do our job and hand it over to someone else.

phila: What the actual characters in the byte stream actually are. If I'm worrying about things I don't have to worry about I defer to your advice.

gkellogg: I think the specifics of encoding corner cases could result in a different code point stream.

gkellogg: What that actually is ... is outside of our scope. That comes from RDF.

gkellogg: The actual hash value that might be produced is dependent upon that, but the way in which you process that to get a hash value is not.

gkellogg: I think we can proceed with that with an appropriate caveat. The other things being -- the way we handle quoted triples and the way we handle text direction. Those are also areas that will trail or be "at risk".

gkellogg: I think the text direction might be a problem for the internationalization folks.

gkellogg: There's a call tomorrow between i18n folks and RDF star.

gkellogg: What we need to do is to determine that the normalized data set is serialized via N-Quads canonical form and is in code point order.

gkellogg: And one of the outputs of that would be a hash of that concatenated set of quads and potentially, if there is a selective disclosure requirement, hashes of each quad in some form, that's work we can do.

<Zakim> seabass, you wanted to comment on the topic of escaping

gkellogg: I think that's the most important stuff to have to happen within this group right now. I don't think it needs to be in a separate spec, it can be done in separate sections in the same spec, we need a direction to get started.

seabass: The previous call for escaping characters reminded me of the SPDX community. With the canonicalization for SPDX data.

seabass: We decided that we would use any unicode characters unescaped except quotation marks...and 0-1f. I forget the specifics of that discussion, but it sounds like it would be supportive of the discussion here.

phila: Yeah, I think that's what Gregg is advocating here.

gkellogg: Yeah, if you look at N-Triples canonicalization right now and it disallows escaping those characters and we feel some change is necessary.

gkellogg: It also disallows the use of ECHAR like `\t` except for some specific characters; that's another area we think should be included. Lastly, the use of explicitly xsd:string on a plain literal all need to be resolved one way or another. This group can say which we we'd like it to be done and I'm sure the RDF star group will do it that way.

gkellogg: They are dealing with procedural issues and they can get over it with sufficient encouragement with our chairs.

seabass: The existing RDF canonicalization effort suggests that you escape with a backsplash -- how do you escape control characters?

gkellogg: Backslash escaping is called an ECHAR and it's defined for specific chars like newline, double quotes, etc. a small set of characters escaped that way. The other escape is via UCHARs, U+4 digits

gkellogg: Those are excluded currently and they should instead be required for control characters other than those covered by ECHAR.

seabass: Percentage encoding isn't on the table at all.

gkellogg: No, percent encoding is not part of encoding code points in RDF literals.

seabass: Right, I see, this is something that the SPDX canonicalization is vaguely compatible at certain levels, maybe you can't use the same algorithm altogether but you could swap out a library / function.

gkellogg: This is the representation that has always conformed with the grammar.

gkellogg: Those grammars allow multiple ways to specify these things, the purpose of canonicalization defines just one way.

phila: Part of me wants to look at issue 4 in the remaining minutes.

phila: Markus and I need to have a chat with the chairs of the RDF star WG, PA will set that up.

Issue 4 - what is the output of the c14n algorithm

phila: It sounds like there will be a lot of agreement, and not a huge problem to overcome. It sounds like there will be a path forward, no one is objecting to what you're saying Gregg, I'm assuming it won't be problematic.

phila: We're still undecided about what the output of this issue will be.

<phila> w3c/rdf-canon#4

phila: The most recent discussion on that thread ...

phila: Is about WebIDL and whether that's the right way to express this stuff. Has this issue come to a conclusion?

phila: Is the output a bunch of quads?

markus_sabadello: I'm not sure if it's come to a conclusion, but just looking at the document right now, the section 4.4.1, the overview of the algorithm, it says "return the normalized dataset". It doesn't say return a single hashed string value, that would be bad. It says "return the normalized dataset".

markus_sabadello: The way how the algorithm is written right now, does that satisfy all the use cases, such as the BBS stuff where the individual quads will be hashed, and maybe that's perfectly possible with the way the algorithm is written right now.

markus_sabadello: Maybe someone can comment on that who knows the algorithm and use cases.

gkellogg: Returning a normalized dataset, it's a term describing an abstract thing. It's an RDF dataset, which is also abstract, which is modified so that the blank node identifiers are fixed. Whereas in a regular RDF dataset there are no such identifiers.

gkellogg: The next step would be to serialize that to N-Quads and then perform a hashing algorithm on that.

gkellogg: Then there is potentially the selective disclosure requirements.

phila: If I understand that the output of the algorithm is an abstract dataset if we can make manifest by turning it into a bunch of N-Quads.

phila: Is it contentious to turn that into a bunch of N-Quads? You hash all or each individually?

phila: Any implementation, surely must need that ... I think we just have one output.

dlongley: I think that part of the algorithm returns the abstract normalized dataset; that along is sufficient for the BBS algorithm.
… Some of those algorithms may even want to modify the bnode labels due to herd-privacy issues.
… Ivan and I discussed ... There could be an array of canonical N-Quads.
… Or a list of N-QUads.

<ivan> qq

dlongley: They can then be joined and hashed, parsed and labels changed, or just use the abstract dataset.

<Zakim> gkellogg, you wanted to clarify that representation likely involves INFRA, not WebIDL

dlongley: We can say how you convert the dataset into an array/list.

ivan: The problem that Andy was raising ... at a very theoretical level -- and abstract dataset doesn't have blank node labels. The concrete implementations have blank node labels -- but if you look at the RDF spec that's not correct. However, if we say, we produce the N-Quads -- and the algorithms can optionally produce the abstract dataset with the right blank node labels.

ivan: And that covers in practice all the applications and it's mathematically precise.

+1 to Ivan

gkellogg: I think that's correct and we had been discussing using Infra.

<ivan> +1 to gregg

+1 to gregg

phila: We will get the meetings setup and maybe get something into the issue on what we discussed.

<seabass> Thank you; see you again in a fortnight :)

– DRAFT –
bi-weekly RCH WG meeting

01 March 2023

Attendees

Meeting minutes

N-quads c14n

Issue 4 - what is the output of the c14n algorithm

Diagnostics