14:02:18 RRSAgent has joined #rch 14:02:22 logging to https://www.w3.org/2023/05/12-rch-irc 14:02:24 Zakim has joined #rch 14:02:27 https://github.com/w3c/rdf-canon/issues/89 14:02:31 present+ 14:02:31 scribe+ 14:02:32 present+ 14:02:34 present+ 14:02:37 present+ 14:02:38 ivan has joined #rch 14:02:40 yamdan has joined #rch 14:02:54 meeting: RCH Special Topic call on Issue 89 14:03:12 present+ 14:03:25 gkellogg: I see Issue 89 addressing the need to support selective disclosure 14:04:19 dlongley: In a selective disclosure piece, the two agents won't have any info other than the selected quads and the mapping. 14:04:32 gkellogg: the vierifier does not have the orginal dataset 14:05:13 dlongley: Yes. magine each quad is signed individually 14:05:42 gkellogg: I take a subset that might be required with those original canonical labels and I can calculate the hash for that subset 14:06:25 dlongley: All you need to send is the subset that you're sending and the mapping for the subset 14:06:39 gkellogg: So the subset I'm sending them has the original labels 14:06:54 dlongley: Imaginew that the thing you send them has unstable blank node labels 14:07:11 ... but you send the mapping so they can generate the relevant subset labels 14:07:34 gkellogg: Given that blank nodes have no IDs, but that's only in the context of a concrete serialization 14:08:07 ... Any process that creates a subset doesn't allow you to correlate the quads back to the original bc any labels ... there may be implementations, but formally there's no way to do that 14:08:13 present+ 14:08:39 ... we're trying to create a way... after doing an operation such as a JSON-LD Frame or a SPARQL query that gives you a subset back, allows you to associate eacxh quad with one in the originak 14:09:07 dlongley: Internally within the c14n alogoithm, we have some steps where we atlk about going through the quads in the input dataset and turning blank nodes into something 14:09:18 ... so if those BN IDs don;t actually exist... 14:09:29 gkellogg: IMO that's the wrong model. There are no IDs in the input 14:09:42 ... only if we serialize those n-quads 14:09:47 ... and that results in an ID 14:09:58 dlongley: But we're talkng about the input before we get to n-quads 14:10:37 ... something I put in Issue 89... one of the other options we have is to make the input ordered quads and say that the algorithm doesn't modify the order 14:11:00 ... it requires a but more work on the outseide. But if the order is not modified in the output they can do any external modifications they need to 14:11:33 ... but I worry that when we describe this in another spec, or at review, I don't want someone to say it doesn't work. It does work but I worry that the formal description isn't quite right 14:12:16 dlongley: regarding making ure BN labels are stable... whatever mechanism is used should be open to innovation. e.g. BNs are skolemized, then a framing operation is used 14:12:51 ... then the blank nodes are stable, then we go back to RDF and run the c14n algorithm. We know the de-skolemized mapping. SO we k ow where they map to 14:13:37 gkellogg: I'm still hung up on this. I think skolemization is the key. Then you frame, there are no BNs. Then you deskolemize and expect that you can pass that deskolemized set and hope you can work with those labels 14:13:58 ... but thayt means parsing the deskolemized, but there's no way to know that the ... was available 14:14:39 dlongley: For implementations that don't have stable IDs, the deskolemizing produces blank nodes and the abstract dataset and the mapping of what the original labels were. 14:14:58 ... [Couldn't keep up with all the words, sorry] 14:15:08 https://github.com/w3c/rdf-canon/issues/89#issuecomment-1542918364 14:15:10 gkellogg: That is more or less what I have in myu comment 14:15:32 q+ 14:15:34 gkellogg: Once we're in the algo, we have a dataset, we cab't talk about specific blank nodes 14:16:18 ... it goes over each quad and maps blank nodes in their quad... we can talk about the label... if we include a deskolemization... each blank node can be correlated 14:16:21 ack ivan 14:16:36 q+ to say can we accomplish the deskolumization somehow by doing it outside of the canonicalization algorithm though? 14:17:06 ivan: I am trying to consider... one step back. The abstract RDF model talks about Bnodes but doesnb't talk about BN IDs 14:17:16 ... every implementation I have seen has internal BN IDs 14:17:51 ... would it be simpler for the algo if we define a minor extension of the RDF data model saying that each BN has a BN ID and we go from there, no need to skolemize back and forth 14:18:15 ... the extension can be used to map back. Any practical implementation will work because the extension makes no change 14:18:22 ... but our description is much simpler 14:18:41 gkellogg: The defn of a normalized dataset is one that has stable IDs for its BNs 14:18:53 ivan: Which extends the dfn a little 14:19:51 gkellogg: Intellectually, to me, using skolem IDs in there comes across. Any parser now will be bale to take in an RDF dataset described with skolem IDs and create IRInides that are those IDs 14:20:14 ... Then if skolem IDs are turned back into blank nodes they can be turned into IDs 14:20:35 gkellogg: As long as we maintain the mapping we can operate on that 14:20:38 ack dlongley 14:20:38 dlongley, you wanted to say can we accomplish the deskolumization somehow by doing it outside of the canonicalization algorithm though? 14:21:12 dlongley: I like what Ivan is saying and I wonder if there's a way to put these two things together and make it simpler to implemet without skolemization at the c14n layer 14:21:48 ... I wonder if we can do what Ivan is saying and say that one way you can do this is to skolemization/deskolemization, if you need it 14:22:01 dlongley: This would make very little change to what we have today 14:22:22 ... if your implementation doesn't support this already, you can use skolemization 14:22:44 gkellogg: The algo is described for an RDF dataset, not for some document that serializes a dataset 14:23:20 ... even if it were... let's say the input is some serialization of an RDF dataset, we'd then have to describe how you parse that serialization to construct the dataset 14:23:26 q+ 14:23:27 ... and retain the labels of the blank nodes 14:23:44 ... nodes a in lists still would be blank 14:24:09 q+ 14:24:17 ack dlongley 14:24:27 dlongley: Is there a way at the beginning of the algoritm to say You could take as an input that is a serualized dataset but you should have the abstract dataset 14:24:32 ack dlongley 14:24:52 dlongley: I worry that we're putting in a lot of spec text in that some may think doesn't achieve a lot 14:25:02 ack iv 14:25:52 ivan: When I did my implementation, the parsing and underlying environment I was using, was creating a different BN label. But then I realised that there's an option to re-use whatever is in the serialized input 14:25:59 q+ 14:26:03 +1 to ivan 14:26:13 ... and I think that will be the general case. It's simpler, don't throw it out (paraphrase) 14:26:18 ack g 14:26:57 q+ 14:27:06 gkellogg: Let's say you have a subject with two values in a list. There are BNs with each of those Formally, every time I parse I get different bkank nodes. I have no labels and no order 14:27:18 ... the order can be arbitrary and not repeatable 14:27:22 ack dlongley 14:28:04 dlongley: That might be true for some syntaxes. May be some wiggle room... I'm looking for shortcuts to make things easier, but text that allows people who don't have that can still accomplish the smae thing 14:28:14 ... externally, you can build your own skolemization process 14:28:17 q+ 14:28:30 ack gkellogg 14:29:08 +1 to gkellogg 14:29:09 gkellogg: If we said that a serialized input is a n-quads doc, not an arbitrary serialization, we can describe a class of n-quads parser that records the IDs for all the blank nodes 14:29:33 gkellogg: That doesn't seem like a stretch. Doing it for an arbitrary format seems heavyweight 14:30:17 dlongley: If we can say that there's an additional thing that you can pass in here... What's the minimum amount of language we have to put in here to keep it simple but supports complexity of people want it 14:30:35 gkellogg: As long as you're maintaining the fidelity of those labels 14:30:38 q+ 14:31:19 +1 to gregg, but that's solvable there with skolumization (externally) 14:31:22 gkellogg: I take a canonical doc... I run it through a SPARQL CONSTRUCT to create my subset which is a graph that can be serialized arbitrarily. I don't think we can say that the process must maintain the labels in the input 14:31:43 ... Another way you might accomplish that is to go through the skolemization 14:31:44 +1 to gregg 14:31:57 ... That's external, but you can do it in a way that preserves the labels. 14:32:03 ack ivan 14:32:31 q+ 14:32:39 ivan: We modify the algo description a little to say we start with n-quads and we end with n-quads. That means you also need to provide as an optional mapping the exact mapping? 14:33:11 q+ 14:33:14 dlongley: Yes. We still want to output what we're outputting today (the abstract mapping) and the canonical mapping. IF we do that, then people can put the two things to gether to achieve their goals 14:33:39 q= 14:33:42 q- 14:34:23 gkellogg: Take the normalized dataset and the map for the blank nodes. I thought the issuer problem might be important but it sounds as if it might not be 14:34:29 q- gkellogg 14:34:53 q+ 14:35:34 dlongley: That will solve the use case for me in the environment I'm in, so I'm all for that. I'm worried that if we don't have that additional abstract mapping for blank node IDs it might create problems in other environments 14:35:39 ack dlongley 14:35:47 q+ 14:36:10 dlongley: I don't want to deny people the ability to work in the abstract [paraphrase] 14:36:13 ack ivan 14:36:34 ivan: I think the answer to your remark - that's why we have a wide review when we're ready. If no one comes up, well... 14:37:12 gkellogg: One use case for c14n is an alternative to an isomorphism. Rather than taking two graphs, you might c14n each and then compare them and see if they're the same. Might get a DIFF 14:37:26 ... c14n doesn't quite get there but iut's better thany anythign else we have. 14:37:42 q+ 14:38:03 q+ to clarify that i don't think we should require the input to be n-quads, it's just an optional input that would guarantee stable identifiers for certain use cases 14:38:22 +1 to what gregg just said ... we should accept either input 14:38:30 gkellogg: Imposing n-quads as input... it sounds like we might want to consider that the input can be either n-quads but if it's a dataset then all BNs are given arbitrary IDs 14:38:39 ack ivan 14:39:18 ivan: I have sympathy for allowing both formats, I have the impression that hte isomorphism use case, in practice there are no issues. Because it has internal BN IDs 14:39:26 ... What the BN IDs are are uninteresting 14:39:36 ivan: We're not creatuing a difficulty 14:40:04 ack dlongley 14:40:04 dlongley, you wanted to clarify that i don't think we should require the input to be n-quads, it's just an optional input that would guarantee stable identifiers for certain use 14:40:07 ... cases 14:40:31 +1 14:40:34 dlongley: I want to agree with allowing either possible input. I don't want to say that blank nodes will be forcibly change if you input an abstract 14:41:10 gkellogg: We can say that in practice many implementations do maintain the IDs used in the input. 14:41:21 ... Only create BN IDs for those that don't have them. 14:41:35 dlongley: That allows some implementations to cut some corners 14:42:03 gkellogg: Can I ask you yamdan - you tried to come up with a comment that explains your undertsnading. Are we missing anything 14:42:12 yamdan: I'm struggling to follow the conversation 14:42:33 yamdan: I was wondering whether ... we already assume some stability about BN IDs in the current c14n algo 14:42:55 ... originally I didn't think we have to define an addition input stability 14:43:38 gkellogg: Looking at step 2 of the algo, I think it needs some work... 14:43:47 gkellogg: There is a BN to quads map... 14:43:58 ... that is initialized from the input dataset 14:44:28 gkellogg: There's a point at which we go through the input dataset and we initialize it with BN IDs that abstractly don't exist 14:44:29 q+ 14:44:55 ... so we're talig about ensuring that there is a pre-step that has some IDs 14:44:59 ack dlongley 14:45:17 dlongley: That sounds more useful - we're assuming there are already some BNs. If not - go make them 14:45:36 dlongley: I think this fills things in at the backend 14:45:51 ... It means that people can continue either way 14:46:23 q+ 14:46:28 gkellogg: attempts to summarise... 14:46:34 q+ 14:46:39 ack dlongley 14:47:09 Yes, I think we're saying that we're assuming that your external env will give you some IDs. But the algorithm says we'll keep IDs stable 14:47:25 ... and we'll produce a mapping for you based on what you gave us. 14:47:46 ... dlongley Framing does this 14:48:17 gkellogg: Is there some place where we want to describe how you might do this? Reqs for selective disclosure etc 14:48:26 dlongley: We will describe this in another doc in the VCWG 14:49:49 phila: Asks that the consensus be articulated in https://github.com/w3c/rdf-canon/issues/89 14:49:54 gkellogg: Agrees to do it 14:50:19 dlongley: We can maybe make simple mention of selective disclosure without going ito details 14:50:26 ack me 14:51:47 rrsagent, draft minutes 14:51:48 I have made the request to generate https://www.w3.org/2023/05/12-rch-minutes.html ivan 14:52:15 RRSAgent, make logs public 14:52:25 RRSAgent, draft minutes 14:52:26 I have made the request to generate https://www.w3.org/2023/05/12-rch-minutes.html phila 14:53:23 zakim, end meeting 14:53:24 As of this point the attendees have been gkellogg, dlongley, phila, dlehn, ivan, yamdan 14:53:24 RRSAgent, please draft minutes 14:53:25 I have made the request to generate https://www.w3.org/2023/05/12-rch-minutes.html Zakim 14:53:30 I am happy to have been of service, phila; please remember to excuse RRSAgent. Goodbye 14:53:33 Zakim has left #rch 14:53:40 RRSAgent, please excuse us 14:53:40 I see no action items