14:02:18 <RRSAgent> RRSAgent has joined #rch
14:02:22 <RRSAgent> logging to https://www.w3.org/2023/05/12-rch-irc
14:02:24 <Zakim> Zakim has joined #rch
14:02:27 <gkellogg> https://github.com/w3c/rdf-canon/issues/89
14:02:31 <gkellogg> present+
14:02:31 <phila> scribe+
14:02:32 <dlongley> present+
14:02:34 <phila> present+
14:02:37 <dlehn> present+
14:02:38 <ivan> ivan has joined #rch
14:02:40 <yamdan> yamdan has joined #rch
14:02:54 <phila> meeting: RCH Special Topic call on Issue 89
14:03:12 <ivan> present+
14:03:25 <phila> gkellogg: I see Issue 89 addressing the need to support selective disclosure
14:04:19 <phila> dlongley: In a selective disclosure piece, the two agents won't have any info other than the selected quads and the mapping.
14:04:32 <phila> gkellogg: the vierifier does not have the orginal dataset
14:05:13 <phila> dlongley: Yes. magine each quad is signed individually
14:05:42 <phila> gkellogg: I take a subset that might be required with those original canonical labels and I can calculate the hash for that subset
14:06:25 <phila> dlongley: All you need to send is the subset that you're sending and the mapping for the subset
14:06:39 <phila> gkellogg: So the subset I'm sending them has the original labels
14:06:54 <phila> dlongley: Imaginew that the thing you send them has unstable blank node labels
14:07:11 <phila> ... but you send the mapping so they can generate the relevant subset labels
14:07:34 <phila> gkellogg: Given that blank nodes have no IDs, but that's only in the context of a concrete serialization
14:08:07 <phila> ... Any process that creates a subset doesn't allow you to correlate the quads back to the original bc any labels ... there may be implementations, but formally there's no way to do that
14:08:13 <yamdan> present+
14:08:39 <phila> ... we're trying to create a way... after doing an operation such as a JSON-LD Frame or a SPARQL query that gives you a subset back, allows you to associate eacxh quad with one in the originak
14:09:07 <phila> dlongley: Internally within the c14n alogoithm, we have some steps where we atlk about going through the quads in the input dataset and turning blank nodes into something
14:09:18 <phila> ... so if those BN IDs don;t actually exist...
14:09:29 <phila> gkellogg: IMO that's the wrong model. There are no IDs in the input
14:09:42 <phila> ... only if we serialize those n-quads
14:09:47 <phila> ... and that results in an ID
14:09:58 <phila> dlongley: But we're talkng about the input before we get to n-quads
14:10:37 <phila> ... something I put in Issue 89... one of the other options we have is to make the input ordered quads and say that the algorithm doesn't modify the order
14:11:00 <phila> ... it requires a but more work on the outseide. But if the order is not modified in the output they can do any external modifications they need to
14:11:33 <phila> ... but I worry that when we describe this in another spec, or at review, I don't want someone to say it doesn't work. It does work but I worry that the formal description isn't quite right
14:12:16 <phila> dlongley: regarding making ure BN labels are stable... whatever mechanism is used should be open to innovation. e.g. BNs are skolemized, then a framing operation is used
14:12:51 <phila> ... then the blank nodes are stable, then we go back to RDF and run the c14n algorithm. We know the de-skolemized mapping. SO we k ow where they map to
14:13:37 <phila> gkellogg: I'm still hung up on this. I think skolemization is the key. Then you frame, there are no BNs. Then you deskolemize and expect that you can pass that deskolemized set and hope you can work with those labels
14:13:58 <phila> ... but thayt means parsing the deskolemized, but there's no way to know that the ... was available
14:14:39 <phila> dlongley: For implementations that don't have stable IDs, the deskolemizing produces blank nodes and the abstract dataset and the mapping of what the original labels were.
14:14:58 <phila> ... [Couldn't keep up with all the words, sorry]
14:15:08 <gkellogg> https://github.com/w3c/rdf-canon/issues/89#issuecomment-1542918364
14:15:10 <phila> gkellogg: That is more or less what I have in myu comment
14:15:32 <ivan> q+
14:15:34 <phila> gkellogg: Once we're in the algo, we have a dataset, we cab't talk about specific blank nodes
14:16:18 <phila> ... it goes over each quad and maps blank nodes in their quad... we can talk about the label... if we include a deskolemization... each blank node can be correlated
14:16:21 <phila> ack ivan
14:16:36 <dlongley> q+ to say can we accomplish the deskolumization somehow by doing it outside of the canonicalization algorithm though?
14:17:06 <phila> ivan: I am trying to consider... one step back. The abstract RDF model talks about Bnodes but doesnb't talk about BN IDs
14:17:16 <phila> ... every implementation I have seen has internal BN IDs
14:17:51 <phila> ... would it be simpler for the algo if we define a minor extension of the RDF data model saying that each BN has a BN ID and we go from there, no need to skolemize back and forth
14:18:15 <phila> ... the extension can be used to map back. Any practical implementation will work because the extension makes no change
14:18:22 <phila> ... but our description is much simpler
14:18:41 <phila> gkellogg: The defn of a normalized dataset is one that has stable IDs for its BNs
14:18:53 <phila> ivan: Which extends the dfn a little
14:19:51 <phila> gkellogg: Intellectually, to me, using skolem IDs in there comes across. Any parser now will be bale to take in an RDF dataset described with skolem IDs and create IRInides that are those IDs
14:20:14 <phila> ... Then if skolem IDs are turned back into blank nodes they can be turned into IDs
14:20:35 <phila> gkellogg: As long as we maintain the mapping we can operate on that
14:20:38 <phila> ack dlongley
14:20:38 <Zakim> dlongley, you wanted to say can we accomplish the deskolumization somehow by doing it outside of the canonicalization algorithm though?
14:21:12 <phila> dlongley: I like what Ivan is saying and I wonder if there's a way to put these two things together and make it simpler to implemet without skolemization at the c14n layer
14:21:48 <phila> ... I wonder if we can do what Ivan is saying and say that one way you can do this is to skolemization/deskolemization, if you need it
14:22:01 <phila> dlongley: This would make very little change to what we have today
14:22:22 <phila> ... if your implementation doesn't support this already, you can use skolemization
14:22:44 <phila> gkellogg: The algo is described for an RDF dataset, not for some document that serializes a dataset
14:23:20 <phila> ... even if it were... let's say the input is some serialization of an RDF dataset, we'd then have to describe how you parse that serialization to construct the dataset
14:23:26 <dlongley> q+
14:23:27 <phila> ... and retain the labels of the blank nodes
14:23:44 <phila> ... nodes a in lists still would be blank
14:24:09 <ivan> q+
14:24:17 <ivan> ack dlongley
14:24:27 <phila> dlongley: Is there a way at the beginning of the algoritm to say You could take as an input that is a serualized dataset but you should have the abstract dataset
14:24:32 <phila> ack dlongley
14:24:52 <phila> dlongley: I worry that we're putting in a lot of spec text in that some may think doesn't achieve a lot
14:25:02 <phila> ack iv
14:25:52 <phila> ivan: When I did my implementation, the parsing and underlying environment I was using, was creating a different BN label. But then I realised that there's an option to re-use whatever is in the serialized input
14:25:59 <gkellogg> q+
14:26:03 <dlongley> +1 to ivan
14:26:13 <phila> ... and I think that will be the general case. It's simpler, don't throw it out (paraphrase)
14:26:18 <phila> ack g
14:26:57 <dlongley> q+
14:27:06 <phila> gkellogg: Let's say you have a subject with two values in a list. There are BNs with each of those Formally, every time I parse I get different bkank nodes. I have no labels and no order
14:27:18 <phila> ... the order can be arbitrary and not repeatable
14:27:22 <phila> ack dlongley
14:28:04 <phila> dlongley: That might be true for some syntaxes. May be some wiggle room... I'm looking for shortcuts to make things easier, but text that allows people who don't have that can still accomplish the smae thing
14:28:14 <phila> ... externally, you can build your own skolemization process
14:28:17 <gkellogg> q+
14:28:30 <phila> ack gkellogg
14:29:08 <ivan> +1 to gkellogg
14:29:09 <phila> gkellogg: If we said that a serialized input is a n-quads doc, not an arbitrary serialization, we can describe a class of n-quads parser that records the IDs for all the blank nodes
14:29:33 <phila> gkellogg: That doesn't seem like a stretch. Doing it for an arbitrary format seems heavyweight
14:30:17 <phila> dlongley: If we can say that there's an additional thing that you can pass in here... What's the minimum amount of language we have to put in here to keep it simple but supports complexity of people want it
14:30:35 <phila> gkellogg: As long as you're maintaining the fidelity of those labels
14:30:38 <ivan> q+
14:31:19 <dlongley> +1 to gregg, but that's solvable there with skolumization (externally)
14:31:22 <phila> gkellogg: I take a canonical doc... I run it through a SPARQL CONSTRUCT to create my subset which is a graph that can be serialized arbitrarily. I don't think we can say that the process must maintain the labels in the input
14:31:43 <phila> ... Another way you might accomplish that is to go through the skolemization
14:31:44 <dlongley> +1 to gregg
14:31:57 <phila> ... That's external, but you can do it in a way that preserves the labels.
14:32:03 <phila> ack ivan
14:32:31 <dlongley> q+
14:32:39 <phila> ivan: We modify the algo description a little to say we start with n-quads and we end with n-quads. That means you also need to provide as an optional mapping the exact mapping?
14:33:11 <gkellogg> q+
14:33:14 <phila> dlongley: Yes. We still want to output what we're outputting today (the abstract mapping) and the canonical mapping. IF we do that, then people can put the two things to gether to achieve their goals
14:33:39 <phila> q=
14:33:42 <dlongley> q-
14:34:23 <phila> gkellogg: Take the normalized dataset and the map for the blank nodes. I thought the issuer problem might be important but it sounds as if it might not be
14:34:29 <phila> q- gkellogg
14:34:53 <dlongley> q+
14:35:34 <phila> dlongley: That will solve the use case for me in the environment I'm in, so I'm all for that. I'm worried that if we don't have that additional abstract mapping for blank node IDs it might create problems in other environments
14:35:39 <phila> ack dlongley
14:35:47 <ivan> q+
14:36:10 <phila> dlongley: I don't want to deny people the ability to work in the abstract [paraphrase]
14:36:13 <phila> ack ivan
14:36:34 <phila> ivan: I think the answer to your remark - that's why we have a wide review when we're ready. If no one comes up, well...
14:37:12 <phila> gkellogg: One use case for c14n is an alternative to an isomorphism. Rather than taking two graphs, you might c14n each and then compare them and see if they're the same. Might get a DIFF
14:37:26 <phila> ... c14n doesn't quite get there but iut's better thany anythign else we have.
14:37:42 <ivan> q+
14:38:03 <dlongley> q+ to clarify that i don't think we should require the input to be n-quads, it's just an optional input that would guarantee stable identifiers for certain use cases
14:38:22 <dlongley> +1 to what gregg just said ... we should accept either input
14:38:30 <phila> gkellogg: Imposing n-quads as input... it sounds like we might want to consider that the input can be either n-quads but if it's a dataset then all BNs are given arbitrary IDs
14:38:39 <phila> ack ivan
14:39:18 <phila> ivan: I have sympathy for allowing both formats, I have the impression that hte isomorphism use case, in practice there are no issues. Because it has internal BN IDs
14:39:26 <phila> ... What the BN IDs are are uninteresting
14:39:36 <phila> ivan: We're not creatuing a difficulty
14:40:04 <phila> ack dlongley
14:40:04 <Zakim> dlongley, you wanted to clarify that i don't think we should require the input to be n-quads, it's just an optional input that would guarantee stable identifiers for certain use
14:40:07 <Zakim> ... cases
14:40:31 <ivan> +1
14:40:34 <phila> dlongley: I want to agree with allowing either possible input. I don't want to say that blank nodes will be forcibly change if you input an abstract
14:41:10 <phila> gkellogg: We can say that in practice many implementations do maintain the IDs used in the input.
14:41:21 <phila> ... Only create BN IDs for those that don't have them.
14:41:35 <phila> dlongley: That allows some implementations to cut some corners
14:42:03 <phila> gkellogg: Can I ask you yamdan - you tried to come up with a comment that explains your undertsnading. Are we missing anything
14:42:12 <phila> yamdan: I'm struggling to follow the conversation
14:42:33 <phila> yamdan: I was wondering whether ... we already assume some stability about BN IDs in the current c14n algo
14:42:55 <phila> ... originally I didn't think we have to define an addition input stability
14:43:38 <phila> gkellogg: Looking at step 2 of the algo, I think it needs some work...
14:43:47 <phila> gkellogg: There is a BN to quads map...
14:43:58 <phila> ... that is initialized from the input dataset
14:44:28 <phila> gkellogg: There's a point at which we go through the input dataset and we initialize it with BN IDs that abstractly don't exist
14:44:29 <dlongley> q+
14:44:55 <phila> ... so we're talig about ensuring that there is a pre-step that has some IDs
14:44:59 <phila> ack dlongley
14:45:17 <phila> dlongley: That sounds more useful - we're assuming there are already some BNs. If not - go make them
14:45:36 <phila> dlongley: I think this fills things in at the backend
14:45:51 <phila> ... It means that people can continue either way
14:46:23 <dlongley> q+
14:46:28 <phila> gkellogg: attempts to summarise...
14:46:34 <phila> q+
14:46:39 <phila> ack dlongley
14:47:09 <phila> Yes, I think we're saying that we're assuming that your external env will give you some IDs. But the algorithm says we'll keep IDs stable
14:47:25 <phila> ... and we'll produce a mapping for you based on what you gave us.
14:47:46 <phila> ... dlongley Framing does this
14:48:17 <phila> gkellogg: Is there some place where we want to describe how you might do this? Reqs for selective disclosure etc
14:48:26 <phila> dlongley: We will describe this in another doc in the VCWG
14:49:49 <phila> phila: Asks that the consensus be articulated in https://github.com/w3c/rdf-canon/issues/89
14:49:54 <phila> gkellogg: Agrees to do it
14:50:19 <phila> dlongley: We can maybe make simple mention of selective disclosure without going ito details
14:50:26 <phila> ack me
14:51:47 <ivan> rrsagent, draft minutes
14:51:48 <RRSAgent> I have made the request to generate https://www.w3.org/2023/05/12-rch-minutes.html ivan
14:52:15 <phila> RRSAgent, make logs public
14:52:25 <phila> RRSAgent, draft minutes
14:52:26 <RRSAgent> I have made the request to generate https://www.w3.org/2023/05/12-rch-minutes.html phila
14:53:23 <phila> zakim, end meeting
14:53:24 <Zakim> As of this point the attendees have been gkellogg, dlongley, phila, dlehn, ivan, yamdan
14:53:24 <Zakim> RRSAgent, please draft minutes
14:53:25 <RRSAgent> I have made the request to generate https://www.w3.org/2023/05/12-rch-minutes.html Zakim
14:53:30 <Zakim> I am happy to have been of service, phila; please remember to excuse RRSAgent.  Goodbye
14:53:33 <Zakim> Zakim has left #rch
14:53:40 <phila> RRSAgent, please excuse us
14:53:40 <RRSAgent> I see no action items