DID WG Topic Call on Abstract Data Model and Unregistered Properties — Minutes

Date: 2020-09-29

See also the Agenda and the IRC Log

Attendees

Present: Brent Zundel, Justin Richer, Jonathan Holt, Orie Steele, Markus Sabadello, Adrian Gropper, Drummond Reed, Daniel Buchner, Dave Longley, Tobias Looker, Dmitri Zagidulin, Shigeya Suzuki, Daniel Burnett

Regrets:

Guests:

Chair: brent zundel, Brent Zundel

Scribe(s): Drummond Reed

Content:


Brent Zundel: This is a very important topic. It could hold up CR.
… looking for a willingness to “hold your nose” and see if there is something you can live with
… let’s get started. There was an email sent after the invite for this meeting with a joint proposal, plus there are several PRs on the topic of the abstract data model (ADM) and representations

Manu Sporny: suggests that Markus or Justin start us off
… given that not necessarily everyone has read the ADM proposal from Justin, Markus, and Drummond

-> See the Proposal of Justin, Markus, and Drummond.

Markus Sabadello: will give a quick introduction to the ADM proposal. It goes back to the Amsterdam F2F…
… there we decided we would use an ADM as a way to support multiple representations
… recently we have had a number of issues and PRs around the handling of different properties
… for example the handling of @context in JSON-only documents
… the main purpose of the ADM proposal was to propose a specific process for producing and consuming based the ADM
… because it’s easier to think about and specify how to produce and consume based on the ADM
… we believe it is applicable to current and future representations

Justin Richer: Our goal with this proposal was to have exact clarity about what it means to go into and out of a specification representation
… there was confusion about this
… so the purpose of this document was to clarify what we discussed (and he thought we agreed on) in Amsterdam
… and to put some “real teeth” on the process definition
… the process starts with an ADM and converts it into a concrete representation.
… consumption is the inverse of that
… so the document was intended to define that production and consumption process very clearly
… before we dive into specifics, one key point: if you are not going from a representation into an ADM, you are not doing “consumption”

Justin Richer: this document tries to define the boundary between a representation and the ADM, not other things a client might want to do with an representation

Orie Steele: Thanks to Justin. The last part of that was very clear. It’s helpful to focus on the movement between an ADM and a representation.
… my first question for the group is: “When does production happen and what software does it?”
… I’ve seen several understandings of this
… I am wondering whether DID method code do “production”?
… specifically I am talking about “resolve” and “resolveStream”
… or does other software code, such as resolvers, do “production”?

Dmitri Zagidulin: Eager to hear the response of Justin

Justin Richer: The answer to the question about resolvers is that it depends on which part of the process you are talking about
… first part is resolving to the ADM, second part is resolving to the concrete representation (called “bytestream” in the ADM proposal)
… the resolver can be involved in either process
… so if you are calling a resolver to get the ADM version, you would be getting the consumption phase
… if you are calling for a representation, then someone else is doing production to get you that representation
… if you are going from one to another process in memory, you can just use the ADM

Dmitri Zagidulin: wonders if it would make sense to have these definitions of production and consumption
… because literally “zero implementations” are actually doing this today
… but, aside from the issues he has with the ADM proposal, he’s worried that its getting away from helping developers actually write libraries
… that ties into what Justin said about what folks are actually doing—that they may not be what developers are actually doing

Manu Sporny: has several proposals that could clarify where we have agreement already
… the second point was that Manu wanted to let others on the queue to add something before the proposals

Markus Sabadello: To his mind, production or consumption happens every time a representation is produced or consumed
… among other things, production is called when you call the resolveStream function
… consumption is called whenever you call to obtain a property defined in the ADM

Dave Longley: This discussion is helpful. I have been thinking in terms of applications consuming representations.
… so we need a “Transformation” section of the spec.
… he was concerned that this process needs to be followed for all DID documents
… thinks is it good to provide clarity about what the final result must be, but does not prescribe the only way to get there

Justin Richer: One of the things that worried him about “transformations” is the sense that representations might be transformed from one into the other directly
… but all the spec really needs to say is how any one representation type needs to go into and out of the ADM as a well-defined model
… if we are not defining such an abstract data model, then there is no “DID document model”
… this is the ultimate acid test: that I can interoperate at this layer, and not at the layer of what got sent over the wire
… otherwise we are just creating a bunch of similar-looking silos that do not actually interoperate

Jonathan Holt: This is reminiscent of the time he spent on HL7 genomics. They could not agree on zero-based encoding of a genome.
… so we really need to get a “beneficial occupancy” of a data model that is “implementable”

Manu Sporny: +1 to agree with jonathan_holt – this getting very close to an academic discussion that doesn’t affect implementations.

Jonathan Holt: even the FHIR model is the 80/20 rule. For example, even the Boolean model is extensible. You can add a qualifier to it.
… this spec is a lot simpler. And I understand the need for the formal logic.

Daniel Burnett: +1 that elegance is nice but not necessary, only interop of real implementations

Jonathan Holt: I can see the need for numbers, but not sure what to do about that.
… there has to be a simpler approach

Justin Richer: +1 to not overdoing the data model, but remember that’s what we said we’re making, right?

Dave Longley: I agree with what Jonathan just said. I’ve worked on ADMs for a number of specs. They seek clarity and precision.
… The point of that clarity is to be clear about the semantics of the data in a representation. But if we become prescriptive…
… people will avoid the spec and doing their own thing
… if the spec exposes developers to too many layers or abstractions that they don’t want to get involved with

Manu Sporny: this remind him of HL7 and also of the early RDF and JSON-LD specs
… it was argued that they needed to do much more, and JSON-LD chose to do much less to make it implementable
… so Manu wants to do several proposals

Orie Steele: unclear

Jonathan Holt: Does that mean that performing a transformation before or after “production” or “consumption” is okay?

Orie Steele: +1 to doing key transformations in the ADM

Justin Richer: If you have to move a property into the ADM or out of the ADM, it’s in the production/consumption process. Otherwise it’s outside of it.

Dave Longley: An example is creating a public CWK property that might have a different representation in CBOR vs. JSON-LD vs. JSON. How would this proposal deal with that?

Manu Sporny: this proposal doesn’t deal with that
… the next one deals with that

Justin Richer: Wants to clarify that the ADM proposal that Justin and Markus and Drummond wrote up, it should not affect what Dave asked about
… it would just call for each representation’s production and consumption rules to deal with that data type

Manu Sporny: is anyone else confused about what “production” and “consumption” rules are?

Dmitri Zagidulin: +1 to ask

Manu Sporny: is there confusion?

Manu Sporny: I’m clear on what production/consumption is, I think.

Jonathan Holt: +1

Dave Longley: +1 i don’t know if i said what i thought it was if Justin would sign off on it

Brent Zundel: is the question whether our definitions are aligned?

Jonathan Holt: The question is: “What is the ADM and how do we represent that?” The clearer that it, the easier it that conversion would be.
… so we should look at the CDDL model because it’s more concrete than a full-blown ADM

Justin Richer: fwiw I’m fine if we base on CDDL instead of Infra, but we just need a clear and consistent base that we can also tie our specific types to

Jonathan Holt: what he learned from talking to CBOR experts is that this has been a challenge in other specs as well
… such as where folks want to do JSON but add CWKs

Orie Steele: I would also prefer CDDL at this point

Orie Steele: agree we need to just do 1 though

Manu Sporny: CDDL is syntax only… it’s not abstract

Jonathan Holt: and the advice was to stick with the native format as much as possible

Manu Sporny: it would be a move away from an abstract data model, IIUC

Justin Richer: jonathan_holt: would it be fair to add JWK as a type and define CWK as the encoding in CBOR?

Jonathan Holt: so, even though I’m a big CBOR fan, I’m not actually trying to dive deeper than that

Justin Richer: jonathan_holt: so not “key material” in general but specifically just the JWKs

Jonathan Holt: the goal of the CDDL is a clear way to encode and decode out of specific representations

Proposed resolution: A DID Method MUST NOT add, modify, or remove from the production or consumption rules for a representation. (Manu Sporny)

Orie Steele: +1

Manu Sporny: +1

Justin Richer: -q

Justin Richer: +1

Markus Sabadello: +1

Brent Zundel: +1

Drummond Reed: +1

Tobias Looker: +1

Daniel Buchner: +1

Dmitri Zagidulin: +0 (I strongly suspect everybody here disagrees on what production/consumption means)

Dave Longley: +1 if i think about production/consumption in terms of transformation of representations only, yes, because DID methods have no business messing with that

Adrian Gropper: +1

Resolution #1: A DID Method MUST NOT add, modify, or remove from the production or consumption rules for a representation.

Jonathan Holt: 0, I think we need to start at clearly defining the ADM

Manu Sporny: the next proposal is to effectively say, “A DID method can specify whatever it wants” in pre-production or post-production

Dave Longley: justin_r, that’s becoming clear to me that that was your goal, but only just recently – i think we need some different language, not sure what it should be

Manu Sporny: so it can do what the method feels is required for security, privacy, consistency, etc.

Justin Richer: dlongley: that’s fair, words is hard

Dave Longley: yup

Manu Sporny: so this would enable DID methods to do that kind of processing OUTSIDE of the production or consumption rules

Justin Richer: Just wants the proposal to be specific that these operations are on the ADM for the DID document.

Manu Sporny: that’s what was meant

Justin Richer: A DID method may process the ADM on pre-production or post-production but not in the process.

Tobias Looker: Is it post-production or post-consumption @manu. Sorry your latest proposal clarifies. I was reading the scribed comment from you

Justin Richer: markus_sabadello: yeah I agree that was the implication but I’d rather not imply things when we can write them down

Markus Sabadello: Has a similar question as Justin. Should the proposal say that the DID method can define operations on the ADM – pre-production and post-consumption

Orie Steele: Similar question: do we really need such a proposal? I assume you can only perform valid operations on an ADM.

Justin Richer: +1 to restating the obvious because it’s not always obvious :)

Manu Sporny: wanted to run the proposal based on the reword
… because we haven’t necessarily been explicit about this
… so it is good to have a proposal

Proposed resolution: A DID Method may process the abstract data model in pre-production or post-consumption to enforce DID Method-specific rules and transformations. (Manu Sporny)

Manu Sporny: +1

Dave Longley: +1

Orie Steele: +1

Drummond Reed: +1

Justin Richer: +1

Brent Zundel: +1

Markus Sabadello: +1

Tobias Looker: +1

Daniel Buchner: +1

Jonathan Holt: 0

Dmitri Zagidulin: +1 (though I agree with dlongley that ‘post-consumption’ is somewhat confusing)

Adrian Gropper: +1

Resolution #2: A DID Method may process the abstract data model in pre-production or post-consumption to enforce DID Method-specific rules and transformations.

Justin Richer: dmitriz: I agree with the confusing terminology, but I think bikeshedding later would be better if we all know what’s up for now, but that’s me.

Dmitri Zagidulin: justin_r: sounds good

Manu Sporny: This proposal allows an unregistered property to survive a journey into and out of the ADM.

Orie Steele: selfissued, wish you were here for this.

Justin Richer: I agree with the thrust of the proposal. However this can be confusing because that’s what goes into the ADM.

Orie Steele: agree with justin’s interpretation

Orie Steele: of this

Justin Richer: so it’s susceptible to the proposal we just passed, which allows the DID method to make changes to the ADM

Markus Sabadello: I agree with the intention of the proposal. We want this “permissionless extensibility” in any representation.
… however what we said was, for unregistered properties, there’s no guarantee of being able to be rendered in all representations.
… a property may not be able to be preserved if it cannot be mapped to an abstract data type

Adrian Gropper: It sounds like this might have privacy implications

Manu Sporny: Does not believe it has privacy implications.
… it is a general design goal and folks may propose exceptions
… so if a very specific case comes up, we can make an exception

Justin Richer: We are not just talking about the values of the properties, but also the names.
… this is for example where JSON-LD multiple contexts come in
… so it can do a semantic “key-by-default”

Justin Richer: oh and cbor numbered keys

Justin Richer: (for objects)

Jonathan Holt: That’s also the potential vulnerability
… this issue about “foobar” not being there is not a problem of itself, as that’s good for avoiding loss of information
… so is the proposal that unknown properties are just passed through, and it’s up to the other end to decide
… so it’s really about how we are going to handle extensions

Justin Richer: +1 to this, an implementation can have its own internal mapping for an abstract data type that isn’t registered publicly

Justin Richer: “unknown” and “unregistered” properties were meant to be different but the words are confusing in this context

Proposed resolution: The Working Group supports a general “preserve by default” design goal for the abstract data model. This means that a general rule for the abstract data model is to preserve properties that are not registered or documented anywhere, such as “foo”: “bar”, as long as the property name and property value datatype can be cleanly and losslessly consumed from a representation into the abstract data model and produced from the abstract data model to a representation. The WG may specify exceptions to this general design goal. (Manu Sporny)

Justin Richer: +1 this is all about extensions and interop between extensions

Drummond Reed: +1

Orie Steele: +1

Manu Sporny: +1

Dave Longley: +1

Justin Richer: +1 (given that there are some subtleties to implementing it)

Tobias Looker: +1

Adrian Gropper: +1

Brent Zundel: +1

Jonathan Holt: 0, I still have security concerns

Dmitri Zagidulin: +1

Daniel Buchner: +1

Markus Sabadello: +1 (but depending on the involved representations it may not always be possible to preserve unregistered properties)

Resolution #3: The Working Group supports a general “preserve by default” design goal for the abstract data model. This means that a general rule for the abstract data model is to preserve properties that are not registered or documented anywhere, such as “foo”: “bar”, as long as the property name and property value datatype can be cleanly and losslessly consumed from a representation into the abstract data model and produced from the abstract data model to a representation. The WG may specify exceptions to this general design goal.

Drummond Reed: “What a harmonious group we have today.”

Orie Steele: thanks all!

Brent Zundel: thanks for everyone listening closely and staying on topic.

Drummond Reed: POPCORN!!!


1. Resolutions