DID WG Topic Call on Handling Unknown Properties — Minutes

Date: 2020-09-15

Attendees

Present: Dave Longley, Brent Zundel, Manu Sporny, Justin Richer, Orie Steele, Markus Sabadello, Kyle Den Hartog, Drummond Reed

Regrets:

Guests:

Chair: Brent Zundel

Scribe(s): Manu Sporny, Kyle Den Hartog

Content:

1. Markus presentation of representation/unknown properties

1. Markus presentation of representation/unknown properties

Brent Zundel: Markus is going to start us off with a brief presentation.
… Please let Markus get through the slides.
… Then we can engage in a hearty discussion.

Markus Sabadello: Link to my slides: https://drive.google.com/file/d/1KHw7QfGBbJrr4okIdxDbDRhHJHJJLmzE/

Markus Sabadello: topic of today’s call is unknown properties
… the PR that’s open right now is about both — how representations work, how production/consumption works… tried to draw 3 different representations.
… There might be a CBOR one at some point
… 3 representations that we’ve talked about — and abstract did document/data model.
… one main insight - when we talk about representations, we talk about lossless conversion between different representations… what’s really happening is not that we’re going from one rep to another… we’re always going through the abstract data model.
… if we have a JSON-LD document, and we want to convert that to plain JSON, then we should consume JSON-LD one, then produce plain JSON representation.
… We said at the Amsterdam F2F - in order to do that, we need registries.
… we have all properties in DID Core - function of registries is to allow lossless conversion.
… function of registries is to support consumption/production of different representations.
… On CBOR call, Brent correctly identified, in order to have functioning representation, we need to say what is required in registry in order to support a CBOR representation.
… One point that’s also the case, not all representations necessarily need to use registry in order to support production/consumption.
… in order to produce/consume plain JSON - the registry is not needed.
… if a new property is added, new extension property added, like ethereumAddress - added to data model, in order to produce/consume plain JSON — no additional information is needed in registry.
… JSON Schema used to be in there, that was removed, in order to produce/consume JSON - no additional information is needed. It’s different for JSON-LD, for every property that’s added, there must be a JSON-LD context that’s added as additional information.
… production/consumption of JSON-LD data model… in that case, ethereumAddress - added to registry and in registry, it says ethereumAddress is defined in X Context, specific JSON-LD context, hosted at DIF, that’s how a producer would know that given this abstract data model, how can it be serialized.
… how can JSON-LD representation be produced from abstract DID Document… JSON-LD producer would have to look in registry, use that context, then that’s how you produce/consume.
… This enables what was the original objective, to enable lossless conversion between JSON-LD and JSON.
… One question — unknown properties in JSON-LD and JSON — one subtopic, open PRs, what should happen with @context — one consequence of previous slide is that if there is a JSON-LD representation of DID Document which contains a context that’s not in the registry, then that doesn’t mean it’s an invalid DID Document, means it can’t be losslessly converted to other representations.
… if you wanted to convert a DID Document that’s JSON-LD and produce plain JSON, they’re both the same except for the @context, how would you know plain representation look like?
… Here’s the same example from first few slides - JSON-LD DID Document can be consumed/produced, information in registry ethereumAddress example - correct JSON-LD Context — producer/consumer could losslessly convert between JSON-LD DID Document and can’t convert this one because it’s not in the registry.

Orie Steele: what markus is saying is true, but only given the current spec… which imo pretty broken :) my PR addresses this.

Markus Sabadello: on slide 6, example 1 - and example 2 are both JSON-LD DID Documents, but only example 1 is using JSON-LD context for ethereumAddress extension property.
… Only JSON-LD could be losslessly converted to example on the right.
… example 2 is almost the same as example 1 - using context not in registry, doesn’t mean that example 2 is invalid, couldn’t be losslessly converted to other representations.
… slide 7 we see the same two examples - same two JSON-LD documents, difference is only in @context and now there is discussion in open PR on whether JSON-LD document can be parsed by plain JSON processor… what would happen in both cases if we have two JSON-LD DID Documents where only the context is different and we try to parse that with a plain JSON processor.
… the answer is that a plain JSON processor couldn’t tell the difference because it would ignore the @context.

Orie Steele: ignored != not present :)

Markus Sabadello: Orie has pointed this out, that’s currently in the JSON consumption rules, if you’re processing plain JSON then context should be ignored.
… according to JSON consumption rules, they would be interpreted int he same way.
… that’s one of the problems - illustrate what could happen in that case - if we start with example 2, which is JSON-LD DID Document, has JSON-LD context which is not in registry and we process this according to plain JSON consumption rules and then we losslessly convert plain JSON to JSON-LD document using registry and using correct context, then we end up with example 1 - potentially a flow if DID Documents are produced and consumed in a way where we’re not clearly distinguishing between the representations but are documenting via plain consumption rules, we end up with inconsistent DID Documents.
… that results in all sorts of interop issues.
… two points are 1) on one hand we’re not converting between one representation and another, we’re always going through abstract data model 2) it could be problematic if we blur the lines between two representation formats. Those were the two points I wanted to make.

Brent Zundel: if you’ve said what you wanted to, let’s go to the queue.

Justin Richer: Thanks for putting that together Markus, because this notion of “you’re always converting by going through abstract data model” is what has been lost in conversation. The goal of representations allow abstract data model to be a layer of interop. Production and consumption is talked about as operations on abstract data model.
… Is @context a property of the abstract data model? If values in @context remain as as properties in abstract data model as defined by DID Document, if they do not (which I believe is intended), then they shouldn’t be carried through to representations where they’re not meaningful. @context is a processing directive that is only relevant to JSON-LD which knows how to process it and therefore should be disallowed in any other representation where it doesn’t make sense. We need to separate processing rules from properties of DID Document itself.

Orie Steele: I agree with a lot of what Justin just said, main point — do you consider @context to be a part of DID Document? I do — schema, or IPLD identifiers, or any other members to be members of the DID Document, JSON is fundamental type system we’re working with here.
… We do have INFRA and that is how we describe things, but things I’m most concerned about — immutability - most concerned about — people mutate documents to start to hash different values… who is changing these things, at what point are they changing, do they change public key bytes, that concerns me, would like to avoid as much as possible.
… One of the obvious places where we have alignment is that there is a layer at which you want to restrict JSON properties to not allow any JSON representations, or JSON Schema, or something you want to do for DID Methods — that’s what is being called base layer JSON representation and that’s what I don’t agree with. We have great alignment on everything else.
… If you were mutating that, by removing it, you are taking responsibility for changing DPKI document … we don’t want to put people into the habit of removing things from PKI documents.

Justin Richer: Yes, if it’s not clear — I completely disagree with @context being in core data model… arbitrary JSON values - namely because it has implied security properties that Orie was talking about. Orie and I have same motivations, but have come to opposing viewpoints. Our hearts are in right place, looking at it very differently.
… Do not see value — its just JSON so like whatever goes in there — not intent of JSOn consumption/production rules. Never meant to be a base layer that JSON-LD extends, was intended to define in a vacuum how you would represent a DID Document, which is an abstract data model, not a JSON structure, but how you represent that. Switch to language of production/consumption and not just serialization.
… I do share concerns wrt. mutability/hashing, but if you’re asking for something in different formats, you should expect them to look differently — if I ask for something in JPEG, and SVG, if they look the same, i wouldn’t expect underlying files to be the same, expect semantic equivalence.
… that’s where proposed value of abstract data model comes in… should be able to compare things to each other to understand if something has been changed in meaningful ways.. important to drop processing directives… @context does not show up as a property in that model even though it came through valid JSON-LD processor, I wouldn’t want it in there.
… If your producer is smart enough to keep that list around if it reproduces the document on the way out, that’s data for representation of that document, not a property of the document itself.

Manu Sporny: there are good points being made all around. I want to go back to Markus’ examples. I think it’s wrong in a subtle way.
… If we think about going from JSON to JSON-LD and then back to JSON we can’t do that
… meaning going JSON-LD to JSON back to JSON-LD we can’t do that
… we should be conscious that by dropping we’re effectively taking some roundtrip use cases out

Orie Steele: what manu saying is exactly the point I am making…. its a security issue, and its not necessary for application/did+json… I would be fine with it for application/did+proprietary+json

Manu Sporny: when looking at Markus’ examples were talking about being able to detect that two documents are not the same thing
… Point 1: I think we’ve created something that can no longer round trip between formats. I think that’s a bad idea.
… context is more than a processing directive. It literally sets the context of the document.
… we didn’t have this problem in the beginning, we do now, and some consider it a problem and some don’t

Orie Steele: … imo the original language was possibly worse than what we have now though…. telling JSON folks to do anything with @context was always the mistake.

Drummond Reed: I’d not thought about it the way Justin brought it up from the standpoint of a abstract data model
… we said the requirement is to be able to losslessly convert between formats
… it’s clear to me a did doc in one format by itself may not contain the information to convert to another format
… I had never thought about it in the way Justin brought it out - @context processing signal — requirement is losslessly convert from one format to another, inherently clear to me that a DID Document in one format by itself may not contain the information necessary to convert into another format, seems like, I can’t figure out how it’s going to happen — JSON-LD requires a context statement, how do you generate from a JSON document?

Brent Zundel: the registry would provide that information

Drummond Reed: Is there an assumption here, 100% information needs to be in DID Document — get in and out — representation has to explain how they go in and out, can’t make assumption that all information is in DID Document.
… maybe theres an assumption here that to do 100% lossless conversion there needs to be a way to go from JSON -> JSON-LD. Maybe that means the methods need to define it

Brent Zundel: In order to move forward - we need some proposals - best to look at spec text of Orie’s PR… if we have another concrete proposal - gentle reminder, jump on queue, say something briefly, let other people say things — be brief.
… secondly let’s keep the points brief to get as many voices in

Markus Sabadello: I think I agree with Just that @context is not a property of the data model, it’s a processing instruction; I don’t know what to call it, something that’s specific to JSON-LD - not a property of abstract data model, when we talk about conversion between different formats, consume one format, produce another, must work if there is JSON-LD and YAML representation and neither has @context, what we need is production consumption for both, and maybe depending on representation, need additional information in registry for JSON-LD.
… for CBOR, we probably — @context is something that’s specific to certain representations.
… a lot of discussion is about @context in the JSON production/consumption rules… what should they say about @context.

Drummond Reed: Yes, Markus just said what I was trying to say, which to go in and out of certain representations, additional information is needed that must be specified by the production and consumption rules of the respective representations.

Orie Steele: I think we’re making good progress — JSON-LD and JSON Schema and XML — those are self describing data structures, you read a part of the data structure and you understand — there is a family of those, they’re interesting, but starting with them is almost a mistake, they are always more complex than vanilla JSON.
… One of the cool properties that JSON has is properties are preserved.

Daniel Burnett: Just a reminder that “lossless interoperability” was for registered properties, not for starting with a representation (e.g. JSON) without registered properties that will magically generate a functional JSON-LD context

Brent Zundel: +1 to burn

Orie Steele: I can create a DID Document, create a representation, you can still see a property — $schema — that’s the feature of JSON and what makes it powerful… it doesn’t do anything to data structure… if you’re using serialize — there is no loss, that is truly lossless.
… Maybe we can get out of this by naming a JSON representation that doesn’t contain processing directives, then we can have a name for it, have consensus, and have a base level vanilla JSON parser behavior in off the shelf tooling today.

Justin Richer: This idea of round tripping and losing things is a red herring — if I am a JSON-LD and plain JSON box and I read in JSON-LD with Contexts that I understand that exist in registry and produce plain JSON, I’m only going to produce properties in registry. If I take that exact same JSON with only registered properties, when I’m loading them in, map them to parts of the registry, I’ll apply appropriate contexts — example of Markus where it doesn’t work, not supposed to work like that.
… If you start w/ JSON and go through JSON-LD then you have to add @context parameter — this is where Orie and I come down on different interpretations. If someone asks me for LD — if I only have registered properties, I know which properties I need to output. Carrying that through plain JSON, imported back out, things are going to go weird if we allow these sorts of directive things. They’re telling you what to do with the document beyond just a parser.

Orie Steele: its generally the case, that not deleting things… is safer than deleting them :) whats why JSON supports arbitrary properties without a schema requirement :)

Justin Richer: Where do we get this list of directives from? All producers and consumers that define a format, that have specific properties in serialization that are meaningful, need to register those.

Drummond Reed: I like this idea of “processing directive properties”.

Justin Richer: If JSON Schema version exists, registered directive property - that’s how we keep things clear and apart from each other, how we prove that what we’re doing is abstract data model, not just for passing around JSON-LD representations.

Dave Longley: it seems to me that “deleting processing directives” isn’t the right way to think about this, it’s more like … when producing JSON from the abstract data model, you don’t include those processing directives because they aren’t appropriate for that format

Manu Sporny: I know Orie has done some tracking of what implementations are doing today. It would be interesting to see what others are doing today and why

Justin Richer: dlongley: exactly. it’s not deleting them, it’s not putting them into the model in the first place so they don’t get passed back out

Kyle Den Hartog: I don’t take a strong opinion on this — from implementation perspective, for us, with Typescript, we check that the @context exists and exists in an Enum, but the way we pass everything around is not very JSON-LD-like, we’ve written a resolver itself and put it in document loader, so best way to do it — we treat our DID Documents like strongly typed JSON objects and check that @context exists… puts us in a weird scenario.
… if we want to switch to JSON-LD or switch, makes it easy to go from JSON to JSON-LD.

Dave Longley: kdenhartog, i don’t think that’s a “weird scenario” — that’s how many people use JSON-LD … as “strongly-typed JSON objects”

Justin Richer: hashing on serializations vs. the semantic model is part of the problem

Orie Steele: For everything I produce, for every DID Method, I will produce JSON that has @context, hash for JSON and JSON-LD is the same. There is a fundamental disagreement on if removing processing things are a good thing. I’m of opinion that removing things is not good and is why JSON works well, it doesn’t tremove things.
… Can we get processing directives and JSON format defined.

Kyle Den Hartog: I consider it weird only because I’ve spent a fair amount of time seeing how DigitalBazaar processes JSON-LD and leans on the JSON-LD libs, but agree it’s the natural behavior for most first learning about JSON-LD

Manu Sporny: Kyle, I wanted to point out that the way that you’re approaching JSON-LD is normal
… the way we do it is to process it as JSON until the point when we need to convert to JSON-LD
… we expect that eventually in our tool chain JSON-LD will be needed and it’s valuable to not drop the context so that we can use the context when it hits that part of processing

Orie Steele: The problem here is that there is no “Just JSON” representation currently.

Justin Richer: Orie: yes there is — that’s what’s in the document today.

Justin Richer: Orie: your proposal is to turn that into something else

Orie Steele: no, what is in the document today is “JSON” without a specific processing directive.

Orie Steele: we call that… not “Just JSON” :)

Orie Steele: it would be “Just JSON” if it didn’t prohibit a specific processing directive :)

Drummond Reed: I listen to this and options, now have deeper appreciation issue — red herring JSON-LD is also JSON, feels like criteria we need to get to is iron clad rules for producing and consuming.
… If each representation defines that for what they need going in and going out, we don’t have to go back and forth.
… It might not look like the most straightforward thing, but should work for everything.

Justin Richer: huge +1 to drummond’s point

Markus Sabadello: there is one sentence in JSON production section and one sentence in JSON consumption which are at the heart of this — production section is: @context must not be used since it’s used for JSON-LD.

Markus Sabadello: the other sentence, in JSON consumption: @context must be ignored.
… that’s actually a contradiction.

Drummond Reed: We need to resolve that contradiction.

Justin Richer: First off, Orie’s proposal is much like what we had for creating resolve vs. resolve stream — I see Orie’s proposal as adding something to the document that is this undefined “it’s JSON and whatever you want is in there is in there” — that was never the intent of the JSON section that’s in there right now. The consumption section for the JSON consumer actually needs to be strengthened to an error.
… we didn’t define what ignored meant, do some other bit of magic, that’s why I raised issue 205 in the first place, we have to have an idea of what to do for several classes of property.
… I now believe that that needs to be strengthened to returning an error because @context is reserved.
… I also want to point out that Manu and Kyle’s use cases, even though you are not doing JSON-LD at all steps - you rely on it being JSON-LD when you need it.

Orie Steele: throwing an error would make more sense… given the rest of your stance… but your stance still doesn’t make sense as a vanilla JSON developer…. as a JSON developer I want JSON with no dropped properties…

Justin Richer: but you need it as JSON-LD for some of the code paths that need it.

Kyle Den Hartog: We never process the DID Document as JSON-LD… that’s in most scenarios, the only time where it comes into considerations, we do use vc.js and we put our resolver in there, we need JSON-LD in there.
… Am I supposed to return JSON Mimetype always or JSON-LD sometimes, I just want to know how to implement… treat it as JSON but respect all JSON-LD aspects of it.

Justin Richer: kdenhartog: I would interpret that as you producing compliant JSON-LD

Justin Richer: whether or not you do that using JSON-LD processing libraries doesn’t matter

Brent Zundel: What are our next steps?

Markus Sabadello: Put two questions in IRC… can we have some feedback/poll on that?

Justin Richer: kdenhartog: think of it this way, you can produce JSON with string concatenation and never use a json library, and it’s still json, right?

Proposed resolution: Remove this sentence in JSON Production: “The member name @context MUST NOT be used as this property is reserved for JSON-LD producers.” (Markus Sabadello)

Orie Steele: if we propose for removal

Orie Steele: its the same as my pr

Justin Richer: -1

Manu Sporny: +1

Orie Steele: +1

Markus Sabadello: -1

Drummond Reed: -1

Kyle Den Hartog: +1

Dave Longley: 0

Brent Zundel: we don’t have consensus either way

Orie Steele: throw an error

Proposed resolution: Keep this sentence in JSON Consumption: “The value of the @context object member MUST be ignored as this is reserved for JSON-LD consumers.” (Markus Sabadello)

Orie Steele: +1 (and same for $schema and every other processing directive that might ever exist)

Justin Richer: -1 to keep it (It should throw an error instead)

Kyle Den Hartog: +1

Manu Sporny: +1 (don’t throw an error, just ignore that it’s there and keep going)

Drummond Reed: -1 - error

Markus Sabadello: -1 throw an error

Dave Longley: -1 i would prefer JSON consumers check the value for specific values in the registry and we get unification across both representations

Dmitri Zagidulin: +1

Kyle Den Hartog: Maybe we can say “it can be ignored but it doesn’t throw an error”, maybe that’s the second one.

Brent Zundel: good conversation, points were made, we need to have concrete recommended spec text on text that the group can come to consensus on. Thanks for coming, thanks to scribes, thanks to squirrel, appreciate work you guys do.

Drummond Reed: Thanks Manu for scribing. You’re incredible that way.