JSON-LD Working Group Telco — Minutes

Date: 2019-03-22

See also the Agenda and the IRC Log

Attendees

Present: Rob Sanderson, Pierre-Antoine Champin, Benjamin Young, Gregg Kellogg, David I. Lehn, Simon Steyskal, Dave Longley, David Newbury, Jeff Mixter, Tim Cole

Regrets: Ruben Taelman

Guests:

Chair: Rob Sanderson

Scribe(s): Dave Longley, Rob Sanderson, Benjamin Young

Content:


Ivan Herman: Ruben is working on a streaming JSON-LD parser.

Rob Sanderson: Next week we will still have split DST and the week after that we’ll get back together again on DST.

Rob Sanderson: Any other announcements or reminders?

Dave Longley: None

Proposed resolution: Approve the minutes of the previous meeting: https://www.w3.org/2018/json-ld-wg/Meetings/Minutes/2019/2019-03-15-json-ld (Rob Sanderson)

Rob Sanderson: +1

Dave Longley: +1

Benjamin Young: +1

Gregg Kellogg: +1

Simon Steyskal: +1

David I. Lehn: +1

Pierre-Antoine Champin: +1

David Newbury: +1

Ivan Herman: +1

Resolution #1: Approve the minutes of the previous meeting: https://www.w3.org/2018/json-ld-wg/Meetings/Minutes/2019/2019-03-15-json-ld

1. Issues

1.1. Remove @omitdefault

Rob Sanderson: link: https://github.com/w3c/json-ld-framing/issues/41

Gregg Kellogg: also: https://github.com/w3c/json-ld-framing/issues/46

Rob Sanderson: The first issue we discussed briefly last call before we realized we needed to talk a bit more about it.

Gregg Kellogg: Another issue is to provide a better example for its use.
… Of @omitDefault.
… The issue is suggesting that @omitDefault is not particularly useful when you can just omit when using default. I think Dave had alluded to some examples where it was useful and that’s why there’s an issue open to highlight where it’s actually useful.
… Otherwise we could remove it from the spec.
… The only way to make a default is to use @default.

Dave Longley: There’s a few ways to have a default. There’s an interaction with explicit. A default of null, if the property is specified but not present. Then output has a default value of null
… sometimes you don’t want it with a value of null, but instead absent.
… Also, if you had specified a default, and some other properties to have flags on them.
… a case would be when you say ‘foo’ and you want to embed:always for foo.
… [example]

Gregg Kellogg: Ok, I think you’ve made a case for why it needs to be there. Having implemented it I still don’t understand the need for it. As long as you take on the responsibility of better documenting the use of @omitDefault.
… If you can provide an example so we can adequately spec it that would be sufficient.

Action #1: provide example in issue frame/41 of @omitdefault need (Dave Longley)

1.2. @embed @first

Rob Sanderson: link: https://github.com/w3c/json-ld-framing/issues/43

Gregg Kellogg: We touched on this one. Originally, @embed was true or false, if it was true then one of the uses within a node would contain the embedded value vs. just a reference. That was expanded because it was useful to be able to say “we want all of them” because it would make it much simpler so clients don’t have to follow pointers. Also embed @link was added for navigating an internal representation.
… We updated it to be the keyword value. And @last was a little arbitrary and the last one expressed in the frame would have the embedded value and @first would be a little more efficient as you don’t have to remove other embeds.
… First and last implies ordering and all of the ordering is optional and if you don’t have order then first and last are non-sensical, so Dave said to use @once but then testing is hard.
… This proposal, given that ordering is an option, it makes sense that we also makes sense to be able to say @first vs. @last and it’s a separate issue for how these behave or if we need something different if we are not ordering.

Pierre-Antoine Champin: Just an idea on the terminology, @first and @last are misleading … perhaps @earliest to @latest or something like that. To make the terms less confusing.

David Newbury: Going through the issue, I’m trying to understand the optionalness of the sorting. It feels like what we’re trying to make sure we can do is compare two JS objects as strings.

Gregg Kellogg: That sort of gets into canonicalization – I believe the tests say we should do a comparison of objects so that ordering isn’t important for comparing but also compare the resulting RDF graphs for isomorphism.
… But it does make it important to know which of the properties get included in the node or a given subject, in that case, it doesn’t change the RDF signature. They all will be references to another object. It’s really in the JSON thing.
… Whether you compare unordered or ordered, it still matters where the embedding takes place.
… The only way to satisfy these cases is so that you run the algorithm so it does process the properties in order. Otherwise you can’t do a structural comparison of the result.

David Newbury: I can say that I have sometimes reordered the properties … to match documentation. It shouldn’t do that if you use a different keyword. If you do anything to change the order of the JS it doesn’t change the meaning of the graph.

Gregg Kellogg: It is sort of a corner case where you have the same thing referred to multiple times.

David Newbury: And it comes up all the time in our data structures. Every time we want to talk about a first name we have an included object that defines first namedness.
… We do use repeated signatures throughout the graph.

Rob Sanderson: I can dig up an example. The ordering of the name is arbitrary, the order of the … firstnamedness and middlenamedness you don’t want to mix that up.

Rob Sanderson: So we do post processing to make sure it’s in the right order to look sane.

Ivan Herman: In general, I am really bothered by this. As an author of JSON-LD … if I really want to control how the framing works… I have to be aware of somewhat arbitrary ordering in the system that is against my way of ordering my properties in my object.
… For me, the whole thing is flawed.

Rob Sanderson: My understanding is very similar to David Newberry’s … if the ordering is important and that will determine where things will be embedded…

Rob Sanderson: Then the structure will potentially be very very different depending on the names of the keys. It would be very weird to have to put numbers at the beginning of every key to make sure they end up in the right place.

Ivan Herman: I have to do this elsewhere (iTunes) and I hate it.

Rob Sanderson: If the intent is that the structure should be comparable – then we don’t need @first … it’s just a flag that says “it goes here” such that it will be comparable.

Rob Sanderson: Then I don’t think that alphanumeric sorting will help, you don’t want @type appearing at the end or in the middle.

Pierre-Antoine Champin: I’m not sure I’m following ivan’s arguments
… about the expectations of the authors
… to me JSON objects are unordered
… as annoying as that is, I don’t expect the order I’ve written to be preserved
… the thing I queued up originally was to ask about @min and @max
… to be the smallest or highest key that one can find
… that would be reproducible
… and avoid key ordering

Dave Longley: so just looking at all this, I would remove all of the ordering
… and anything out of any algorithm that would attempt any ordering at all
… and highlights that the ordering of constituents back then was incorrect
… we mostly did this ordering for the test suites

Ivan Herman: +1 to dlongley

Dave Longley: original name was @once
… but what we bumped up again was how do we test this?
… because you can get really complicated output
… you can’t test from the RDF, because that isn’t the point of framing
… what I recommend we do, is in the test suite
… we order the keys in a certain way
… let’s somehow get the ordering into the test suite
… and out of the algorithms

Rob Sanderson: +1 to dlongley - ordering in the test suite, not in the spec

Dave Longley: so that we can still test correct ordering in the test suite

Jeff Mixter: +1 to dlongley

Dave Longley: and removing it out of the algorithm will also speed things up
… leaving it where it is will just confuse users anyhow
… so removing it is a win for them also

Gregg Kellogg: ordering in the test suite helps
… it’s been updated to not do order based testing of objects
… so I think the issue is about where to do the embedding

Dave Longley: -1 !

Gregg Kellogg: so we should either do no embedding or embed everywhere
… anytime you get into having to compare length of strings, you’re doing ordering
… ordering kills performance
… the issue of having @first is predicated on having @last
… so what I think we’re debating now is should there be any reason to have these at all
… so maybe we only normatively require @none and @always

Gregg Kellogg: https://w3c.github.io/json-ld-api/tests/#json-ld-object-comparison

Gregg Kellogg: there is a section in the test suite README about how we do object comparison

Ivan Herman: I wanted to answer pchampin, but I think we’ve moved past that
… but +1 to dlongley about removing this

Dave Longley: there’s no way we can just limit this to @none and @always because there are lots of reasons to just embed one
… there are lots of use case where data comes in from somewhere, and you need to pluck out the proof or whatever single thing
… and then you need to do some processing on those
… and if you inject new triples, that will ruin the data for that processing
… so you don’t care where the data occurs, just that it does occur

David Newbury: I want to second that
… I’d like to create shallow set of look-up types
… I don’t care where it is
… but it would be really obnoxious to have to dedupe it, etc
… but where it shows up in the structure is not something I’d ever trust

Gregg Kellogg: then what makes since would be to change @last to @once and then figure out how to test it
… so when we’re testing we can’t do object comparison
… and again this sort of goes against the whole purpose of framing…but we would need to flatten the structure to do the comparison

Rob Sanderson: I think we’re ready for a proposal

Proposed resolution: Change @last (back) to @once, don’t add @first and then figure out testing separately (Rob Sanderson)

Dave Longley: +1

Rob Sanderson: +1

Gregg Kellogg: +1

Ivan Herman: +1

David Newbury: +1

Benjamin Young: +1

David I. Lehn: +1

Tim Cole: +1

Pierre-Antoine Champin: +1

Simon Steyskal: +1

Jeff Mixter: +1

Resolution #2: Change @last (back) to @once, don’t add @first and then figure out testing separately

Simon Steyskal: The default for the object embed flag is @first (in addition to the explicit inclusion flag being false), and now?

Dave Longley: default should be @once now because it doesn’t add any triples to the graph

1.3. JSON datatype

Rob Sanderson: link: https://github.com/w3c/json-ld-syntax/issues/4

Rob Sanderson: PR: https://github.com/w3c/json-ld-api/pull/72

Rob Sanderson: we also have discussed the JSON datatype on github
… Gregg, you’ve been the most involved (as always)
… could you summarize?

Gregg Kellogg: the issue comes down to representation
… if you are going to describe both the lexical and value space
… somewhat like HTML
… the lexical space cannon be guaranteed
… the JSON literal quality is lost when its turned into a native representation
… you loose the original key ordering, key escaping, and lexical numerical representations
… so it seems we will need to canonicalize
… which has been referenced in the issue
… it’s sadly not as close to done as I’d hoped
… and we can’t count on it being final in time
… so, do we care if two implementations use the same canonicalization
… so we have done some things about do we use Integer or Doubles for numbers
… so when you’d turn the JSON literal into RDF (in the toRDF space), we do need to say something about that at least
… and the elimination of whitespace
… and the ordering of keys
… I think that can be done
… there’s a lot of detail in that, but we should be able to reference ECMAScript for this
… or we could do it ourselves

Rob Sanderson: last time we talked about the canonicalization issue
… we also talked about HTML being not easily canonicalizable

Gregg Kellogg: HTML is a little different
… they will preserve order, and whitespace
… so you do have the opportunity return to that result

Ivan Herman: well, attribute order and things are not covered
… this would be a problem if you were to attempt to sign an HTML document

Gregg Kellogg: if we weren’t in an era when signatures weren’t as important as they are now, then maybe we wouldn’t need to care about this so much

Rob Sanderson: so, is there a JSON-LD document that could include a JSON “native” data type that also needs to be signed
… so if the only use case is to import GeoJSON
… do we need to worry

Ivan Herman: I have spent time on this issue with others
… aside from the canonicalization problem
… if we do make a native JSON type, we will have to put it into some namespace–rdf: or jsonld:

Rob Sanderson: +1 to RDF namespace

Ivan Herman: if we do that, we’ll have to write the SWIG mailing list, to announce the new datatype, etc.
… we can do this as part of our document
… the other problem is
… I did put a reference in the issue for the rules we have to follow when we point to something normatively
… my first reading is that unfortunately, this JSON canonicalization specification cannot be referred to normatively
… the second problem is bringing our own canonicalization into our document
… if we do that, I can safely say the Director would say no to that
… so, we can’t just take an IETF spec and put it into a W3C spec
… all of these are admin problems
… But I am still not convinced that we need the canonicalization as a normative part of our spec
… we could say that someone else may do this and reference forthcoming work
… but when the issue is that we have a JSON portion we want to store in RDF
… we can state that the only expectation is that [the same processor will produce the same output]
… none of the arguments that I heard is that canonicalization needs to be normative

Pierre-Antoine Champin: http://tinyurl.com/y2gmzxf8

Pierre-Antoine Champin: I was wondering about this example
… there’s an Integer in the non-canonical form
… would that be canonicalized or not?

Gregg Kellogg: yes, that would be canonicalized
… I don’t know any processors that would properly serialize that with a leading zero
… if you’re going to the internal representation
… it is the number 42
… some might do 42.0
… or 42E+0
… that would be fine, but I don’t think most JSON serializers would do that

Pierre-Antoine Champin: for the moment, we know how to sign this thing

Dave Longley: I think this falls into the same category as HTML
… it’s a string in the JSON; it’s not native HTML
… or a native number in the example’s case
… if we’re storing stuff in a string, then store it as a string
… but people want a native JSON object in their JSON

Pierre-Antoine Champin: but if you remove the leading 0 you don’t get the same signature
… so I’m assuming that the signature is dealing with the order or absence of order in the object when signed
… so if the object was a native JSON object, then it would already benefit
… and regardless we already have this problem with other string-expressed literals

Rob Sanderson: if you instead make it value 42.0
… since no one really serializes as 042
… whatever you change here will change the signature
… even though it will canonicalize as something different

Dave Longley: I disagree

Rob Sanderson: what do you disagree with?

Ivan Herman: I think in these examples, the current JSON-LD specification doesn’t say anything about what you put in strings
… we don’t suggest any sort of mini-canonicalization for things like this
… having built-in canonicalization for the native JSON representation
… would be a departure from what we’ve done previously

Dave Longley: my response to all that is that we have very consistent rules about moving non-string data into strings
… so we do have those sorts of specifications
… from a native JSON value into a string
… this same thing would exist for native JSON objects
… for things that come in via a string, those will stay as whatever that string is
… so strings have no issue
… so if you take pchampin’s example, and change it to a real number: 42

Gregg Kellogg: 42, 42.0, 42.0E0, 4.2E+1 are all the same number

Dave Longley: and if you put that in the playground, check the nquads tab, you’ll find the same number

Ivan Herman: yep I acknowledge that

Rob Sanderson: maybe then it’s the playground which is at fault
… I put in several examples, and the signature changes for all of these different 42’s as an integer

Dave Longley: you’re looking at the RSA signature, so you’ll see it change constantly
… because that injects random data
… what you need to look at is the N-Quads or normalized tabs
… the data there stays the same

Gregg Kellogg: this is in the data round tripping section

Gregg Kellogg: so, imo, if we create a datatype for JSON
… before there is a canonicalization for it
… then we’re in danger of doing things too early
… ultimately we need to deal with a canonicalized JSON

Pierre-Antoine Champin: +1

Gregg Kellogg: so the best thing we can do right now is nothing
… and defer this until there is a canonicalized form
… otherwise whitespace, object ordering, etc are all variable
… and the literals really won’t be worth doing any lexical representation is important
… better not to do anything until a canonicalization spec exists

Ivan Herman: my take would be milder
… the GeoJSON example doesn’t care about canonicalization

Rob Sanderson: +1 to ivan

Ivan Herman: with the canonicalization things differed
… and state that this feature is not recommended
… so we differ it, and if/when the canonicalization becomes standard or whatever, then we at that point suggest that that spec gets used

Rob Sanderson: it would be better to have a JSON datatype and state that later we’ll do canonicalization

Dave Longley: let’s provide rules for how to produce the JSON string that match the draft – but that you can do something else and be very clear it’s preferred that everyone do the same thing

Rob Sanderson: so we should start with JSON datatypes, and just suggest that you can’t sign these

Jeff Mixter: +1 to ivan and azaroth

Gregg Kellogg: if we don’t do canonicalization now, we don’t seem to be prevented from doing it later
… if we end up as a living spec, then we could do it that way
… and we could also suggest that for testing purposes it is always canonicalized

Rob Sanderson: a warning or a note?

Proposed resolution: Move forwards with a JSON native data type, with a warning that it cannot be canonicalized (Rob Sanderson)

Rob Sanderson: I’d suggest a warning

Gregg Kellogg: +1

Jeff Mixter: +1

Ivan Herman: +1

Rob Sanderson: +1

Simon Steyskal: +1

Pierre-Antoine Champin: +1

Tim Cole: +1

Dave Longley: +0

Benjamin Young: +0 still have concerns about eager misuse

David I. Lehn: +0.5

Jeff Mixter: I echo bigbluehat concerns but I also have very valid reasons to add JSON to RDF data.

Dave Longley: +1 to everything Benjamin is saying … but that we should really also have JSON literals … but they should also all be converted to the same strings in processors :)

David Newbury: +1

Resolution #3: Move forwards with a JSON native data type, with a warning that it cannot be canonicalized

Dave Longley: JSON literals can be an escape hatch but ONLY an escape hatch.

2. AOB

Ivan Herman: https://github.com/w3c/json-ld-wg/issues?q=is%3Aissue+is%3Aopen+label%3Aaction

Ivan Herman: this is a new feature, I reconstructed a long backlog; people should close the issues


3. Resolutions

4. Action Items