JSON-LD Working Group Telco — Minutes

Date: 2019-10-11

See also the Agenda and the IRC Log

Attendees

Present: Gregg Kellogg, Ivan Herman, Dave Longley, Benjamin Young, tim cole

Regrets: ruben taelman

Guests:

Chair: Rob Sanderson

Scribe(s): Dave Longley, Rob Sanderson, Benjamin Young

Content:


1. Scribe Selection

2. Approve minutes of previous call:

Proposed resolution: Approve minutes of previous call (Rob Sanderson)

Gregg Kellogg: +1

Rob Sanderson: +0

Dave Longley: +1

Benjamin Young: +1

Ivan Herman: +1

Resolution #1: Approve minutes of previous call

Pierre-Antoine Champin: +1

3. Announcements and reminders

Rob Sanderson: I wasn’t on the call last week, any follow ups on the discussion on implementations?

Gregg Kellogg: No discussion last week on implementations.

Rob Sanderson: Anything else?

Ivan Herman: I think chairs already have the mail but on the 27th Europe goes winter time but the US does not yet, but we should expect the 2 weeks of biannual big mess.

Dave Longley: November 3rd US

Rob Sanderson: Anything else?

4. Issues

4.1. Language-maps and @direction

Rob Sanderson: https://github.com/w3c/json-ld-syntax/issues/280

Rob Sanderson: With our introduction with @direction we can’t then use the direction in a language map because you can’t put the @value object as the value of the language map. We could change language maps to allow that but it’s very very late in the day and it doesn’t seem like there’s a big need. No one is clamoring for this particular functionality.
… Any further clarification?

Ivan Herman: No.

Gregg Kellogg: It’s pretty clear.

Benjamin Young: Just a quick note that I’ve never seen direction in a key – this language map format is semi common, even outside of JSON-LD, but never seen it.
… +1 to defer.

Gregg Kellogg: What you might imagine would be a combination of language and direction that would be similar for what we do with the i18n datatype. Where we separate language and direction with an underscore. I can imagine some future version that might do that. That might be more useful than allowing values that were value objects.
… There’s no demonstrated need to do this, so it’s reasonable for us to hold off until a need is demonstrated.

Proposed resolution: Defer use of direction for text in language maps to a future version (Rob Sanderson)

Dave Longley: +1

Ivan Herman: +1

Rob Sanderson: +1

Pierre-Antoine Champin: +1

Gregg Kellogg: +1

Benjamin Young: +1

Resolution #2: Defer use of direction for text in language maps to a future version

4.2. Relationship with RDF/JS dataset

Rob Sanderson: https://github.com/w3c/json-ld-api/issues/166

Rob Sanderson: Let’s do this one first because I hope it will be equally quick then we’ll do that.

Ivan Herman: It’s a little bit like the previous thing. It’s probably way too late to touch this, it’s a bit unfortunate that this is the case, I’m playing with something that’s using the RDF output of the jsonld parser in JavaScript. While I realized there is another API by the CG out there but I have to convert what I get from json-ld to that and it’s unfortunate, but it’s just too late.
… I think this is something we should put into the same bin as postponing … and there may be backwards compatibility issues and it’s not that simple to handle it.

Gregg Kellogg: I think perhaps we should lobby RDF/JS to include some additional interfaces to do it. We basically invented interfaces for datasets to support the needs of the algorithm to iterate over graphs and triples within that graph, the RDF/JS only allows iterating over quads.
… If there was a way to add something to create a subset of a dataset that just represented just the quads for a graph then we could use that and it would be great.
… Then we could align with these things, but given our need to separate the triples from the graph and that’s true for pretty much everything else, TriG for instance, it seems like it would be really useful for that interface to provide for that.
… If they had that we could use it, otherwise we just have to defer.

Dave Longley: +1 to gregg

Rob Sanderson: Given it’s a CG spec we can’t normatively refer to it.
… I’m also in favor of deferring.

Ivan Herman: And keeping it open.

Proposed resolution: Defer RDFJS integration to a future version, and work with the CG to ensure consistency (Rob Sanderson)

Gregg Kellogg: +1

Ivan Herman: +1

Rob Sanderson: +1

Dave Longley: +1

Pierre-Antoine Champin: +1

Benjamin Young: +1

Resolution #3: Defer RDFJS integration to a future version, and work with the CG to ensure consistency

Dave Longley: As an author of the JS implementation we’ve tried to get alignment as much as possible up to this point.
… With RDF/JS’s APIs.

4.3. Language tag normalization

Gregg Kellogg: https://github.com/w3c/json-ld-api/issues/167

Gregg Kellogg: This is a slightly different issue, but they are quite related. One is that JSON-LD requires that language tags be normalized to lowercase. All other RDF concepts says that implementations MAY do that. My implementation does that and it is important for SPARQL querying otherwise you have to implement that at query time which is impractical for most of us.
… That’s the genesis of that – the fact that JSON-LD makes it a requirement is a little odd but it makes it easier to test the results.
… Dave might have a recollection of that.
… My thought is that we pull back on MUST and go to MAY and add a provision in the testing README that says for testing purposes, implementations need to consider that language tags need to vary so they can properly report.

Ivan Herman: The problem I see with the current setup, if I do roundtripping, if I do one of the various algorithms and I take the outcoming one then it will all be normalized to lowercase. This is not what i18n people would expect (e.g., they want zh-Hans)

Gregg Kellogg: It’s not just RDF roundtripping, it happens in expansion.

Ivan Herman: Yes, and the problem is that the habit in the i18n community is for whatever reason to say “en-UK” in capital letters. There are all kinds of additional rules that users may expect to see and that they use when they create JSON-LD and it will go away when it roundtrips and this may be surprises.

Rob Sanderson: One thing to consider – while, technically worth the proposal, we should consider the backwards compatibility with 1.0 that we’re changing a requirement to reduce it down to a MAY.
… Any processor that’s relying on that requirement may stop functioning.

Gregg Kellogg: That should properly exist in the normalization algorithm, it can sign any RDF, so if you’re coming from any other format, so there is no guarantee coming from other syntaxes.

Dave Longley: Was just going to say it does need to be normalized for signing. Can put it to a different layer, but we can’t prohibit it
… tried to make it clear that normalization can happen and does need to with a concrete syntax
… when you’re serializing you need to output lowercase so it’s the same binary stream of bytes to sign

Ivan Herman: only when you do a signature
… but if you’re just producing turtle there’s no requirement

Gregg Kellogg: The output of the normalization routine is quads with special for bnodes

Ivan Herman: Question arose is that even if we agree to do it in an upper layer, the question to answer is there software that will break?
… other implementations of the signature for example

Gregg Kellogg: Not part of JSON-LD. If you were to normalize from another serialization it would still exist

Dave Longley: You are right. Not sure if we’re changing from MUST or MUST NOT … that would also have an impact
… but as gregg is saying, the libraries should be normalizing

Gregg Kellogg: Recommendation is to follow RDF concepts and go with a MAY

Rob Sanderson: Any other thoughts?
… General agreement that the canonize libraries should be outputting lowercase anyway and we can remove this hard restriction.

Gregg Kellogg: It seems like we should resolve to do it. It probably means creating a lot of changes to the test suite.

Proposed resolution: Change requirement to lowercase language tags to be a MAY from MUST (Rob Sanderson)

Ivan Herman: +1

Tim Cole: +1

Rob Sanderson: +1

Pierre-Antoine Champin: +1

Dave Longley: +1

Benjamin Young: +1

Gregg Kellogg: +1

Resolution #4: Change requirement to lowercase language tags to be a MAY from MUST

Rob Sanderson: There was a suggestion to do SHOULD…

Gregg Kellogg: No, I don’t think we should do that, this aligns with other RDF serializations.

4.4. Language tag values and validation

Rob Sanderson: https://github.com/w3c/json-ld-api/issues/167 (still)

Gregg Kellogg: The other thing – whether language tag values need to be validated.
… Currently the specs say different things. The definition says the language tag MUST be well formed according to BCP47. But the algorithm says implementations must not attempt to fix any invalid IRIs or language tags. I think that’s been interpreted to not validate language tags.

Ivan Herman: Why is it there?

Gregg Kellogg: I can see why you shouldn’t fix URIs, but not language tags.

Dave Longley: I don’t recall either.

Gregg Kellogg: JSON-LD Processors MUST NOT attempt to correct malformed IRIs or language tags; however, they MAY issue validation warnings. IRIs are not modified other than conversion between relative and absolute IRIs.

Gregg Kellogg: Clearly, to be consistent, to say they must be well formed then the algorithms should be updated and a version of Ruben’s PR should be updated.

Rob Sanderson: I agree we’re inconsistent. For the correcting, is the implementation that if there was say, an errant space, “en “ that you shouldn’t try to guess that it was just supposed to be “en”?

Gregg Kellogg: That’s what the current text says. I believe the new text would say that that form of a language tag would be invalid and that processing for that step … either the algorithm would be aborted at that point or that particular language tag would be ignored.
… I think for it to be an error would be that the promise would be rejected.

Ivan Herman: I think my feeling is that – the promise is rejected but the whole JSON-LD processing throws up its arms or the language tag is ignored?

Gregg Kellogg: If we do that, the only way we have to do that is to issue a warning. Rejecting the promise means we abort the whole step.

Rob Sanderson: The question to me … is this more like a syntax error, in which case we should abort because it’s rubbish.

Gregg Kellogg: For any other RDF serialization it’s an error.

Rob Sanderson: Or is it more like an undefined property name at which point we’d ignore it.

Pierre-Antoine Champin: I think we’re very much in the case where we might break schema.org data. We have no data that people out there are using proper language tags and old processors accept them than we might make Dan Brickley very angry and rightfully so.
… I would lean towards ignoring them as undefined properties.

Gregg Kellogg: 1+

Dave Longley: I’m of two minds. If this was a fresh spec without implementations, theres no question we should abort

Ivan Herman: +1 to dlongley

Benjamin Young: +1 to dlongley

Dave Longley: if we’re concerned about existing systems, we should keep it rather than dropping it

Gregg Kellogg: I think that’s a compelling argument. We should issue a warning but continue processing. The exception would be for the i18n datatype and compound literal, that would be an opportunity to lock that down. What we currently say is that we can’t generate any invalid triples.
… In the case that this was a language tag that didn’t match the REGEX then it would be reasonable that we reject one coming from the i18n URI or the compound literal as a hard failure.
… That being the case, for warning purposes and rejection purposes, there are two options of REGEXes for how to verify that.

Ivan Herman: Yes, Dave’s argument is compelling. The only thing we can say without any problems – is that the processor MUST issue a warning from now on that this is illegal. But we should not stop processing.

Dave Longley: +1 not testable

Gregg Kellogg: We don’t have a way to test for warnings.
… From Pierre-Antoine and Dave both – I don’t know how much data is there from schema.org … we could say if you’re in a hard 1.0 mode you let it pass but reject in 1.1 but that flies in the face of what we’ve been doing.
… We drop triples that are invalid and that’s it, that’s the current behavior.

Pierre-Antoine Champin: If we did specify a way to turn warnings into errors, and that’s often a standard kind of thing, we could force those tests to use this mode and get the error and that would make it testable.

Gregg Kellogg: I think the way we’d do that is to add an option to treat language tag problems as errors.
… There are other places we generate warnings.

Rob Sanderson: The current behavior, per the issue, is that current processors may issue validation warnings but they don’t have to.

Dave Longley: I would only support doing a hard error if we found that existing implementations already did it.

Gregg Kellogg: I think since all expansion tests are toRDF tests – anything that failed to transform would have been caught.

Proposed resolution: Update the API conformance section to say that invalid language tags SHOULD generate a warning (Rob Sanderson)

Ivan Herman: +1

Rob Sanderson: +1

Benjamin Young: +1

Gregg Kellogg: +1

Tim Cole: +1

Pierre-Antoine Champin: +1

Dave Longley: +1

Resolution #5: Update the API conformance section to say that invalid language tags SHOULD generate a warning

Gregg Kellogg: The question becomes how do we validate language tags, there are some different REGEXes out there.
… RDF might accept something that is not a valid language tag … because it’s REGEX is more loose.
… In the RDF translation it’s not just a warning.

Gregg Kellogg: [144s] LANGTAG ::= "@" [a-zA-Z]+ ( "-" [a-zA-Z0-9]+ )* (from Turtle)

Rob Sanderson: +1

Gregg Kellogg: obs-language-tag = primary-subtag *( "-" subtag )

Gregg Kellogg: primary-subtag = 1*8ALPHA

Gregg Kellogg: subtag = 1*8(ALPHA / DIGIT)

Ivan Herman: Turtle is much looser – the regex from BCP47 is way more complicated, multi line, etc.

Gregg Kellogg: There may be something even more complicated than what’s in BCP47.

Pierre-Antoine Champin: I wanted to point out – Ivan you pointed me at the awful regex and that one is trying to qualify the different parts of the BCP47. It was for capturing the various parts in a language tag, the other one for validation is much simpler.

Proposed resolution: Use the less strict RDF regex, not BCP47’s, to determine validity (Rob Sanderson)

Rob Sanderson: +1

Pierre-Antoine Champin: +1

Dave Longley: +1

Ivan Herman: +1

Tim Cole: +0

Gregg Kellogg: +0

Gregg Kellogg: Given that, I’d vote for BCP47 … I think the only difference is in specifying it, it’s not a real difference.
… It’s only compatible when other syntaxes have accepted something that’s incompatible with BCP47.

Benjamin Young: +1

Ivan Herman: I think we would get problems if we weakened our check from before. From the i18n people in particular.
… They checked Turtle.
… It might be a bug report for Turtle/on the RDF specs.

Gregg Kellogg: Because they can accept things that aren’t legal BCP47.

Proposed resolution: Use stricter BCP47 syntax and file an errata for RDF to request an update to its definition to match BCP47 (Rob Sanderson)

Ivan Herman: The i18n review was ineffective … it was much weaker. Let alone the issue around direction. This also shows it was weak, I didn’t realize that but that’s the way it is. I think it’s a bug over there.

Rob Sanderson: +1

Dave Longley: +1

Ivan Herman: +1

Gregg Kellogg: +1

Pierre-Antoine Champin: +1

Tim Cole: +1

Benjamin Young: +1

Resolution #6: Use stricter BCP47 syntax and file an errata for RDF to request an update to its definition to match BCP47

4.5. Lazy evaluation of processing mode

Rob Sanderson: At TPAC, we discussed with Dan Brickley and others some of the requirements for enforcing a processor mode and why we have @version and why it’s a number and so forth. Manu was there and the discussion was that there are times that you need to enforce the processing mode and it’s valuable and other times you don’t need it and you can do a lazy upgrade.
… We can determine this via the test suite because the tests are flagged with version/processing mode. It would therefore be better for backwards compatibility to allow 1.1 to upgrade without requiring the other 1.1 things.
… So it would be better that you must use 1.1 processing vs. only doing it when you see version 1.1 declared.

Gregg Kellogg: That is essentially it. The issue is, what happens when a 1.1 processor does differently than a 1.0 processor because it fails to detect those features and the sense of the group was that that consideration is in publisher’s hands by choosing to use 1.1 features and not putting 1.1 in their context they are saying they are ok with that.
… For the usage of their particular data. Whereas we would encourage that they do it, we won’t enforce it.
… The benefit of that is that we can get much more incremental adoption of 1.1 features with minimal downside.

Ivan Herman: The only additional thing we said at TPAC is that, to be careful, we would mark this lazy eval as At Risk – because there was a general feeling that we might discover it’s too complicated for implementation reasons and therefore we drop it and do it the strict way.

Dave Longley: I wasn’t at TPAC, but I’m in favor of trying to get more 1.1 adoption
… I’m in favor of it being marked “at risk” in case it doesn’t work out
… I think this is a good path forward to find out if we can make it happen

Tim Cole: the “at risk” approach sounds right to me

Ivan Herman: This should be put into the document.

Action #1: update docs regarding lazy evaluation of processing mode (Gregg Kellogg)

Rob Sanderson: Any other business?
… That’s it, thanks to Dave for scribing, we’ll have a call in another week, thanks to Gregg for all the work.

Dave Longley: +1


5. Resolutions

6. Action Items