JSON-LD F2F Meeting, 1st day — Minutes
See also the Agenda and the IRC Log
Present: Ivan Herman, Rob Sanderson, Leonard Rosenthol, Benjamin Young, Simon Steyskal, Gregg Kellogg, Antoine Roulin, Adam Soroka, Dan Brickley, Harold Solbrig, Dave Longley, Christopher Allen
Guests: Luc Audrain, Jean-Yves Rossi, Pierre-Antoine Champin, paul, Ganesh Annan, Dmitri Zagidulin
Chair: Rob Sanderson, Benjamin Young
Scribe(s): Gregg Kellogg, Rob Sanderson, Adam Soroka, Leonard Rosenthol, Benjamin Young, Pierre-Antoine Champin
- 1. Introductions
- 2. general remarks for the meeting
- 3. JSON Syntax
- 4. Joing meeting with WOT WG
- 5. Resolutions
- 6. Action Items
Ivan Herman: Ivan Herman, I’m the W3C Staff Contact. I used to be Semantic Web activity lead, but not now.
Rob Sanderson: Rob Sanderson from the J Paul Getty Trust. Co-char with Benjamin. Formerly with annotations.
Benjamin Young: Benjamin Young with Wiley and sons, co-chair.
Gregg Kellogg: I’m Gregg Kellogg, have been on a number of groups, first was RDFA. Working on JSON-LD 1.0 in RDF 1.1, worked with Ivan on those and for CSV on the Web
… an underutilized format for representing information in CSV. Co-chair on the JSON-LD CG where we pushed in some new features that we’ve inherited. Acting as only editor, hopefully can get someone else to come and help
Pierre-Antoine Champin: Piere-Antoine Champin: formerly RDF WG, considering joining the group
Jean-Yves Rossi: I’ve discovered the W3C 4 years ago. One of our R/D topics was rules and regulations and looked into RDF to navigate those rules. Different state members; we’ve been investing for the last 2 years arond this topic.
Antoine Roulin: Antoin Roulin, working in the European legislation mapping.
Luc Audrain: chair in the publishing activities, co-chair of the publishing business group in the W3C. Here as an observer, for now.
Leonard Rosenthol: Leonard Rosenthal at Adobe. I also serve as chair on XMP using JSON-LD. Also, Adobe using JSON-LD.
paul: Paul from XML5 forum in Korea, I’ve been in social web working group. I’m interested in introducing JSON-LD to Korean developers. This is my first time to join a F2F.
Adam Soroka: Adam Soroka from Apache, member of the WG.
Simon Steyskal: Simon Steyskal, research scientist at Siemens Austria and also with Vienna University of Economics and Business together with Axel Polleres; prior to this WG I was part of the SHACL WG, ODRL WG and now also part of the Data Privacy Vocabulary CG
2. general remarks for the meeting
Rob Sanderson: agenda link: https://docs.google.com/document/d/1qTLztv7nqbYuUsZbwhPhOyG5tHTJrTt9tGKWnD5Xa5A/edit#
Rob Sanderson: We’ve tried to keep the sessions to 90 minutes and then have breaks along with everyone else.
… We have 2 joint sessions, one from 15:30 with WoT who have slides.
… The second is from tomorrow from 11:00-12:30 around profiles and content negotiation. There’s a best practice as using the context as a profile in the media type. They have a work item to standardize these things.
… We’ve tried to front-load the complicated stuff until today, while we’re somewhat fresh.
… Starting with the RDF issues that will set the stage for the other issues.
Rob Sanderson: guiding principles: https://www.w3.org/2018/json-ld-wg/WorkMode/guiding_principles
Rob Sanderson: The guiding principles are “stay on target”, prod/consumption of JSON-LD. Make it easy as possible for a wide-vaviety of developers to understand JSON-LD without having to understand RDF.
… Require real use cases with actual data.
… Require two supporters of each use case.
Dan Brickley: Dan Brickley, work for Google run schema.org project. Complain a lot about JSON-LD but love it.
Rob Sanderson: A single WG may bring an issue that has multiple supporters within that WG.
… Consistency is good.
… Usability determined by end-users not developers. The serialized JSON should be as directly usable by users.
Dan Brickley: there are Google engineers who write parsers, I don’t mind making them work. Publishers need to have it easy, because they use cookbooks.
Rob Sanderson: there don’t need to be 100 parsers, but 100s of users that make use of it.
… It should also be easy to produce the JSON.
… Provide on-ramps. Barrior to entry should be low.
… Define success, not failure. Leave things open.
… Follow existing standards and best practices.
… New features should be compatible with the RDF data model. Although 1.0 has a model which is a super-set, new features should be round-trippable through RDF.
Adam Soroka: I don’t think people disagree with that.
Rob Sanderson: If we end up in a stalemate, we come back to the principles to try to resolve.
Dan Brickley: I don’t see privacy, can you articulate that.
Adam Soroka: It comes from other groups as well, such as WoT.
Action #1: add privacy and horizontal review to guiding principles (Rob Sanderson)
Dan Brickley: I’m concerned about the remote context.
Rob Sanderson: the direction issue: https://github.com/w3c/json-ld-syntax/issues/11
Ivan Herman: It would be worth discussing language/direction issues as discussed in the Publishing WG. There, we will adopt something more general. It doesn’t affect JSON-LD, although we’ll need to specify something someplace.
3. JSON Syntax
3.1. Consider obsoleting use of blank nodes for properties and “generalized RDF” #37
Rob Sanderson: the first block of issues is around JSON syntax vs JSON-LD.
… obsoleting the use of blank nodes for properties.
Ivan Herman: link: https://github.com/w3c/json-ld-syntax/issues/37
Rob Sanderson: we’ve discussed this and resolved it as accepted
… after which Manu objected, on the grounds that our charter does not permit us to deprecate extant features
Ivan Herman: I’d prefer to actually discuss this with Manu himself
Rob Sanderson: Manu has delegated all the JSON-LD stuff to Dave Longley (who is not present at this time)
Ivan Herman: we can close it again, but Manu will just reopen it again
… I believe that his justification was basically wrong
Dan Brickley: Can this be used for property graph -style usage
… in which properties can have properties?
Ivan Herman: JSON-LD doesn’t have that structure by itself, so you would have to rely on the RDF side, and it wouldn’t be valid there.
Gregg Kellogg: If RDF supports that in the future (and work is ongoing there) and it was understood to be the correct way to do it, you could do it
Dan Brickley: Wouldn’t be good to take something out when in a year it might be very handy?
Leonard Rosenthol: How do you define deprecation? In ISO, we say “Don’t write it, but you can read it”
Ivan Herman: There is, afaik, a formal definition for deprecation at W3C, but we don’t have it at our fingertips.
… we’ll check into that
Gregg Kellogg: One of the issues with deprecation is w.r.t. to semantic versioning. 1.0 vs. 1.1 vs. 2.0. We’re at 1.1, not 2.0.
… we would still, by deprecating, be taking it out eventually, and Manu would probably object to that.
… I’d rather add descriptive notes using “SHOULD NOT” language
Benjamin Young: I work on a graph system, levelgraph.io capable of provenance for each triple, but none of that survives into JSON-LD.
Benjamin Young: https://www.w3.org/Data/events/data-ws-2019/cfp.html
Benjamin Young: we do need to address that, to make it possible for those systems to use JSON-LD
Dan Brickley: earlier literals as subjects discussion, https://www.w3.org/2001/sw/wiki/Literals_as_Subjects
Pierre-Antoine Champin: Recently, some proposals have come out showing research into the “use bnodes to add properties to properties” technique. And there are “spaces” in the SPARQL rec that allow, e.g. literals as subject. This is all generalized RDF, but is that the thing to which you want to round trip?
Dan Brickley: for the record, property graphs have drivers for a lot of databases beyond neo4j
Dan Brickley: http://tinkerpop.apache.org/ lists 24 graph data systems with some kind of property graph interface
Ivan Herman: property graphs are great, but this WG should not have an issue for encoding property graphs in JSON-LD
… that’s a 2.0 thing.
… maybe in a year.
… but discussion now is very premature, not appropriate for this WG
… my problem with bnodes as properties is demonstrated by Manu’s reaction: one is liable to believe that one has solved a problem when the solution is, in fact, wrong.
… Manu is using the preservation of bnode labels from scope to scope as a feature.
… it’s not. It’s a particularity of some implementations.
… he’s using a syntactical quirk as support to solve another problem.
… usages of bnode-predicates is only possible within RDF/OWL derivation
… certainly not a technique that anyone should be using in data, we should stick to what is in RDF
… after the upcoming Berlin workshop, perhaps this could be discussed for 2.0, but not now.
Dan Brickley: I am hearing the word “deprecate”, and that is scary for something with the audience of Google.
… we could put versioning into resources we control (e.g. the schema.org context), but we won’t do that if it’s liable to irritate or confuse millions of people
… we don’t use the bnode-property thing ourselves, at Google.
… but schema.org folks have real interest in qualifying graph edges, so if this could help with that…
Ivan Herman: there are other ways to do that
Dan Brickley: that would be fine.
Dan Brickley: we originally objected to the charter for this group, because we felt it was justified in terms of success for which Google bore responsibility. And the google-promoted schema.org rollout is powered by millions of sites who don’t follow w3c latest developments religiously. It would unfortunate to start deprecating what they do, so quickly.
… we don’t want to injure our users!
Ivan Herman: the CG would normally be the place where these things are discussed
Gregg Kellogg: the CG is not very live now
… I wouldn’t want to cut off property graphs. let’s put it on the side and deal with it at a later date
… specifically after being informed by the upcoming workshop
… bnodes-as-properties was not meant in support of property graphs
… it was more about worries that demanding an IRI for every property would be a bridge too far for many developers.
… I support Ivan’s point that the use Manu suggests is technically wrong
… it’s a bad idea. there is an expectation from users that labels won’t change, but we don’t want to reinforce that
Ivan Herman: the canonicalization algorithm would change those labels anyway
Gregg Kellogg: there are toggles for doing that, but usually people say that canonicalized datasets are text strings, not abstract syntax
Benjamin Young: This issue is for people who write contexts, not content (not instances), this is about a much smaller community than everyone who publishes JSON-LD
… the confusion that can result from bnode labels not being preserved does need to be recognized and dealt with
… we need to make sure that people understand what’s happening and why they need stable identifiers, perhaps in the forthcoming primer
Ivan Herman: we gave examples of this happening in the rec, and we shouldn’t have done that
Rob Sanderson: the original JSON-LD 1.0 use case would be covered by being able to use relative IRIs for @vocab
… we should continue as we have chosen, using the term “obsolete” as HTML does—to mean “you can use this, but conforming processors will throw a warning”
… and we can ask Manu and danbri whether this really is used in the wild.
Dan Brickley: Pretty sure it isn’t.
… but hard to say for sure.
Ivan Herman: Should we close?
Dan Brickley: We could put a “feature at risk” on it.
Gregg Kellogg: That would be good because future people who aren’t represented in this discussion could object
Dan Brickley: Can we adjust the language to not tell people they’re WRONG, but that this has implications for what you can do with your JSON-LD?
Ivan Herman: It is just wrong.
Benjamin Young: It may be wrong from a OWA perspective. “Culturally” wrong.
Gregg Kellogg: We can’t pretend that this was experimental. We didn’t mark it as such.
… there’s a good bit of explanatory text about this in 1.0. We could certainly remove that.
Proposed resolution: Mark bnode as property as a Feature At-Risk for the next WD, with a view towards marking as Obsolete with a warning in 1.1 (Rob Sanderson)
Gregg Kellogg: A suggested URN-based replacement for this would be good.
Rob Sanderson: +1
Benjamin Young: URNs can be as fragile as bnodes, but there is a whole world of identifiers out there
Ivan Herman: +1
Simon Steyskal: +1
Benjamin Young: +1
Ivan Herman: Every resolution is temporary for a week, so we can do this and still have discussion with Manu et al.
Gregg Kellogg: +1
Pierre-Antoine Champin: +1
Adam Soroka: +1
Luc Audrain: [abstain]
Dan Brickley: +0.666_
Resolution #1: Mark bnode as property as a Feature At-Risk for the next WD, with a view towards marking as Obsolete with a warning in 1.1
Pierre-Antoine Champin: I don’t think the JSON-LD spec is the right place to solve the bnode-property question
Rob Sanderson: Since we have danbri for now, let’s move some relevant issues up from the afternoon
3.2. Content addressable contexts #9
Ivan Herman: link: https://github.com/w3c/json-ld-syntax/issues/9
Ivan Herman: related: https://github.com/w3c/json-ld-syntax/issues/20
Rob Sanderson: the issue from my POV is that it would be nice to have a feature that let’s context be “sealed” or “frozen”
… this lets people identify an immutable context
… this could prevent a huge amount of people retrieving context over the wire
Dan Brickley: cf https://www.w3.org/RDF/Group/Schema/openissues.html#c12 “Closed 19980707 - deferred until 1.1; M+S to supply a mandatory extensions mechanism” (“We would like to define a mechanism for ‘sealing’ an RDF Class, so that it becomes illegal to make certain RDF statements involving it. This is loosly analagous to the notion of ‘final’ classes in Java / OO programming. …”)
Rob Sanderson: as well as preventing people from overriding term expansion
… since choosing an expansion is “last in wins”
Adam Soroka: ivan and gkellogg: Those are two different issues!
Rob Sanderson: https://github.com/w3c/json-ld-syntax/issues/9 is the first part of what I was describing
Benjamin Young: Immutable header https://tools.ietf.org/html/rfc8246
Rob Sanderson: I believe there is no argument about this being a good functionality—the question is just, how do we do this in practice?
… is this an HTTP thing?
Ivan Herman: We are overloading the term “frozen”.
Dan Brickley: Similar metaphors, but not the same.
Benjamin Young: There is an identifier-based approach, and for schema.org the semantics have changed
… over time
Dan Brickley: well, it’s always meant schema.org, but we have resolved it in different ways, because of the history
… we have just kept adding terms, and the context just keeps growing
Gregg Kellogg: You might be able to pare it down with @vocab
Dan Brickley: It’s rarely been the case that we don’t have to touch it for a long period of time
… and there’s constantly things to fix
Benjamin Young: That’s the POV that this identifier will always get you the most up-to-date, “best” version of the resource
… but for the way I work, which uses separate context documents for different versions, would behave completely differently
… so one proposal is to bake version into the identifier, and another is to seal the context
Dan Brickley: Not too worried about that part of the problem: It’s not inconceivable that we might maintain a couple of high-value vocabs, but much of schema.org isn’t used now
… and for the sealed context question, another approach– perhaps allow to specify import order and thereby prevent inappropriate overrides
Gregg Kellogg: looking at the mechanism by which a processor can be encouraged to not constantly be fetching context,
… . HTTP has cache-control, and we suggest using it, but that doesn’t seems to be enough.
… so can we add some of the HTTP semantics right inside the context?
… e.g. “this is valid for this time, do not refetch within that time”
… processing tools could even throw on an inappropriate retrieval
… we’d also like to be able to “preload” contexts (prefill a cache).
Dan Brickley: We are reliving the XML Catalog discussions from couple decades back, https://en.wikipedia.org/wiki/XML_catalog
Ivan Herman: I am afraid of conneg and HTTP headers, which are great, BUT many users cannot control them
… relying on that is attractive to a spec, but may not be a good solution
… so gkellogg’s suggest is uglier than relying on HTTP headers, but way more secure
… the schema.org example shows us that many context cannot be sealed.
… the “freezing” (no bad overrides) things—I’m guilty of that myself
… we do that on top of schema.org in publishing
Dan Brickley: We’re fine with that!
Ivan Herman: sealing a popular context could create problems for that
Dan Brickley: Sure, we can do conneg. But it’s frustrating as a vocab provider. For the rest of the site we are moving to pure static hosting.
… we would like our work as vocab providers to end with creating files
Leonard Rosenthol: Other vocabulary providers have a different approach. In our defns and vocabs, we have a policy that says
… once this is stable, any future version must be backwards-compatible
… can’t remove fields, can’t change value types, etc.
… in part because we didn’t want to deal with versioning
… but we also maintain a registry, so at any time anyone can go and get a schema
… so we get some of the benefits (e.g. we don’t have to do unneeded fetches)
… we don’t have to worry about the versioning because of the policy
Dan Brickley: two things where I’ve been raising concerns around @context
… when you’re a search engine, you may not have all the resources in hand at once
… also, @context doesn’t work for IoT, where devices are resource-constrained
… they don’t want to be fetching stuff across the network
… not sure whether “frozen” or “sealed” is the right metaphor
… but the assumption that fetching a @context is trivial isn’t necessarirly a good one
… 1.1 should not force many fetches, e.g. because of privacy concerns
Ivan Herman: understood, but how do we avoid the problem?
Dan Brickley: We shouldn’t allow the difficulty of the problem to prevent us from facing it.
Rob Sanderson: We’ll be talking to the IoT folks this afternoon.
Leonard Rosenthol: Why (other than parsing) do you need to fetch contexts?
others: Can’t interpret the data at all w/o the context
Leonard Rosenthol: Doesn’t mean that you have to fetch it, does it?
Dan Brickley: We ran into this with RDF/XML—we forced you to put all the semantics into the instance. Made the syntax much disliked.
Benjamin Young: https://json-ld.org/spec/latest/json-ld-api-best-practices/#cache-context
Dan Brickley: JSON-LD went the other way
Benjamin Young: There is to be a best-practice document, which can discuss these issues.
… the docs may say you can fetch all the time, but the tools understand how bad an idea that is.
Dan Brickley: ajs6f, to be more explicit: it isn’t even the semantics (in terms of the rdfs/owl model), it is very basic graph structure issues. Does “foo”:”bar” expand to “bar” or
Benjamin Young: I manually solve this problem, and it’s sort of like what we would want
… but we haven’t encouraged people to create workflows that are safe in that way
… users don’t necessarily even know that they should know HTTP at the depth needed for control caching
Luc Audrain: in Publishing, we are using JSON-LD to build metadata about books, and push it to webpages. We have, today, standards like ONIX, which are in use all over the world. But we know we have qualifications on our metadata. E.g. we might send a link to a cover image. When the book goes on sale, we might change the image to include the new price. We assume people refetch to get the updated image. It’s a similar problem. It would be nice if we could provide this.
Rob Sanderson: That’s at a different level, the level of instances. They will always require refetching. That’s not quite within our work.
paul: Can someone define the term “sealed”?
… who makes that decision?
Rob Sanderson: the publisher would decide that.
… that’s the feature that prevents overrides.
… for the other feature (content-addressable contexts)…
… to the extent that we want to include HTTP-type features to control/prevent fetching
… What is the extent to which we should go, to make that easy for devs?
Dan Brickley: In RDFS, W3C Working Draft 14 August 1998 https://www.w3.org/TR/1998/WD-rdf-schema-19980814/ we wrote “”"”Since an RDF Schema URI unambiguously identifies a single version of a schema, RDF processors (and Web caches) should be able to safely store copies of RDF schema graphs for an indefinite period. The problems of RDF schema evolution share many characteristics with XML DTD version management and the general problem of Web resource versioning.”
Dan Brickley: “Is is expected that a general approach to these issues will presented in a future version of this document, in co-ordination with other W3C activities. Future versions of this document may also offer RDF specific guidelines: for example, describing how a schema could document its relationship to preceding versions.””””
Ivan Herman: there is an analogy that might inspire us
… it’s like the way that CSS handles fonts
… they have an indirection. I define a font symbol, to which I can attach URIs or filenames
… the processor picks the first that is available.
… . so e.g. I could perhaps make a list, with my file of schema.org first, then the network address.
… . that’s what CSS does—they have the same problem (in their case huge font files)
… how could we shoehorn this into the syntax?
Leonard Rosenthol: You’d have to change the meaning of @context
Gregg Kellogg: The fact that there is a practice in CSS is good.
… piggybacking on an existing pattern in CSS will make arch review easier
Dan Brickley: ajs6f: it may be worth talking to the CSS folks to see how that feature is working out in practice
Rob Sanderson: Seems like something we could put into the API spec.
… if cache management was part of the API
… as opposed to reimplementing another document loader
Ivan Herman: -> https://www.w3.org/TR/css-fonts-3/#font-face-rule Font face rule in CSS
Leonard Rosenthol: The problem w/ putting it into the API is that you’ve got to think of the various devices on which it might run– what about embedding devices, e.g.?
… the API must be flexible enough to let clients control the amount of caching that takes place.
Simon Steyskal: https://drafts.csswg.org/css-font-loading-3/ ?
Gregg Kellogg: The API spec does permit that—we only spec the relevant behavior
Ivan Herman: There are some nice examples from the CSS doc linked above
Dan Brickley: Is anyone here tracking the WebPacakging work?
… seems to address the problem of leaking URLs
… i.e. queries from Google
… it’s also about what happens when you don’t retrieve from the resource’s identifier, but from somewhere else, which is the same problem
… different timescales to cultural heritage, bu the same underlying problem!
Ivan Herman: Yes, but IIUC not something to change in the spec
Benjamin Young: the thing that is “hiding” in the WebPackage is that you don’t get the content from the content owner.
… there is no way to discover the webpackage from the live URL
Dan Brickley: There are many ways to get content. How you get content is a different question from JSON-LD– we don’t have to solve it.
… we might look at this in terms of parser APIs. How do parsers report the timeliness of their operations?
… we might want to say that JSON-LD doesn’t mandate the only parser APIs—to create some space in which to expand/eplore
… I want the schema.org contexts to be available into the future…
hsolvig: The ShEx spec states that to be valid JSON ShEx, the context must have a specific URI
… that muddles the identity of the operation with the identity of a resource involved therewith
… what do we do with contexts that may never resolves?
Gregg Kellogg: For ShEx on the web, we established a media type. Then we gave rules for that media type.
Rob Sanderson: Seems like there’s a general feeling that there are many possible solutions to this issue, and baking something into the specs might be shortsighted.
Dan Brickley: the more I think about it, the more I like the idea of the poor parser just reporting what it did, weaknesses and all
Ivan Herman: We could almost verbatim copy what CSS did
Adam Soroka: gkellogg and leonardr: but that would break extant parsers
Ivan Herman: No, we could add more syntax for this purpose, not change @context
Dan Brickley: Who uses this
Ivan Herman: Whoever publishes data.
Dan Brickley: That’s tens of millions of sites, thousands of which will be hacked….
Benjamin Young: it’s different from fonts—you have fallbacks in that case, and @context is more central to JSON-LD than fonts are to CSS
Adam Soroka: azaroth et al: COFFEE!
Leonard Rosenthol: azaroth : we should take 15m to continue on the content addressable context to find a resolution/direction
Benjamin Young: it would be good to have a decision around the approach
… cache loading into the instance (eg. font-face), http-level info (in the context doc)
Gregg Kellogg: there is some precident on http changes
Rob Sanderson: or we can change things at the API level (since we have doc loading there)
… so that library writers bear the cost
Gregg Kellogg: one thing we coudl do is change the doc loader about how it looks for the context - eg. prescribe the specifics - as well as specific support for “side lookups”
… that would lead to greater consistency
… doc loading is in the specs
Benjamin Young: API spec area about Document Loading https://w3c.github.io/json-ld-api/#remote-document-and-context-retrieval
Rob Sanderson: can we promote (make more obvious) the doc loader in the API doc?
… perhaps make it a top level item?
Gregg Kellogg: this comes back to best practice. Algos described in API.
… our practice has been to put a thin veneer over the APIs but didn’t do that for doc loader. But we could
… there is a non-normative features section that we could also use. (eg. “do it this way”)
… we could describe that sources of contexts may require excessing loading (etc.)
… what does WOT do?
Rob Sanderson: don’t forget privacy/security
Gregg Kellogg: there are things we expect them to do (from a syntax perspective), and they won’t read the API doc
Benjamin Young: the section “remote doc…” could easily be renamed. (and I like everything y’all were saying)
… also want to make sure we highlight the security./privacy stuff (not just about “remote”)
… also a bunch of work in best practices to explain this. (and nothing in syntax)
Adam Soroka: do we want to talk about other situations where we retrieve docs? or is it just context?
Gregg Kellogg: there is also a section on expansion which talks about remote IRIs and documents.
Adam Soroka: so maybe we should not just put this in terms of context
Gregg Kellogg: but you migth not want the entire JSON-LD doc reloaded each time either, so cache-control would be useful there too. (does this belong in the body)
Benjamin Young: my data shouldn;’t be that smart (about caching)
Rob Sanderson: the data carries ontology and not necessary processing
… to ivan’s point about fallbacks. Today there is a callback (which shoudln’t need to change) and the list of things to retrieve (which we would)
… unless there is a way for the doc loader to get a list of URLs to work from (in order of precedence)
Ivan Herman: I produce my data an want to publish it, and I have a local context file and will point to it first and then a remote version.
Gregg Kellogg: context-path/reference, it could interact with the doc loader
Ivan Herman: but we need a syntax too
Adam Soroka: do the things got to an existing doc loader or do they find a doc loader?
… the doc loader is smarter
Gregg Kellogg: an array of strings is fairly simple to understand
Adam Soroka: but what if there are other policies that control the order the array is processed
Gregg Kellogg: we still have a year+ to figure otu details
bigblueha: we need a proposal to work from
Leonard Rosenthol: azaroth : question for ivan
bigblueha: in what situations would the data provider want to provide the “context set:” rather than the processor?
Ivan Herman: if I set up a WoT in my home, I have to manage them. I might include a local copy of schema.org there to avoid them all having to reach outside the network.
Gregg Kellogg: but that’s an http proxy
Ivan Herman: but it might also fall into caching model
Gregg Kellogg: webpacking and activity streams could also help to improve caching
Adam Soroka: even a refrigerator might have a filesystem where I know what the URLs on it are, so I can point to those as local cached copies
Rob Sanderson: 15m done.
… there is agreement that we want to promote the doc loader in the API spec into something more visible and point to it from other docs (esp. about privacy/security)
… we have discussed the idea of “context list” (ordered lists of contexts) where the loader can try to find things from these multiple items
… but we need requirements and use case to help understand what to specify and where to put it
Adam Soroka: also be sure to include the horizontal bits too
Rob Sanderson: do we need a resolution?
Ivan Herman: just leave it open
Benjamin Young: can we take an action?
Gregg Kellogg: lets use some issue(s) to track ideas
Action #2: create issues for ContextList and DocumentLoader editorial changes (Rob Sanderson)
3.3. unoverrideable (or “sealed”) contexts
Rob Sanderson: link: https://github.com/w3c/json-ld-syntax/issues/20
Rob Sanderson: back to issue 20
… sealed contexts
… introduced in the CG by dave longley in June
Gregg Kellogg: so the idea is that there is something in the context that says if a given context refers to something else that tries to override a known term
… that there is a way to prevent the overriding (but w/o stopping processing)
… one example, a context with a list of other contexts the others can’t overrride
… how do we cause overriding to take place when you absolutely need to? context:null?
… scoped contexts should do the same thing
Ivan Herman: who would freeze the context?
Gregg Kellogg: author
Adam Soroka: perhaps to avoid caching problems?
Gregg Kellogg: but that’s not what it means
… but they do work together
Ivan Herman: there is somethign in my forgotten past that is making me unhappy about this
Adam Soroka: the use case that gkellogg mentioned is a good one
Ivan Herman: once I have nulled things, I can do what I want anyway
… for an example, when there is something mising in schema.org we eitehr come up with a our own term or we use a context to override
… but it’s not a very elegant solution, but it works and solves a need.
Gregg Kellogg: but you could just as well solve it by creating a new term but then schema.org isn’t going to recognize it
Benjamin Young: the sealed context goes against compaction
Gregg Kellogg: no it does not.
Benjamin Young: if you look on the playground this isn’t what I see…so maybe there is a difference between spec and implementation?
Gregg Kellogg: the sealed context is relevant in expansion
… by treating the input as always being expanded
… also algos start from expanded form
Rob Sanderson: I want to question the nuclear option
… if that is allowed, then why would that be useful instead of just specifying new context?
Gregg Kellogg: the reason to do it is that you don’t want to inherit things outside your scope
Rob Sanderson: if have a context that wipes out all the previous ones, then everything form that point is known
Gregg Kellogg: you can also set a single term to null
… as a way to undefine a single tyerm
Adam Soroka: given ivan’s example, where I want to change the value type of a known term…that is quite different from completely overriding a term with completely different semantics.
… we should be sure that all use cases are addressed with the same solution
Gregg Kellogg: in the case of schema.org, we want to make it legimate to avoid loading contexts
Dan Brickley: and be sure to differentiate search from other cases
Gregg Kellogg: verifyable claims wants sealed, but we’re not sure why
Benjamin Young: VC relies on LD signatures, and sealing ensures that the output graph (which is what is signed) is always the same.
Gregg Kellogg: but if they were defined in the scope of the client
Ivan Herman: which is why nuclear works
… they need a controlled environment and so they want to ensure that any previous state doesn’t impact the sig contexts
Benjamin Young: but their user data goes lower and the sig is the higher tree
… .they are the first context not the last
Dan Brickley: w.r.t. signature over this, there are some old experiments around signing schemas (links near https://www.w3.org/2011/rdf-wg/wiki/TF-Graphs-UC/FOAF_Use_Case#Signed_RDF somewhere)
Adam Soroka: is this a real concern?
Adam Soroka: ajs6f: actually thinks this is real, just ? whether sealed contexts can fix it
Gregg Kellogg: yes, if you could somehow (maliciously) change the context earlier on, then all breaks down
Dan Brickley: this could introduce some serious problems with nuclear overriding needed things
Dan Brickley: not feeling it. this doesn’t seem to align with the rest of the web and its use of JSON-LD
Ivan Herman: this seems to be useful from the security/sig concerns
Benjamin Young: but if you expanded everything then you wouldn’t have the problem
Ivan Herman: but can I add signatures to a context? verifying authenticity of a context
Action #3: create context signature issue, related to #20 (Rob Sanderson)
Gregg Kellogg: if we have a signature on a given context, has integrity, etc. - but that’s different from a new context
Adam Soroka: both could be valid or one/both could be malicious
Ivan Herman: the whole thing is a security thing that has to be built on top of JSON-LD
Benjamin Young: since the people that wanted it aren;’t here, so let’s not touch it right now. If we get it wrong, it will make things worse
… we need some use cases
Leonard Rosenthol: If we’re worried about signing things, we should do it consistently.
… there’s a mechanism in HTML to verify a linked resource - sub resource integrity - can we apply that there as well?
Benjamin Young: https://www.w3.org/TR/SRI/
Benjamin Young: https://w3c-dvcg.github.io/http-signatures/
Dan Brickley: (there’s some relationship between the subresource integrity and webpackging . that I don’t totally understand)
Leonard Rosenthol: also resource integrity might be useful here as well
Benjamin Young: https://github.com/WICG/webpackage
Dan Brickley: what happens when we come to a future TPAC and everyone has their own signatures :(
… integrity and signatures isn’t the scope of this group
Gregg Kellogg: this seems like something for HTML or Web Platform
Benjamin Young: put some links in the chat for various other options
… LD signatures might be part of DID but they are debating it there
… so let’s kick the can over to them and see what happens.
Ivan Herman: that being said, SRI is so simple that we might consider how we could use it for contexts.
… it certainly doesn’t solve anything, but it would be useful
Benjamin Young: the may be some issues about exactly what are you validating/checking.
… there are some known issues, but we may be able to work around those with some specific info about JSON-LD
Leonard Rosenthol: SRI is just hashing a “binary stream”
Leonard Rosenthol: we need an action item to create an issue around use of SRI
Gregg Kellogg: thinks we discussed this last TPAC
Ivan Herman: how would the syntax work?
Gregg Kellogg: yes, we’d have to work on that
Proposed resolution: Defer #20 for TAG review for cross-specification issues of signing on the web ; confer with security regarding SRI and credentials regarding LDS vs byte hashing (Rob Sanderson)
Leonard Rosenthol: +1 to proposal
Ivan Herman: +1
Rob Sanderson: +1
Benjamin Young: +1
Adam Soroka: +1
Gregg Kellogg: +1
Benjamin Young: https://www.w3.org/TR/2014/WD-SRI-20140318/
Benjamin Young: wanted to point out that RFC as well which is much better than the current SHA expression as used in SRI
… not our problem and lots of solutions
Harold Solbrig: +1
Simon Steyskal: +1
Resolution #2: Defer #20 for TAG review for cross-specification issues of signing on the web ; confer with security regarding SRI and credentials regarding LDS vs byte hashing
3.4. What is ‘base’ for an embedded json-ld?
Rob Sanderson: link: https://github.com/w3c/json-ld-syntax/issues/23
Rob Sanderson: 23 is a long thread…
Leonard Rosenthol: ivan : the reason it is relevant, so far embedded JSON in HTML is not standardized (non-normative section)
Rob Sanderson: one thing to do is to make it nomative in HTML. Great!
… but what is the base URL for data in the JSON(-LD) portion
Rob Sanderson: Previous resolution was: https://github.com/w3c/json-ld-syntax/issues/23#issuecomment-424764186
Rob Sanderson: simple solution is to say its the enclosing document
… but the clean solution would be related to the base element from the HTML (“ base URL of the HTML containing the script element must be used as a base for the JSON-LD snippet”)
… we don’t care how it gets established, just that it is coming from the DOM (which is well defined)
… downside is that a JSON-LD would also require an HTML parser
… do we do the simple version, which is not complete in my view
… or do we do the full blown embedded version?
Gregg Kellogg: wasn’t there some conflict in the HTML spec about data blocks?
Dan Brickley: (allusion to https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags )
Ivan Herman: even if there is, it’s up to them to sort it out
… whatever the base value in the DOM tells me what is the base value of the contained content
… danbri since you are one of the biggest users of the embedded use case
… we’d like your help to help us decide
Dan Brickley: if you can help me find some simple cases, I’d be happy to run this through our parsers
… and see how that goes
… we run ours through a headless browser for a few seconds
… but that doesn’t necessarily mean we’re using the DOM
Leonard Rosenthol: so, there’s using JSON-LD in any embedded context
… and there’s also the specific case of embedding in PDFs
… and the same issue applies
… so I think we need a consistent mechanism
… so we need a way to understand how all base URIs are calculated
… regardless of embedding
… that said, I think ivan’s suggestion that getting the base from the DOM is reasonable
Benjamin Young: danbri I have run a few examples but the base isn’t taken into account by the tools that I have available
Gregg Kellogg: is that true for RDFa?
Benjamin Young: I don’t believe so
… if I start with an HTML page, put in some JSON, and then try to resolve stuff and the base is wrong
… but if we change it automatically, that could be a compatibility issue
… I did some experiments to see how it would work
… it is a change to the past and complicates the present.
Benjamin Young: we also have the email use case with a different base
… MSFT has their own support in outlook for JSON-LD
Dan Brickley: are we aware of other uses cases for base in script?
Gregg Kellogg: turtle is defined as allowing to be expressed there
… but it does not state anything about base
Ivan Herman: turtle is as underdefined as JSON-LD is for now
… we are in a sense the only one I know of where this issue arrises within a recommendation track manner
Leonard Rosenthol: what about CSS?
… that calculates base using the DOM base
Ivan Herman: true, but I have another
… gkellogg help me out
… isn’t it correct that in RDFa that the base effects the results?
Gregg Kellogg: yes. or xml:base
… but that is specifically within the HTML context
… and microdata also
Ivan Herman: but that also means that if I look at it from schema.org
… then the RDFa and the JSON-LD can’t inherit the same base
Gregg Kellogg: but it’d be the same situation with RDF/XML, etc.
… and I’d wager that schema.org/search-engines are not in a place to support any of this right now
Gregg Kellogg: I’d like to bring up @lang
… the element inherits a language
… from the surrounding document
… are their cases where that effects the CSS or other inlined non-markup content?
… this is where we would say RDF syntax is clear
… if it has a language, it’s a string
… and gets @lang
… but that is explicitly running within the DOM
Ivan Herman: how these inherit is clearly defined in RDFa
… here it’s not defined and unclear
… but to the user there may be expectations about inheriting from the context
Gregg Kellogg: I guess my feeling is different
… especially if you’re using some sort of HTML editor
… putting lang=”jp” or whatever on the document does effect all the content in it–and would effect RDFa
… but to put a code block in there, than now is effected by the language properties by where its embedded
… to have that behavior change
… if I process it raw, but now if I put it in an HTML document and process it there
… that would not meet peoples expectations
Benjamin Young: people’s expectations…at Wiley our goal is to create a single merged graph from a single publication
… in none of this work, did anyone express a desire or interest to deal with either base or lang inheritance
… because it is scary!! It is so unclear as to the rules esp. when combining stuff from all other the place
… and no one would be happy!
… in my CMS life, things come from all other place and get reused in various uses which may also be multiple languages and expectations aren;’t clear
… the way things work now is happily stupid
Rob Sanderson: I wanted to talk about “least surprise”
… first danbri, I wanted to talk about microdata, rdfa, and json-ld in the same page
Dan Brickley: I wish we did
Rob Sanderson: so let’s assume you do, which of these proposals would you want?
… do you want base/lang/etc mixed across all of them?
… or is it type-local where the JSON-LD document must restate its own values for those?
Dan Brickley: so, I did a quick look around stackoverflow, etc.
… well, they never say URI, but we know they mean it deep in their heart
… so, if we were a bit more careful about specifying some of this it would be clearer for them
… people do have a trend toward wanting to type less to get the same result
… just as we were trying to get the near-twins microdata and RDFa to get along
… then JSON-LD came along
… one of the reasons publishers like JSON-LD is that you can inject different bits from everywhere
… and it’s kinda carnage
… and a really fabulous HTML page written by the indieweb community could be amazing
… and then you could inject some JSON-LD from whereever–however horible
… but it wouldn’t effect your beautiful HTML
… and that leans it toward being a self-contained block
… if you want to be explicit, put full URLs in
… which you may at least want to do when your testing
Ivan Herman: that’s worse than what we propose
… if the block is resolved against the URL of the document or from the base of a containing document
Leonard Rosenthol: for PDF in terms of how things work today
… PDF derives it from those of HTML
… if it’s from a file system, it the URI from the file system
… if it’s from the Web, then it’s that URL
… from which it was served
… if you fetched it with a script, then it’s from that context
… and it has it’s own base tag
… which works just as HTML’s
… embedded JSON-LD acts like a linked resource
… so the base is the base of that thing–we don’t inherit it from the parent document
… not saying that’s right or wrong
… it’s just how it is today
Benjamin Young: There is no precedence for anything in HTML (e.g s script block) inheriting from the larger doc
… it may be interesting to explore that, but it’s not a JSON-LD thing, it’s a markup thing.
… HTML says, “Anything in a data block is just text content”
Dan Brickley: Bound to mention WebComponents and custom elements here. In the same territory.
… JSON-LD could very well show up inside those guys
… suspicion is that they’re using some kind of inheritance. If we don’t do that here, convergence in time gets harder
Benjamin Young: WebComponents are different, the base is about “where is this running?” not “where is this downloaded?”
… they use the current page, now.
… JS and HTML are really a unit now. It’s a different relationship than between HTML and JSON-LD
Ivan Herman: Perhaps this gets kicked up to TAG?
… this is a Web Architecture issue.
Benjamin Young: Okay to punt!
Gregg Kellogg: That’s what they are for.
Dan Brickley: see https://developers.google.com/web/updates/2015/03/creating-semantic-sites-with-web-components-and-jsonld for some experiments from Google a few years back, exploring some of the schema/jsonld/webcomponents possibilities.
Benjamin Young: https://www.w3.org/TR/html5/semantics-scripting.html#data-block
Ivan Herman: We have to answer this qustion to make embedded JSON-LD normative
Proposed resolution: Defer and kick up to the TAG for comment as to a view from the web architecture perspective (Rob Sanderson)
Ivan Herman: and the HTML spec doesn’t answer this qquestion
Adam Soroka: +1
Rob Sanderson: +1
Simon Steyskal: +1
Gregg Kellogg: +1
Leonard Rosenthol: +1
Ivan Herman: +1
Harold Solbrig: +1
Benjamin Young: +1!!!
Resolution #3: Defer and kick up to the TAG for comment as to a view from the web architecture perspective
Ivan Herman: if a feature that wasn’t normative becomes normative, then yes, things will change
Benjamin Young: but JSON-LD’s greatest usage right nowis the non-normative use!
… and we shouldn’t break what’s running.
… I want this to make more sense, but this is too dramatic a change too quickly.
Action #4: reach out to the TAG re #20 (Rob Sanderson)
Action #5: reach out to the TAG re #20 (Benjamin Young)
Dan Brickley: so https://www.w3.org/TR/json-ld/#embedding-json-ld-in-html-documents is basically 4 lines of spec plus an example. We got a lot of deployment out of it, at least…
Ivan Herman: I will not object to closing this, but the TAG discussion should be recorded as part of this discussion
Dan Brickley: The problems at Google with this kind of thing are dwarfed by the basic data quality probems
Dan Brickley: parting request: could someone help me get a minimal pair of examples for the HTML base (and language) decision, that I can share with Google colleagues
3.5. Sealed contexts (redux, with people from the Verifiable Claims WG joining)
Rob Sanderson: Link: https://github.com/w3c/json-ld-syntax/issues/20
Dave Longley: we wanted to talk about onboarding allies from the JSON world
… a number of W3C specs have adopted JSON-LD, to get extensibility and interoperability
… however, a number of non-RDF people are only going to use this data as plain JSON
… not apply any JSON-LD algorithm
… Most of those standards require that their context is the first one in the @context list;
… and have the rule that terms from their context should not be overridden.
… Result: plain JSON interpretation will differ from JSON-LD interpretation.
Rob Sanderson: if the JSON-only processor does not use the context at all, where’s the risk?
Dave Longley: E.g. Verifiable Credentials defines a ‘description’ field;
… if someone uses an additional context redefining ‘definition’ to mean something else,
… a pure-JSON tool will still interpret that as the VC description field.
… The ‘sealed context’ feature is required to make JSON-LD-based formats robust,
… reducing the gap btw JSON and JSON-LD.
… JSON people understand how JSON-LD can help interoperability,
… rather than having anyone defining their own schema,
… but they don’t have an RDF background.
Ivan Herman: the idea of a ‘sealed context’ is to prevent people from overriding terms in the first context.
… Wouldn’t it break the expectation of someone publishing their own data with their own vocabulary?
Dave Longley: this only applies in the case you are using a *list
Pierre-Antoine Champin: s/*list/array of contexts.
Christopher Allen: In constructing VC examples or DID examples,
… I ran in problem linked to the underlying graph model, that only Dave or Manu could explain.
… In VC, I want to be able to put some metadata of my own,
… something that’s not standardized or even ready for standardization,
… how do we put it in a way that we are not messing up other things?
Dave Longley: many people don’t get that curly braces in JSON-LD mean that you are talking about a node,
… and attributes are about the links in this graph;
… those people tend to produce data that make no sense for people who understand those principles.
Benjamin Young: I came to JSON-LD from an RDF-aware background
… I want to make things clearer,
… but not at the price of hiding completely the graph model from Web developers.
Adam Soroka: +1 to bigbluehat
Benjamin Young: Anecdote: in some plain-JSON formats, the same attribute ‘name’ does not expect the same type at different levels of the tree. You don’t have that in JSON-LD.
Christopher Allen: We are at a jey point where a lot of people are using JSON-LD.
… We don’t want them to decide that they will just do JSON and forget about the LD.
Gregg Kellogg: would the solution be a set of best practices?
Ivan Herman: Question: who is expected to seal a context? The author of the context or the author of the data?
Dave Longley: This is an open question.
… The notion of sealed context is one way to hide the graph model .
… Best practices, and anything that makes people aware that there is something beyond JSON, is a non-starter.
Benjamin Young: for you, sealed context only apply for arrays of contexts?
Dave Longley: yes, in all the use cases that I know of.
… If someone publishes a new context which *extends another one,
Pierre-Antoine Champin: s/extends/extends*/
Dave Longley: they can decide to override terms of the inherited context.
… As this would be a totally differen thing from the POW of pure-JSON.
Ivan Herman: Would it make sense to introduce a JSON datatype?
… I could put a bunch of JSON in the JSON-LD, that would be interpreted as an RDF literal with the JSON datatype.
Benjamin Young: but then you would have problems for signing, because you couldn’t determine the exact value of that literal (re. new lines, spaces…).
Dave Longley: yes, that raises a number of other issues.
Benjamin Young: One use case in VC is to be able to express an inner graph, with its own context,
… without compromising the outer graph with the VC metadata.
… In the old days, we would have used namespaces.
… Why not use “vc:” in front of the terms you want to keep uncompromised?
Dave Longley: The prefix wouldn’t survive some JSON-LD applications,
… and would produce something different from the POV of plain-JSON.
Rob Sanderson: Is it the expension that’s the issue? Or the compaction as well?
… Is it only about the label or the rest of the definition (e.g. “@type”: “@list”)?
Dave Longley: the full term definition has to be a constant.
Ivan Herman: in the first round, let’s stick to sealing contexts, not individual term definitions
… What we are trying to do is to add a “modifier” on the “@context” link.
Adam Soroka: The notion of wrapping is important in this use case.
… Is “sealing” not a special case of prioritization,
… which might have more complex requirements?
Dave Longley: ActivityPub also have requirements for priority.
… Their answer is “if you nee priorities, use full-fledged JSON-LD”
Dave Longley: re. WC, we could ask people to include their own @context,
… but that could not work in all cases.
Benjamin Young: pchampin suggested earlier to required that the sealed context be in the end,
… so it will effectively not be overridden with the current rules.
Gregg Kellogg: yes, but putting the extension before the extended context would be odd
Rob Sanderson: if terms are marked sealed, and the extension redefines that term,
… the overridden term will not be sealed, which can be detected by a validator
… This could mitigate the problem.
Ivan Herman: what if we have two sealed contexts in the list?
Dave Longley: then we would also need order priority…
Pierre-Antoine Champin: seems to me we’re discussing two different things with multiple contexts
… which are aware of the sealed context
… and it would therefore be carefully designed to not conflict
… the other scenario is a “wrapping” one
… where the contained statements are expected to not match the wrapper
Benjamin Young: regarding the wrapping scenario, JSON-LD has graph containers.
Dave Longley: VC is using that.
Benjamin Young: could we solve the problem by adding some mechanism at the graph containers level.
Dave Longley: that would only work in named graph. That might solve my problem, but may be not others’ (activity streams).
Benjamin Young: following pchampin’s idea that there are two different categories, may be two different solutions are the way to go.
Benjamin Young: https://github.com/w3c/json-ld-syntax/issues/87
Ivan Herman: to wrap up, the options that we considered are
… 1/ modify the semantics of graph containers
… 2/ having a “sealed”: true attribute in the context
… 3/ having a “sealed”: true within term definitions
… 4/ having the equivalent of a “rel=” attribute on the “@context” link
Gregg Kellogg: when you seal, are you sealing the local context or the active context?
Gregg Kellogg: seal and integrity should be treated separately
Benjamin Young: why not just go for putting the sealed context in the end?
Gregg Kellogg: because the following contexts will usually rely on the definitions in the first (sealed) context.
… e.g. after importing the schema.org context, I know I can use the xsd prefix.
Rob Sanderson: ref: https://iiif.io/api/presentation/3.0/#45-linked-data-context-and-extensions
4. Joing meeting with WOT WG
Ivan Herman: No separate minutes taken, should be available from the WOT WG
Rob Sanderson: It seems like we should raise the
rdf:HTMLincompatibility to a higher power
Rob Sanderson: As
... </span>"should work, IMO
Action #6: raise a meta issue about BIDI and langString / HTML (Rob Sanderson)
- Resolution #1: Mark bnode as property as a Feature At-Risk for the next WD, with a view towards marking as Obsolete with a warning in 1.1
- Resolution #2: Defer #20 for TAG review for cross-specification issues of signing on the web ; confer with security regarding SRI and credentials regarding LDS vs byte hashing
- Resolution #3: Defer and kick up to the TAG for comment as to a view from the web architecture perspective
6. Action Items
- Action #1: add privacy and horizontal review to guiding principles (Rob Sanderson)
- Action #2: create issues for ContextList and DocumentLoader editorial changes (Rob Sanderson)
- Action #3: create context signature issue, related to #20 (Rob Sanderson)
- Action #4: reach out to the TAG re #20 (Rob Sanderson)
- Action #5: reach out to the TAG re #20 (Benjamin Young)
- Action #6: raise a meta issue about BIDI and langString / HTML (Rob Sanderson)