JSON-LD F2F Meeting, 2nd day — Minutes

Date: 2018-10-26

See also the Agenda and the IRC Log

Attendees

Present: Adam Soroka, Rob Sanderson, Benjamin Young, Gregg Kellogg, Simon Steyskal, Ivan Herman, Harold Solbrig, Dan Brickley

Regrets:

Guests: Eric Prud’hommeaux, Jean-Yves Rossi, Antoine Roulin, Luc Audrain, Dan Brickley, Hadley Beeman

Chair: Rob Sanderson, Benjamin Young

Scribe(s): Rob Sanderson, Adam Soroka, Benjamin Young

Content:


1. Issues

1.1. blank node ids for graphs

link: https://github.com/w3c/json-ld-api/issues/26

Benjamin Young: white board from this morning: white board content

Eric Prud’hommeaux: the issue to talk about api#26 …

Eric Prud’hommeaux: … Gregg wrote it up. The issue is for @container: @graph — this property creates a blank node that ends up being a graph name for the embedded triples.
… so if we look at something without @graph, and we have a tree like […], we could access it like var.input.value
… In triples it turns into a graph b0 foo:input b1 ; b1 foo:value "x"
… We could get that with a graph path similar to the code
… Which is the same as the sparql ?a foo:input ?b ; ?b foo:value ?v
… In my proposal, If you create a graph it still ends up traversable
… but currently we end up with b0 input b1 ; b2:value "x" — it’s disconnected
… The things we normally use to get around don’t work

Adam Soroka: We need to make the bnodes the same?

Eric Prud’hommeaux: Yes. We would otherwise need higher level logic to merge them
… but we could just re-use the bnode
… this was used for verifiable claims. The bit that was in the input was only a tiny bit of stuff. So warlking around in a large graph looking for a triple was easy as the wrapper was very small, but that isn’t always the case
… you would also otherwise need application level logic, or you can’t query
… to traverse the data, you need to know it’s valid

Adam Soroka: And hence ShEX issues

Eric Prud’hommeaux: Yes, or SPARQL queries.

Eric Prud’hommeaux: The downside is that the blank node is the name of the graph, and a node in the outer graph
… this could be the default behavior. There might be other ways to connect them in the future
… need to get the same level of access in the RDF as in the JSON
… one more thing, the way to connect the nodes with more configuration and more of a pain, one construct is to say that b1 has a focus of b2
… that has a lot of reuse, but would mean writing into the rdf namespace
… e.g. in clinical data. Here’s a pairing of a graph name and a focus node

Eric Prud’hommeaux: Objection is typically that if there’s a property, is it about the graph or about the node

Gregg Kellogg: implies the name of the graph has meaning

Eric Prud’hommeaux: Yes, but it’s a blank node
… but what happens if it has @id? I think the answer there is both a node name and a graph name

Rob Sanderson: What’s the range of foo:input?

Eric Prud’hommeaux: union of named graph and the graph node class
… which doesn’t bother logicians, but does bug engineers

Gregg Kellogg: Not the use of every graph container
… where the graph appears with the value of a statement and doesn’t have their own declarative statement
… should we just use the graph name as the subject in that case
… which would allow for the follow your nose

Eric Prud’hommeaux: everything inside the input, all of those triples have the same subject

Adam Soroka: Trying to think of situations where I would want either way

Gregg Kellogg: You could just declare the subject

Benjamin Young: That maps cleanly to the expection when used — once typed out I expect it to work like that
… to end up detached would be bad.
… it breaks round tripping.
… there’s times when you use @graph like a packaging format
… here’s a bundle of stuff

Adam Soroka: as Gregg says in that case put in an explicit subject

Benjamin Young: In @container: @graph the implication is that they’re connected
… once you lose that, you’d just stay in JSON

Rob Sanderson: so the proposal is that for @container: @graph, when the subject is not explicitly set, then the default is to reuse the blank node

Ivan Herman: I have a more general uncomfortable feeling. We introduce another micro-rule. They all make sense by themselves, but when you pile them up you get a language that’s diifficult to understand
… we don’t take a simpler approach

Eric Prud’hommeaux: It seems the argument is more persuasive in the other direction

Ivan Herman: I don’t go into the particular issue, but that we just got as a proposal — if this and that and that, then …
… this is the proposal, it’s not a straightforward thing
… this is what I don’t like
… we pile up lots of these things and end up very complex.
… We should talk about URI resolution. e.g. with vocab and this and that. 90% of the people on the call didn’t follow what Gregg was explaining
… not anything wrong with what Gregg says, the reasoning steps are all okay by themselves, it’s the overall thing that becomes complicated
… the containers in 1.0 were used only for one thing, now we add a lot more
… we continue to do that. Don’t want to get to the technical details for this issue, just at the overall pattern

Eric Prud’hommeaux: I think the argument here is that the status quo is harder to explain, more surprising

Benjamin Young: Current situation is accidental.
… this proposal seems more natural

Adam Soroka: there’s some tension between avoiding surprise and keeping things easy to learn. In this case there is complexity, but its less surprising

Ivan Herman: As a zero level question, why do we need the container graph?
… Some community needs something, so we add a new quirk

Benjamin Young: @graph gets everyone’s hackles up. So we got @container: @graph
… from a JSON developer’s perspective, they need tools to get from the tree to the graph structure
… need to not annoy both groups
… this one to me resolves an issue from the RDF side

Benjamin Young: https://w3c.github.io/json-ld-syntax/#ex-85-implicitly-named-graph

Simon Steyskal: Going through the spec, in example 85, for graph containers. I wasn’t sure about the original sample
… it shows two graph objects, but if you look at the statements in the playground, it’s not what the original version has. They don’t match the expanded version in the spec
… the playground already does this reuse

Adam Soroka: People might be reliant on the feature?

Rob Sanderson: what does the spec say now?

Gregg Kellogg: it makes you create a new blank node

Ivan Herman: I repeat what I said in the issue comments :-( From a point of view of consistent view of how JSON-LD behaves, what is done today is the right thing to do
… the various things that a container contains is pieces of graphs. Inside is different nodes. So reusing the same bnode internally and for the graph, I understand it’s handy, but it does not fit the model for the JSON-LD world
… we could hack it around with a micro-rule, but from a JSON-LD consistency PoV it’s not right
… not a formal objection but I disagree

Adam Soroka: There’s a lot of opportunities — there’s other ways to do it

Eric Prud’hommeaux: If you could parameterize the behavior and let the user decide whether they get this behavior or the other

Ivan Herman: We have the syntax in the example. We can name the bnode explicitly

Gregg Kellogg: what does container: graph mean? You’re putting a box around some of the information so that it’s part of a separate graph.
… but it means that input has a value that is a named graph.

Ivan Herman: That’s what it means

Adam Soroka: You’re right, but eric is not asking to uniformly conflate them

Ivan Herman: then a separate syntax?
… we would have two types of containers, one graph container behaves as it should, and another that does something extra
… it pulls in the name into the internals of the graph

Adam Soroka: It’s a very common idiom

Ivan Herman: I understand. The problem is that we always follow perfectly valid rules, but we need to look at the overall result
… for many people JSON-LD is very scary because it’s so complicated

Eric Prud’hommeaux: But has better adoption

Ivan Herman: we do something very strange — and maybe we need to acknowledge it — we work with people from all corners of JSON usage and try to push them into the linked data world
… so you might lose the LD people as JSON-LD becomes an incredible mess
… you have people working with patterns of usage, but if I come from the LD world and just want to use JSON-LD as a serialization, and I know what I’m doing, then for me the usage is very complicated
… I don’t think in the fixed patterns, I just want to put a graph and get unexpected results

Eric Prud’hommeaux: In this case you have a disconnected graph, and you can do that with the expanded form

Rob Sanderson: [… more similar discussion …]

Rob Sanderson: So the alternate would be to have an explicit link. Would that be automatic, or put into the data?

Gregg Kellogg: I don’t think you can have the named graph in the source and the graph?
… within the named graph you have a triple whose subject is the graph
… can create a statement with a blank node, and the meaning lies with the predicate
… would not want to automatically introduce it

Eric Prud’hommeaux: When you have a graph that has two of those, what does it mean?
… it’s unattractive, and I’ve done it but was an interim measure

Gregg Kellogg: if we keep the status quo, and the name of the graph is not visible, the implications for writing a shape for trying to match things

Eric Prud’hommeaux: THere’s a step where you collect things. But when you get to the internal graph you’ve already collected it. You’d need to do cycles of gathering and validating
… without some predictable connection, there isn’t a way to do it without a procedural language

Adam Soroka: You have to assume validity in order to validate it

Eric Prud’hommeaux: Can do various things

Adam Soroka: But it’s application level knowledge

Eric Prud’hommeaux: Yes, you’d have to customize a lot of stuff
… here you do an unbound sparql query
… which is considerably more expensive

Adam Soroka: you still might need to apply application knowledge

Eric Prud’hommeaux: Could find nodes that don’t have inbound links, but you can’t assume that’s always the case with inverse properties etc

Rob Sanderson: (restates problem)

Ivan Herman: It comes from @container: @graph. As a value, I have an object whose keys refer to something specific
… a language refers to the language of the string
… but the graphs are very different beasts. They must have an identifier, or we generate them one
… it’s different from language, so maybe the container model is not fit for that purpose
… we must talk about identifiers and how they’re used elsewhere
… when I use the container for a language, it’s simple
… it’s by the natural language
… container is a way to categorize certain things, and they become keys
… I have a bunch of strings with a category, the language
… I create an object that uses the category as a term

Adam Soroka: Containers as maps

Gregg Kellogg: Not all though

Ivan Herman: Yes, @container: @list is a very different animal

Eric Prud’hommeaux: Not that it’s a graph, just the container
… if you want to build a named graph then have a different construct

Gregg Kellogg: Can have a map of graphs with @id
… an array of keys

Ivan Herman: should take a step back to look at containers and mapping
… is it possible to have a clearer model and separate the two things
… and then come back to it if there’s a more natural way to model it
… if it’s a blank node, then I can assign and reuse
… user has the choice to reuse.

Gregg Kellogg: If you use a graph id map, then they have to name them explicitly
… VC and WoT are in a similar situation, I think

Ivan Herman: Yes, but do we now add another special quirk??

Adam Soroka: Depends how many people are interested in it

Ivan Herman: Then we need a template language
… Propose to leave this alone for a little and look at containers in general

Adam Soroka: And “path” is in here too now (see discussions with WoT), for things people are asking for

Ivan Herman: Could use different term, they’re not containers like list or set

Gregg Kellogg: Could introduce @map

Ivan Herman: and then add in obsolete terms for indexing

Adam Soroka: Seeing patterns, and then clarifying how to get them into the syntax
… hence microrules

Gregg Kellogg: Also about the interpretation of the value space

Rob Sanderson: Exactly equivalent to https://github.com/w3c/json-ld-syntax/issues/77

Gregg Kellogg: raising warnings :( and makes algorithms harder

Rob Sanderson: priority of constituencies puts algorithms very close to the bottom

Ivan Herman: Can have a raise warning or not flag in the API
… algorithms will be slightly more complicated, but only affect 5 or 6 people

Adam Soroka: And we probably know most of them

Eric Prud’hommeaux: Regardless of how you construct the syntax, need to deal with nesting in JSON

Gregg Kellogg: There’s the expectation of connectivity

Eric Prud’hommeaux: Relatively simplisitic user, but that’s typical. If it’s more nuanced, I want the default to not produce pathological graphs

Gregg Kellogg: If we created a new @map thing and put graph / id maps in there, so would have a reduced use case for @container, and we’re back to the same issue
… container is a graph, and you’re in an implicitly named graph. Now where are the rest of the things?

Adam Soroka: syntactic mechanism. If containers were minimized, could be nicer. If we could add metadata to containers, we could maybe add the information. But would need very strong notion of containers

Eric Prud’hommeaux: That’s a step in the right direction
… trying to deal with existing sem web … two camps. People who abuse the node to be the graph name. And then there’s people who keep them separate.
… but theres a mechanism to connect them
… trick is normally HTTP fragments. Use the # and then HTTP connects them
… those two camps are not going to come together
… at least half the people are going to be miffed
… so putting in controls will help

Ivan Herman: more inclined to look at something more complicated, but long term more powerful, and accept that we need a transformation / template language

Ivan Herman: We see a user community that uses a template as that’s how they think. We try to come up with syntactic quirks so the templates fit in the model
… that’s where we get in trouble. If we had some transformation language, it could help.
… not sure it’s realistic, and not familiar with framing details
… can that be added to framing model? Not a rec, so don’t have backward compatibility restriction
… if we do something there, that would mean a cleaner separation
… if this is taken up by a frame and uses the same bnode. Can express it in JSON-LD. It’s all doable already.

Eric Prud’hommeaux: have about 100 hours thinking on this in ShEX. Both dealing with a case where there’s an algorithmic mapping between a graph node and a node in the graph
… need to get from one to the other
… expressivity we discovered we needed was at a minimum to chop off or add a hash based identifier
… for the range 14 folks
… ability to say it’s the same
… and then as you work down into the people who have pipeline techniques, you end up with regexps
… that lets you use node identifiers that are relative to the base
… two nodes that are different but related
… regexs look at the graph labels
… to deal with existing data
… question is how much you want RDF data to drive this.

Adam Soroka: And the other extreme is JSON devs who are told they have to do something. Some things don’t make any sense at one or the other end of the spectrum

Eric’s examples: EricP's examples

Proposed resolution: add a feature at risk that the implicitly identified graphs will share the bnode with the unidentified member of the graph, on the grounds that the user community most in need of this would expect it, and the community that would be horrified by it better understands the solution of explicit naming (Rob Sanderson)

Rob Sanderson: +1

Ivan Herman: +0.0000001

Simon Steyskal: +1

Gregg Kellogg: +1

Harold Solbrig: +1

Resolution #1: add a feature at risk that the implicitly identified graphs will share the bnode with the unidentified member of the graph, on the grounds that the user community most in need of this would expect it, and the community that would be horrified by it better understands the solution of explicit naming

Rob Sanderson: Assuming +1s from Adam and Benjamin

Adam Soroka: +1

2. Disambiguate uses of @type #77

Simon Steyskal: link: https://github.com/w3c/json-ld-syntax/issues/77

Ivan Herman: can @container be used with @type to separate string, dates, etc?
… if so that is again an argument to deconflate all the meanings of @container

Gregg Kellogg: might make sense but isn’t very useful
… the datatype for all variations would be rdf:HTML, so you wouldn’t be able to also work by languages

Ivan Herman: right. but even gkellogg had to think about whether you can use @type with @container, which tells us that we need to simplify
… a container-mapping for @dataype — it’s not clear whether it would be useful

Rob Sanderson: if rdf:Literal is the datatype…
… if you had some HTML, some strings, maybe you can sort it out then

Ivan Herman: so mapping via datatype is questionable

Gregg Kellogg: I understand the proposal to be a new keyword @datatype to be used in contexts to signal datatypes, as @type does today
… letting us mark that usage of @type as obsolete

Ivan Herman: even today, you could use it via aliasing.

Gregg Kellogg: compaction wouldn’t do what you want if you did that

Ivan Herman: we have an example from Publishing WG where the alias would be useful, and round tripping not necessary

Proposed resolution: add @datatype for use with value nodes that would be a synonym only in those situations for @type (Rob Sanderson)

Rob Sanderson: +1

Ivan Herman: +1

Adam Soroka: +1

Harold Solbrig: +0

Simon Steyskal: +1

Gregg Kellogg: +0.5

Adam Soroka: Assuming bigbluehat is +1

Resolution #2: add @datatype for use with value nodes that would be a synonym only in those situations for @type

3. allow relative IRIs for @vocab

Simon Steyskal: link: https://github.com/w3c/json-ld-syntax/issues/72

Gregg Kellogg: in JSON-LD, vocab-relative and document-relative IRIs are resolved differently. we’ve already looked at this problem
… and offered the ability to set @vocab=""
… which allows vocab to be resolved against the document base
… motivated at least in part because in other RDF formats, that distinction doesn’t exist
… so there was a parity issue against other serializations
… this issue goes further, and lets @vocab get set to any relative URI, which would then be evaluated against the document base
… the proposal includes that if a @vocab is already set and a new relative @vocab comes along, one simply string-appends the new one to the old one

Rob Sanderson: in the case of a base that came from HTTP with a # on the end, that would get lost

Gregg Kellogg: this also addresses the problem that Manu raised in the context of blank-node-properties.

Rob Sanderson: if you set vocab to ../# and you had example.org/ns then you get example.org/ns../#

Ivan Herman: as an editorial matter we must make very clear that this is string concatenation, not IRI concatenation

rob: are there good rules for determining relative vs. absolute IRIs?

Ivan Herman: Look for the scheme

Ivan Herman: I am almost sure that the URI spec defines that very clearly

Rob Sanderson: but this could be a security problem if a malicious actor sets a CURI prefix of “http” to some malicious address

Ivan Herman: also the same thing with base

Gregg Kellogg: We can’t really know

Rob Sanderson: we can just advise people of the security concerns

Action #1: Rob Sanderson to make a security consideration issue re relative IRI concatenation

Ivan Herman: do we check for “acceptable” scheme?
… what about Javascript URIs (bookmarklets)

Rob Sanderson: “@vocab”: “javascript:”

Action #2: Rob Sanderson to create security consideration re javascript URIs and relative IRIs

Gregg Kellogg: we don’t now check for defined schemes

Benjamin Young: it’s the responsibility of the document loader to worry about this
… it could just choose not to resolve troubling URIs

Ivan Herman: so you put together a URI from the JSON-LD (could happen in many ways).
… at that level, do we add a security check?

Benjamin Young: that’s the job of the person using the URI

Rob Sanderson: Or rather "@vocab": "javascript:document.alert('hi!');"

Benjamin Young: this isn’t a job for the syntax
… e.g. data: can hide anything
… data: URIs

Gregg Kellogg: this could be used for maliciousness, but it’s on the users of the URIs to be careful

Rob Sanderson: we don’t do path expansion, we’re doing string concat here. so we won’t catch a lot of stuff

Benjamin Young: but the advantage of string concat is that it supports non-pathy URIs

Gregg Kellogg: @vocab is used only for properties

several: and generally people don’t dereference properties, and nothing in our algorithms says that they should

Gregg Kellogg: we can modify the API to return only URIs of some form.

Benjamin Young: we can say “we never use these URIs, so there’s no concern w/i JSON-LD, but if users choose to use them, the usual concerns about URIs from the wild apply”

Gregg Kellogg: we might consider softening the current restrictions in 3.6.3
… to use IRI expansion and not string concatenation

Rob Sanderson: "@vocab": "http://example.org/ns/" and then "@vocab": "/"

Rob Sanderson: currently you get "http://example.org/ns//" which is unexpected for relative IRIs

Rob Sanderson: And the expectation would be "http://example.org/"

Ivan Herman: are we making the distinction between the two kinds of resolution disappear?

Gregg Kellogg: still the issue of concatenation vs. IRI resolution

Gregg Kellogg: how do I establish @vocab, vs. how do I use it?

Ivan Herman: if we are in @vocab we do string concat, it’s clean
… and users just have to know about that
… let’s don’t mingle concatenation and IRI resolution

Gregg Kellogg: we’re only interested in resolving `@vocab when it is relative, that’s all

Ivan Herman: You’re right, but that’s about implementing the system
… I’m talking about end users
… if we just have string concat for @vocab, that’s clean, I understand that
… even if we do IRI resolution instead over here
… somewhere else. but any problems with doubled slashes, etc., are users problems to deal with

Rob Sanderson: if we went all the way to have @vocab itself computed and vocab terms resolved via IRI resolution, it breaks things
… so lets stick cleanly to string concatenation

Gregg Kellogg: Ok, but these are different use cases.

Ivan Herman: there is an actual regexp to recognize absolute IRIs, so we can rely on that

Harold Solbrig: doesn’t the CURI spec speak to this?

Gregg Kellogg: we don’t use CURIEs, either

Harold Solbrig: why are we using something else?

Gregg Kellogg: every RDF serialization uses its own way to discuss short URIs
… (gkellogg then names more than you would think he could off the top of his head)

Rob Sanderson: we can’t necessarily construct all legit IRIs, but most of what we can’t is unusual enough not to be problematic
… if we stick with string concat, we avoid this
… if you construct stupid @vocabs, that’s your problem

Proposed resolution: Allow IRIs to be constructed by string concatenation with multiple @vocabs (Rob Sanderson)

Proposed resolution: Allow IRIs to be constructed by string concatenation with multiple @vocabs, with appropriate security consideration section (Rob Sanderson)

Adam Soroka: +1

Gregg Kellogg: +0.9

Rob Sanderson: +1

Simon Steyskal: +1

Adam Soroka: +1

Harold Solbrig: +1

Benjamin Young: +1

Resolution #3: Allow IRIs to be constructed by string concatenation with multiple @vocabs, with appropriate security consideration section

Ivan Herman: +1

Benjamin Young: the Chaucer quote fwiw https://english.stackexchange.com/questions/139073/meaning-of-if-gold-rust-what-shall-the-iron-do

4. expanding @vocab properties consistently

Rob Sanderson: link: https://github.com/w3c/json-ld-syntax/issues/56

Gregg Kellogg: the concern is that the same term expands differently in different places
… gkellogg then describes the example in the issue

Rob Sanderson: the example has two triples in it:

Benjamin Young: triples from the playground from the first example in #56

Benjamin Young: <http://example1.com/barney> <rdf:value> "the sidekick" .

Benjamin Young: _:b0 <http://example1.com/fred> <http://example1.com/barney> .

Benjamin Young: _:b0 <http://example1.com/fred> <http://example2.com/barney> .

Ivan Herman: I would say that in the second case (barney is a string) it’s the defn of barney that comes in
… nothing to do with that fred is defined by vocab

Gregg Kellogg: no, it does have to do with
… our rules are intentionally different for different kinds of values
… the complaint is that the rules for resolving IRIs vary
… in the example the value space of fred is vocab
… the fact that there is an entry in the vocab space for barney means nothing when we use barney as an ID.

Adam Soroka: that’s the two different spaces

Gregg Kellogg: unless it happens to be used in a CURI

Ivan Herman: this is confusing

Gregg Kellogg: in 1.0 there is a distinction between the vocabulary (properties, types) and values, which come from the document.
… if you have a term in the vocab with a name that is used as instance data, in each position they are resolved differently.

Benjamin Young: a remix of the example of #56 using 2 contexts in http://tinyurl.com/y7wpjogw

Gregg Kellogg: we’re getting back into deep history of 1.0 here. Qe felt there are good reasons to make this distinction, but we may have been wrong, of course

Adam Soroka: ivan is now showing examples on the projector, see http://tinyurl.com/yclwe2ax

Gregg Kellogg: when you have a doc, you ascribe meaning by bring properties and types to it.
… but you would want to resolve relative value URIs differently than vocab terms

Rob Sanderson: but this kind of problematic situation shouldn’t arise

Benjamin Young: sometimes you don’t get a choice

Rob Sanderson: you should be able to avoid it by inverting the order of multiple contexts, if they colliding in this way between them

Ivan Herman: in the first example with barney, http://example1.com/ is used for base and the first appearance of fred

Gregg Kellogg: the fact that barney is a term in the context doesn’t affect value expansion

Ivan Herman: if I change the base, various changes result
… we could pare down the example

Benjamin Young: this could confuse naive devs

Rob Sanderson: you should never get this to begin with
… what would be the use of interpreting barney as the same thing in the example
… you wouldn’t construct that data.

Benjamin Young: [uses his hands to construct a list of annotations]
… one past use case we’ve gotten is when you start with a list of ids,then you reference down into a list of inclusions.
… the use case from which this current issue came is gathering that up into a packaged graph with lots of graphs about the same stuff

Rob Sanderson: http://tinyurl.com/yb3olq5y

Benjamin Young: in JSON-LD we are constantly using many contexts, that raises the likelihood of this

Adam Soroka: more discussion about whether or not this situation can realistically arise

Rob Sanderson: the data in the OP example is meaningless

Adam Soroka: ivan brings more examples to the projector

Ivan Herman: here is a minimal situation to display the problem

Ivan Herman: A simpler version of the example:

{
  "@context": {
    "fred": {
      "@id": "http://a.b.c",
      "@type": "@vocab"
    },
    "barney": {
      "@id": "abc:def"
    }
  },
  "fred": "barney"
}

(see http://tinyurl.com/y78bzssn)

Adam Soroka: agreement that ivan’s new example minimally demonstrates the problem

Gregg Kellogg: why does one want to create terms with '@type':"@vocab" to begin with?

Ivan Herman: we should have a very clear example of this and explanation of this in the docs

Gregg Kellogg: I think we have examples

Ivan Herman: with lots of explanation

Gregg Kellogg: we have a document space and a vocab space, and the only time document terms get resolved in vocab space is in compact IRIs
… and when the term used as the property is defined as '@type':"@vocab"

Benjamin Young: this example wasn’t contrived—it came out of extant tooling

Adam Soroka: disagreement about how responsible users are to avoid these kinds of colisions when using other people context’s

Ivan Herman: Patrick (the OP) is a very expert user. What about the less-expert users?
… it’s a return to what I earlier said: what if I am semweb literate, and I come to this, and I can’t make my graph work sanely in JSON-LD

Benjamin Young: even with Patrick’s expertise he hit this problem, and not by trying to develop a curious example
… it was normal work

Gregg Kellogg: this might stem from divergent missions
… surprising that someone trying to turn Turtle into JSON-LD doesn’t find that obvious

Ivan Herman: might need to push this into the primer
… the doc as is tried to make it usable for JSON devs
… we may need to write from the other direction
… for RDF folks coming to JSON

Rob Sanderson: in Patrick’s defense, there is no good example for this

Gregg Kellogg: I can try to write this up
… one concern: we are still getting to “own” JSON-LD
… we all need to get to an adequate mastery level to be able to talk about this
… we could try assigning issues to people who raise them

Proposed resolution: Make #56 editorial to add a "@type":"@vocab" example (Rob Sanderson)

Adam Soroka: +1

Proposed resolution: Make #56 editorial to add a "@type":"@vocab" example and otherwise no change (Rob Sanderson)

Rob Sanderson: +1

Simon Steyskal: +1

Benjamin Young: +1

Gregg Kellogg: +1

Adam Soroka: +1

Harold Solbrig: +1

Resolution #4: Make #56 editorial to add a "@type":"@vocab" example and otherwise no change

5. joint meeting with the Data Exchange WG

Ivan Herman: Minutes of the session in the DXWG Meeting Minutes

6. syntax/#8 - HTTP parameters for specifying context or frame

Benjamin Young: continuing conversation from #dxwg

Rob Sanderson: we can use space separated lists of URIs, so the profile=”” parameter can contain both the compaction, etc. defined URIs
… as well as a JSON-LD context or frame URI
… however, these URIs might collectively get quite long

Adam Soroka: if you specify a URI don’t you want it to be compacted?

Rob Sanderson: it could be a frame

Gregg Kellogg: or a bomb

Benjamin Young: or a cat photo

Adam Soroka: do we want to have a default use like it’s assumed it’s a context URI?

Ivan Herman: what would this look like in practice?

Rob Sanderson: I’ll write it up…one second…

Rob Sanderson: ACCEPT: application/ld+json;profile="http://w3.org/ns/anno.jsonld"

Rob Sanderson: this would say, “I want this request compacted according to this context”

Gregg Kellogg: if that JSON-LD URL returns a context

Benjamin Young: don’t dereferences things found in profile params
… because proxies

Gregg Kellogg: right, security concerns/needs should be expressed

Benjamin Young: we used the entire string as the “media type” for Web Annotation
… with no expressed intent or idea of dereferencing that context URI

Adam Soroka: I’m not too worried about big headers
… if you’re going to be this particular about it, then that’s a “cost” you’ll face

Rob Sanderson: and the multiple URIs in profile=”” is not a problem?

Adam Soroka: no, I think that’s fine

Rob Sanderson: if you don’t care if you’re flattened or compacted, then don’t state it and use the context for whichever you’re returning

Benjamin Young: is there priority among these strings?
… or do they just work as flags?

Rob Sanderson: they’re just flags

Adam Soroka: do we want q=”” parameters for this?

Rob Sanderson: I’d say it’s an error condition to have multiple contexts given

Benjamin Young: is the intent to continue–as Web Annotation did–as a single opaque string?
… or are we switching on each piece?
… mostly it’s a question for me about upgrading existing Web Annotation servers (and the like)
… so are they opaque “whole” strings? or are we taking them apart in some way?

Rob Sanderson: so, in the case of compaction at least, we could have a default that maps to compaction
… which is how profile=”” is used in the cases that we know

Gregg Kellogg: are you going to help write these tests?

Rob Sanderson: sure. I’ll help write those tests
… and the paragraph/dependencies

Gregg Kellogg: and at the moment we don’t have complete HTTP header tests…but need those

Rob Sanderson: I know someone (bigbluehat) who’s done this already

Benjamin Young: we’re continuing the Web Annotation validation/testing tooling work at http://annotator.apache.org/

Gregg Kellogg: so, there are a few HTTP tests now
… they were originally in an .htaccess file
… but right now it’s left up to the implementations to act as if the headers were expressed in the responses or not
… the test suite should ideally run off a W3C mirror that implement these .htaccess

Benjamin Young: curious which bits you needed in .htaccess

Gregg Kellogg: there’s a remote access manifest in the API

Gregg Kellogg: https://github.com/w3c/json-ld-api/blob/main/tests/remote-doc-manifest.jsonld

Gregg Kellogg: for example https://github.com/w3c/json-ld-api/blob/main/tests/remote-doc-manifest.jsonld#L34
… this one tests a application/jldTest content type
… which in this case should fail
… others here deal with redirects
… also, another for the Link header https://github.com/w3c/json-ld-api/blob/main/tests/remote-doc-manifest.jsonld#L84
… for all of these we’re analyzing the JSON-LD result
… so we probably want similar ones analyzing the HTTP headers themselves

Proposed resolution: Add text to Iana Considerations explicitly allowing the request of a context or frame document, plus security consideration on whitelist, plus tests (Rob Sanderson)

Rob Sanderson: +1

Gregg Kellogg: +1

Adam Soroka: +1

Benjamin Young: +1

Ivan Herman: +1

Resolution #5: Add text to Iana Considerations explicitly allowing the request of a context or frame document, plus security consideration on whitelist, plus tests

7. Discussions with the TAG representative

Rob Sanderson: We want guidance from TAG because this crosses specs, specs that are at various process stages.

Gregg Kellogg: https://github.com/orgs/w3c/projects/4

7.1. What is ‘base’ for an embedded json-ld? (redux)

Gregg Kellogg: https://github.com/w3c/json-ld-syntax/issues/23

Rob Sanderson: the overall req is that for relative URIS, we must have a way to resolve them to absolute URIs.
… JSON-LD that is provided by itself, this only refers to the URI of the doc.
… so resolution is easy.

Rob Sanderson: but you can also embed JSON-LD w/in HTML
… there are various ways to determine the base of an HTML doc
… our issue is when we are resolving rel URIs inside a JSON-LD doc inside an HTML doc
… should we rely on that base or should we rely on the JSON-LD base (as it would otherwise be used)?

Ivan Herman: the base attribute in the DOM is inherited by the <script/> element that contains the JSON-LD

Hadley Beeman: use case?

Ivan Herman: every schema.org doc (in JSON-LD) is embedded. Billions of pages contain this.

Dan Brickley: Case 1: https://gist.github.com/danbri/0168ebcd731493ee88c20407e6cdb8bf raw: https://gist.githubusercontent.com/danbri/0168ebcd731493ee88c20407e6cdb8bf/raw/bff3ae0bed3fac72972bf1ccccd4137517111bc4/gistfile1.txt HTML includes a div setting a base. vs Case 2: https://gist.github.com/danbri/b04d835ebf227f8994001b304ac59acf https://gist.githubusercontent.com/danbri/b04d835ebf227f8994001b304ac59acf/raw/3cbea88cd905665cd158b5988b1a8839362e4f57/gistfile1.txt

Dan Brickley: I forgot to relative-ize some URLs

Hadley Beeman: what are you going to do with the absolute URI?

Rob Sanderson: the point is to get to RDF, which works on absolute URI

Dan Brickley: We just want to make links!

Benjamin Young: the dom tree inherits that base tag, and the base tag also supplies the base URI for the HTML doc
… the prevailing encouragement is to ignore the DOM

Rob Sanderson: the counterexamples are Microdata or RDFa. RDFa uses all the base URIs.
… they use the DOM, both of them

Gregg Kellogg: it would be nice if all these things agreed

Benjamin Young: we already have explicit base for JSON-LD

Hadley Beeman: how much is JSON-LD content vs. machine-parseable behavior?
… it’s not easy

Benjamin Young: Is the JSON-LD integrated in the doc, or is the doc just a carrier?

Gregg Kellogg: you might have multiple script blocks.

Dan Brickley: I redid the testcases: https://gist.github.com/danbri/0168ebcd731493ee88c20407e6cdb8bf vs https://gist.github.com/danbri/b04d835ebf227f8994001b304ac59acf (see raw links for … raw links)

Hadley Beeman: it’s content vs. structured data
… danbri and are have been discussing this

Hadley Beeman: my instinct is to prefer that data that is machine-readable before human-readable shouldn’t be tied up with the DOM
… but I’d like to discuss with my TAG colleagues to avoid breaking other things

Dan Brickley: 3 points about this

Dan Brickley: 1. Google at least, likely other search engines, extract JSON-LD after running headless browsers.

Dan Brickley: 2. Here’s a headless browser called from python (firefox), which ignores <base href> and just pulls out <script>. It would be good for this not to get too much more complex.

Dan Brickley: 3. The big appeal of JSON-LD was its standalone nature.

Benjamin Young: this might be a good thing for RDF 2.0, but it would not be in a <script/> tag
… but a data block right now doesn’t inherit from the DOM

Hadley Beeman: if you come from the part of the Web where most stuff is in the DOM, then this would be natural for JSON-LD
… the desire to integrate in the future makes sense,
… but not doing that suddenly now

Dan Brickley: 1. we don’t want to get involved with browsery hell as a search engine.
… but what we do now is fragile
… but it’s easy for publishers
… we say that it works, but we’re not pushing it as the right thing to do
… 2. JSON-LD took off because it was self-contained
… unlike other structured data formats
… that self-contained character is the clincher here
… only do relative URIs if you know what you’re doing

Gregg Kellogg: these bleeds into other specs that might use <script> blocks
… how should rel URIs be interpreted there?
… I suspect that they should be resolved against the document URI

Benjamin Young: same deal if you have a CMS that is sticking metadata into pages—you should absolute-ize the URIs
… which we can do already in JSON-LD

Hadley Beeman: Create an issue on this please https://github.com/w3ctag/design-reviews

Gregg Kellogg: if we have input, we should give that to the TAG

Benjamin Young: could result in changes to the datablock spec
… or telling specs to advertise what they do here
… ultimately, if you don’t absolutize the URIs, you are entangling your data with the package

Benjamin Young: https://www.w3.org/TR/html5/semantics-scripting.html#data-block

Hadley Beeman: you used the word “package”

Benjamin Young: let me say “conveyance” instead

Dan Brickley: ..ooOO( if json ld is parsing from a.html and there’s an iframe of b.html, … do we have anything to say?)

Dan Brickley: what is the story with multiple script blocks?

Benjamin Young: we have other issues, but without this we can’t get to the others

7.2. Content addressable contexts

link: https://github.com/w3c/json-ld-syntax/issues/9

Rob Sanderson: equally or more cross-working-group to the last one
… VC et al have security concerns around contexts that must be led by a server before the server can understand the JSON-LD

Benjamin Young: you don’t have to fetch this every time you need it, you will cache contexts, etc.
… which comes around to how contexts (as web resources) change with time
… naming on the Web—easy?
… each of the communities using the tech can resolve these things. we need the TAG to advice on the nuances of things
… like content hashes, etc.

Rob Sanderson: you can solve it at 3 diff layers: HTTP, where you can last-modified, etags, etc.
… in the doc itself, which can declare versioning
… or in the URI, e.g. http/example.com/v2

Dan Brickley: but do you need a context at all?

Gregg Kellogg: you can put it into the instance doc itself, but it still exists.

Benjamin Young: {"@context": {"@vocab": "http://example.com/"}, "name": "made up vocab"}

Gregg Kellogg: The notion is that JSON-LD provides the context within which to interpret your JSON
… strings sometimes are dates sometimes IRIs, sometimes something else
… you need to both distinguish between these things but also allow for idiomatic JSON

Benjamin Young: which creates _:doc-id <http://example.com/name> "made up vocab" (in triples—where “doc-id” is completely random)

Gregg Kellogg: the context explains some of the things that API docs might explain

Dan Brickley: I had thought that the context could be derived in some other way

Benjamin Young: you can.

Adam Soroka: (from an HTTP header)

Dan Brickley: is this wrong then?

Dan Brickley: { "@vocab": "https://schema.org/", "@type": "Volcano", "sameAs": "https://www.wikidata.org/wiki/Q2586153", "name": "Zuidwal volcano", "description": "The Zuidwal volcano is an extinct volcano in the Netherlands at more than 2 km (6,600 ft) below ground ..."}

Benjamin Young: latest syntax spec https://w3c.github.io/json-ld-syntax/

Rob Sanderson: the TAG issue for guidance is — this caching question is much broader that JSON-LD.
… many systems do this kind of “apply one resource to another to clarify”
… e.g a CSS stylesheet—if it changes over time the rendered HTML will look very different
… in a knowledge graph context, that kind of change is much more dangerous
… guidance for mitigating those concerns in the spec rather than leaving it up to broader discussion about change over time on the web

Benjamin Young: Link header looks like: Link: <context.jsonld>; rel="http://www.w3.org/ns/json-ld#context"; type="application/ld+json"

Hadley Beeman: so remote change can affect local things—how is this different from old concerns about linking?

Dan Brickley: the WoT folks chose a syntax that requires this additional context resource—that’s what causes this to be different
… as long as you need the context doc to interpret and act on the data, you have opened this security hole

Hadley Beeman: so what about caching?

Adam Soroka: WoT operates on much longer timescales — years in many cases.

Dan Brickley: one thing we care about is being a planet-scale search engine, the other is about lightbulbs in your home

Benjamin Young: you can’t change what happens in the lightbulb without changing the context used for data there
… you can update when you update the code
… the URI of the context really is a URI (an identifier), not a location, but it ends up being used as one

Hadley Beeman: I still hear this either being “we have to solve the problems of building the IoT” or “the network knows something about me because of what I looked at”
… which are both larger problems

Dan Brickley: no, it’s different — because now the lightbulb is not a user agent, but it is broadcasting info about you

Benjamin Young: our API spec says you can implement document loading any way you want, yay!
… but that doesn’t really solve the problem
… we have this elsewhere, e.g. clickjacking
… the answer is Single Domain policies, which is utterly inflexible and unwebby
… we want to have ids that can be safely resolved
… we would like a webby trust model

Hadley Beeman: if we do that, we say this is no longer a user agent (the light bulb)
… you are making an explicit decision that has privacy remifications—you need to be aware of that

Dan Brickley: (mitigation sketches being: parser reports, and web packaging for out-of-band context bundles with integrity checks, e.g. via homehubs etc)

Dan Brickley: I am unaware of any context that is useful and has gone unchanged for more than a few weeks
… we’re stuck between refreshing and getting the best new stuff
… and broadcasting your interests on the web
… but if not, you get out of date
… one thing would be for parsers to report what they’re done

Rob Sanderson: (and we carry on)

Rob Sanderson: (everyone disagrees about the weeks lifecycle, e.g. activitystreams, annotations, ldp, etc)

Dan Brickley: there are a lot of use cases for web packaging, this might connect with that
… your home env might have gotten this through some kind of bundled thing at lower frequencies.

Benjamin Young: you could have an API for updating the lightbulb
… but that can’t work from a pull side, because you can’t guarantee that the identifier will resolve the same way on the open web over time
… bringing us to blockchains, hashing, etc.
… there are lots of potential solutions on the list
… web packaging is one potential solution

Rob Sanderson: do you understand the issue well enough for us to send a ticket to TAG?

Hadley Beeman: what’s the header for this?

Benjamin Young: maybe “integrity”

Hadley Beeman: please write this up with use cases
… because otherwise we’re staying hypothetical

Benjamin Young: there is no requirement to do remote context dereferencing
… but people will

Dan Brickley: there are the specs, then there are ways you relay operate
… with JSON-LD, if you do it wrong, people flame you on the mailing list
… social pressure has forced us to become a component in a larger system—not cool!

Dan Brickley: we’d like people to get what they are expecting when they get contexts

Benjamin Young: many communities have this same problem
… with many different solutions
… that’s what makes it a TAG deal
… another big JSON-LD deployment is Mastodon (uses Activity Streams)

Adam Soroka: .. they haven’t updated their context for a long time

Benjamin Young: but it still works
… which is what makes this a Publishing WG problem—publishing that is broken because a system went down somewhere isn’t publishing as publishers understand it

8. document loading related proposals

Ivan Herman: I think it’s worthwhile to explore an “integrity” expression similar to SRI
… I don’t know how exactly it would be used at the API level
… I also have the expectation that other attributes with similar intent will come along
… but I feel that sealing is definitely not one of those
… and as an aside, our preference is to say that sealing is done within a context file for that context

Adam Soroka: I believe it’s only been context files who have asked us for sealing anyhow, is that correct?

Ivan Herman: yes. only context file authors
… and perhaps we should add to that issue that we could limit sealing to only a context file
… did we also talk about data documents?

Gregg Kellogg: I believe we talked about both
… we could do that for a context or a term definition

Benjamin Young: I still feel like this will result in unexpected behavior

Adam Soroka: syntax based security is false security

Gregg Kellogg: I’d just remove the sealed keyword from someone else context file if for some reason I couldn’t override it

Ivan Herman: so “sealing” does indeed give one a false sense of security

Gregg Kellogg: in Ruby there are private and protected methods
… but there’s also a send method that allows you to use either regardless

Ivan Herman: so. we have not made a decision on that
… but I am happy we have this one the minutes


9. Resolutions

10. Action Items