Re: Official response to RDF-ISSUE-132: JSON-LD/RDF Alignment from David Booth on 2013-06-08 (public-rdf-comments@w3.org from June 2013)

From: David Booth <david@dbooth.org>
Date: Sat, 08 Jun 2013 14:27:38 -0400
To: Markus Lanthaler <markus.lanthaler@gmx.net>
CC: 'public-rdf-comments' <public-rdf-comments@w3.org>
Message-ID: <51B3779A.7030503@dbooth.org>
Hi Markus,

On 06/08/2013 08:28 AM, Markus Lanthaler wrote:
> On Friday, June 07, 2013 1:55 AM, David Booth wrote:
>> On 05/21/2013 02:19 PM, Manu Sporny wrote:
>>> Hopefully it is clear that the decision to leave "based on RDF" out of
>>> the Linked Data definition was thoroughly and carefully considered. In
>>> the end, the group decided not to tie RDF and Linked Data together
>>> because it would be conflating a data publishing concept (Linked Data)
>>> with an abstract data model (RDF).
>>>
>>> In the end, the group decided against tightly coupling Linked Data and
>>> RDF because:
>>>
>>> 1. It would conflate two different concepts.
>>
>> It is extremely misleading to suggest that tightly coupling Linked Data
>> and RDF "conflates" two different concepts, when the fact is that Linked
>> Data -- in the established sense of the term -- is *based* on RDF.
>
> IMHO, RDF != Linked Data.

Correct.  Linked Data is a subset of RDF: all Linked Data is RDF, but 
not all RDF is Linked Data.

> Nothing in RDF requires IRIs to be dereferenceable
> - but of course you can use RDF to express Linked Data if you somehow
> communicate out-of-band that those *identifiers* in there are also locators.

Correct.  That's the difference between RDF and Linked Data: Linked Data 
requires that URIs be dereferenceable; RDF does not.

>> It is clear from reading the JSON-LD group's discussion log
>> http://json-ld.org/minutes/2011-07-04/#topic-3
>> that the group wanted to avoid reference to RDF, and hence -- exceeding
>> its authority -- the group invented a new definition for "Linked Data"
>> to suit this purpose.  Some individuals even appear to have convinced
>> themselves that this new definition is the *real* definition of the
>> term!  It is not.
>
> I think we are talking about the text in the (non-normative) introduction.

Correct.

> Let me quote it:
>
>     Linked Data is a technique for creating a network of inter-connected
>     data across different documents and Web sites. In general, Linked Data
>     has four properties:
>       1) it uses IRIs to name things;
>       2) it uses HTTP IRIs for those names;
>       3) the name IRIs, when dereferenced, provide more information about
>          the thing; and
>       4) the data expresses links to data on other Web sites.
>     These properties allow data published on the
>     Web to work much like Web pages do today. One can start at one piece
>     of Linked Data, and follow the links to other pieces of data that are
>     hosted on different sites across the Web.
>
> Here are TBL's (current, 2009) Linked Data principles:
>
>    1) Use URIs as names for things
>    2) Use HTTP URIs so that people can look up those names.
>    3) When someone looks up a URI, provide useful information,
>       using the standards (RDF*, SPARQL)
>    4) Include links to other URIs. so that they can discover more things
>
> So I think all we are arguing about here is the "(RDF*, SPARQL)" in (3),
> right?

Yes.  We are arguing about the attempt to re-defined the notion of 
Linked Data to not be based on RDF.

> Now let's look at the at the original 2006 version of the Linked Data
> principles as Kingsley proposed:
>
>    1) Use URIs as names for things
>    2) Use HTTP URIs so that people can look up those names.
>    3) When someone looks up a URI, provide useful information.
>    4) Include links to other URIs. so that they can discover more things.
>
> http://web.archive.org/web/20061201121454/http://www.w3.org/DesignIssues/Lin
> kedData.html

Uh . . . did you read that entire document?  Please do.  Because unless 
one were intentionally exercising selective understanding, I do not see 
how anyone could honestly misread it so badly as to not realize that it 
is specifically talking about RDF and the Semantic Web, making reference 
to the way the HTML-based web works, and showing that the same 
principles of linking and dereferencing are needed for RDF and the 
Semantic Web.  The very first paragraph says:
[[
The Semantic Web isn't just about putting data on the web. It is about 
making links, so that a person or machine can explore the web of data. 
With linked data, when you have some of it, you can find other, related, 
data.
]]

And the second paragraph explicitly says: "for data they links  between 
arbitrary things described by RDF".  I don't know how he could have said 
it more clearly.

Claiming that that document in any way supports the notion that Linked 
Data is not based on RDF would be disingenuous to the point of being 
fraudulent.

>
> Surprisingly exactly that "(RDF*, SPARQL)" remark was missing when the term
> was coined.

Not surprising at all.  That document was clearly a rough draft.  It is 
not surprising that TimBL added more clarification as he edited it.

> We can continue forever to argue about whether it is needed or
> not. We can also argue whether it is possible to "provide useful
> information" by using an abstract data model, i.e., RDF. When you
> dereference a URI, you'll get back a representation which is in a concrete
> syntax. So, it would be more correct to say
>
>    3) When someone looks up a URI, provide useful information,
>       using a standard format which can be interpreted as RDF
>
> Would that add any value given that you can interpret (convert) every format
> to RDF? I doubt so. This group (myself included) is convinced that doing so
> would scare of a large portion of the target group, i.e., average web
> developers.

That is your motive, and I have some sympathy for it.  But it does not 
justify the attempt to re-define the meaning of such an important and 
well-established term.

>
>
>> The term "Linked Data" has a well-established meaning within semantic
>> web community.  The JSON-LD group would be *misleading* the public by
>> stating or implying that Linked Data is not necessarily based on RDF.
>
> RDF is an abstract data model whereas Linked Data is a concept. Everything
> can be expressed in RDF. In that paragraph we are describing the concept for
> people not familiar with it. Clearly, the "semantic web community" is not
> the intended target group of that paragraph. Not with the best will in the
> world can I see how this is "misleading the public".

Claiming or implying that Linked Data is not based on RDF would be 
misleading the public.  It is factually untrue.

>
>
>> If certain members of the JSON-LD group wish to re-architect Linked Data
>> and the Semantic Web to be based on JSON instead of RDF, they are free
>> to make that *proposal* on their own time, but that is *not* how Linked
>> Data and the Semantic Web are currently architected, and that is not
>> what the RDF working group was chartered to do.  The working group was
>> chartered to "Define and standardize a JSON Syntax for RDF . . . an RDF
>> serialization":
>> http://www.w3.org/2011/01/rdf-wg-charter
>
> We are definitely not trying to re-architect Linked Data. What we are trying
> to do is to bring it to the masses. I think we agree that the semantic web
> community you are talking about has a miserable track record for doing so.
> Mentioning RDF in the first paragraph of the spec would certainly not help
> us in that regard. Unfortunately, a lot of people simply stop listening when
> they hear the three magic letters R D F.

I do not believe that mentioning that Linked Data is "based on RDF" will 
have a significant impact on the adoption of JSON-LD.

JSON-LD needs to stand on its own merits.  Tutorials can perfectly well 
talk about how to use JSON-LD without ever discussing bnodes, RDF/XML, 
or the semantics of interpretations.

>
> We just try to explain them the underlying principles in simple terms to get
> them interested and motivated enough to read the rest. The end of the spec
> makes JSON-LD's relationship to RDF crystal clear (IMO at least) and
> contains a whole lot of examples for people from the semantic web community
> already familiar with e.g. Turtle or RDFa. Those people don't need to read
> the introduction, they know the basics already.

I think it's good that it starts by explaining the underlying principles 
in simple terms.  I do not believe that that objective will be harmed by 
the three words "based on RDF".

>
>
>> Why does the definition of "Linked Data" matter so much?  Messaging
>> matters!  It can have a huge real-life impact.  (Colossal recent example
>> in politics: The messaging that President Bush used to justify starting
>> the Iraq war, which has ended up costing trillions of dollars and over
>> 100,000 civilians killed!)
>
> I just ignore this remark.
>
>
>> The coining of the term "Linked Data" by TimBL was the single most
>> important advance in messaging in the entire history of the Semantic
>> Web.  One of the biggest problems the Semantic Web had was the term
>> "Semantic Web" itself, because: (a) it is intimidating and confusing;
>> and (b) it is misleading, because people wrongly associate it with the
>> semantics of natural language processing.  It has been difficult over
>> the years to get the messaging simple and clear -- and the ugliness of
>> RDF/XML certainly didn't help -- and the term "Linked Data" helps
>> substantially.
>
> Exactly, it is a marketing term. Dan wrote excellent piece on that so I
> won't rehash it here:
>
> http://lists.w3.org/Archives/Public/www-archive/2012Oct/0119.html

Yes, that's a very good post.  To quote: 'RDF effectively got rebranded 
again; this time as "Linked Data".'

>
> The truth is that people strongly associate RDF with RDF/XML. In fact, it is
> difficult to have conversations without conflating RDF the data model and
> its serialization formats.

It would be nice if JSON-LD could help dispell that misconception.  But 
to do so, people need to know that JSON-LD *is* a serialization of RDF.

>
>
>> If the JSON-LD spec were to adopt a definition of "Linked Data" that
>> differs in such a critical way from the established meaning of this
>> term, it would be misleading the public and would create confusion in
>> the community.
>
> Sorry, but I just can't see how it is doing that.

Okay, to be very clear:

1. Telling the public that Linked Data is not based on RDF would be 
misleading the public, because Linked Data -- in the original and 
well-established sense of the term -- very clearly *is* based on RDF.

2. That would create confusion in the community because people would 
then have differing notions of what is Linked Data.  Thus, when people 
talk about Linked Data, there would be confusion about what is meant. 
If the meaning is not kept clear, people may start claiming that 
arbitrary JSON is "Linked Data", or that spreadsheets that contain some 
URLs are "Linked Data", or that database tables that have foreign keys 
are "Linked Data", or HTML pages of census data with links to other 
pages are "Linked Data".

>
>
>> To be clear, the current resolution of this point is NOT satisfactory.
>>
>> A simple and neutral way to resolve this problem would be to just quote
>> TimBL's original definition of the term.  This is what other documents
>> have done, and would not require endless wordsmithing debates.  I
>> suggest doing that and linking to TimBL's original Linked Data document.
>> (Credit: thanks to Arnaud Le Hors for making this suggestion while we
>> were talking at SemTech.)
>
> I suppose by "TimBL's original definition" you don't really mean the
> original 2006 version, right?

I was referring to his completed version of that definition.  But as I 
pointed out above, the intent was also clear in the first draft.

>
>
>>> 2. It is the groups experience that Web developers have an aversion to
>>> RDF as a complex technology due to RDF/XML and other technologies that
>>> do not represent the current RDF world. It doesn't matter if these
>>> aversions are based on reality - the aversion exists, so we try to
>>> downplay RDF as much as possible in the JSON-LD spec.
>>
>> I agree with the goal of keeping it simple for Web developers, but I
>> think the downplaying has gone to the point of hiding it, and that is
>> harmful.  If developers' view of RDF is going to change, they need to
>> know that it *is* RDF that they are using when they use JSON-LD. If
>
> And you think that developers won't understand that from the last paragraph
> in the introduction
>
>     Developers that require any of the facilities listed above or
>     need to serialize an RDF graph or dataset [RDF11-CONCEPTS] in a
>     JSON-based syntax will find JSON-LD of interest.
>
> or any of the sections specifically discussing the relationship of JSON-LD
> and RDF?

No.  Would developers who "need to serialize an RDF graph in a XYZ-based 
syntax find XYZ-LD of interest"?  I guess, but that doesn't say that 
XYZ-LD is an RDF.  If they knew that the "-LD" stood for "Linked Data" 
and that Linked Data is based on RDF, then they may guess.  But I think 
it should be clear.

>
>
>> they see how easy it is to use JSON-LD, it will stand on its own merits,
>> even if it does say "RDF inside".  To my mind, the goal should not be to
>> *hide* the fact that it is JSON-LD is RDF, but to make JSON-LD 100%
>> usable by those who do not wish to learn anything *else* about RDF --
>> i.e., anything beyond what they learn in the JSON-LD spec.
>
> That's exactly what we try to do. By "hiding RDF" we try to increase the
> chances that they "see how easy it is to use JSON-LD" instead of stopping to
> read after the first paragraph because of an aversion to RDF.

If JSON-LD takes off it will be because of the benefit it provides -- 
not because of the inclusion or omission of the words "based on RDF".

>
>
>>> 3. There is no technical problem that is solved by referencing RDF in
>>> the definition of Linked Data.
>>
>> No, but as explained above, it is a very important messaging issue.
>>
>>> 4. If we were to add RDF to the definition of Linked Data, there would
>>> just be another set of objections to the inclusion of RDF in the
>>> definition of Linked Data.
>>
>> Then those objections should be addressed head-on anyway, because the
>> term "Linked Data" has an important and well-established meaning in the
>> community, and that includes the fact that Linked Data is RDF. Otherwise
>> those who wish to divorce Linked Data from RDF will be misleading the
>> public when they talk about "Linked Data" and mean something else, or
>> they talk about "conflating" Linked Data with RDF, when in fact Linked
>> Data *is* RDF.
>
> Linked Data != RDF. RDF without a single dereferenceable IRI is still valid
> RDF but it certainly isn't Linked Data by any means.

Correct.  Linked Data is RDF, but RDF is not necessarily Linked Data.

>
>
>>>> 2. Define a *normative* bi-directional mapping of a JSON profile to
>>>> and from the RDF abstract syntax, so that the JSON profile *is* a
>>>> serialization of RDF, and is fully grounded in the RDF data model and
>>>> semantics.
>>>
>>> We already do this here:
>>>
>>> http://www.w3.org/TR/json-ld/#transformation-from-json-ld-to-rdf
>>
>> No, it doesn't.  That section explicitly says: "This section is
>> non-normative".
>
> JSON-LD consists of two specs, the syntax spec and the algorithms and API
> spec. The normative transformation to RDF can be found here:
>
> http://www.w3.org/TR/json-ld-api/#convert-to-rdf-algorithm

It needs to be much clearer that the bi-directional mapping to/from the 
RDF abstract syntax is normative.  As I said in my last email:
[[
 > In discussing this at SemTech with Gregg Kellogg -- BTW, thanks for
 > getting together Gregg! -- he told me that it is the working group's
 > intent that JSON-LD be a *normative* serialization of RDF.  So if that's
 > the case, it sounds like this may be an editorial issue: the document
 > needs to be much clearer about how the relationship is normative.  At
 > present, it is not at all clear that it is normative.  Maybe what's
 > needed would be something as simple as "This section is non-normative.
 > However, JSON-LD is a normative serialization of RDF: the normative
 > relationship between the JSON-LD syntax and the RDF model is defined in
 > section @@@@."
]]

>
>
>>> There have been arguments in the past to specify an additional subset of
>>> JSON-LD that is a direct mapping to the RDF Abstract Syntax, but no one
>>> has provided a compelling technical reason to do so.
>>>
>>> Additionally, creating two profiles of JSON-LD could have worse
>>> consequences than the ones you outline in your e-mails. For example,
>>> some implementers may only implement the subset and not the full version
>>> of JSON-LD, which would create a really bad interoperability problem.
>>
>> The issue here was about alignment: JSON-LD saying that URIs "SHOULD"
>> (RFC2119) be dereferenceable, while RDF makes no such requirement.
>
> Exactly, and still you argue that Linked Data === RDF.

No, I have never argued that they are synonymous.  Linked Data is RDF, 
but RDF is not necessarily Linked Data.

>
>> However, in discussing this at SemTech with Greg Kellogg and Arnaud,
>> Arnaud suggested that instead of defining a profile of JSON-LD that
>> drops the "SHOULD", it would be better to encourage RDF to *include*
>> such as a "SHOULD".  I think that's a great idea.
>
> I tried that already, see: https://www.w3.org/2011/rdf-wg/track/issues/103

Good.  We're on the same side on that.  :)

>
>
>> To be clear, I withdraw my suggestion that a separate profile of JSON-LD
>> be defined.
>
> Great.
>
>
>>> The extra features in JSON-LD, such as blank nodes as graph names, are a
>>> requirement for the Web Payments work as well as the RDF digital
>>> signatures work. So, we can't remove them without causing damage to
>>> those initiatives.
>>>
>>> If an author wants to use a version of JSON-LD that is fully grounded in
>>> the RDF data model, they should not use the JSON-LD features listed in
>>> those bullet points, or they should convert their non-RDF data to
>>> something that RDF can understand (more on this below).
>>
>> It sounds like my suggestion to use skolemized URIs to avoid that
>> problem was not understood, so I'll try to clarify.  The point is to
>> ensure that JSON-LD is fully grounded in the RDF model, so that JSON-LD
>> truly *is* an RDF serialization.  To achieve that in cases where a naive
>> mapping from the JSON-LD syntax to the RDF model would produce a blank
>> node in a position that RDF does not allow, I was suggesting that the
>> JSON-LD spec *normatively* state that skolemized URIs MUST be used in
>> the RDF model in those places.  I'll explain more below about how those
>> skolem URIs are chosen.
>>
>>>
>>>> 3. Use skolemized URIs in the normative mapping to prevent mapping
>>>> JSON syntax to illegal RDF.
>>>
>>> This is already stated as an option in a normative section:
>>>
>>> http://www.w3.org/TR/json-ld/#relationship-to-rdf
>>>
>>> We do not make this mandatory because there are several other legitimate
>>> ways to convert blank nodes to something that RDF can interpret. For
>>> example: 1) normalizing and getting a hash identifier for the subgraph
>>> attached to the blank node property or blank graph, 2) creating a
>>> counter-based solution for blank node naming, 3) minting a new global
>>> IRI for the blank node, 4) transforming to a data model that allows
>>> blank node properties and blank graphs, etc. There is no single correct
>>> approach.
>>
>> That doesn't matter, as my next comment below will explain.
>>
>>>
>>> Additionally, skolemization will not work unless all systems exchanging
>>> the skolem IRIs do so in a standard way, and there is currently no
>>> standard way of skolemizing.
>>
>> *Some* standardization is needed, but it does not need to specify all
>> details about the skolem URI.  All that's really important is that: (a)
>> a skolem URI somehow be created; and (b) such skolem URIs can be
>> reliably *recognized* as skolem URIs.  Beyond that, it doesn't matter if
>> some implementations use counters, some using hashing techniques and
>> some use other techniques.
>
> Another important aspect that's missing in your list above is that skolem
> IRIs have to be unique and that's exactly what makes it so difficult to
> create them in a distributed system.

First, that's the implementer's problem.  Second, as long as the specs 
indicate that URI squatting is not okay, and the specs are followed, I 
don't see why it should be difficult.  It is easy enough to implement a 
counter.

>
>
>> The RDF 1.1 spec does specify how skolem URIs can be created so that
>> they can be reliably recognized -- by use of a the .well-known
>> convention -- so the JSON-LD could reference this technique in
>> specifying how blank nodes are avoided in places where the RDF model
>> does not allow them.
>
> The general problem I have with this approach is that a skolem IRI just
> allows you to work around a limitation in a serialization format or a
> concrete implementation. In RDF, the data model, it is still a blank node
> (that's why it is important to be able to reliably recognize them). And
> again we conflate the data model and the serialization formats..

No, a skolem IRI emphatically is *not* a blank node in the RDF model. 
That's the whole point of using a skolem IRI instead of a blank node.

The reason it is important (in the JSON-LD case) to be able to 
recognized them as skolem IRIs is so that the those skokem IRIs can be 
removed when RDF is serialized (back) to JSON-LD.

>
>
>> To be clear, the current resolution of this point is NOT satisfactory.
>> Please further consider the suggestion of requiring skolem URIs in those
>> circumstances.
>
> Are skolem IRIs blank nodes or not according to you? If so, how does it help
> to require them?

A skolem IRI is an IRI, and an IRI is not a blank node, so a skolem IRI 
is not a blank node either.  The idea is to use skolem IRIs *instead* of 
blank nodes, in positions where blank nodes are not allowed in the RDF 
model.  This helps because otherwise certain JSON-LD documents would not 
be valid RDF.   If JSON-LD really is a serialization of RDF, then it 
needs to be valid RDF.

>
>
>>>> 4. Make editorial changes to avoid implying that JSON-LD is not RDF.
>>>>    For example, change "Convert to RDF" to "Convert to Turtle" or
>>>> perhaps "Convert to RDF Abstract Syntax".
>>>
>>> The group agrees with changing the title of the section to "Convert to
>>> RDF Abstract Syntax".
>>
>> Thank you.  But there are several other places also where the wording
>> implies that JSON-LD is not RDF.  Appendix C is rife with them. I
>> started to list them, but immediately ran into the problem that this
>> section -- particularly the part before C.1 -- needs to be rewritten
>> once JSON-LD is actually a normative serialization of RDF, and is fully
>> grounded in the RDF model.
>
> JSON-LD is not RDF. Turtle is neither. Both are serialization formats with a
> mapping to RDF, an abstract data model.
>
>
>> The whole discussion of the JSON-LD data model as distinct from the RDF
>> data model also suggests that JSON-LD is not RDF.  It is also confusing
>> to define a JSON-LD data model in addition to a JSON-LD document's RDF
>> data model.  This confusion will be eliminated by making JSON-LD a
>> normative serialization of RDF, fully grounded in the RDF model.
>
> We added the Data Model section since the RDF WG asked us to do so. I don't
> see compelling reasons to revisit that decision.

I think the confusion that it adds, of having two 
separate-but-very-similar data models, is a compelling reason to revisit 
that decision.

Thanks,
David
Received on Saturday, 8 June 2013 18:28:11 UTC