Re: Fw: RDF/JSON from Sandro Hawke on 2013-04-27 (public-rdf-wg@w3.org from April 2013)

From: Sandro Hawke <sandro@w3.org>
Date: Fri, 26 Apr 2013 23:17:34 -0400
To: Arnaud Le Hors <lehors@us.ibm.com>
CC: "public-rdf-wg@w3.org" <public-rdf-wg@w3.org>, Martin Nally <martin.nally@gmail.com>
Message-ID: <517B434E.4010206@w3.org>
On 04/25/2013 12:11 PM, Arnaud Le Hors wrote:
> My colleague Martin Nally sent a response to Markus but because he's 
> not subscribed to the list his message was put on hold for moderation. 
> I thought this would only take a day or two but it's not happened so 
> I'm forwarding his message to avoid any further delay. I suggest you 
> copy him in any response though.
> Thanks.


Sorry about this - only WG members can post to the WG list; everyone 
else is supposed to post to public-rdf-comments.  That's a bit of pain, 
but (ironically) it's supposed to help us make sure we're not forgetting 
to reply to any comments.   The main failing is the reply that's sent 
says something else; I've put in a request to change that.

On the substance of the comment -- I find it rather compelling and am 
not sure how to proceed.   The bit about "Databases like to query on 
predicates" was particularly interesting....   An odd contrast to direct 
access from JS.

Arnaud/Martin, how ideally would you like to proceed on this?   Just 
make RDF/JSON a note?   Or something else?    Should the two be aligned 
in those "details" where they differ?  On the point about documentation, 
would it help if there were a simple, mostly self-contained section of 
the JSON-LD spec you could point to?

      -- Sandro

> --
> Arnaud  Le Hors - Software Standards Architect - IBM Software Group
>
> ----- Forwarded by Arnaud Le Hors/Cupertino/IBM on 04/25/2013 09:01 AM 
> -----
>
> From: Martin Nally <martin.nally@gmail.com>
> To: Arnaud Le Hors <lehors@us.ibm.com>
> Date: 04/17/2013 12:21 PM
> Subject: Fwd: RDF/JSON
> ------------------------------------------------------------------------
>
>
> Markus,
>
> Thanks for your careful response to my code samples trying to explain 
> why RDF/JSON has been working better for us than JSON-LD. Let me first 
> give some background. The applications we have been writing have 
> logic-tier servers written in python (and some ruby) with 
> user-interfaces written in javascript running in the browser on PCs 
> and mobile devices. The JSON we are discussing is the JSON that flows 
> between the python/ruby and the javascript. Both sides are RDF-aware, 
> and many of our resources have multiple subjects. The JSON forms the 
> primary API of the servers, although you can also ask for the same 
> data in turtle or rdfa (we use the rdfa format for some use-cases in 
> our applications, but we are not currently using turtle). We do not 
> have a very long or broad experience - we are a prototyping team, not 
> (yet) a product team, and we have built up no more than a few of 
> thousand lines of code in these applications.
>
> Our JSON was originally in JSON-LD format, in the sense that the JSON 
> we produced and consumed was valid JSON-LD, but we only produced and 
> consumed a very restricted subset of JSON-LD. For example, we did not 
> support contexts. What we used from JSON-LD was the basic organization 
> that can be perhaps be summarized like this: [{'@id': S, P: O}]. I 
> hope that is clear - it is an array of "dictionaries" where one 
> dictionary entry is '@id' for the subject, and the other dictionary 
> entries provide the predicates and objects. The O is itself an array 
> of dictionaries, with the only valid keys in the dictionaries being 
> '@id', '@value' and '@type'.
>
> We spent a few months building code this way, and it worked OK. We 
> also stored this format in MongoDB and queried on it successfully. I 
> cannot remember exactly what triggered the decision, but about 2 
> months ago, I decided to convert over to RDF/JSON format. Since 
> RDF/JSON has almost no options - in contrast with JSON-LD, which has 
> many - you don't need me to explain what we did much further, but for 
> completeness I will say simply that we moved to this data 
> organization: {S: { P: O}}. The O in RDF/JSON is very similar to the 
> JSON-LD version of O we used before - it differs only in the detail.
>
> My experience with the port is that the complexity of our code came 
> down substantially. I would guess the code that manipulates RDF is now 
> in general maybe only two thirds or three-quarters of what it had 
> previously been. Even where the complexity is similar, we now have the 
> advantage of using simpler language constructs (primarily dictionary 
> access like a[b]), where before we used a helper method. Not all our 
> helper methods are gone - for example we still have a helper method 
> whose meaning is "for a given predicate and object, return all the 
> subjects". It is possible that the simplification we experienced was 
> particular to our data. I think you are correct in saying that if our 
> representations were all single-subject, then we would not have found 
> RDF/JSON to be better - the JSON-LD data organization would have been 
> as good or better. Other than the fact that our resources often have 
> multiple subjects, I doubt there is anything very special in what we 
> are doing. I had hoped that my code samples might help explain why our 
> code got simpler, but I can see now that probably won't work.
>
> Interestingly, RDF/JSON turned out not to be helpful at all in JSON 
> databases. Databases like to query on predicates, and JSON-LD's 
> approach of making the subject be the value of the '@id' predicate has 
> worked much better for us, so our database format is still a very 
> restricted but valid JSON-LD format. (We have also used triples stores 
> - that is another conversation.) We have gone back and forth on 
> whether we prefer JSON-LD's format for the 'O' part or RDF/JSON's 
> version. We have seen advantages to both. When we put our JSON into 
> Lucene, JSON-LD's version of the O format was more convenient, because 
> it allowed us to easily tell Lucene to handle URLs differently from 
> regular strings. When we put the data into MongoDB, we did not need 
> this, and RDF/JSON's version of the O part looked better because it 
> did not require query writers to remember to use '@ID' for URL-valued 
> properties and '@value' for everything else - you just always use 
> 'value'. This is particularly helpful if you don't know what type you 
> are querying on. Right now, we are using both 'O' formats, one for 
> MongoDB and the other for Lucene.
>
> In addition to the code simplification we saw when we converted to 
> RDF/JSON for the API, we also saw a reduction in specification 
> complexity that may ultimately be more important. Remember that our 
> JSON is our API and so we need to document it. One approach is to 
> document it ourselves without reference to any external specification 
> whatever. In that case, documenting [{'@id': S, P: O}] is only 
> slightly more complex than documenting {S: { P: O}}. However, we would 
> really rather not document this ourselves - we would rather point to a 
> specification. For {S: { P: O}} we can easily point to the RDF/JSON 
> spec - the whole spec almost fits on a page, and since RDF/JSON really 
> has no options, there is little to say about our usage patterns of it. 
> By contrast, JSON-LD is a complex specification with many options. If 
> we were to reference the JSON-LD spec in our API documentation, not 
> only would we be referencing a relatively large and complex spec, we 
> would then have to add yet more information to document the very 
> restricted subset we actually use. We get no value from the parts of 
> JSON-LD we do not use and we have no interest in allowing clients to 
> give us arbitrary JSON-LD, or in producing it, either now or in the 
> future. Referencing the JSON-LD spec in our API documentation is not 
> an attractive option.
>
> I hope this helps explain our usage of RDF/JSON and JSON-LD and our 
> experiences with them.
>
> Best wishes, Martin
Received on Saturday, 27 April 2013 03:17:42 UTC