From RDF Working Group Wiki
Revision as of 23:25, 10 March 2011 by Nrixham (Talk | contribs)

Jump to: navigation, search


JSON RDF Task Force

The JSON RDF Task Force is primarily responsible for creating a JSON serialization of RDF.


Materials from RDF Next Step WorkShop

  • Allows web authors (Javascript, HTML5, ... developers) more easily use rdf data with existing tools and techniques
  • Multiple JSON formats and implementations (some interoperable) already exist showing interest in this work
  • Current JSON formats are not aligned - differnent approaches - making it JSON-user friendly versus making it familiar to existing RDF users.
  • Needs some R&D and alignment.
  • Risk that the result would be some standard that would not be adopted if it was not 'web author' friendly.


  • JSON Serialization of RDF

Questions to Contemplate

  1. What are the use cases for the JSON serialization?
  2. Are we to create a lightweight JSON based RDF interchange format optimized for machines and speed, or an easy to work with JSON view of RDF optimized for humans (developers)?
  3. Is it necessary for developers to know RDF in order to use the simplest form of the RDF-in-JSON serialization?
  4. Should we attempt to support more than just RDF? Key-value pairs as well? Literals as subjects?
  5. Must RDF in JSON be 100% compatible with the JSON spec? Or must it only be able to be read by a JavaScript library and thus be JSON-like-but-not-compatible (and can thus deviate from the standard JSON spec)?
  6. Must all major RDF concepts be expressible via the RDF in JSON syntax?
  7. Should we go more for human-readability, or terse/compact/machine-friendly formats? What is the correct balance?
  8. Should there be a migration story for the JSON that is already used heavily on the Web? For example, in REST-based services?
  9. Should processing be a single-pass or multi-pass process? Should we support SAX-like streaming?
  10. Should there be support for disjoint graphs?
  11. Should we consider how the structure may be digitally signed?
  12. How should normalization occur?
  13. Should graph literals be supported?
  14. Should named graphs be supported?
  15. Should automatic typing be supported?
  16. Should type coercion be supported?
  17. Should there be an API defined in order to easily map RDF-in-JSON to/from language-native formats?

RDF in JSON Use Cases

Migrating to Semantic Web Services

Frank runs a website that provides Web Services via a REST-based API that supports JSON. He would like developers using his system to be able to easily post and get RDF data RESTfully via his Web Services. He wants to make sure that the data that is exchanged looks very much like the JSON data that is passed to and from the Web Services that he already provides. He wants to make sure that developers can utilize the current JSON-based tools and workflows, perhaps with a tiny library for the serializations he uses, but still ensure that he can add semantics to that data in a way that is easy to explain to his Web Service customers.

Generalized storage of Semantic Data in Web Services

ACME Corp is operating a website with a JSON API. They want to give users the ability to store arbitrary additional data alongside certain objects managed via the API. For example, when a user account is created via the API, the client app should be able to also submit a digital signature or “My upcoming trips” data. The client app would be able to use that data on subsequent requests. To avoid accidental clashes between fields used by different client apps, ACME Corp wants to use RDF as the data model. Nevertheless, they want to keep the impact on the existing JSON API and existing clients to a minimum.

Developing a Javascript application that interacts with a graph store

Herbert is developing a Javascript application that interacts with an RDF store. He wants to be able to easily PUT, POST and GET RDF data RESTfully using the SPARQL RDF Dataset HTTP Protocol. Since he is working in Javascript, he wants to be able to send data to a graph store using JSON to represent the RDF data.

Expose a service that internally uses RDF in a JSON-friendly way

Stacy operates several Web Services. She designed the data that is sent and received by her Web Services in a way that maps very easily to RDF. She wants to be able to take the data that she is already publishing and transform it into RDF for internal use. She wants to be able to do this without impacting the developers that are currently using her system.

She also wants to be able to give the developers that care about RDF a data model that maps to RDF well. She would like to support both regular JSON developers and semantic web JSON developers at the same time via her JSON-based Web Services API.

Digital Signatures on Graphs

Graeme would like to publish assets for sale on his website via a JSON-based Web Services API. He would like this data to be cached on third party sites without the pricing information being changed or forged. He accomplishes this by digitally signing the graph of information that he publishes such that search engines and other caching mechanisms can relay the information without needing to directly access his site. By cryptographically signing the graph, he is also ensuring that information about the asset, including pricing information, cannot be changed or forged to different values.

Universal Payment Standard for the Web

The PaySwarm Web platform is an open web standard that enables Web browsers and Web devices to perform Universal Web Payment. The nascent standard is using a form of RDF in JSON extensively in order to support distributed listing of assets, description of licenses and digital contracts, and digital signatures on graphs of RDF information. Information is published via HTML+RDFa and then used in JSON-form when transmitted to and from PaySwarm-aware Web Services.

Treating JSON data as RDF for use in Data Spaces

In data integration scenarios it can be useful to “crawl” a JSON API, and reflect the crawled data in an RDF expression that can then be stored in a SPARQL store and further refined/mapped/linked with other RDF. A main challenge in “crawling” JSON APIs in a generic way is the question how to find/construct new URIs to GET from the first JSON response, as this often requires API specific knowledge, such as which fields contains URIs and which templates should be used to construct URIs from field values. Ideally, such URIs would be captured in the RDF representation so that the RDF representation captures the “link structure” in the original JSON. In this use case, producing “idiomatic” RDF that uses proper vocabularies etc from the JSON is perhaps not realistic; the structure of the produced RDF would closely reflect the JSON object structure, and vocabulary terms would be local to the API.

Pulling In Data From An External Linked Data Service

Joe has had a Justin Bieber fan site hosted on GeoCities forever. After GeoCities got shut down, he first considered migrating his content onto a Facebook fan site, however, having heard of the Semantic Web's recent success stories, decided to run his own site, get some server space, install WordPress, install a WordPress RDFa plugin, and be happy. From reading an article on W3Schools, he knows that if he writes...

<div xmlns:foaf="" about="">
  Justin Bieber's birthday is on
  <span property="foaf:birthday" content="1994-03-01">March 1, 1994</span>			

...he makes a statement about the Justin Bieber. At some point he decides to create a Justin Bieber images widget for his site with content from that cool new Semantic Web image site where he can retrieve semantically annotated images using their HTTP API like so:


This API returns data in application/rdf-however-we-call-it+json, so he can simply JSON.parse the result, and use it like so:

var results = JSON.parse(responseText);
results.images.forEach(function(img) {
  // build HTML

He loves this API, because it is easy to work with in jQuery, plus the returned JSON code is easy to understand by just looking at it.

Access CONSTRUCT/DESCRIBE query results from JSON apps

SPARQL provides a JSON format for SELECT and ASK queries: currently a a W3C WG NOTE.

A JSON serialization of RDF would make the results of CONSTRUCT (and DESCRIBE) queries accessible as JSON-consuming clients.

Traditional JSON API over an RDF store

Pablo is developing a new web service. He has recently started to explore RDF, and would like to build the new service with an RDF store as the backend. However, all his other services have JSON APIs. He is concerned about alienating his customers by offering only RDF and SPARQL interfaces. He would like a solution that allows him to expose a JSON-based API on top of the RDF store. It should be similar to his other APIs and feel familiar to his users. On the other hand, he also wants to expose the full RDF data model to allow those with the right tooling to make maximum use of his data.

View as RDF

Eli has an existing popular HTTP API which returns JSON and has many users, he has recently learned of the benefits of the semantic web and linked data, and would like to provide a way for users to see his data as RDF, without changing his entire heavily invested in technology stack or breaking backwards compatibility on his API. Eli can only make minor tweaks, such as adding http based id's to the objects the API returns, and providing a map which relates object property names to well known RDF properties.

RDF in JSON Design Requirements

There should be two serialization formats

There should be a machine-friendly serialization format and there should be a human-friendly serialization format.

  • -1 Manu Sporny, given the limited time for this working group, I think we should focus on the human-friendly serialization format. RDF already has a number of machine-friendly serialization formats.
  • +1 Andy. A simple "s", "p", "o" format is not the same amount of work as a human-friendly form. See SPARQL JSON result format
  • 0 Lee. I'd worry about the WG's available time and resources.
  • +1 Nathan if possible.
  • -0 Matteo Brunati not enough time maybe
  • -0 Chris Matheus not a priority
  • -1 Thomas Steiner Can we avoid this? One format to rule them all.

A primary goal SHOULD be to build a human-friendly version of the serialization for JSON developers

The serialization should be optimized for humans first, machines second. The ability for machines to quickly parse the file is secondary to the ability for developers to be able to use the serialization with JavaScript. A focus should be placed on making the serialization fit into JavaScript frameworks easily, even at the cost of JSON-LD processor implementation complexity.

  • +1 Manu Sporny
  • -1 Lee. Given the existing work in the RDFa group on an API, I'd rather see a simple, machine-friendly format that implementations can then make available via an API. I'm not convinced that a standard human-friendly JSON format is a big win.
  • -0 Andy Different uses cases lead to different design tradeoffs. (e.g LDA is a tree; ideal for them, bad for different uses.)
  • +1 Nathan but only if the product can be considered simple JSON objects (k/v objects with a subject set) and the caveat is recognized that by not requiring an RDF toolkit or understanding of properties, inference etc, the data isn't really RDF... it's RDF-able - else -1, waste of time.
  • +1 Matteo Brunati +1 Nathan observations
  • +1 Chris Matheus extremely helpful for users new to RDF
  • +1 Thomas Steiner Yes, please! Make it easy for developers to write RDF in JSON.

A primary goal SHOULD be to build a machine-optimized version of the serialization

The serialization should be optimized for machines first, humans second. The ability to use the serialization in JavaScript is secondary to the ability for machines to quickly parse the file. A focus should be placed on making implementations very easy to write.

The serialization SHOULD be able to transform most JSON in use today into RDF

There should be a flexible mechanism, such as a "context", that is capable of mapping from JSON key-value pairs to RDF triples. This mechanism could be specified either in-band or out-of-band from the serialization. Having this feature could map much of the existing JSON in the wild into RDF.

  • +1 Manu Sporny
  • -1 Lee. Seems out-of-scope; do existing RDF-in-JSON solutions already have such mechanisms?
  • -1 Andy The original data was not written to be used in this way.
  • +1 Nathan Assuming we're still talking two serializations, then this would be very valuable, for twitter to be able to say here's our data, view it as simple objects or rdf graphs; although I'm unsure we can get there without a common vision across the water.
  • -1 Matteo Brunati +1 to Andy, it's not in the original usage of the data
  • -1 Chris Matheus nice to have but should not consume this team's resources
  • 0 Thomas Steiner Time permitting, not a top priority IMHO.

Developers do not need to be familiar at all with RDF to start using the serialization

Understanding the semantic web and the concepts of RDF (triples, graphs, etc.) should not be required in order to use the format. That means that the format may have a very simple, stripped down version for beginners and a more advanced set of features for semantic web enthusiasts.

  • +1 Manu Sporny
  • +1 Nathan only if two serializations, and as per previous comments.
  • -1 Richard Cyganiak I think I disagree. If you don't want to expose developers to RDF at all, then why not just use vanilla JSON? Also I don't understand how the beginner/advanced thing should work. A server will have to generate the one or the other, so it's not like client-side developers get to choose which version they want to be exposed to.
  • -1 Matteo Brunati I think a minimal semweb context is necessary: thinking on SIMILE Exhibit framework. It's not simple to use without a prior knowledge of the model.
  • 0 Chris Matheus some very basic knowledge may be important but deep knowledge should not be required
  • -1 Thomas Steiner People should at /least/ have an understanding of triples, that's enough for most use cases.

The serialization MAY include features not in RDF

There are certain features, such as generic key-value pairs in JSON that do not map well to RDF. They would map well if RDF had a concept of plain literals in the subject or predicate position. The serialization could include these concepts but may specify that the values may not be serialized to all RDF serialization formats (such as RDF/XML, TURTLE or RDFa).

  • +1 Manu Sporny
  • -1 Andy creates an incompatible sub-community of applications.
  • +1 Nathan useful for allowing "junk" data like debugging info and session tokens, again only if two serializations.
  • -1 Richard Cyganiak as per Andy. Generic key-value pairs can be translated to <> <#key> "value" or somesuch.
  • -1 Matteo Brunati as for Andy. making a default rule to the generic key-value stuff
  • -1 Chris Matheus shouldn't spend time on this
  • -1 Thomas Steiner Strong no. Stay compatible with RDF by all means.

The serialization MUST be 100% compatible with the JSON spec

Additional features such as comments or short-hand notation to support datatypes could be supported in the serialization if we extended the JSON format. This would mean that the serialization would be incompatible with vanilla JSON readers and writers. While this may make serialization nicer, we should not make any additions/modifications to the JSON format to ensure maximum compatibility with pre-existing processors.

It is a requirement that all RDF concepts MUST be expressible in the serialization

There are concepts like RDF datatypes and g-snaps/graph literals that could be omitted from the serialization in order to reduce learning and implementation complexity.

  • -1 Manu Sporny, Good design is a balancing act - we should only include what will help the most number of people.
  • +1 Lee. I'd hesitate to say "all", but in general, a JSON RDF serialization would not be useful to us unless it was as much a 1st-class serialization of the RDF model as turtle, RDF/XML, etc.
  • +1 Andy for the machine-friendly form to work with non-JSON apps and systems.
  • -1 Andy for the human-friendly form but the features dropped will vary from usage to usage.
  • +1 Nathan for machine (rdf in json)
  • -1 Nathan for human (rdf-able json objects)
  • -1 Chris Matheus not for this round
  • +0.8 Matteo Brunati probably yes, but not this time maybe, too complexity?
  • -1 Thomas Steiner Easy things should be easy and hard things should be possible. Keep the entry barrier low (inferred types), but allow the experts to do crazy things.

There should be a migration story for going from existing JSON in the wild to this new format

The serialization task force should ensure that there is a subset of the serialization that is useful to beginners that use pure JSON, then show how developers could sprinkle in a little RDF into their JSON, then show how developers can fully migrate to the new serialization format. The transition to the serialization format will probably take multiple years The transition should be as smooth and organic as possible. We should also understand that many may not need to transition to RDF - JSON may work just fine for their application. We should not assume that people will go straight from regular JSON to the new serialization format.

Memory usage and CPU usage while processing SHOULD be a primary consideration

Memory and CPU usage for processing JSON is low. We should ensure that processing the serialization format is only slightly more complex than processing regular JSON.

  • +0 Manu Sporny, we want to be cognizant of resource usage but I don't think this should be a primary driver for design decisions for the language.
  • -1 Lee. Seems like an implementation detail to me.
  • -1 Andy (NB: JSON structures are read entirely into memory before the application gets to see them.)
  • +0.5 Nathan there is a balance between memory and processing to be struck, ntriples = more byte, turtle = more processing, same considerations for JSON.
  • -1 Chris Matheus
  • -1 Thomas Steiner IMHO if you need the ultimate performance, use, e.g., N-Triples, readability should have a higher priority, personally speaking.

The design target is small snippets of RDF Data

"small" might be less than 1 million triples, not 10.

  • +1 Andy
  • 0 Nathan two different considerations for machine or human, I'd say under 10k for human, over and beyond for machine
  • +1 Thomas Steiner For huge dumps use, e.g., N-Triples IMHO.

Design target: graphs or resources

A human friendly JSON format can be designed more towards graphs (multiple subjects) or more targeted on just describing one resource (subject). This is not to exclude one possibility over the other - this is to decide the focus.

  • graphs Andy
  • machine: graphs, human: resource Nathan
  • graphs Manu Sporny, but I don't think we'll need to choose between the two if we're smart about it. For instance, JSON-LD allows expressing graphs just as easily as expressing resources.
  • graphs Chris Matheus
  • resources Thomas Steiner

The serialization MUST support disjoint/unconnected graphs

All current RDF serialization formats allow you to express two graphs that are not necessarily connected to one another. The new serialization format should allow the same mechanism. This is also important because normalization is difficult to achieve in a general way without also supporting disjoint graphs in the serialization. JSON-LD disjoint graphs example.

  • +1 Manu Sporny
  • +1 Andy One graph with two+ disjoint components per serialization
  • +0 Andy Multiple graphs per serialziation. No more than follow work in other TFs.
  • +1 Nathan as per andy's comments
  • +1 Chris Matheus

The serialization MUST provide a normalization algorithm

Normalization, also known as canonicalization, is typically used when determining whether two sub-graphs that are expressed in different ways are identical. It is also very useful when hashing sub-graphs for checksumming or digital signature purposes. JSON-LD normalization example.

  • +1 Manu Sporny, I think we need normalization because we need to have a good digital signatures story
  •  ? Andy. Unclear - are we signing the graph or the serialization? Is a Turtle-signed graph the same graph? Would it include IRI normalization?
  • +0 Nathan
  • +0 Chris Matheus highly desirable if there's time
  • +1 Thomas Steiner Time permitting

The serialization SHOULD enable digital signatures

Digital Signatures have a number of useful purposes. When combined with g-snaps/graph literals they provide a very easy way of establishing cryptographically verifiable provenance. These features are used heavily in electronic commerce. JSON-LD digital signature example.

The serialization SHOULD support advanced graph concepts

The serialization format should support advanced graph concepts such as g-box, g-snap and g-text such that you can make statements about snapshots of graphs. Annotating graphs with metadata such as graph retrieval time, digital signatures on the contents of the graph, and other metadata associated with graphs are an important feature for higher-level concepts like provenance. Sandro's explanation of advanced graph concepts.

  • +1 Manu Sporny
  • -1 Richard Cyganiak Has security implications for RDF crawlers; requires larger API surface; SPARQL only returns single graphs anyways; use case is unclear
  • -1 Andy Not unless the format is following standard work done in other TFs.
  • +0.5 Nathan follow other TFs
  • 0 Matteo Brunati too problematic probably, +1 Richard notes
  • -0 Chris Matheus not this round unless the Graph TF results happens quickly and their incorporation is straight forawrd

The serialization MUST support automatic typing

Being able to transform a JSON document into a native object is one of the key benefits of using JSON over other serialization formats. Automatically typing of numbers and boolean values into language-native datatypes removes an extra step that developers must perform without this feature. For example, one could easily transform a serialized number that is an xsd:integer into a language-native integer. JSON-LD automatic typing example.

The serialization SHOULD support type coercion

While not immediately obvious, type coercion allows one to map regular JSON into RDF in a way that may add datatype decorators to object literals. In other words, it provides for a way to get Typed Literals from regular JSON data. JSON-LD type coercion example.

  • +1 Manu Sporny
  • +1 Nathan for human one, -1 for machine one
  • +1 Thomas Steiner Yes, as this is what people (IMHO) expect, and it keeps the entry barrier low. Still allow for overriding.

The serialization SHOULD rely on microsyntaxes instead of nested structures

There are two common approaches to expressing RDF in JSON. One of them is to use nested structures to express language and type information for literals. The other approach is to use shallow structures with microsyntaxes mirroring TURTLE to express language and type information for literals.

The serialization SHOULD provide an API

An API would allow developers to transform incoming documents into a format that is easier for them to work with. In other words, it would allow them to drop all type information if it wasn't useful to them, or remove any micro-syntaxes that would get in the way of basic usage of the data. Keep in mind that even JSON has an api: JSON.parse(). JSON-LD API example.

(?? Reword as: The serialization SHOULD assume working with a JavaScript RDF API (Andy))

  • +1 Manu Sporny
  • -1 Nathan the machine one will have the RDF API, the human one is pointless if it needs and API.
  • +1 Matteo Brunati as Andy said, working with an API ( are there other WG are working on that or not? )
  • -1 Chris Matheus not this round
  • -1 Thomas Steiner The JSON is the API, we just need to make it easy enough.

There SHOULD be one and only one way to serialize a given triple

The more different ways there are to express the same triple or graph, the harder it gets to use the host language's native toolbox (that is, pure JS expressions) to process data. At some point, using the host language becomes impossible without using a parser library layered on top of the host language, negating the benefit of basing the language on JSON in the first place. (Note, this is about using different JSON structures to express the same triple; not about different triples expressing the same statement in RDF Semantics, like "foo" vs "foo"^^xsd:string).