TF-JSON

From RDF Working Group Wiki
Revision as of 11:55, 9 March 2011 by Mbrunati (Talk | contribs)

Jump to: navigation, search

Contents

JSON RDF Task Force

The JSON RDF Task Force is primarily responsible for creating a JSON serialization of RDF.

Inputs

Materials from RDF Next Step WorkShop

Pros
  • Allows web authors (Javascript, HTML5, ... developers) more easily use rdf data with existing tools and techniques
  • Multiple JSON formats and implementations (some interoperable) already exist showing interest in this work
Cons
  • Current JSON formats are not aligned - differnent approaches - making it JSON-user friendly versus making it familiar to existing RDF users.
  • Needs some R&D and alignment.
  • Risk that the result would be some standard that would not be adopted if it was not 'web author' friendly.

Deliverables

  • JSON Serialization of RDF

Questions to Contemplate

  1. What are the use cases for the JSON serialization?
  2. Are we to create a lightweight JSON based RDF interchange format optimized for machines and speed, or an easy to work with JSON view of RDF optimized for humans (developers)?
  3. Is it necessary for developers to know RDF in order to use the simplest form of the RDF-in-JSON serialization?
  4. Should we attempt to support more than just RDF? Key-value pairs as well? Literals as subjects?
  5. Must RDF in JSON be 100% compatible with the JSON spec? Or must it only be able to be read by a JavaScript library and thus be JSON-like-but-not-compatible (and can thus deviate from the standard JSON spec)?
  6. Must all major RDF concepts be expressible via the RDF in JSON syntax?
  7. Should we go more for human-readability, or terse/compact/machine-friendly formats? What is the correct balance?
  8. Should there be a migration story for the JSON that is already used heavily on the Web? For example, in REST-based services?
  9. Should processing be a single-pass or multi-pass process? Should we support SAX-like streaming?
  10. Should there be support for disjoint graphs?
  11. Should we consider how the structure may be digitally signed?
  12. How should normalization occur?
  13. Should graph literals be supported?
  14. Should named graphs be supported?
  15. Should automatic typing be supported?
  16. Should type coercion be supported?
  17. Should there be an API defined in order to easily map RDF-in-JSON to/from language-native formats?

RDF in JSON Use Cases

RDF REST Web Services

Frank wants to be able to easily post and get RDF data RESTfully via Web Services. He wants to make sure that the data that is exchanged looks very much like the JSON data that is passed to and from popular services like Twitter's API. He wants to utilize the current JSON-based tools and workflows that he uses for all of his other data on the Web, but add semantics to that data in a way that is easy to explain to his fellow developers.

Developing a Javascript application that interacts with a graph store

Herbert is developing a Javascript application that interacts with an RDF store. He wants to be able to easily PUT, POST and GET RDF data RESTfully using the SPARQL RDF Dataset HTTP Protocol. Since he is working in Javascript, he wants to be able to send data to a graph store using JSON to represent the RDF data.

Expose a service that internally uses RDF in a JSON-friendly way

Stacy operates several Web Services. She designed the data that is sent and received by her Web Services in a way that maps very easily to RDF. She wants to be able to take the data that she is already publishing and transform it into RDF for internal use. She wants to be able to do this without impacting the developers that are currently using her system.

She also wants to be able to give the developers that care about RDF a data model that maps to RDF well. She would like to support both regular JSON developers and semantic web JSON developers at the same time via her JSON-based Web Services API.

Digital Signatures on Graphs

Graeme would like to publish assets for sale on his website via a JSON-based Web Services API. He would like this data to be cached on third party sites without the pricing information being changed or forged. He accomplishes this by digitally signing the graph of information that he publishes such that search engines and other caching mechanisms can relay the information without needing to directly access his site. By cryptographically signing the graph, he is also ensuring that information about the asset, including pricing information, cannot be changed or forged to different values.

Universal Payment Standard for the Web

The PaySwarm Web platform is an open web standard that enables Web browsers and Web devices to perform Universal Web Payment. The nascent standard is using a form of RDF in JSON extensively in order to support distributed listing of assets, description of licenses and digital contracts, and digital signatures on graphs of RDF information. Information is published via HTML+RDFa and then used in JSON-form when transmitted to and from PaySwarm-aware Web Services.

RDF in JSON Design Requirements

There should be two serialization formats

There should be a machine-friendly serialization format and there should be a human-friendly serialization format.

  • -1 Manu Sporny, given the limited time for this working group, I think we should focus on the human-friendly serialization format. RDF already has a number of machine-friendly serialization formats.
  • +1 Andy. A simple "s", "p", "o" format is not the same amount of work as a human-friendly form. See SPARQL JSON result format
  • 0 Lee. I'd worry about the WG's available time and resources.
  • +1 Nathan if possible.
  • -0 Matteo Brunati not enough time maybe

A primary goal SHOULD be to build a human-friendly version of the serialization for JSON developers

The serialization should be optimized for humans first, machines second. The ability for machines to quickly parse the file is secondary to the ability for developers to be able to use the serialization with JavaScript. A focus should be placed on making the serialization fit into JavaScript frameworks easily, even at the cost of JSON-LD processor implementation complexity.

  • +1 Manu Sporny
  • -1 Lee. Given the existing work in the RDFa group on an API, I'd rather see a simple, machine-friendly format that implementations can then make available via an API. I'm not convinced that a standard human-friendly JSON format is a big win.
  • -0 Andy Different uses cases lead to different design tradeoffs. (e.g LDA is a tree; ideal for them, bad for different uses.)
  • +1 Nathan but only if the product can be considered simple JSON objects (k/v objects with a subject set) and the caveat is recognized that by not requiring an RDF toolkit or understanding of properties, inference etc, the data isn't really RDF... it's RDF-able - else -1, waste of time.
  • +1 Matteo Brunati +1 Nathan observations

A primary goal SHOULD be to build a machine-optimized version of the serialization

The serialization should be optimized for machines first, humans second. The ability to use the serialization in JavaScript is secondary to the ability for machines to quickly parse the file. A focus should be placed on making implementations very easy to write.

The serialization SHOULD be able to transform most JSON in use today into RDF

There should be a flexible mechanism, such as a "context", that is capable of mapping from JSON key-value pairs to RDF triples. This mechanism could be specified either in-band or out-of-band from the serialization. Having this feature could map much of the existing JSON in the wild into RDF.

  • +1 Manu Sporny
  • -1 Lee. Seems out-of-scope; do existing RDF-in-JSON solutions already have such mechanisms?
  • -1 Andy The original data was not written to be used in this way.
  • +1 Nathan Assuming we're still talking two serializations, then this would be very valuable, for twitter to be able to say here's our data, view it as simple objects or rdf graphs; although I'm unsure we can get there without a common vision across the water.

Developers do not need to be familiar at all with RDF to start using the serialization

Understanding the semantic web and the concepts of RDF (triples, graphs, etc.) should not be required in order to use the format. That means that the format may have a very simple, stripped down version for beginners and a more advanced set of features for semantic web enthusiasts.

  • +1 Manu Sporny
  • +1 Nathan only if two serializations, and as per previous comments.
  • -1 Richard Cyganiak I think I disagree. If you don't want to expose developers to RDF at all, then why not just use vanilla JSON? Also I don't understand how the beginner/advanced thing should work. A server will have to generate the one or the other, so it's not like client-side developers get to choose which version they want to be exposed to.
  • -1 Matteo Brunati I think a minimal semweb context is necessary: thinking on SIMILE Exhibit framework. It's not simple to use without a prior knowledge of the model.

The serialization MAY include features not in RDF

There are certain features, such as generic key-value pairs in JSON that do not map well to RDF. They would map well if RDF had a concept of plain literals in the subject or predicate position. The serialization could include these concepts but may specify that the values may not be serialized to all RDF serialization formats (such as RDF/XML, TURTLE or RDFa).

  • +1 Manu Sporny
  • -1 Andy creates an incompatible sub-community of applications.
  • +1 Nathan useful for allowing "junk" data like debugging info and session tokens, again only if two serializations.
  • -1 Richard Cyganiak as per Andy. Generic key-value pairs can be translated to <> <#key> "value" or somesuch.
  • -1 Matteo Brunati as for Andy. making a default rule to the generic key-value stuff

The serialization MUST be 100% compatible with the JSON spec

Additional features such as comments or short-hand notation to support datatypes could be supported in the serialization if we extended the JSON format. This would mean that the serialization would be incompatible with vanilla JSON readers and writers. While this may make serialization nicer, we should not make any additions/modifications to the JSON format to ensure maximum compatibility with pre-existing processors.

It is a requirement that all RDF concepts MUST be expressible in the serialization

There are concepts like RDF datatypes and g-snaps/graph literals that could be omitted from the serialization in order to reduce learning and implementation complexity.

  • -1 Manu Sporny, Good design is a balancing act - we should only include what will help the most number of people.
  • +1 Lee. I'd hesitate to say "all", but in general, a JSON RDF serialization would not be useful to us unless it was as much a 1st-class serialization of the RDF model as turtle, RDF/XML, etc.
  • +1 Andy for the machine-friendly form to work with non-JSON apps and systems.
  • -1 Andy for the human-friendly form but the features dropped will vary from usage to usage.
  • +1 Nathan for machine (rdf in json)
  • -1 Nathan for human (rdf-able json objects)

There should be a migration story for going from existing JSON in the wild to this new format

The serialization task force should ensure that there is a subset of the serialization that is useful to beginners that use pure JSON, then show how developers could sprinkle in a little RDF into their JSON, then show how developers can fully migrate to the new serialization format. The transition to the serialization format will probably take multiple years The transition should be as smooth and organic as possible. We should also understand that many may not need to transition to RDF - JSON may work just fine for their application. We should not assume that people will go straight from regular JSON to the new serialization format.

  • +1 Manu Sporny
  • +1 Nathan for human rdf-able json object serialization, if we can get there.

Memory usage and CPU usage while processing SHOULD be a primary consideration

Memory and CPU usage for processing JSON is low. We should ensure that processing the serialization format is only slightly more complex than processing regular JSON.

  • +0 Manu Sporny, we want to be cognizant of resource usage but I don't think this should be a primary driver for design decisions for the language.
  • -1 Lee. Seems like an implementation detail to me.
  • -1 Andy (NB: JSON structures are read entirely into memory before the application gets to see them.)
  • +0.5 Nathan there is a balance between memory and processing to be struck, ntriples = more byte, turtle = more processing, same considerations for JSON.

The design target is small snippets of RDF Data

"small" might be less that 1 million triples, not 10.

  • +1 Andy
  • 0 Nathan two different considerations for machine or human, I'd say under 10k for human, over and beyond for machine

Design target: graphs or resources

A human friendly JSON format can be designed more towards graphs (multiple subjects) or more targeted on just describing one resource (subject). This is not to exclude one possibility over the other - this is to decide the focus.

  • graphs Andy
  • machine: graphs, human: resource Nathan
  • graphs Manu Sporny, but I don't think we'll need to choose between the two if we're smart about it. For instance, JSON-LD allows expressing graphs just as easily as expressing resources.

The serialization MUST support disjoint/unconnected graphs

All current RDF serialization formats allow you to express two graphs that are not necessarily connected to one another. The new serialization format should allow the same mechanism. This is also important because normalization is difficult to achieve in a general way without also supporting disjoint graphs in the serialization. JSON-LD disjoint graphs example.

  • +1 Manu Sporny
  • +1 Andy One graph with two+ disjoint components per serialization
  • +0 Andy Multiple graphs per serialziation. No more than follow work in other TFs.
  • +1 Nathan as per andy's comments

The serialization MUST provide a normalization algorithm

Normalization, also known as canonicalization, is typically used when determining whether two sub-graphs that are expressed in different ways are identical. It is also very useful when hashing sub-graphs for checksumming or digital signature purposes. JSON-LD normalization example.

  • +1 Manu Sporny, I think we need normalization because we need to have a good digital signatures story
  •  ? Andy. Unclear - are we signing the graph or the serialization? Is a Turtle-signed graph the same graph? Would it include IRI normalization?
  • +0 Nathan

The serialization SHOULD enable digital signatures

Digital Signatures have a number of useful purposes. When combined with g-snaps/graph literals they provide a very easy way of establishing cryptographically verifiable provenance. These features are used heavily in electronic commerce. JSON-LD digital signature example.

The serialization SHOULD support advanced graph concepts

The serialization format should support advanced graph concepts such as g-box, g-snap and g-text such that you can make statements about snapshots of graphs. Annotating graphs with metadata such as graph retrieval time, digital signatures on the contents of the graph, and other metadata associated with graphs are an important feature for higher-level concepts like provenance. Sandro's explanation of advanced graph concepts.

  • +1 Manu Sporny
  • -1 Richard Cyganiak Has security implications for RDF crawlers; requires larger API surface; SPARQL only returns single graphs anyways; use case is unclear
  • -1 Andy Not unless the format is following standard work done in other TFs.
  • +0.5 Nathan follow other TFs

The serialization MUST support automatic typing

Being able to transform a JSON document into a native object is one of the key benefits of using JSON over other serialization formats. Automatically typing of numbers and boolean values into language-native datatypes removes an extra step that developers must perform without this feature. For example, one could easily transform a serialized number that is an xsd:integer into a language-native integer. JSON-LD automatic typing example.

The serialization SHOULD support type coercion

While not immediately obvious, type coercion allows one to map regular JSON into RDF in a way that may add datatype decorators to object literals. In other words, it provides for a way to get Typed Literals from regular JSON data. JSON-LD type coercion example.

The serialization SHOULD rely on microsyntaxes instead of nested structures

There are two common approaches to expressing RDF in JSON. One of them is to use nested structures to express language and type information for literals. The other approach is to use shallow structures with microsyntaxes mirroring TURTLE to express language and type information for literals.

The serialization SHOULD provide an API

An API would allow developers to transform incoming documents into a format that is easier for them to work with. In other words, it would allow them to drop all type information if it wasn't useful to them, or remove any micro-syntaxes that would get in the way of basic usage of the data. Keep in mind that even JSON has an api: JSON.parse(). JSON-LD API example.

(?? Reword as: The serialization SHOULD assume working with a JavaScript RDF API (Andy))

  • +1 Manu Sporny
  • -1 Nathan the machine one will have the RDF API, the human one is pointless if it needs and API.
  • +1 Matteo Brunati as Andy said, working with an API ( are there other WG are working on that or not? )

There SHOULD be one and only one way to serialize a given triple

The more different ways there are to express the same triple or graph, the harder it gets to use the host language's native toolbox (that is, pure JS expressions) to process data. At some point, using the host language becomes impossible without using a parser library layered on top of the host language, negating the benefit of basing the language on JSON in the first place. (Note, this is about using different JSON structures to express the same triple; not about different triples expressing the same statement in RDF Semantics, like "foo" vs "foo"^^xsd:string).

  • +1 Richard Cyganiak This is the lesson to be learnt from RDF/XML.
  • +0 Manu Sporny, while I agree in principle I don't know how we'd enforce this in practice - that is, what's the difference between "foo" and "foo"^^xsd:string in JSON? Would you serialize the plain literal "foo" and the Typed Literal "foo"^^xsd:string in the same way in JSON? If the answer is yes, isn't the translation lossy?
    • This one is inherent to the way the RDF model is defined. There's nothing that can be done about it in the syntax. The concern here was about using different JSON structures to express the same triple. I clarified the description.
  • +1 Matteo Brunati as for Richard, RDF/XML is the lesson

Participants