JSON-Design-Requirements

From RDF Working Group Wiki
Revision as of 10:23, 5 April 2011 by Msporny (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

RDF in JSON Design Requirements

There should be two serialization formats

There should be a machine-friendly serialization format and there should be a human-friendly serialization format.

  • -1 Manu Sporny, given the limited time for this working group, I think we should focus on the human-friendly serialization format. RDF already has a number of machine-friendly serialization formats.
  • +1 Andy. A simple "s", "p", "o" format is not the same amount of work as a human-friendly form. See SPARQL JSON result format
  • 0 Lee. I'd worry about the WG's available time and resources.
  • +1 Nathan if possible.
  • -0 Matteo Brunati not enough time maybe
  • -0 Chris Matheus not a priority
  • -1 Thomas Steiner Can we avoid this? One format to rule them all.
  • 0 Nicholas Humfrey Ideally one would be a subset of the other.
  • -0 Gavin Carothers There does not need to be a disconnect between human-friendly and machine-friendly for readability

A primary goal SHOULD be to build a human-friendly version of the serialization for JSON developers

The serialization should be optimized for humans first, machines second. The ability for machines to quickly parse the file is secondary to the ability for developers to be able to use the serialization with JavaScript. A focus should be placed on making the serialization fit into JavaScript frameworks easily, even at the cost of JSON-LD processor implementation complexity.

  • +1 Manu Sporny
  • -1 Lee. Given the existing work in the RDFa group on an API, I'd rather see a simple, machine-friendly format that implementations can then make available via an API. I'm not convinced that a standard human-friendly JSON format is a big win.
  • -0 Andy Different uses cases lead to different design tradeoffs. (e.g LDA is a tree; ideal for them, bad for different uses.)
  • +1 Nathan but only if the product can be considered simple JSON objects (k/v objects with a subject set) and the caveat is recognized that by not requiring an RDF toolkit or understanding of properties, inference etc, the data isn't really RDF... it's RDF-able - else -1, waste of time.
  • +1 Matteo Brunati +1 Nathan observations
  • +1 Chris Matheus extremely helpful for users new to RDF
  • +1 Thomas Steiner Yes, please! Make it easy for developers to write RDF in JSON.
  • -1 Nicholas Humfrey I am more interested in machines. Turtle is good for Humans.
  • -1 Gavin Carothers Large JSON objects are already VERY human unfriendly.

A primary goal SHOULD be to build a machine-optimized version of the serialization

The serialization should be optimized for machines first, humans second. The ability to use the serialization in JavaScript is secondary to the ability for machines to quickly parse the file. A focus should be placed on making implementations very easy to write.

The serialization SHOULD be able to transform most JSON in use today into RDF

There should be a flexible mechanism, such as a "context", that is capable of mapping from JSON key-value pairs to RDF triples. This mechanism could be specified either in-band or out-of-band from the serialization. Having this feature could map much of the existing JSON in the wild into RDF.

  • +1 Manu Sporny
  • -1 Lee. Seems out-of-scope; do existing RDF-in-JSON solutions already have such mechanisms?
  • -1 Andy The original data was not written to be used in this way.
  • +1 Nathan Assuming we're still talking two serializations, then this would be very valuable, for twitter to be able to say here's our data, view it as simple objects or rdf graphs; although I'm unsure we can get there without a common vision across the water.
  • -1 Matteo Brunati +1 to Andy, it's not in the original usage of the data
  • -1 Chris Matheus nice to have but should not consume this team's resources
  • 0 Thomas Steiner Time permitting, not a top priority IMHO.
  • -1 Nicholas Humfrey too complex - and not worth standardising
  • -1 Gavin Carothers JSON-RDF should be a syntax, not something like GRDDL.

Developers do not need to be familiar at all with RDF to start using the serialization

Understanding the semantic web and the concepts of RDF (triples, graphs, etc.) should not be required in order to use the format. That means that the format may have a very simple, stripped down version for beginners and a more advanced set of features for semantic web enthusiasts.

  • +1 Manu Sporny
  • +1 Nathan only if two serializations, and as per previous comments.
  • -1 Richard Cyganiak I think I disagree. If you don't want to expose developers to RDF at all, then why not just use vanilla JSON? Also I don't understand how the beginner/advanced thing should work. A server will have to generate the one or the other, so it's not like client-side developers get to choose which version they want to be exposed to.
  • -1 Matteo Brunati I think a minimal semweb context is necessary: thinking on SIMILE Exhibit framework. It's not simple to use without a prior knowledge of the model.
  • 0 Chris Matheus some very basic knowledge may be important but deep knowledge should not be required
  • -1 Thomas Steiner People should at /least/ have an understanding of triples, that's enough for most use cases.
  • 0 Nicholas Humfrey don't hide the triples
  • -1 Gavin Carothers RDF-JSON should act as an introduction to RDF. Not as something you can use without knowing about RDF.

The serialization MAY include features not in RDF

There are certain features, such as generic key-value pairs in JSON that do not map well to RDF. They would map well if RDF had a concept of plain literals in the subject or predicate position. The serialization could include these concepts but may specify that the values may not be serialized to all RDF serialization formats (such as RDF/XML, TURTLE or RDFa).

  • +1 Manu Sporny
  • -1 Andy creates an incompatible sub-community of applications.
  • +1 Nathan useful for allowing "junk" data like debugging info and session tokens, again only if two serializations.
  • -1 Richard Cyganiak as per Andy. Generic key-value pairs can be translated to <> <#key> "value" or somesuch.
  • -1 Matteo Brunati as for Andy. making a default rule to the generic key-value stuff
  • -1 Chris Matheus shouldn't spend time on this
  • -1 Thomas Steiner Strong no. Stay compatible with RDF by all means.
  • -1 Nicholas Humfrey
  • -1 Gavin Carothers A single JSON object should be able to contain JSON-RDF as a value. Which addresses Nathan's need.

The serialization MUST be 100% compatible with the JSON spec

Additional features such as comments or short-hand notation to support datatypes could be supported in the serialization if we extended the JSON format. This would mean that the serialization would be incompatible with vanilla JSON readers and writers. While this may make serialization nicer, we should not make any additions/modifications to the JSON format to ensure maximum compatibility with pre-existing processors.

It is a requirement that all RDF concepts MUST be expressible in the serialization

There are concepts like RDF datatypes and g-snaps/graph literals that could be omitted from the serialization in order to reduce learning and implementation complexity.

  • -1 Manu Sporny, Good design is a balancing act - we should only include what will help the most number of people.
  • +1 Lee. I'd hesitate to say "all", but in general, a JSON RDF serialization would not be useful to us unless it was as much a 1st-class serialization of the RDF model as turtle, RDF/XML, etc.
  • +1 Andy for the machine-friendly form to work with non-JSON apps and systems.
  • -1 Andy for the human-friendly form but the features dropped will vary from usage to usage.
  • +1 Nathan for machine (rdf in json)
  • -1 Nathan for human (rdf-able json objects)
  • -1 Chris Matheus not for this round
  • +0.8 Matteo Brunati probably yes, but not this time maybe, too complexity?
  • -1 Thomas Steiner Easy things should be easy and hard things should be possible. Keep the entry barrier low (inferred types), but allow the experts to do crazy things.
  • +1 Nicholas Humfrey Definitely for the machine version of the spec.
  • +1 Gavin Carothers Two serializations of the same abstract RDF should be the same.

There should be a migration story for going from existing JSON in the wild to this new format

The serialization task force should ensure that there is a subset of the serialization that is useful to beginners that use pure JSON, then show how developers could sprinkle in a little RDF into their JSON, then show how developers can fully migrate to the new serialization format. The transition to the serialization format will probably take multiple years The transition should be as smooth and organic as possible. We should also understand that many may not need to transition to RDF - JSON may work just fine for their application. We should not assume that people will go straight from regular JSON to the new serialization format.

Memory usage and CPU usage while processing SHOULD be a primary consideration

Memory and CPU usage for processing JSON is low. We should ensure that processing the serialization format is only slightly more complex than processing regular JSON.

  • +0 Manu Sporny, we want to be cognizant of resource usage but I don't think this should be a primary driver for design decisions for the language.
  • -1 Lee. Seems like an implementation detail to me.
  • -1 Andy (NB: JSON structures are read entirely into memory before the application gets to see them.)
  • +0.5 Nathan there is a balance between memory and processing to be struck, ntriples = more byte, turtle = more processing, same considerations for JSON.
  • -1 Chris Matheus
  • -1 Thomas Steiner IMHO if you need the ultimate performance, use, e.g., N-Triples, readability should have a higher priority, personally speaking.
  • +1 Nicholas Humfrey It is important that it should be parsable as a stream (single pass)

The design target is small snippets of RDF Data

"small" might be less than 1 million triples, not 10.

  • +1 Andy
  • 0 Nathan two different considerations for machine or human, I'd say under 10k for human, over and beyond for machine
  • +1 Thomas Steiner For huge dumps use, e.g., N-Triples IMHO.
  • 0 Nicholas Humfrey

Design target: graphs or resources

A human friendly JSON format can be designed more towards graphs (multiple subjects) or more targeted on just describing one resource (subject). This is not to exclude one possibility over the other - this is to decide the focus.

The serialization MUST support disjoint/unconnected graphs

All current RDF serialization formats allow you to express two graphs that are not necessarily connected to one another. The new serialization format should allow the same mechanism. This is also important because normalization is difficult to achieve in a general way without also supporting disjoint graphs in the serialization. JSON-LD disjoint graphs example.

  • +1 Manu Sporny
  • +1 Andy One graph with two+ disjoint components per serialization
  • +0 Andy Multiple graphs per serialziation. No more than follow work in other TFs.
  • +1 Nathan as per andy's comments
  • +1 Chris Matheus

The serialization MUST provide a normalization algorithm

Normalization, also known as canonicalization, is typically used when determining whether two sub-graphs that are expressed in different ways are identical. It is also very useful when hashing sub-graphs for checksumming or digital signature purposes. JSON-LD normalization example.

  • +1 Manu Sporny, I think we need normalization because we need to have a good digital signatures story
  •  ? Andy. Unclear - are we signing the graph or the serialization? Is a Turtle-signed graph the same graph? Would it include IRI normalization?
  • +0 Nathan
  • +0 Chris Matheus highly desirable if there's time
  • +1 Thomas Steiner Time permitting

The serialization SHOULD enable digital signatures

Digital Signatures have a number of useful purposes. When combined with g-snaps/graph literals they provide a very easy way of establishing cryptographically verifiable provenance. These features are used heavily in electronic commerce. JSON-LD digital signature example.

The serialization SHOULD support advanced graph concepts

The serialization format should support advanced graph concepts such as g-box, g-snap and g-text such that you can make statements about snapshots of graphs. Annotating graphs with metadata such as graph retrieval time, digital signatures on the contents of the graph, and other metadata associated with graphs are an important feature for higher-level concepts like provenance. Sandro's explanation of advanced graph concepts.

  • +1 Manu Sporny
  • -1 Richard Cyganiak Has security implications for RDF crawlers; requires larger API surface; SPARQL only returns single graphs anyways; use case is unclear
  • -1 Andy Not unless the format is following standard work done in other TFs.
  • +0.5 Nathan follow other TFs
  • 0 Matteo Brunati too problematic probably, +1 Richard notes
  • -0 Chris Matheus not this round unless the Graph TF results happens quickly and their incorporation is straight forawrd

The serialization MUST support automatic typing

Being able to transform a JSON document into a native object is one of the key benefits of using JSON over other serialization formats. Automatically typing of numbers and boolean values into language-native datatypes removes an extra step that developers must perform without this feature. For example, one could easily transform a serialized number that is an xsd:integer into a language-native integer. JSON-LD automatic typing example.

The serialization SHOULD support type coercion

While not immediately obvious, type coercion allows one to map regular JSON into RDF in a way that may add datatype decorators to object literals. In other words, it provides for a way to get Typed Literals from regular JSON data. JSON-LD type coercion example.

The serialization SHOULD rely on microsyntaxes instead of nested structures

There are two common approaches to expressing RDF in JSON. One of them is to use nested structures to express language and type information for literals. The other approach is to use shallow structures with microsyntaxes mirroring TURTLE to express language and type information for literals.

  • +1 Manu Sporny
  • -1 Richard Cyganiak It's ugly as hell and makes the language unusable without an API
  • -1 Nathan
  • -1 Matteo Brunati
  • -1 Chris Matheus
  • -1 Thomas Steiner Microsyntaxes are easy to get wrong. Nested structures occupy more memory and require two object accesses on read/write, however, with default/inferred types, this could be OK.
  • +1 Nicholas Humfrey It depends on how complex the micro-syntax is. I like the idea of making the JSON less verbose (for humans)

The serialization SHOULD provide an API

An API would allow developers to transform incoming documents into a format that is easier for them to work with. In other words, it would allow them to drop all type information if it wasn't useful to them, or remove any micro-syntaxes that would get in the way of basic usage of the data. Keep in mind that even JSON has an api: JSON.parse(). JSON-LD API example.

(?? Reword as: The serialization SHOULD assume working with a JavaScript RDF API (Andy))

  • +1 Manu Sporny
  • -1 Nathan the machine one will have the RDF API, the human one is pointless if it needs and API.
  • +1 Matteo Brunati as Andy said, working with an API ( are there other WG are working on that or not? )
  • -1 Chris Matheus not this round
  • -1 Thomas Steiner The JSON is the API, we just need to make it easy enough.
  • 0 Nicholas Humfrey Nice to have - but should not be required

There SHOULD be one and only one way to serialize a given triple

The more different ways there are to express the same triple or graph, the harder it gets to use the host language's native toolbox (that is, pure JS expressions) to process data. At some point, using the host language becomes impossible without using a parser library layered on top of the host language, negating the benefit of basing the language on JSON in the first place. (Note, this is about using different JSON structures to express the same triple; not about different triples expressing the same statement in RDF Semantics, like "foo" vs "foo"^^xsd:string).