JSON Syntax Options

From RDF Working Group Wiki
Revision as of 20:43, 12 March 2011 by Nrixham (Talk | contribs)

Jump to: navigation, search

JSON Syntax Options

This page is being used by the RDF WG to harvest different approaches to enabling the key features of RDF, in JSON.

URI Properties

RDF uses URIs to name things, including properties. A key benefit of this is that it allows different data sources to all use properties defined in open vocabularies, thus enabling shared understanding of data.

JSON on the other hand, is typically used for domain specific / silo based information where properties are simple lexical terms (like "name") and what the property "means" is documented somewhere out of band, for instance in API documentation, or in a JSON-Schema document.

There follows a collection of different approaches we can take which enable the use of URI identified properties in JSON.

Full URIs

{ "http://xmlns.com/foaf/0.1/name": "Bob" }

Benefits:

  • Unambiguous and easy to process.
  • When following your nose around the web, property equivalence uses the in serialization URI.

Drawbacks:

  • Increased bytesize over the wire.
  • Can be verbose to use when using the returned (JSON.parsed) data without an API or tooling.
  • Verbose to author.

Example usage (assuming the returned data has been JSON.parsed):

obj["http://xmlns.com/foaf/0.1/name"]
obj[ foaf('name') ] // when using a tabulator ns style approach in your code
obj[ resolve('foaf:name') ] // when using a function which allows the resolution of CURIEs as found in the RDF API

CURIEs

Note: this example uses JSON-LD syntax for the prefix maps:

{
  "#": { "foaf": "http://xmlns.com/foaf/0.1/" },
  "foaf:name": "Bob"
}

To reconstruct the URI, one must split "foaf:name" on the colon, replace "foaf" with it's related mapping in the prefix map "http://xmlns.com/foaf/0.1/", concatenate "name" to "http://xmlns.com/foaf/0.1/".

Separator Options:

  •  : colon (familiar, but can't use . notation in JSON.parsed output)
  • _ underscore (unfamiliar, ambiguous when property also contains an underscore)
  • $ dollar (unfamiliar, but can use . notation in JSON.parsed output)

Benefits:

  • Reduced bytesize over the wire
  • Familiar to traditional RDF users
  • Easier to author

Drawbacks:

  • Requires tooling to normalize CURIEs prior to using the data when following your nose around the web.
  • Requires CURIE resolution to do property comparison (equivalence must be between URIs not CURIEs)
  • Unreliable when following your nose around the web (the same URI could be shortened to "ns0:ame" or "f:name")
  • Unfamiliar to traditional JSON users
  • Verbose to use when using the returned (JSON.parsed) data without an API or tooling.

Example usage (assuming the returned data has been JSON.parsed):

obj["foaf:name"]; // but ONLY when you are familiar with the data and NOT when following your nose


TERMs (no colon)

Note: this example uses JSON-LD syntax for the prefix maps:

{
  "#": { "name": "http://xmlns.com/foaf/0.1/name" },
  "name": "Bob"
}

To reconstruct the URI, one must replace "name" with it's related value in the map "http://xmlns.com/foaf/0.1/name"

Benefits:

  • Reduced bytesize over the wire
  • Familiar to traditional JSON users
  • Easy to author
  • Easy to use when using the returned (JSON.parsed) data without an API or tooling.

Drawbacks:

  • Requires tooling to normalize TERMs prior to using the data when following your nose around the web.
  • Requires TERM resolution to do property comparison (equivalence must be between URIs not TERMs)
  • Unreliable when following your nose around the web (the same URI could be shortened to "foo" or "bar")
  • Unfamiliar to traditional RDF users

Example usage (assuming the returned data has been JSON.parsed):

obj.name; // but ONLY when you are familiar with the data and NOT when following your nose

TERMs (with colon allowed)

Note: this example uses JSON-LD syntax for the prefix maps:

{
  "#": { "name": "http://xmlns.com/foaf/0.1/name", "rdfs:label": "http://www.w3.org/2000/01/rdf-schema#label" },
  "name": "Bob",
  "rdfs:label": "Bob"
}

To reconstruct the URI, one must replace the term ("name", "rdfs:label") with it's related value in the map ("http://xmlns.com/foaf/0.1/name", "http://www.w3.org/2000/01/rdf-schema#label")

Benefits:

  • Reduced bytesize over the wire
  • Familiar to traditional JSON users
  • Familiar to traditional RDF users
  • Easy to author
  • non-colon names only: Easy to use when using the returned (JSON.parsed) data without an API or tooling.

Drawbacks:

  • Requires tooling to normalize TERMs prior to using the data when following your nose around the web.
  • Requires TERM resolution to do property comparison (equivalence must be between URIs not TERMs)
  • Unreliable when following your nose around the web (the same URI could be shortened to "foo" or "bar")
  • with colon names only: Verbose to use when using the returned (JSON.parsed) data without an API or tooling.

Example usage (assuming the returned data has been JSON.parsed):

obj.name; // non-colon - but ONLY when you are familiar with the data and NOT when following your nose
obj["rdfs:label"]; // with colon - but ONLY when you are familiar with the data and NOT when following your nose


TERMs + Single Vocab

Note: this example uses a made up syntax!

{
  "#vocab": "http://example.org/my-vocab#",
  "name": "Bob",
}

To reconstruct the URI, one must append "name" to the value of #vocab ("http://example.org/my-vocab#")

Note: This may look wonderful, but comes with the one-vocab caveat that means when publishers require multiple terms, they will be likely to create "proxy" vocabularies that simply pull together many terms from different vocabularies and merge them. There is a processing and understanding cost to that which can't be stepped in to lightly.

Benefits:

  • Minimal bytesize over the wire
  • Familiar to traditional JSON users
  • Familiar to RDFa users
  • Easy to author
  • Unambiguous and easy to process.
  • Easy to use when using the returned (JSON.parsed) data without an API or tooling.
  • Encourages vocabulary merging and reuse.
  • Potentially far easier to deploy, doesn't require publishers to implement/have a sem web stack.

Drawbacks:

  • Requires understanding of equivalent property statements in custom vocabularies when following your nose around the web.
  • Real world property equivalence is far more complicated.

Example usage (assuming the returned data has been JSON.parsed):

obj.name; // non-colon - but ONLY when you are familiar with the data and NOT when following your nose

Option: External Maps

Three of the above options ( CURIEs, TERMs no colon, TERMS with colon ) all require a prefix or term map to be included in order to turn shortened properties in to full URIs.

There is a possibility that these maps could be factored out and referenced externally, this option comes with it's own set of benefits and drawbacks.

Benefits:

  • Minimal data over the wire.
  • When used with either of the TERMs options, allows bootstrapping of existing JSON data in the wild.
  • Encourages vocabulary merging and reuse.
  • Potentially far easier to deploy, doesn't require publishers to implement/have a sem web stack.

Drawbacks:

  • Sometimes requires two GETs when following your nose.
  • External map unavailability removes your ability to see the data as RDF.
  • Some changes to external maps could change the meaning of the data.


Datatypes

RDF includes support for specifying the datatype of literals, commonly referred to as "Typed Literals", this allows any literal to be given a specific datatype, typically one of the xsd: types.

JSON has inbuilt support for a minimal set of datatypes, namely strings, numbers (which covers integers, doubles and decimals), booleans, arrays and objects.

Commonly used datatypes which are not in JSON but frequently used in RDF, are IRI and the various forms of date and time.

Note: Many other JSON related specifications have found a need to define support for IRIs and various forms of date/time, for example Activity Streams JSON.

Note: Objects and Arrays will typically have special meaning/usage for RDF - JSON, so will not be discussed further here.

Limited Expressibility

This approach would constrain the syntax to only being able to express those datatypes already existing in JSON, namely:

* String (Unicode)
* Number (Integer, Double, Decimal)
* Boolean (true, false)
* Null (? does RDF have a concept of null,or datatype for it ?)

Benefits:

  • Requires no special processing of data
  • Familiar to most JSON users
  • Simple

Drawbacks:

  • No way to use other common or custom datatypes

Limited Expressibility + IRIs and Date/Time

This approach would extend the native JSON datatypes to include support for IRI, Date, Time and DateTime:

* IRI
* Date
* DateTime
* Time
* String (Unicode)
* Number (Integer, Double, Decimal)
* Boolean (true, false)
* Null (? does RDF have a concept of null,or datatype for it ?)

note: the additional types would need to be quoted like strings in order to keep JSON compatibility, e.g. "http://example.org/" rather than the same without quotes.

Benefits:

  • Potentially requires no special processing of data
  • Familiar to most users
  • Simple
  • Enough to cover most common use cases.

Drawbacks:

  • No way to use other common or custom datatypes


Property Range from Vocab

Either of the Limited Expressibility options could be augmented with type hinting from the range of the property being used.

Benefits:

  • Potentially requires no special processing of data (when not following your nose)
  • Familiar to most users
  • Simple
  • Enough to cover most common use cases.
  • Allows expression of common or custom datatypes

Drawbacks:

  • Potentially requires understanding of properties when following your nose & tooling to do so. (nathan: is this a drawback??)

Map the property to a datatype

Either of the Limited Expressibility options could be augmented with type hinting on the property, this could be included in the serialization, or in an external map as with the External Maps option for URIs.

Benefits:

  • Potentially requires no special processing of data (when not following your nose)
  • Familiar to most users
  • Simple
  • Enough to cover most common use cases.
  • Allows expression of common or custom datatypes

Drawbacks:

  • Potentially requires understanding of properties when following your nose & new tooling to do so.

Datatypes from JSON schema

As above, what we're doing could be merged with JSON Schema, in fact we could fully externalize and work with JSON Schema to create a single spec which covers most of the webs JSON needs, and our own RDF needs - but that's perhaps too wild for this group and out of charter.

(nathan likes this idea)

In-String TypedLiterals

This approach involves including both the data and the datatype in a single quoted string, for example "FDE3^^xsd:base64Binary"

note: the exact format of the combined string would be up for discussion, we may want to use full IRIs for datatypes, may explicitly offer a set of predefined tokens mapped to IRIs (e.g. "^int"), may have the datatype prefixed or postfixed - many different approaches

Benefits:

  • Can express all common and custom datatypes

Drawbacks:

  • Always requires special processing
  • Unfamiliar to most typical JSON users
  • Verbose
  • What to do when you don't understand a datatype?

Paired Values - value/datatype

Using either the object or array syntax from JSON, we could specify typed literals like such:

{ "property": {
    "_value": "FDE3",
    "_datatype": "xsd:base64Binary",
  }
}

Options:

  • All typed literals like this, including numbers.
  • Only some typed literals like this.

Benefits:

  • Can express all common and custom datatypes

Drawbacks:

  • Always requires special processing
  • Unfamiliar to most typical JSON users
  • Verbose

Paired Values - datatype arcs

Using either the object or array syntax from JSON, we could specify typed literals like such:

{ "property": { "xsd:base64Binary": "FDE3" } }

note: see JSN3 for more examples

Options:

  • All typed literals like this, including numbers.
  • Only some typed literals like this.

Benefits:

  • Can express all common and custom datatypes
  • Smaller bytesize on the wire (allows repetition)

Drawbacks:

  • Always requires special processing
  • Unfamiliar to most typical JSON users


Languages

RDF currently includes support for specifying the language strings (for example english or dutch), Plain Literals, support is often serialization specific, with RDFa delegating to the lang/xml:lang attributes, and turtle taking the "Bob"@en approach.

JSON currently has no support for specifying the language of strings.

No Language

It's an option... JSON natively supports unicode, thus strings like "花澄" are perfectly acceptable, and JSON is used effectively throughout the web without requiring a language tag, and further often text consists of multiple different languages and which language tag to use is not clear. For example:

彭博社:2987名人大代表中70名最富的人资产总值为4931亿人民币,约751亿美元!The richest 70 of the 2,987 members have a combined wealth of 493.1 billion yuan ($75.1 billion)

source

Property Specifies Language

This option would involve language specific properties being created in vocabs, for example "rdfs:label-en" and "rdfs:label-ja".

Not saying much about this one as it's a huge change to RDF and quite possibly entirely impractical from almost every angle. But, it is an option.

Property Modifiers

This option involves adding a language hint to the property, as serialization sugar only, for example:

{ "label@en": "London" }

Benefits:

  • Can express languages
  • Potentially lighter to process than "in-string language"
  • Potentially smaller bytesize on the wire than both of the paired values option (and allows repetition)

Drawbacks:

  • Always requires special processing to use the data
  • Unfamiliar to most typical JSON users
  • Can be verbose when working with data in many languages (requires a min of one property value pair per language)


In-String Language

This approach involves including both the data and the language in a single quoted string, for example "花澄@ja"

note: the exact format of the combined string would be up for discussion, we may want to use IRIs for languages, may explicitly offer a set of predefined tokens (e.g. "@en"), may have the language prefixed ("ja@花澄") or postfixed ("花澄@ja") - many different approaches

Benefits:

  • Can express languages

Drawbacks:

  • Always requires special processing (including tracking back over parsed data)
  • Unfamiliar to most typical JSON users

Paired Values - value/language

Using either the object or array syntax from JSON, we could specify plain literals with languages as such:

{ "property": {
    "_value": "花澄",
    "_language": "ja",
  }
}

Benefits:

  • Can express languages
  • Lighter to process than "in-string language"

Drawbacks:

  • Always requires special processing to use the data
  • Unfamiliar to most typical JSON users
  • Verbose

Paired Values - language arcs

Using either the object or array syntax from JSON, we could specify plain literals with languages as such:

{ "property": {"@en": "London"} }

note: see JSN3 for more examples

Benefits:

  • Can express languages
  • Lighter to process than "in-string language"
  • Smaller bytesize on the wire than the other paired values option (allows repetition)

Drawbacks:

  • Always requires special processing to use the data
  • Unfamiliar to most typical JSON users

Externalized Languages / Language Mapped to Property

Can't think of a decent, reliable way to do this? Maybe somebody can.


Syntax Structure

RDF is very flexible syntax-wise, because it is a graph based data model (nodes and edges), and can be expressed in any number of ways, from a set of triples, through to key/value objects with a subject assigned.

JSON is typically used to express simple key/value objects, plain old data objects.

RDF in JSON can therefore be assembled as key/value objects with a subject assigned, or in an n-triples like manner (a big list of triples) or anywhere in-between, as with turtle.

note: the benefits and disadvantages run much deeper than the simple ones mentioned in this section, as each option in this document has it's own set of trade-offs, however some primary ones are listed here which are specific to the general syntax option.

Triples

This option involves specifying the serialization to be a simple set of s,p,o triples, an example may be:

[
  {"s": "IRI", "p": "IRI", "o": {"_value": "object", "_language": "en"} },
  {"s": "_:b1", "p": "IRI", "o": {"_value": "fdb3", "_datatype": "http://www.w3.org/2001/XMLSchema#base64Binary"} }
]

Benefits:

  • Unambiguous
  • Simple for machines process (and to spec!)
  • Could be a good machine2machine interchange format for triples.

Drawbacks:

  • Requires RDF Tooling to use for most practical purposes
  • Extremely unfriendly for typical JSON Developers (and anybody working with the JSON.parsed data directly)
  • Extremely verbose + large bytesize over the wire

Distinguishing Features:

  • approach constrains to a triple based syntax, typically lends to "one way to do each thing".

Iterative Reduction

This approach involves iteratively reducing the Triples approach until something more "turtle-list" (for lack of a better word) is created. Simple beginning steps may involve:

Allowing multiple object values:

[
  {"s": "IRI", "p": "IRI", "o": [{"_value": "London", "_language": "en"},{"_value": "Londra", "_language": "it"},{"_value": "Lontoo", "_language": "fi"}] }
]

Turn each Property-Object chain in to a key/value object:

{
 "subject-IRI": {
   "http://www.w3.org/2000/01/rdf-schema#label": [{"_value": "London", "_language": "en"},{"_value": "Londra", "_language": "it"},{"_value": "Lontoo", "_language": "fi"}],
   "http://www.w3.org/2002/07/owl#sameAs": [ "http://data.nytimes.com/14085781296239331901", "http://sws.geonames.org/2643743/" ]
 }
}

and so forth, perhaps adopting some of the various options outlined in this document in the process.

Benefits:

  • Reduced bytesize over the wire

Drawbacks (depending how close to "Objects" you get):

  • Requires RDF Tooling to use for most practical purposes
  • Unfriendly for typical JSON Developers (and anybody working with the JSON.parsed data directly)
  • It's not triples, and it's not objects

Distinguishing Features:

  • approach is unconstrained and every option is viable, including multiple option combinations (multiple ways to state a property for example).

note: there may be more benefits, feel free to add, the original author of this document (nathan) can't see any though, to him this is just unfriendly turtle.

Objects

This approach starts with typical plain old simple objects as found in most JSON in the wild, then focusses on keeping it as close to the JSON data that's in the wild as possible, and allowing data from specific sources to be consumed without the use of RDF tooling. This lends more to a mapping based approach.

Example starting point:

{
 "id": 1237642,
 "name": "Bob",
 "age": 44
}

Typical approach would be to start with a simple key/value object then layer on subjects and additional datatypes (like IRI and dates)

{
 "@id": "http://example.org/users/1237642#",
 "name": "Bob",
 "age": 44
}

Then handle URI properties (see the many options under URI Properties above for more):

{
 "@vocab": "http://example.org/schema/user#",
 "@id": "http://example.org/users/1237642#",
 "name": "Bob",
 "age": 44
}

Benefits:

  • Simple for developers to work with when using JSON.parsed data
  • Minimal bytesize over the wire
  • Familiar to most users
  • Easy to publish without requiring a full RDF tooling or tech stack changes
  • Potentially allows bootstrapping of many web 2.0 data sources.

Drawbacks:

  • Requires RDF Tooling to use when following your nose around the web
  • Takes more processing when working with the data like RDF than the Triples approach

Distinguishing Features:

  • approach is constrained such that the end result would be as close to existing typical JSON usage, to simple objects that is.

Summary

There are many different variations possible, especially when taking the "Iterative Reduction" approach.

Two points to consider from nathan:

  • Every follow your nose usecase always requires tooling and processing, so this can be null and voided from most of the drawback sections. The only variables are "how much processing?", "how big? (bytesize)" and "can this be easily used as simple JSON.parsed data when not following your nose?".
  • It helps to have a usecase/requirements/constraints when creating things, both the "triples" and "objects" approaches have clear requirements and end goals, the "iterative reduction" option on the other hand..