Difference between revisions of "JSON Syntax Options"

From RDF Working Group Wiki
Jump to: navigation, search
(No Language)
Line 403: Line 403:
 
* Always requires special processing to use the data
 
* Always requires special processing to use the data
 
* Unfamiliar to most typical JSON users
 
* Unfamiliar to most typical JSON users
 +
 +
==== Externalized Languages / Language Mapped to Property ====
 +
 +
Can't think of a decent, reliable way to do this? Maybe somebody can.

Revision as of 18:59, 12 March 2011

JSON Syntax Options

This page is being used by the RDF WG to harvest different approaches to enabling the key features of RDF, in JSON.

URI Properties

RDF uses URIs to name things, including properties. A key benefit of this is that it allows different data sources to all use properties defined in open vocabularies, thus enabling shared understanding of data.

JSON on the other hand, is typically used for domain specific / silo based information where properties are simple lexical terms (like "name") and what the property "means" is documented somewhere out of band, for instance in API documentation, or in a JSON-Schema document.

There follows a collection of different approaches we can take which enable the use of URI identified properties in JSON.

Full URIs

{ "http://xmlns.com/foaf/0.1/name": "Bob" }

Benefits:

  • Unambiguous and easy to process.
  • When following your nose around the web, property equivalence uses the in serialization URI.

Drawbacks:

  • Increased bytesize over the wire.
  • Can be verbose to use when using the returned (JSON.parsed) data without an API or tooling.
  • Verbose to author.

Example usage (assuming the returned data has been JSON.parsed):

obj["http://xmlns.com/foaf/0.1/name"]
obj[ foaf('name') ] // when using a tabulator ns style approach in your code
obj[ resolve('foaf:name') ] // when using a function which allows the resolution of CURIEs as found in the RDF API

CURIEs

Note: this example uses JSON-LD syntax for the prefix maps:

{
  "#": { "foaf": "http://xmlns.com/foaf/0.1/" },
  "foaf:name": "Bob"
}

To reconstruct the URI, one must split "foaf:name" on the colon, replace "foaf" with it's related mapping in the prefix map "http://xmlns.com/foaf/0.1/", concatenate "name" to "http://xmlns.com/foaf/0.1/".

Separator Options:

  •  : colon (familiar, but can't use . notation in JSON.parsed output)
  • _ underscore (unfamiliar, ambiguous when property also contains an underscore)
  • $ dollar (unfamiliar, but can use . notation in JSON.parsed output)

Benefits:

  • Reduced bytesize over the wire
  • Familiar to traditional RDF users
  • Easier to author

Drawbacks:

  • Requires tooling to normalize CURIEs prior to using the data when following your nose around the web.
  • Requires CURIE resolution to do property comparison (equivalence must be between URIs not CURIEs)
  • Unreliable when following your nose around the web (the same URI could be shortened to "ns0:ame" or "f:name")
  • Unfamiliar to traditional JSON users
  • Verbose to use when using the returned (JSON.parsed) data without an API or tooling.

Example usage (assuming the returned data has been JSON.parsed):

obj["foaf:name"]; // but ONLY when you are familiar with the data and NOT when following your nose


TERMs (no colon)

Note: this example uses JSON-LD syntax for the prefix maps:

{
  "#": { "name": "http://xmlns.com/foaf/0.1/name" },
  "name": "Bob"
}

To reconstruct the URI, one must replace "name" with it's related value in the map "http://xmlns.com/foaf/0.1/name"

Benefits:

  • Reduced bytesize over the wire
  • Familiar to traditional JSON users
  • Easy to author
  • Easy to use when using the returned (JSON.parsed) data without an API or tooling.

Drawbacks:

  • Requires tooling to normalize TERMs prior to using the data when following your nose around the web.
  • Requires TERM resolution to do property comparison (equivalence must be between URIs not TERMs)
  • Unreliable when following your nose around the web (the same URI could be shortened to "foo" or "bar")
  • Unfamiliar to traditional RDF users

Example usage (assuming the returned data has been JSON.parsed):

obj.name; // but ONLY when you are familiar with the data and NOT when following your nose

TERMs (with colon allowed)

Note: this example uses JSON-LD syntax for the prefix maps:

{
  "#": { "name": "http://xmlns.com/foaf/0.1/name", "rdfs:label": "http://www.w3.org/2000/01/rdf-schema#label" },
  "name": "Bob",
  "rdfs:label": "Bob"
}

To reconstruct the URI, one must replace the term ("name", "rdfs:label") with it's related value in the map ("http://xmlns.com/foaf/0.1/name", "http://www.w3.org/2000/01/rdf-schema#label")

Benefits:

  • Reduced bytesize over the wire
  • Familiar to traditional JSON users
  • Familiar to traditional RDF users
  • Easy to author
  • non-colon names only: Easy to use when using the returned (JSON.parsed) data without an API or tooling.

Drawbacks:

  • Requires tooling to normalize TERMs prior to using the data when following your nose around the web.
  • Requires TERM resolution to do property comparison (equivalence must be between URIs not TERMs)
  • Unreliable when following your nose around the web (the same URI could be shortened to "foo" or "bar")
  • with colon names only: Verbose to use when using the returned (JSON.parsed) data without an API or tooling.

Example usage (assuming the returned data has been JSON.parsed):

obj.name; // non-colon - but ONLY when you are familiar with the data and NOT when following your nose
obj["rdfs:label"]; // with colon - but ONLY when you are familiar with the data and NOT when following your nose


TERMs + Single Vocab

Note: this example uses a made up syntax!

{
  "#vocab": "http://example.org/my-vocab#",
  "name": "Bob",
}

To reconstruct the URI, one must append "name" to the value of #vocab ("http://example.org/my-vocab#")

Note: This may look wonderful, but comes with the one-vocab caveat that means when publishers require multiple terms, they will be likely to create "proxy" vocabularies that simply pull together many terms from different vocabularies and merge them. There is a processing and understanding cost to that which can't be stepped in to lightly.

Benefits:

  • Minimal bytesize over the wire
  • Familiar to traditional JSON users
  • Familiar to RDFa users
  • Easy to author
  • Unambiguous and easy to process.
  • Easy to use when using the returned (JSON.parsed) data without an API or tooling.
  • Encourages vocabulary merging and reuse.
  • Potentially far easier to deploy, doesn't require publishers to implement/have a sem web stack.

Drawbacks:

  • Requires understanding of equivalent property statements in custom vocabularies when following your nose around the web.
  • Real world property equivalence is far more complicated.

Example usage (assuming the returned data has been JSON.parsed):

obj.name; // non-colon - but ONLY when you are familiar with the data and NOT when following your nose

Option: External Maps

Three of the above options ( CURIEs, TERMs no colon, TERMS with colon ) all require a prefix or term map to be included in order to turn shortened properties in to full URIs.

There is a possibility that these maps could be factored out and referenced externally, this option comes with it's own set of benefits and drawbacks.

Benefits:

  • Minimal data over the wire.
  • When used with either of the TERMs options, allows bootstrapping of existing JSON data in the wild.
  • Encourages vocabulary merging and reuse.
  • Potentially far easier to deploy, doesn't require publishers to implement/have a sem web stack.

Drawbacks:

  • Sometimes requires two GETs when following your nose.
  • External map unavailability removes your ability to see the data as RDF.
  • Some changes to external maps could change the meaning of the data.


Datatypes

RDF includes support for specifying the datatype of literals, commonly referred to as "Typed Literals", this allows any literal to be given a specific datatype, typically one of the xsd: types.

JSON has inbuilt support for a minimal set of datatypes, namely strings, numbers (which covers integers, doubles and decimals), booleans, arrays and objects.

Commonly used datatypes which are not in JSON but frequently used in RDF, are IRI and the various forms of date and time.

Note: Many other JSON related specifications have found a need to define support for IRIs and various forms of date/time, for example Activity Streams JSON.

Note: Objects and Arrays will typically have special meaning/usage for RDF - JSON, so will not be discussed further here.

Limited Expressibility

This approach would constrain the syntax to only being able to express those datatypes already existing in JSON, namely:

* String (Unicode)
* Number (Integer, Double, Decimal)
* Boolean (true, false)
* Null (? does RDF have a concept of null,or datatype for it ?)

Benefits:

  • Requires no special processing of data
  • Familiar to most JSON users
  • Simple

Drawbacks:

  • No way to use other common or custom datatypes

Limited Expressibility + IRIs and Date/Time

This approach would extend the native JSON datatypes to include support for IRI, Date, Time and DateTime:

* IRI
* Date
* DateTime
* Time
* String (Unicode)
* Number (Integer, Double, Decimal)
* Boolean (true, false)
* Null (? does RDF have a concept of null,or datatype for it ?)

note: the additional types would need to be quoted like strings in order to keep JSON compatibility, e.g. "http://example.org/" rather than the same without quotes.

Benefits:

  • Potentially requires no special processing of data
  • Familiar to most users
  • Simple
  • Enough to cover most common use cases.

Drawbacks:

  • No way to use other common or custom datatypes


Property Range from Vocab

Either of the Limited Expressibility options could be augmented with type hinting from the range of the property being used.

Benefits:

  • Potentially requires no special processing of data (when not following your nose)
  • Familiar to most users
  • Simple
  • Enough to cover most common use cases.
  • Allows expression of common or custom datatypes

Drawbacks:

  • Potentially requires understanding of properties when following your nose & tooling to do so. (nathan: is this a drawback??)

Map the property to a datatype

Either of the Limited Expressibility options could be augmented with type hinting on the property, this could be included in the serialization, or in an external map as with the External Maps option for URIs.

Benefits:

  • Potentially requires no special processing of data (when not following your nose)
  • Familiar to most users
  • Simple
  • Enough to cover most common use cases.
  • Allows expression of common or custom datatypes

Drawbacks:

  • Potentially requires understanding of properties when following your nose & new tooling to do so.

Datatypes from JSON schema

As above, what we're doing could be merged with JSON Schema, in fact we could fully externalize and work with JSON Schema to create a single spec which covers most of the webs JSON needs, and our own RDF needs - but that's perhaps too wild for this group and out of charter.

(nathan likes this idea)

In-String TypedLiterals

This approach involves including both the data and the datatype in a single quoted string, for example "FDE3^^xsd:base64Binary"

note: the exact format of the combined string would be up for discussion, we may want to use full IRIs for datatypes, may explicitly offer a set of predefined tokens mapped to IRIs (e.g. "^int"), may have the datatype prefixed or postfixed - many different approaches

Benefits:

  • Can express all common and custom datatypes

Drawbacks:

  • Always requires special processing
  • Unfamiliar to most typical JSON users
  • Verbose
  • What to do when you don't understand a datatype?

Paired Values - value/datatype

Using either the object or array syntax from JSON, we could specify typed literals like such:

{ "property": {
    "_value": "FDE3",
    "_datatype": "xsd:base64Binary",
  }
}

Options:

  • All typed literals like this, including numbers.
  • Only some typed literals like this.

Benefits:

  • Can express all common and custom datatypes

Drawbacks:

  • Always requires special processing
  • Unfamiliar to most typical JSON users
  • Verbose

Paired Values - datatype arcs

Using either the object or array syntax from JSON, we could specify typed literals like such:

{ "property": { "xsd:base64Binary": "FDE3" } }

note: see JSN3 for more examples

Options:

  • All typed literals like this, including numbers.
  • Only some typed literals like this.

Benefits:

  • Can express all common and custom datatypes
  • Smaller bytesize on the wire (allows repetition)

Drawbacks:

  • Always requires special processing
  • Unfamiliar to most typical JSON users


Languages

RDF currently includes support for specifying the language strings (for example english or dutch), Plain Literals, support is often serialization specific, with RDFa delegating to the lang/xml:lang attributes, and turtle taking the "Bob"@en approach.

JSON currently has no support for specifying the language of strings.

No Language

It's an option... JSON natively supports unicode, thus strings like "花澄" are perfectly acceptable, and JSON is used effectively throughout the web without requiring a language tag, and further often text consists of multiple different languages and which language tag to use is not clear. For example:

彭博社:2987名人大代表中70名最富的人资产总值为4931亿人民币,约751亿美元!The richest 70 of the 2,987 members have a combined wealth of 493.1 billion yuan ($75.1 billion)

source

Property Specifies Language

This option would involve language specific properties being created in vocabs, for example "rdfs:label-en" and "rdfs:label-ja".

Not saying much about this one as it's a huge change to RDF and quite possibly entirely impractical from almost every angle. But, it is an option.

Property Modifiers

This option involves adding a language hint to the property, as serialization sugar only, for example:

{ "label@en": "London" }

Benefits:

  • Can express languages
  • Potentially lighter to process than "in-string language"
  • Potentially smaller bytesize on the wire than both of the paired values option (and allows repetition)

Drawbacks:

  • Always requires special processing to use the data
  • Unfamiliar to most typical JSON users
  • Can be verbose when working with data in many languages (requires a min of one property value pair per language)


In-String Language

This approach involves including both the data and the language in a single quoted string, for example "花澄@ja"

note: the exact format of the combined string would be up for discussion, we may want to use IRIs for languages, may explicitly offer a set of predefined tokens (e.g. "@en"), may have the language prefixed ("ja@花澄") or postfixed ("花澄@ja") - many different approaches

Benefits:

  • Can express languages

Drawbacks:

  • Always requires special processing (including tracking back over parsed data)
  • Unfamiliar to most typical JSON users

Paired Values - value/language

Using either the object or array syntax from JSON, we could specify plain literals with languages as such:

{ "property": {
    "_value": "花澄",
    "_language": "ja",
  }
}

Benefits:

  • Can express languages
  • Lighter to process than "in-string language"

Drawbacks:

  • Always requires special processing to use the data
  • Unfamiliar to most typical JSON users
  • Verbose

Paired Values - language arcs

Using either the object or array syntax from JSON, we could specify plain literals with languages as such:

{ "property": {"@en": "London"} }

note: see JSN3 for more examples

Benefits:

  • Can express languages
  • Lighter to process than "in-string language"
  • Smaller bytesize on the wire than the other paired values option (allows repetition)

Drawbacks:

  • Always requires special processing to use the data
  • Unfamiliar to most typical JSON users

Externalized Languages / Language Mapped to Property

Can't think of a decent, reliable way to do this? Maybe somebody can.