RE: Defining a common convention for marking up JSON from Michael Pizzo on 2013-09-24 (public-linked-json@w3.org from September 2013)

From: Michael Pizzo <mikep@microsoft.com>
Date: Tue, 24 Sep 2013 21:36:31 +0000
To: "public-linked-json@w3.org" <public-linked-json@w3.org>
Message-ID: <f69aaa71c80144e886c2f4d67bb9cdd9@BN1PR03MB220.namprd03.prod.outlook.com>
Picking back up on this thread (how quickly time passes).



I think we got a bit off track by talking about whether keywords were "data" or "markup". Regardless of whether we consider information like context, type, etc., markup, we have the basic problem that we want to add information to the JSON payload that doesn't conflict with the data members (or "terms") being serialized.



The fact that JSON-LD prefixes keywords with "@" and recommends that terms NOT start with "@" illustrates the point exactly. We are trying to avoid conflicts between term names, keywords, and other potential (important) information about the collection of terms.



Prefixing such "keywords" with "@" works for one definer of keywords (i.e., those defined by JSON-LD), but if other parties want to add additional keywords to the same payload, without conflicting with term names, JSON-LD, OR EACH OTHER, they also have to come up with a convention. Which means that someone wanting to persist terms compatible with multiple systems needs to know about, and avoid, multiple conventions.



Again, if we take the example of XML, one thing XML does very well is allow a payload to be made up of elements defined by multiple different parties, with multiple different purposes. By namespacing the element names, these elements can all co-exist and processors understand exactly how to process the results according to their own set of rules.



JSON-LD and OData are both *so* close to this model. Both JSON-LD and OData are adding properties to the JSON payload that we need to make sure don't conflict with the terms being serialized OR EACH OTHER. JSON-LD choses prefixing keywords with "@" to try and avoid conflicts with term names, but this is not yet common nor general. OData defines a common mechanism of qualifying the names to allow multiple different parties to add content without conflicts, but relies on the simple presence of a dot in the name to distinguish term names from namespaced keywords (which precludes term names from containing dots).



If we could both bend just a little; if OData would prefix keywords with "@odata.*", and JSON-LD would prefix keywords with "@jsonld.*" then we could share a convention for adding keywords that don't conflict with term names without conflicting between the two (i.e., @odata.context is different from @jsonld.context). At the same time we could allow other parties to add information to the payload (i.e., "@org.iso.unitsofmeasure":"meters") and further reduce the likelihood of having term names conflict with any of these keywords (i.e., even if there was a term named "@context" it wouldn't conflict with JSON-LD's "@jsonld.context).



The OASIS OData Technical Committee has signed off on a Committee Specification 01 as the final OData specification ready for standardization. I have recommended that the Technical Committee delay standardization by producing a Committee Specification 02 that aligns with JSON-LD by prefixing all annotations with "@". So all OData keywords within a JSON payload with start with "@odata.", and other annotations that follow the convention will start with "@{namespace}".



It would be fantastic if JSON-LD would support this convention by prefixing keywords with "@jsonld.*".



On Friday, August 30, 2013 12:35 AM, Michael Pizzo wrote:

> Thanks for the quick response and thoughts Markus.  I'm glad to see,

> from the responses so far, that there is interest in exploring some

> type of alignment.



There definitely is but please bear with me till I fully understand the

"problem" because at the moment I think I can't see it.



[...]

>> (please

>> note that it is fine to use properties starting with an @ as long as it

is

>> not a defined keyword from a JSON-LD perspective).

>

> I was wondering if the list of keywords was hard-coded or if the

> @ prefix were a general mechanism. There are advantages to both,

> of course; one is less restrictive for general property names and

> the other is more extensible.



The list is hardcoded. There's however the following statement in the spec

discouraging third parties from defining new keywords:



    To avoid forward-compatibility issues, a term SHOULD NOT start with

    an @ character as future versions of JSON-LD may introduce additional

    keywords.





>> Also, I don't think (at

>> least for JSON-LD) that we can differentiate between "markup" and "data".

>> It's not like HTML where you just markup some text. Losing, e.g., an

>> identifier of an entity is not really desired and most people wouldn't

>> classify that as markup - at least I wouldn't.

>

> Markup may be a poor choice of words. The general idea is that there is

> "data" and "meta" or "control information" (such as type, etc.). A simple

> JSON processor wouldn't know what to do with type, and wouldn't have to;

> it could just skip it.



That's the part where I think we disagree most. In JSON-LD the type, the

language etc. are part of the data. There's no markup (well, you could argue

that @index is markup, but that's really it). That's one thing. The other

thing is that a "simple JSON processor" doesn't know what to do with any of

the properties. The whole document is an opaque structure. All it can do, is

to transform a string into an in-memory representation.



It depends on the application on top of the JSON parser to interpret the

data. Unfortunately, that application has to depend on out-of-band

information to be able to interpret it. JSON-LD tries to bring that

information in band (just as OData does) by making the data unambiguous.

AFAIK OData does much more since it also defines service interfaces etc. And

that's probably the reason why you are talking about "markup". In JSON-LD

you would need a separate vocabulary to describe that "metadata". LDP [1] is

such a vocabulary, Hydra another one [2-3].





> Even for the identifier, a general control that's just trying to paint

> data on a screen may be perfectly fine ignoring the identifier for an

> entity.



Right, just as you say it *may* be fine ignoring the identifier. But you

don't know. It is up to that application to decide which *data* it renders

and which it ignores.





> It's only a consumer that understands that this JSON is JSON-LD, and wants

> to do something like link to the object, that cares about the identifier.



That can be said about every other property as well.





> That doesn't mean it's not there for consumers that do care about it, just

> that a namespacing mechanism for properties enables generic parsers to be

> trained to look for the meta-information they care about and ignore the

rest.



The question is whether that's metadata or not. Would you classify the

primary key in a DB record as metadata? I wouldn't. Of course, an

application might ignore it nevertheless because it doesn't need it.





[...]

>> I haven't had a look at the latest OData draft yet, but how does a

processor

>> know what odata (or any other prefix) stands for? Who owns it? Is there a

>> central registry for those prefixes?

>

> Good question. The answer today is currently somewhat specific to OData

> ("odata" is reserved, and the document references a metadata document that

> defines the prefixes).



Does OData still use application/json as media type? If that's the case, how

would a processor know whether this is really intended to be OData or

whether someone just accidently called a property odata.something? JSON-LD

doesn't redefine the semantics of existing JSON. It has its own media type

(application/ld+json) which defines the semantics of those keywords in such

a document. If you want to serve it as JSON, you would have to associate a

context to it (using an HTTP link header with a very specific relation). So

there's no risk of overwriting other namespaces as OData does. Everything is

visible at the HTTP level.





> This is certainly an area that we could collaborate on as well. We could

define

> a registry of well-known prefixes, together with a mechanism like XML has

to

> define ad-hoc prefixes.



We already have such a mechanism, the context. It's completely

decentralized. You can host your context on any site and reference it from

any JSON-LD document.





>> You can do that already, although you would have to add a context (which

in

>> the case of a JSON document could also be referenced by an HTTP Link

header

>> [1]) aliasing the keywords [2]. For the sake of simplicity, I embed it

>> directly in the following example:

>>

>> {

>>   "@context": [

>      { "jsonld.id": "@id" },

>      "http://json-ld.org/contexts/person.jsonld"

>>   ],

>>   "jsonld.id": "http://dbpedia.org/resource/John_Lennon",

>>   "name": "John Lennon",

>>   "born": "1940-10-09",

>>   "spouse": "http://dbpedia.org/resource/Cynthia_Lennon"

>> }

>

> Interesting. So (except for context) you could make the JSON-LD keywords

> information look like ODATA-JSON annotations.



Exactly





> That's actually really encouraging, but still feels like a one-off

> for making JSON-LD look (mostly) like OData JSON, and not a general

> solution for custom/third party annotations.



Why not? Just define the context and host it at a well-known location such

as, e.g., http://odata.org/context.jsonld. Everyone who wants to use JSON-LD

in that way, then simply references that context and that's it. Documents

that aren't using those keywords, can be automatically transformed to do so

by our API [4].





[...]

> I'm sure we could train processors to understand both OData's JSON format

> and JSON-LD as one-offs, but the problem becomes when the next JSON-based

> format comes along and defines their own way to add control information.

> Or, when someone simply wants to add custom annotations to a JSON payload.



That's exactly why there exist media types. You cannot override the

semantics of an existing media type. Of course you can define in your spec

that all properties starting with "odata." mean something very specific for

a OData processor but such a processor wouldn't have any way to find out if

that is really what the author intended if they are served as

application/json. The author just tells you that it is JSON. If you go and

look up RFC4627 which defines application/json you obviously won't find

anything about "odata.".





> A namespacing mechanism allows a processor to understand a single, simple

rule

> (like names containing a dot are namespaced) and anybody can add their own

> specific information to a payload, without worrying about conflicts.

> Processors/applications can pick and choose what they want to pay

attention to.



Right, but that rule has to be defined at the media type level. We define it

for application/ld+json. We can try to align that with what you are doing

but we cannot force that on anyone using application/json. It is not under

our control.





>>> JSON parsers would have a common way to differentiate

>>> markup from data, and could consume/ignore/expose whatever markup they

>>> chose.

>>

>> As already sais above, I don't think we can differentiate between markup

and

>> data in JSON-LD.

>

> Really? I think it would be very useful for a general JSON processor to

> recognize the data properties of a JSON-LD payload, even if just to paint

it

> on a screen, without needing to know/understand/ignore all of the JSON-LD

> keywords.



Yeah really :-) A general JSON processor will never recognize any property.

They are all opaque for a JSON processor. We have to talk about JSON-LD

processors and OData processors and see how we can align them.





> Again, thanks for taking the time for a detailed response. I actually

> learned a lot, and am encouraged that there may be a happy path here.

> I hope my answers above make sense, and help clarify the goal of moving

> from static, predefined keywords in each JSON-based format to a general,

> extensible, customizable annotation mechanism that everyone can

> use/understand.



They definitely helped me to understand your position. I think the key

difference between JSON-LD and OData is that OData does have metadata

properties whereas JSON-LD keywords are solely used as syntactic constructs

to express data. JSON-LD's goal is to make the data self-descriptive and

eliminate out of band information. It does not define service interfaces as

OData does. As such, I think there's no metadata in JSON-LD documents that

could be ignored without losing information, but there is in OData

documents. Is that classification correct? If so, it would be very valuable

to at least be able to interpret OData *data* as JSON-LD.







[1] http://www.w3.org/TR/ldp/

[2] http://www.w3.org/community/hydra/

[3] http://www.markus-lanthaler.com/hydra/

[4] http://www.w3.org/TR/json-ld-api/





--

Markus Lanthaler

@markuslanthaler
Received on Tuesday, 24 September 2013 21:37:17 UTC