From Media Annotations Working Group Wiki
Jump to: navigation, search

Data Interchange format

Reviewed by Joakim Söderberg

This page is for discussing the issue of data interchange format. So far we have identified the following choices:
1) String

Description: A collection of zero or more Unicode characters

Example: "23 February", "Pierre-Antoine Champin"


2) URI only to identify vocabulary and the part in question

Description: From Dublin Core /RDF A description with a single statement, which uses a single value string and a vocabulary encoding scheme to describe the value. "The resource has the subject named 'Ornitology' from the Vocabulary Encoding Scheme"


@prefix dcterms: <> 
@prefix ex: <> .
 DescriptionSet ( 
  Description (  
   ResourceURI ( <> )  
   Statement (   
    PropertyURI ( dcterms:subject )   
    VocabularyEncodingSchemeURI ( ex:MyVocab )   
    ValueString ( "Ornitology" )  


3) String + URI (URI for identifying vocabularies)




4) String with en embedded URI

Description: the URI should be inside < >

Example: "Pierre-Antoine <>"

Comments: <pchampin> applications could not bother and display the whole string Werner: There should be two fields.

5) URI as Value

Description: From Dublin Core /RDF A description with a single statement, which uses a value URI to identify the value. "The resource has the subject identified by the URI"


@prefix dcterms: <> .
DescriptionSet ( 
 Description ( 
 ResourceURI ( <> ) 
 Statement (  
  PropertyURI ( dcterms:subject )  
  ValueURI ( <> )  

6) OO-like approach

Description: JSON is a language independent data-interchange format easy for humans and computers to read and write. It uses conventions that are used in the C-family if languages, including C, C++, C#, Java, Javascript, Perl adn Python.

JSON is built on two structures:

  • A collection of name/value pairs. Depending on language this can be realized as an object, record, struct, dictionary , hash table, keyed list or associated array.
  • An ordered list of values. In most languages, this is realized as an array, vector, list or sequence.

Comments: <ruben> Using objects we could wrap all the elements of our particular "value description model". This way the one who calls the API function does not need to deal with an extra grammar (e.g. DC-TEXT, DC-RDF, etc.).


(the return of our function will be the "value surrogate") - a value surrogate is either a literal value surrogate or a non-literal value surrogate

 - a literal value surrogate is made up of
   - exactly one value string
 - a non-literal value surrogate is made up of
   - zero or one value URIs
   - zero or one vocabulary encoding scheme URIs
   - zero or more value strings

- a value string is either a plain value string or a typed value string

 - a plain value string may be associated with a value string language
 - a typed value string is associated with a syntax encoding scheme URI

- a non-literal value may be described by another description "

If we were using Java it would be something like this:

//abstract class Value

//Class LiteralValue inherits from class Value

 ValueString literalValueString

//Class NonLiteralValue inherits from class Value

 String nonLiteralValueURI
 String nonLiteralValueVocabularyEncodingSchemeURI
 ValueString valueStrings [N]

//Class ValueString

 String string
 String stringLanguage
 String syntaxEncodingSchemeURI

<pierre-antoine> an option would be, as suggested at the last telecon, to return a JSON object with two attributes "uri" and "text". The burden on the developer would be low (append ".uri" or ".text" after the returned value), and the API would not have to "know" what is required.