Strawman API design and notes

From Media Annotations Working Group Wiki
Jump to: navigation, search

Introduction

This is all in the context of a given media resource, previously identified (e.g. DOM node, URL, URN...) the resource might be a concept identified by URN (e.g. "the movie casablanca")

There are two return types possible: unstructured and structured. If a property has a value for the resource, an unstructured value can always be returned. Unstructured values are strings and could, for example, be displayed. Structured values are machine-processable and have more rigidly-defined structure (e.g. an ISO 8709 Annex H GPS string). If the source doesn't have suitable data for a structured return, the mapping function may return nothing for a structured value request, even if an unstructured value exists and is returned.

A question concerns properties which might have referential (URI) values, in addition to or instead of direct values. Are they part of the structured return type? If so, what happens when we only have a URI and we are asked for the unstructured value? ("What is the name of the cameraman?" -> "http://www.example.com/~jones"??)

Perhaps every call returns a structure, of which one component is "unstructured value representation". This may be better. The API could return an object with standard fields, one of which is a field whose value is an object with property-name specific fields.

for working on a collection, similar calls to some here apply (e.g. get a list of all properties with values) we also need, however, a way to get resource identifiers for the resources in a collection. perhaps a relation type of 'member', or the like?

how do you tell whether you have the resource ID of a collection, a concept (as above), a specific resource, etc.? what about collections of collections (like a library of albums)?

if a resource can be externally annotated by meta-data documents, or user-data, they are 'attached' to the resource prior to these calls being made

question: what happens if multiple attachments of the same type are made? how are they identified below? These calls assume there is at most one attachment/source-data-block of any given type

We have a question over how to solve r09 (identifying provenance of metadata)

Getting information

get-mawg-unstructured-value( property-name,
   [source-format-filter],
    [sub-type-filter],
    [language-code-filter],
    [fragment-indicator])	// if used, must start with #
 returns list-of-values, the list elements might be (depending on the property name):
   {value}
   {value, sub-type} 
   {value, language}
   {value, sub-type, language}

Where the format of value is defined in the MAWG ontology for the given property-name, and where the valid sub-types are defined by MAWG for each property-name.

What happens if we filter on language when irelevant (e.g. for width & height)?

  • the filter is ignored

Any filtering by sub-type has to be relevant to the property, or you get an error/nothing

This fails to relay the precision of the underlying format through (e.g. for a name; (some systems know which is first/given and which second/family, some just have a string).

For user-defined tags, we may need to be able to indicate the user who defined each value

  • (how do we identify the user? presumably a userID that is scoped to the source of the resource)
  • perhaps a different pair of calls, with different filtering?
  • and there may be user-entered values for standard tags, as well as user custom tags
  • something like...:
get-user-unstructured-value( property-name,
    [userID-filter],
    [sub-type-filter],,
    [fragment-indicator])	// if used, must start with #
get-mawg-structured-value( params as above )...

The value type returned here depends on property-name, and the mawg defines what the structure is

  • and the mapping into that structure for any given source format
  • if the source data cannot be structured in the mawg, nothing comes back (whereas the unstructured call
  • above always succeeds if there is source data)

Iterating

get-property-names-that-have-values([source-format-filter])
   -- returns list of property-names, which, if queried, would give return at least one value
get-source-formats-that-have-at-least-one-property()
   -- returns a list of source formats which have at least one property with at least one value
get-original-data(source-format)		// the source data block without processing (e.g. ID3 tag block, XML)
   -- each mapping defines what you get back here (i.e. the return value is mapping specific)

Setting

Setters have huge issues. the notes below just start to scratch the surface


set ( property-name,
     value )			// always? additive, and value may be a duple or triple as above for returned values
remove (property-name, value )

Where are these edits happening? locally in memory, persistently but only from my point of view, or into the actual source resource? (The last has permissions issues)

  • What happens when I set something that can't be represented in the current formats? The API
  • automatically creates a tag in a format that can? Which one?
  • what happens if our semantics are broader than the target format's semantics for a given tag?
  • How do we distinguish add or replace?