ISSUE-52: Lightweight DataStore aligned with ECMAScript [RDFa 1.1 API]

ISSUE-52: Lightweight DataStore aligned with ECMAScript [RDFa 1.1 API]

http://www.w3.org/2010/02/rdfa/track/issues/52

Raised by: Nathan Rixham
On product: RDFa 1.1 API

ISSUE-51 moves the create*** methods from DataStore to DataContext.

This proposal/issue aligns DataStore methods with ECMAScript-262 v5 to provide a familiar lightweight store which is constrained to remove unexpected functionality.

New IDL:

[NoInterfaceObject]
interface DataStore {
    readonly attribute unsigned long legnth;
    RDFTriple           get (in unsigned long index);
    void                add (in RDFTriple triple);
    void                merge(in DataStore store);
    sequence<RDFTriple> toArray();

    boolean             some(in RDFTripleFilter callback);
    boolean             every(in RDFTripleFilter callback);
    boolean             mentions(in IRI iri);

    DataStore           filter (in RDFTripleFilter filter);
    void                forEach (in RDFTripleCallback callback);

    DataIterator        iterator();
};

This removes DataStoreIterator and introduces the following new interface:

[NoInterfaceObject, Callback, Null=Null]
interface RDFTripleCallback {
    void run (in RDFTriple triple, in optional unsigned long index, in optional DataStore store);
}

And changes the definition of RDFTripleFilter to:

[NoInterfaceObject, Callback, Null=Null]
interface RDFTripleFilter {
    boolean match (in RDFTriple triple, in optional unsigned long index, in optional DataStore store);
}

And changes the definition of DataIterator to:

[NoInterfaceObject]
interface DataIterator {
    readonly attribute DataStore store;
    boolean hasNext();
    RDFTriple next();
};


The full list of changes is as follows:

DataStore: changed `size` to `length` to align with ECMAScript and most languages (array.length)
DataStore: changed return type of `add` to void as it is impossible to ever return boolean False.
DataStore: removed `getter` property of get method to remove indexed sequence functionality
DataStore: removed `clear` method, it isn't needed
DataStore: changed return type of `merge` to void as it was impossible to ever return boolean False.

DataStore: added toArray() method which returns a sequence<RDFTriple> (an Array in ECMAScript)
This exposes a lot of functionality to implementations (such as map and reduce, join etc) and will often be used - is there a good reason not to add this?

DataStore: Quantification methods
 - added .some (existential)
 - added .every (universal)
 - added .mentions (the most common existential needed - "does this store mention <foo>", many uses)
Both .some and .every are ECMAScript-262 v5 methods on Array, and these methods are aligned exactly thus providing useful for RDF and familiar to javascript methods.

DataStore: Aligned .filter() method with ECMAScript-262 v5 method on Array
removed `pattern` from filter(), all functionality this could implement can be handled by DataQuery or by RDFTripleFilter, `pattern` allows for non-standardized implementation specific functionality to be introduced, in fact it forces this to happen, best left to libraries if they want to provide it (even though it isn't needed).
removed redundant `element` from filter - no need/use.
made `RDFTripleFilter filter` non optional

RDFTripleFilter: Aligned with callback used on Array.filter/some/every in ECMAScript-262
Functionality is the same and is used in .filter, .some and .every, the method takes 3 params, the RDFTriple, the index and the store which holds the triple.

RDFTripleCallback: New, Aligned with callback used on Array.forEach in ECMAScript-262
This is essentially the same as RDFTripleFilter, however it returns void rather than boolean, thus compatible with strictly typed languages.

DataStoreIterator: Removed!
This is replaced with RDFTripleCallback, formerly DataStoreIterator accepted 4 params, index, subject, property, object - however this meant that the triple could not be accessed (for instance to use the .toString() method) so if using any other method in the api such as DataStore.add() a new RDFTriple would have had to be created. Changing the method to accept RDFTriple as param one exposes more functionality, saves the user code, and aligns with the rest of the API. 

DataStore: added `iterator` method and removed indexed behaviour
Indexed behaviour and array accessors [] raised an important issue over expected functionality, namely setting by index `store[23] = triple;`, in one case this meant editing the contents of the graph (something which couldn't be implemented with user expected functionality - i.e. changing the document source) and in another case it would make it almost impossible to prevent duplicates being added. From an implementation perspective it would be impossible in many languages, including ECMAScript to implement. Instead the DataStore should be treated like a typed int hash IntHash<RDFTriple> which nicely wraps the array/collection hiding any inadvisable methods; this can easily be implemented in any language. Iteration is still possible in two common forms:
  for(i=0;i<store.length;i++) {
    triple = store.get(i);
  }
and
  iterator = store.iterator();
  while((triple = iterator.next()) !== null) {
    // work with triple
  }
and of course there's the callback interface .forEach too, and further the .toArray() method which gives you the triples in a native structure/array.


DataIterator: Changed - In line with all of the above.
changed `store` attribute to readonly (changing the store at runtime will produce unexpected results)
removed attribute `root`, unused
removed attribute `filter`, no need, constrains DataIterator to only be used with DataStore.filter method
removed attribute `triplePattern`, redundant/unused
added hasNext() method to make iteration easier and prevent users making double calls to `next` by accident / getting confused.
usage:
  while(iterator.hasNext()) {
    triple = iterator.next();
  }
basic ECMAScript Datastore.iterator implementation:
  Datastore.prototype.iterator = function() {
    return {
      cur : 0,
      store : this,
      hasNext: function() { return cur < store.length; },
      next: function() { return store.get(cur++); }
    };
  }


Additional notes:
All of these changes (and previous + following) are fully implemented in a prototype and proven to work, these proposal/issues refine previous suggestions made on list.

In the prototype library (which has expanded functionality on top of the API) I've found the need for two more methods which I haven't mentioned above:
 - DataStore.contains( triple ); this complements the existential methods providing a common "is this triple in the store" method, like mentions this functionality can be implemented using .some
 - DataStore.apply( filter ); this is allows you to reduce the contents of a data store by filter, like filter() but replacing the contents rather than returning a new store. The reason I added in the library was for memory management, rather than creating a new store (in addition to the previous store) it simply shrinks the store.

Received on Wednesday, 27 October 2010 03:08:58 UTC