Re: New Editors draft of RDFa API spec from Manu Sporny on 2010-09-13 (public-rdfa-wg@w3.org from September 2010)

From: Manu Sporny <msporny@digitalbazaar.com>
Date: Sun, 12 Sep 2010 23:36:02 -0400
To: RDFa WG <public-rdfa-wg@w3.org>
Message-ID: <4C8D9C22.5040501@digitalbazaar.com>
On 09/08/2010 10:00 AM, Ivan Herman wrote:
> here are my comments. They are not in any particular order or, more
> exactly, they are mostly in the order the document; I just made notes
> as I went along. All in all: kudos:-)

Thanks :)

I've attempted to address all of your editorial suggestions... more below:

> ---- I miss much more cross links. If I see a term 'property group'
> somewhere, it should link to its definition. Shane did a marvellous
> job in the RDFa Core document in this respect, we should have that,
> too...

Fixed. I cross-referenced the f*#@ out of the document... may have
missed a couple, but now every mention of an interface is linked to it's
WebIDL description in the document. We may want to link the reference to
the section that describes it - let's try this out for now and see what
everyone thinks about it.

There are a few exceptions... I didn't cross reference when the general
concept was being discussed. For example in the Introduction and Goals
sections, there are no references to DataStore or Property Group because
we are introducing the concept in an abstract form at that point.

One annoying thing is that the CSS makes thinks like the 's' on
PropertyGroups look really bad. Perhaps some tweaks to the CSS are in
order, or a different styling will be necessary. I'll try to ping Shane
about this and see what he thinks.

> ---- The document uses the latest editor's draft version of WebIDL:
> 
> http://dev.w3.org/2006/webapi/WebIDL/
> 
> instead of the latest, official version
> 
> http://www.w3.org/TR/2008/WD-WebIDL-20081219/
>
> This is significant because, for example, 'stringifier' does _not_
> appear in the 2008 version, it took me some time to find it... I
> think the reference should refer to the editor's draft for now.

Fixed. The document now refers to the latest Editors Draft. We'll change
this back when WebIDL gets published as a TR.

> ---- The 1.2 example is a bit stange. It is correct, no doubt about
> it, but it is a way of using RDFa that might raise a lot of eyebrows.
> Indeed, it is really using <div>-s in nested manner to encode RDF
> without any textual content except for one paragraph.
> 
> Again, the example is correct. But a question that might come to a
> casual reader: why is the usage of RDFa justified for this example?
> One could encode the RDF code in, say, turtle or RDF/XML and let it
> retrieved run-time...
>
> Bottom line: maybe, for didactic reasons, using a different example
> might be better.

I've changed the example such that the human-readable and
machine-readable data is out in the open (not hidden).

> ---- 1.3: The indentation of the example with the dangling '<a' is a
> bit strange...

Fixed.

> ---- 1.4: indentation is missing

Fixed. Also fixed a bug in the markup from foaf:term_OnlineAccount to
foaf:OnlineAccount.

> ---- 1.5: why the nested span in the middle?

Just making sure the reviewers are paying attention. :P

Fixed.

> ---- 2. I wonder whether the right term to use in the document is
> JavaScript or ECMAScript. Looking at
> http://www.w3.org/TR/2010/CR-geolocation-API-20100907/ they use
> ECMAScript... Probably a global change should be done...

Done.

> B.t.w., There is a missing conformance requirement section. The
> geolocation document seems to contain a good one, also referring to
> the binding issues on webidl (ie, that the ecmascript binding to
> webidl must be used).
> 
> That also affects a bit the last paragraph in section 2. The WebIDL
> document includes a binding to Java, so it might be good to refer to
> the fact that Java implementations should use that one. Nothing can
> be said about the other languages, though...

Fixed. I added a conformance section to 4.1. This is fairly late in the
document, but every section up to that point is non-normative and
doesn't must MUST, SHOULD, etc. We can always move the conformance
section to the top of the document if people don't like how this reads.
I also added the Java clause as well as a clause stating that a best
effort should be performed for languages other than ECMASCript and Java.

> ---- 2.1. Goals: although there should be a general discussion on
> this, it may be worth emphasizing that not only the API allows for
> non-RDFa parsers to be used, but the interface offers some sort of a
> generic API to RDF...

Fixed. I added a sub-section called "A Modular RDF API" to try and
clarify this a bit more.

> Editorially: I thing forward links into the document would be useful
> for terms like property groups...

Fixed. Added forward links to Property Group, Property Group Templates
and Data Store.

> ---- 2.2 Concept diagram: I am not sure how, but it might be good to
> have on the diagram and the accompanying text, references to some of
> the 'sections' of the document. We use, for example, the term 'RDF
> Interfaces' in the text; maybe using the same term on the diagram
> would be good (if the diagram is in SVG, it should be a clickable
> link to the relevant section...). Same for the others and the text
> itself.

I agree with you in principle - things start to fall apart after that...

I tried SVG without all of the extra non-W3C shim code required to make
SVG work cross-browser. I tried to make native SVG work for 4 hours
straight one day... couldn't get it to work across all browsers - sizing
issues. I gave up. The source document is in SVG if someone would like
to give it a shot.

> ---- Examples for manageing elements with data: the terms 'nodes' and
> 'elements' are used interchangeably, and that is a bit disturbing. I
> know, they are 'Element Nodes' in DOM parlance but, nevertheless...
> let us try to keep to one (I do not really care which one, to be
> honest). Note that the specification uses 'node' (of course), so
> maybe we should keep to that.

Hmm... I tried to be very careful where Node and where Element is used
as they are not interchange-able per the DOM spec. Meaning, Node is a
super-class of Element. Sometimes we mean Node, sometimes we mean Element.

I did search through the document and found a few places where we used
Node when we should have used Element. I've corrected those, but the
rest seem to be correct. When we say DOM Node or DOM Element, we're
being very specific... don't know if there is a better way to express
that in the current spec.

> ---- Is the 'Issue' entry on 'Modifying DOM Element' still relevant?

Yes, we don't have a solution for this yet.

> ---- 3.2.1, first example uses the query, but it is not clear why one
> would use that one instead of
> 
> var people =
> document.getItemsByType("http://xmlns.com/foaf/0.1/Person");
> 
> which achieves the same thing.  I think another example should be
> found that cannot be expressed with the basic calls... Or making it
> clear that *that* particular call can be done in this case, but look
> at the examples below that become more complicated...

I implemented your section suggestion - we state that the particular
call can be done with getItemsByType(), but explain how it can also be
done w/ query to give people an idea of how the queries are similar.

> I must admit I had to look up what this foaf:myersBriggs property
> means. Can't we use a somewhat less esoteric example?

Could you provide a suggestion? It took me 2 hours to come up with and
implement that example for an advanced query :). I don't want to
implement something else unless we have some kind of general agreement
that the example is not esoteric. That and my brain hurts right now...
help? :)

> ---- PlainLiteral definition. Why does one need a stringifier for a
> value and not for the language attribute? Aren't both strings in the
> first place?

'stringifier' tells the language which value or method should be used if
the object is converted into a DOMString. In the PlainLiteral's case, if
one were to convert the PlainLiteral into a string, the value attribute
would be used to generate the string while the language attribute would
be ignored.

> Example later: it says that literal.toString() can be used. Where is
> that defined? 

literal.toString() would execute the stringifier behavior for the
interface if toString() is not defined on the interface. In other words,
'value' would be used.

> Put it another way, shouldn't it be defined on the RDF Node level so
> that it would be inherited by everyone?

It is defined on the RDF Node level. Perhaps we are mis-communicating?

> (Or is it
> defined in WebIDL in general and I just do not know it? Maybe worth
> emphasizing for outsiders like me...)

Understanding WebIDL is a pre-requisite of reading the spec. I know this
is somewhat esoteric and is deep WebIDL magic, so I've tried to
elaborate the RDF Node interface's value attribute to make it more
clear. Let me know if you think this is good enough.

> The comments says, for example, "The API supports direct assignment
> of PlaingLiteral values" but the example is not assignment but just
> accessing the attribute. Besides, those attributes are defined to be
> 'readonly'. Isn't something wrong?

Good catch... changed to read as "The API supports attribute-based
access of..."

> ---- TypedLiteralConverter interface: I do not understand what the
> targetType parameter is for. Either give a good (and convincing:-)
> example, or drop it if it has only a very restricted use...

Added an example and another method to the DataContext to aid
TypedLiteral conversion. We can't remove this Mark has a plan for
stating targetTypes for TypedLiteral converters. I don't fully
understand his plan, so the interface is a bit shoddy as I don't know
exactly how Mark wants to see it implemented. It's a bit clunky right
now... perhaps Mark has some insight into how we could make it cleaner.

> ---- DataStore interface: the latest version of Web IDL seems to have
> _dropped_ the [IndexGetter]. Quoting from appendix C:
>
> [[[ Turned [Callable], [IndexGetter], [IndexSetter], [IndexCreator],
> [IndexDeleter], [NameGetter], [NameSetter], [NameCreator],
> [NameDeleter] and [Stringifies] into real Web IDL syntax using
> caller, getter, setter, creator, deleter and stringifier. Dropped
> [NoIndexingOperations] in favor of an omittable keyword that can be
> used on the above six special operations. ]]]
> 
> this part should be redone then... which may mean that the method
> 'get' would be gone, too?

Fixed. The 'get' method doesn't get removed since it is the method that
is decorated with the 'getter' property.

> ---- Still the DataStore interface:
> 
> Just to specify it clearly: if I 'add' a triple twice, then the
> triple will not be repeated, right? 

That is correct.

> It sounds obvious but some triple
> stores (like AllegroGraph) does not do that check...

Clarified this in the spec in the documentation for the DataStore.add()
method.

> I continue to be puzzled by the filter method, ie, by the fact that
> it returns another DataStore, rather than just an array of
> RDFTriple-s. I just do not get it... The PropertyGroup, for example,
> returns a 'Sequence' argument, ie, it is possible to just return an
> array. This should be discussed.

We do this to support filter chaining, so you can do stuff like:

var abcStore =
document.data.store.filter(FILTER_A).filter(FILTER_B).filter(FILTER_C);

Keep in mind that you can only filter one subject, one property or one
object at a time. You may have an RDFTripleFilter function for each
FILTER_* that does things like "count but pass" triples. So FILTER_A and
FILTER_B could analyze each stage of the DataStore, but then FILTER_C
does the actual filtering based on data collected by FILTER_A and FILTER_B.

This is a /very/ advanced use case, but it allows very complex queries
to be done in fairly compact code.

We could achieve the same result by returning Sequences from the
DataStore.filter() method, but if we take that route, we have to make it
easy to construct a new DataStore... and the code is much more
verbose/bloated.

> I am not fully convinced about the necessity of having the 'forEach'
> method. Sure, I can see its utility, but its functionality can easily
> be programmed by a cycle through the triples of the store and it
> seems to add too much to the Data store interface. I would consider
> removing it altogether, including the DataStoreIterator interface.

Sure, we could remove forEach... but given the two choices - procedural
iteration through the DataStore, or a functional iteration through the
DataStore, I would personally pick the functional one more times than
the procedural one. Doing functional programming in Javascript happens
more naturally than in Python or many of the other functional-supporting
languages.

It's very difficult to explain this, but when I started out using
Javascript, I tended towards using procedural programming and it was
always very awkward. For some reason, programming in a more functional
way in Javascript ends up not biting you as much as programming in it
procedurally... and after a while, you start to enjoy using Javascript's
more functional aspects more often than the procedural ones. Our entire
engineering team went through this transition - hating it at first and
now it's something that is integral to the way we develop Javascript code.

So, while it's good to simplify... I'd be bothered by removing it at
this point in time... perhaps we should discuss this more as I don't
necessarily thing the explanation I give above is good enough to be used
as the reason we have the forEach interface.

> ---- Data Parser interface.
> 
> The current parser is defined for a DOM-like parser, eRDF, Microdata,
> whatever. But I would like to be able to have a turtle parser that
> takes a URI as an argument, rather than an Element. Would that be
> possible to do in WebIDL? In any case, it would really be good to
> have that extension point to any type of parsing...

This is ISSUE-44:

http://www.w3.org/2010/02/rdfa/track/issues/44

We have some ideas for making this happen... dangerous ideas that are
bound to scare people. :)

> I am not sure what the role of the store is for a DataIterator. They
> way I understand it:
> 
> - parse puts all the triples into the Store and then one used the
> DataStore interface - iterator just gives you the triples one after
> the other. The user 'may' decide to add it to a store, of course, but
> that is outside the realm of the iterator, isn't it?

Yes, that's correct.

> If so, I actually wonder whether the DataParser and the DataIterators
> are not two completely different interfaces, for different usages and
> it may be better to separate them altogether.

Perhaps... the division is fairly awkward at present. The idea is that
the DataParser has two modes of operation - read-and-store and
stream-and-discard:

parse() -> process the document and store every triple (read-and-store)
iterate() -> stream triples as they are found (stream-and-discard)

The first requires quite a bit of memory, the second is far more memory
efficient. Think desktop vs. smartphone.

> ---- Property group interface.
> 
> Editorial issue: the property group template section comes a bit out
> of the blue, because the query is defined later. I would expect this
> section to be moved down to the definition of Data Query...

Unfortunately, if we move it down there, people may not understand that
Property Groups are meant to be language-native containers for Linked
Data. Perhaps we need a better introduction to that section so it
doesn't come from out of the blue? Would that address your concern, Ivan?

> ---- Property group template section
> 
> I would think we should reuse the example on google snippet we have
> at the beginning of the document. Let us proliferate those examples
> too much...

Fixed.

> ---- That may be a stylistic issue: isn't it more logical to have the
> getItemsBy*** methods defined on the DocumentData interface rather
> than the RDFaDocument? 
> After all, those can be considered as
> shorthands for specific query methods. I may also move the
> getElementsBy* methods there for symmetry, though they are closer to
> the 'usual' DOM methods.

The reason those methods are on RDFaDocument is because RDFaDocument is
a supplemental interface to DOM Document. In other words, we expect
anybody that implements the RDFa API in a DOM environment to implement
those interfaces on the DOM Document object... this is so people can do
stuff like this:

document.getItemsByType(...);

which is supposed to parallel calls like this:

document.getElementsById(...);

You could say it's stylistic... we could move all the document data
calls to document.data, or we could get rid of those calls entirely.
IIRC, Mark felt strongly about this and I tend to agree with him, but
feel less strongly about it. I don't know how Benjamin feels about these
interfaces being on Document vs. DocumentData.

I've left them for now until we get more feedback.

Thanks for the thorough review, Ivan - it really helped a bunch :)

I'll publish a new Heartbeat-ready Working Draft in a few minutes.

-- manu

-- 
Manu Sporny (skype: msporny, twitter: manusporny)
President/CEO - Digital Bazaar, Inc.
blog: Saving Journalism - The PaySwarm Developer API
http://digitalbazaar.com/2010/09/12/payswarm-api/
Received on Monday, 13 September 2010 03:36:36 UTC