ACTION-187: extensibility and parsing from Philip Jägenstedt on 2010-09-22 (public-media-fragment@w3.org from September 2010)

From: Philip Jägenstedt <philipj@opera.com>
Date: Wed, 22 Sep 2010 13:12:14 +0200
To: "Media Fragment" <public-media-fragment@w3.org>
Message-ID: <op.vjflelw2sr6mfa@kirk>

As request, a short summary of the long standing issue of syntax, parsing  
and how that relates to extensibility.

By extensibility I am not primarily talking about 3rd parties extending  
MF, but about our own possibilities of updating the spec after MF 1.0. For  
the purpose of discussion, assume that we want to add a dimension for  
filtering the audio, e.g., freq=300,3000 to keep only the part of the  
audio that corresponds (approximately) to human voice (300Hz-3000Hz).

How will implementations of MF 1.0 handle t=10,500&freq=300,3000 ? This is  
the core point of disagreement, and the question is really about how MF  
1.0 parsers should work. Leaving it undefined is not a good option, as the  
history clearly shows. Two other options have been on the table:

1. Require that parsing follow a strict ABNF syntax like the one we have.  
Since freq is not part of the MF 1.0 syntax, parsing  
t=10,500&freq=300,3000 will fail and the whole fragment will be ignored,  
including t=10,500.

2. Require that parsing follow an algorithm or a more forgiving ABNF  
syntax. The concrete suggestion I've made is that the algorithm or syntax  
should match how query strings work. That is, a list or key-value pairs is  
formed by splitting the string on & and =. As a second step, that list is  
traversed to match the keys against the dimensions and parsed according to  
the ABNF syntax of each dimension. Crucially, unrecognized/invalid keys or  
values are ignored. That means that in the above example, the time  
dimension will keep working even if an unrecognized (to a MF 1.0  
implementation) freq dimension is used.

Note: Neither 1 or 2 are requirements on using any specific implementation  
technique, only to behave *as if* you are, which still leaves plenty of  
room for different approaches.

I strongly favor option number 2, and see these benefits:

* It works like query strings, just like one would expect from looking at  
the syntax. The algorithm I've suggested is actually from testing query  
string parsing in PHP, ASP, ASP.NET, CGI.pl and JSP, as reported earlier  
on this list.

* It's simpler for implementors, as we won't have to implement everything  
at once. This is likely what's going to happen, as the time dimension is  
ready to implement, while the named dimension is still not clear how to  
apply to e.g. a WebM or Ogg resource.

* It's better for extensibility, as adding new dimensions doesn't break  
all existing implementations. Imagine if adding a new element to HTML  
would cause pages to render completely blank in all existing browsers. Not  
even XHTML is that strict.

Please comment, we need to reach some kind of consensus on this soon and  
move on. If we can agree on what we want, we can then discuss how to  
change the spec accordingly (algorithm or ABNF, etc...)

-- 
Philip Jägenstedt
Core Developer
Opera Software

Received on Wednesday, 22 September 2010 11:23:02 UTC