Re: ACTION-187: extensibility and parsing

On Wed, Sep 22, 2010 at 9:12 PM, Philip Jägenstedt <philipj@opera.com>wrote:

> As request, a short summary of the long standing issue of syntax, parsing
> and how that relates to extensibility.
>
> By extensibility I am not primarily talking about 3rd parties extending MF,
> but about our own possibilities of updating the spec after MF 1.0. For the
> purpose of discussion, assume that we want to add a dimension for filtering
> the audio, e.g., freq=300,3000 to keep only the part of the audio that
> corresponds (approximately) to human voice (300Hz-3000Hz).
>
> How will implementations of MF 1.0 handle t=10,500&freq=300,3000 ? This is
> the core point of disagreement, and the question is really about how MF 1.0
> parsers should work. Leaving it undefined is not a good option, as the
> history clearly shows. Two other options have been on the table:
>
> 1. Require that parsing follow a strict ABNF syntax like the one we have.
> Since freq is not part of the MF 1.0 syntax, parsing t=10,500&freq=300,3000
> will fail and the whole fragment will be ignored, including t=10,500.
>
> 2. Require that parsing follow an algorithm or a more forgiving ABNF
> syntax. The concrete suggestion I've made is that the algorithm or syntax
> should match how query strings work. That is, a list or key-value pairs is
> formed by splitting the string on & and =. As a second step, that list is
> traversed to match the keys against the dimensions and parsed according to
> the ABNF syntax of each dimension. Crucially, unrecognized/invalid keys or
> values are ignored. That means that in the above example, the time dimension
> will keep working even if an unrecognized (to a MF 1.0 implementation) freq
> dimension is used.
>
> Note: Neither 1 or 2 are requirements on using any specific implementation
> technique, only to behave *as if* you are, which still leaves plenty of room
> for different approaches.
>
> I strongly favor option number 2, and see these benefits:
>
> * It works like query strings, just like one would expect from looking at
> the syntax. The algorithm I've suggested is actually from testing query
> string parsing in PHP, ASP, ASP.NET, CGI.pl and JSP, as reported earlier
> on this list.
>
> * It's simpler for implementors, as we won't have to implement everything
> at once. This is likely what's going to happen, as the time dimension is
> ready to implement, while the named dimension is still not clear how to
> apply to e.g. a WebM or Ogg resource.
>
> * It's better for extensibility, as adding new dimensions doesn't break all
> existing implementations. Imagine if adding a new element to HTML would
> cause pages to render completely blank in all existing browsers. Not even
> XHTML is that strict.
>
> Please comment, we need to reach some kind of consensus on this soon and
> move on. If we can agree on what we want, we can then discuss how to change
> the spec accordingly (algorithm or ABNF, etc...)



I also strongly favor option number 2. I don't think anything else makes
sense, actually, because we would fail to interoperate with  other schemes
that use fragments and queries on media resources. Only name-value pairs
that do not parse according to our ABNF will be ignored from the viewpoint
of media fragments. They can be used by the browser or server for other
purposes.

Cheers,
Silvia.

Received on Wednesday, 22 September 2010 12:13:35 UTC