Re: ACTION-187: extensibility and parsing from Philip Jägenstedt on 2010-10-18 (public-media-fragment@w3.org from October 2010)

From: Philip Jägenstedt <philipj@opera.com>
Date: Mon, 18 Oct 2010 10:31:16 +0200
To: public-media-fragment@w3.org
Message-ID: <op.vkrjaeyksr6mfa@kirk>
On Wed, 29 Sep 2010 15:50:35 +0200, Raphaël Troncy  
<raphael.troncy@eurecom.fr> wrote:

> Dear Philip,
>
>> As request, a short summary of the long standing issue of syntax,
>> parsing and how that relates to extensibility.
>
> Thanks for having start up this thread. We have closed today the  
> action-187 (and 186) as you may have seen in the minutes. We have also  
> resolved that we should do the option 2 you described below (consensus  
> from all minus one neutral among the people who have expressed an  
> opinion).
>
> We have further precise how the specification will manage extensibility:
>
> 1/ A media fragment URI is indeed a set of key/value pairs for which at  
> least one key is recognized by our grammar
> 2/ The ABNF grammar that describes the media fragment syntax will be  
> edited (see ACTION-189) so that:
>    . The production rule of 'mediasegment' is now:
> mediasegment = namesegment / axissegment / extensionsegment
> extensionsegment = extensionprefix '=' extensionparam
>    . Additional prose states that 'extensionsegment' cannot redefine one  
> of the current axis, so e.g., extensionprefix cannot be 't' or 'track'  
> or 'id' or 'xywh'
> 3/ We could add an additional paragraph stating how the parsing of the  
> media fragment URI should be done
>
>> How will implementations of MF 1.0 handle t=10,500&freq=300,3000 ? This
>> is the core point of disagreement, and the question is really about how
>> MF 1.0 parsers should work. Leaving it undefined is not a good option,
>> as the history clearly shows.
>
> Indeed. With the current decision:
>   . <uri>#t=10,500&freq=300,3000 will be a valid MF 1.0 URI
>   . <uri>#freq=300,3000 will *NOT* be a valid MF 1.0 URI
>
>> 1. Require that parsing follow a strict ABNF syntax like the one we
>> have. Since freq is not part of the MF 1.0 syntax, parsing
>> t=10,500&freq=300,3000 will fail and the whole fragment will be ignored,
>> including t=10,500.
>>
>> 2. Require that parsing follow an algorithm or a more forgiving ABNF
>> syntax. The concrete suggestion I've made is that the algorithm or
>> syntax should match how query strings work. That is, a list or key-value
>> pairs is formed by splitting the string on & and =. As a second step,
>> that list is traversed to match the keys against the dimensions and
>> parsed according to the ABNF syntax of each dimension. Crucially,
>> unrecognized/invalid keys or values are ignored. That means that in the
>> above example, the time dimension will keep working even if an
>> unrecognized (to a MF 1.0 implementation) freq dimension is used.
>
> Please comment on this decision stating either that you agree or you  
> disagree so that I can implement the ACTION-189.
> Thanks.
> Best regards.
>
>    Raphaël
>

Sorry for the delay, much traveling and an overflowing inbox takes its  
toll.

I take it that the option 2 you refer to is this one:

On Wed, 22 Sep 2010 13:12:14 +0200, Philip Jägenstedt <philipj@opera.com>  
wrote:

> 2. Require that parsing follow an algorithm or a more forgiving ABNF  
> syntax. The concrete suggestion I've made is that the algorithm or  
> syntax should match how query strings work. That is, a list or key-value  
> pairs is formed by splitting the string on & and =. As a second step,  
> that list is traversed to match the keys against the dimensions and  
> parsed according to the ABNF syntax of each dimension. Crucially,  
> unrecognized/invalid keys or values are ignored. That means that in the  
> above example, the time dimension will keep working even if an  
> unrecognized (to a MF 1.0 implementation) freq dimension is used.

I'm glad we've finally been able to agree on this, the question is now  
only how to put it in the spec. The parsing must have the following  
properties for this to be like query strings:

* Percent-decoding must performed on both names and values. I don't think  
this can be expressed in a single layer of ABNF, and at the very least it  
isn't expressed in the ABNF we currently have.

* When a name occurs twice, the last occurrence should be the one used.  
(#t=0&t=1 means 1 second)

About the suggested extension of mediasegment above, it's problematic to  
require that extensionprefix not be 't', 'track', 'id' or 'xywh'. That  
would mean that #t=foo:1&t=1 wouldn't parse to 1 second, making it  
impossible to ever add additional time formats. Always ignoring invalid  
things is simpler.

How should we move forwards with the spec editing practicalities. IMO,  
validity and parsing are sufficiently different that they can't simply be  
merged into a single ABNF. A validator should warn against all unknown  
name-values, while a user agent should ignore them.

My suggestion:

Introduce the ABNF in  
<http://lists.w3.org/Archives/Public/public-media-fragment/2010Aug/0005.html>

By saying that the name and value are URI components, I believe it is  
implied that percent-decoding should be performed. In either case, the  
spec should say so for clarity.

The name-value byte arrays should be decoded as UTF-8 to give unicode  
strings.

Have a pair of ABNF for each of our dimensions that operate on these  
unicode strings.

A valid MF is one where all name-value pairs match one of the predefined  
dimensions, but no name occurs twice.

For parsing, one iterates over the list of name-value pairs, parsing any  
that are valid according to the ABNF. As a side-effect of the loop, the  
last valid pair of any dimension is the one that ends up being used.

-- 
Philip Jägenstedt
Core Developer
Opera Software
Received on Monday, 18 October 2010 08:33:33 UTC