Re: ACTION-187: extensibility and parsing

On Mon, Oct 18, 2010 at 7:31 PM, Philip Jägenstedt <philipj@opera.com> wrote:
> On Wed, 29 Sep 2010 15:50:35 +0200, Raphaël Troncy
> <raphael.troncy@eurecom.fr> wrote:
>
>> Dear Philip,
>>
>>> As request, a short summary of the long standing issue of syntax,
>>> parsing and how that relates to extensibility.
>>
>> Thanks for having start up this thread. We have closed today the
>> action-187 (and 186) as you may have seen in the minutes. We have also
>> resolved that we should do the option 2 you described below (consensus from
>> all minus one neutral among the people who have expressed an opinion).
>>
>> We have further precise how the specification will manage extensibility:
>>
>> 1/ A media fragment URI is indeed a set of key/value pairs for which at
>> least one key is recognized by our grammar
>> 2/ The ABNF grammar that describes the media fragment syntax will be
>> edited (see ACTION-189) so that:
>>   . The production rule of 'mediasegment' is now:
>> mediasegment = namesegment / axissegment / extensionsegment
>> extensionsegment = extensionprefix '=' extensionparam
>>   . Additional prose states that 'extensionsegment' cannot redefine one of
>> the current axis, so e.g., extensionprefix cannot be 't' or 'track' or 'id'
>> or 'xywh'
>> 3/ We could add an additional paragraph stating how the parsing of the
>> media fragment URI should be done
>>
>>> How will implementations of MF 1.0 handle t=10,500&freq=300,3000 ? This
>>> is the core point of disagreement, and the question is really about how
>>> MF 1.0 parsers should work. Leaving it undefined is not a good option,
>>> as the history clearly shows.
>>
>> Indeed. With the current decision:
>>  . <uri>#t=10,500&freq=300,3000 will be a valid MF 1.0 URI
>>  . <uri>#freq=300,3000 will *NOT* be a valid MF 1.0 URI
>>
>>> 1. Require that parsing follow a strict ABNF syntax like the one we
>>> have. Since freq is not part of the MF 1.0 syntax, parsing
>>> t=10,500&freq=300,3000 will fail and the whole fragment will be ignored,
>>> including t=10,500.
>>>
>>> 2. Require that parsing follow an algorithm or a more forgiving ABNF
>>> syntax. The concrete suggestion I've made is that the algorithm or
>>> syntax should match how query strings work. That is, a list or key-value
>>> pairs is formed by splitting the string on & and =. As a second step,
>>> that list is traversed to match the keys against the dimensions and
>>> parsed according to the ABNF syntax of each dimension. Crucially,
>>> unrecognized/invalid keys or values are ignored. That means that in the
>>> above example, the time dimension will keep working even if an
>>> unrecognized (to a MF 1.0 implementation) freq dimension is used.
>>
>> Please comment on this decision stating either that you agree or you
>> disagree so that I can implement the ACTION-189.
>> Thanks.
>> Best regards.
>>
>>   Raphaël
>>
>
> Sorry for the delay, much traveling and an overflowing inbox takes its toll.
>
> I take it that the option 2 you refer to is this one:
>
> On Wed, 22 Sep 2010 13:12:14 +0200, Philip Jägenstedt <philipj@opera.com>
> wrote:
>
>> 2. Require that parsing follow an algorithm or a more forgiving ABNF
>> syntax. The concrete suggestion I've made is that the algorithm or syntax
>> should match how query strings work. That is, a list or key-value pairs is
>> formed by splitting the string on & and =. As a second step, that list is
>> traversed to match the keys against the dimensions and parsed according to
>> the ABNF syntax of each dimension. Crucially, unrecognized/invalid keys or
>> values are ignored. That means that in the above example, the time dimension
>> will keep working even if an unrecognized (to a MF 1.0 implementation) freq
>> dimension is used.
>
> I'm glad we've finally been able to agree on this, the question is now only
> how to put it in the spec. The parsing must have the following properties
> for this to be like query strings:
>
> * Percent-decoding must performed on both names and values. I don't think
> this can be expressed in a single layer of ABNF, and at the very least it
> isn't expressed in the ABNF we currently have.
>
> * When a name occurs twice, the last occurrence should be the one used.
> (#t=0&t=1 means 1 second)
>
> About the suggested extension of mediasegment above, it's problematic to
> require that extensionprefix not be 't', 'track', 'id' or 'xywh'. That would
> mean that #t=foo:1&t=1 wouldn't parse to 1 second, making it impossible to
> ever add additional time formats. Always ignoring invalid things is simpler.
>
> How should we move forwards with the spec editing practicalities. IMO,
> validity and parsing are sufficiently different that they can't simply be
> merged into a single ABNF. A validator should warn against all unknown
> name-values, while a user agent should ignore them.
>
> My suggestion:
>
> Introduce the ABNF in
> <http://lists.w3.org/Archives/Public/public-media-fragment/2010Aug/0005.html>
>
> By saying that the name and value are URI components, I believe it is
> implied that percent-decoding should be performed. In either case, the spec
> should say so for clarity.
>
> The name-value byte arrays should be decoded as UTF-8 to give unicode
> strings.
>
> Have a pair of ABNF for each of our dimensions that operate on these unicode
> strings.
>
> A valid MF is one where all name-value pairs match one of the predefined
> dimensions, but no name occurs twice.
>
> For parsing, one iterates over the list of name-value pairs, parsing any
> that are valid according to the ABNF. As a side-effect of the loop, the last
> valid pair of any dimension is the one that ends up being used.
>

Sounds like no-body objects. Is this being included in the spec now?

Silvia.

Received on Tuesday, 19 October 2010 21:14:16 UTC