Re: ACTION-187: extensibility and parsing from Silvia Pfeiffer on 2010-09-24 (public-media-fragment@w3.org from September 2010)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Fri, 24 Sep 2010 18:43:32 +1000
To: Philip Jägenstedt <philipj@opera.com>
Cc: public-media-fragment@w3.org
Message-ID: <AANLkTikuuWbzU8-Bq1eRhD8_6LEpb-FjXCm=2t4wi+bY@mail.gmail.com>
On Fri, Sep 24, 2010 at 6:09 PM, Philip Jägenstedt <philipj@opera.com>wrote:

> On Fri, 24 Sep 2010 06:56:33 +0200, Davy Van Deursen <
> davy.vandeursen@ugent.be> wrote:
>
>  Citeren Silvia Pfeiffer <silviapfeiffer1@gmail.com>:
>>
>>> On Wed, Sep 22, 2010 at 9:12 PM, Philip Jägenstedt <philipj@opera.com
>>> >wrote:
>>>
>>>  As request, a short summary of the long standing issue of syntax,
>>>> parsing
>>>> and how that relates to extensibility.
>>>>
>>>> By extensibility I am not primarily talking about 3rd parties extending
>>>> MF,
>>>> but about our own possibilities of updating the spec after MF 1.0. For
>>>> the
>>>> purpose of discussion, assume that we want to add a dimension for
>>>> filtering
>>>> the audio, e.g., freq=300,3000 to keep only the part of the audio that
>>>> corresponds (approximately) to human voice (300Hz-3000Hz).
>>>>
>>>> How will implementations of MF 1.0 handle t=10,500&freq=300,3000 ? This
>>>> is
>>>> the core point of disagreement, and the question is really about how MF
>>>> 1.0
>>>> parsers should work. Leaving it undefined is not a good option, as the
>>>> history clearly shows. Two other options have been on the table:
>>>>
>>>> 1. Require that parsing follow a strict ABNF syntax like the one we
>>>> have.
>>>> Since freq is not part of the MF 1.0 syntax, parsing
>>>> t=10,500&freq=300,3000
>>>> will fail and the whole fragment will be ignored, including t=10,500.
>>>>
>>>> 2. Require that parsing follow an algorithm or a more forgiving ABNF
>>>> syntax. The concrete suggestion I've made is that the algorithm or
>>>> syntax
>>>> should match how query strings work. That is, a list or key-value pairs
>>>> is
>>>> formed by splitting the string on & and =. As a second step, that list
>>>> is
>>>> traversed to match the keys against the dimensions and parsed according
>>>> to
>>>> the ABNF syntax of each dimension. Crucially, unrecognized/invalid keys
>>>> or
>>>> values are ignored. That means that in the above example, the time
>>>> dimension
>>>> will keep working even if an unrecognized (to a MF 1.0 implementation)
>>>> freq
>>>> dimension is used.
>>>>
>>>> Note: Neither 1 or 2 are requirements on using any specific
>>>> implementation
>>>> technique, only to behave *as if* you are, which still leaves plenty of
>>>> room
>>>> for different approaches.
>>>>
>>>> I strongly favor option number 2, and see these benefits:
>>>>
>>>> * It works like query strings, just like one would expect from looking
>>>> at
>>>> the syntax. The algorithm I've suggested is actually from testing query
>>>> string parsing in PHP, ASP, ASP.NET, CGI.pl and JSP, as reported
>>>> earlier
>>>> on this list.
>>>>
>>>> * It's simpler for implementors, as we won't have to implement
>>>> everything
>>>> at once. This is likely what's going to happen, as the time dimension is
>>>> ready to implement, while the named dimension is still not clear how to
>>>> apply to e.g. a WebM or Ogg resource.
>>>>
>>>> * It's better for extensibility, as adding new dimensions doesn't break
>>>> all
>>>> existing implementations. Imagine if adding a new element to HTML would
>>>> cause pages to render completely blank in all existing browsers. Not
>>>> even
>>>> XHTML is that strict.
>>>>
>>>> Please comment, we need to reach some kind of consensus on this soon and
>>>> move on. If we can agree on what we want, we can then discuss how to
>>>> change
>>>> the spec accordingly (algorithm or ABNF, etc...)
>>>>
>>>
>>>
>>>
>>> I also strongly favor option number 2. I don't think anything else makes
>>> sense, actually, because we would fail to interoperate with  other
>>> schemes
>>> that use fragments and queries on media resources. Only name-value pairs
>>> that do not parse according to our ABNF will be ignored from the
>>> viewpoint
>>> of media fragments. They can be used by the browser or server for other
>>> purposes.
>>>
>>
>> Same opinion here, option 1 doesn't seem to make sense. However, should we
>> allow any unknown constructions in the URI fragment or
>> just key-value pairs with an unknown key? For example:
>> - t=10,500&freq=300,3000: should be a valid fragment IMO, as indicated by
>> Philip's arguments;
>> - t=10,500&foo: is this a valid media fragment? According to Philip's
>> parsing algorithm, I think it is not. From an extension point
>> of view, disallowing such a construction should be fine since we can
>> rewrite this as t=10,500&foo=true if we want to obtain
>> key-value pairs. Note that I'm not in favor of allowing other things than
>> key-value pairs, I just wanted to point out this case.
>>
>
> The ABNF I suggested in <
> http://lists.w3.org/Archives/Public/public-media-fragment/2010Aug/0005.html>
> isn't complete, it's just the first level defining name-value pairs. I think
> that we should define validity in a way that makes validators warn about
> things that aren't part of MF 1.0, to help authors find typos, etc. There
> are many ways we could achieve that spec-wise, if we agree on what we want.
> Validity and parsing can and should be separate, so we don't need to agree
> on exact details for the purposes of this discussion.


Assuming everyone is on board with that (which, of course, isn't clear yet)
- would you be able to come up with spec text for this? You seem to have an
idea in your head already what it should look like, so it would be good to
build on that.

Cheers,
Silvia.
Received on Friday, 24 September 2010 08:52:41 UTC