Re: ACTION-112 follow-up: Conrad's vs current's proposal for Media Fragment Processing from Conrad Parker on 2009-12-08 (public-media-fragment@w3.org from December 2009)

From: Conrad Parker <conrad@metadecks.org>
Date: Tue, 8 Dec 2009 09:50:53 +0900
To: Raphaël Troncy <Raphael.Troncy@cwi.nl>
Cc: Media Fragment <public-media-fragment@w3.org>
Message-ID: <dba6c0830912071650v2d21aa4bq6a6189f9b65c5045@mail.gmail.com>
2009/12/7 Raphaël Troncy <Raphael.Troncy@cwi.nl>:
> Hi Conrad,
>
>>> As you will read in the minutes of today's telecon, we distinguish two
>>> (ideal) cases (ignoring the fallback plans):
>>>  - case 1: the URI fragment can be resolved directly with the server help
>>> and only one roundtrip (HTTP request / HTTP response) takes place
>>>  - case 2: we introduce a hack so that the URI fragment can be cached in
>>> current proxies, two roundtrips take place
>>
>> what hack?
>
> We mainly introduce the 4-ways handshake (aka two roundtrips) in the spec so
> that current caches deployed on the web can actually cache fragments since
> they understand only byte ranges so far. We could speculate in the future
> that caches will understand ranges in other units and therefore be also able
> to cache a HTTP range request directly expressed in seconds. In this later
> case, the 4-ways handshake is no longer necessary ... thus the hack.

So the mechanism you are referring to is the use of HTTP/1.1
byte-ranges, in order to take advantage of HTTP/1.1 byte-range
caching.

By "hack" you mean that you consider it a temporary workaround. I find
this insulting because I believe that using HTTP/1.1 byte-range
caching is a good design decision, and one that I have been
advocating.

Most caches can remember a particular byte range that was already
seen, and serve it again, or serve out part of one previously seen
byte range. That is already useful to us, as it allows media fragment
data to be cached from the outset. By using existing byte-range
caching mechanisms we are also using the same data that is retrieved
by standard video players that use progressive download and byte-range
requests for seeking. Normal video playback and scrubbing populates
the same caches (keyed against the same URLs) as accessing our media
fragment schemes, if they use byte-ranges for the main data transfer.

To be most useful to us, caches should also do byte-range recombining,
which involves using multiple cached byte-ranges to serve a request
that spans them. This is in development (eg. in squid) and will take a
few more years to be fully available across the internet. One of the
most useful reasons for deploying this caching feature is video.

To put this in perspective, the possibility of byte-range caching has
been around since HTTP/1.1 was introduced. Byte-range recombining is
far simpler than any caching scheme based on media fragments, where
some of the fragment identifier has taken over the Range header. I
would expect a deployment window of 10 years for caching on media
fragment Ranges. I would also expect resistance from CDNs for any
scheme that removes the ability to cache byte ranges of arbitrary
data, especially when the scheme particularly impacts video.

I don't consider using byte-ranges for the main data transfer to be a "hack".

>>> For the case 1, it seems that your proposal and the current's proposal
>>> are
>>> similar except that:
>>>  . you introduce two new headers ('Fragment' and 'Content-Fragment')
>>>  . your HTTP response is a 200 (and not a 206) and Yves argues that the
>>>  chances that a cached fragment will be reused and served from the cache
>>> is
>>> pretty low [3].
>>>
>>> *Question:* could you please argue and explain what advantages the
>>> introduction of these two new headers bring?
>>
>> To argue advantages, please tell me what to compare against. As I
>> understand, no other mechanism has been proposed for dealing with
>> textual media fragments (track, id etc.) via HTTP.
>
> We are trying to compare your approach that introduces the headers
> 'Fragment' and 'Content-Fragment' and the one written in the spec. Indeed,
> the 'track' and 'id' dimensions are tricky and badly described so far. What
> makes you think that we will not be able to 'invent' units to address
> 'track' or even 'id' directly in a HTTP range request?

Obviously it's possible to imagine new units for Range.

The advantage of keeping things like the HTTP request representation
of "track=audio" out of Range is that you can keep Range available for
its current/intended purpose of transporting ranges of data.

The mechanism I proposed (a separate Fragment request/Content-Fragment
response header pair) would work and is cacheable on the current web.

The mechanism in the current spec is not cacheable at all until the
entire web changes to support an as-yet-undefined caching mechanism.

Basically I'm viewing things like "track=audio" the same way that
Language representations are handled. It works and it doesn't
interfere with caching.

>>> For the case 2, both approaches require two round-trips:
>>>  . Yves argues that we should use a 307 response code for the first
>>> roundtrip (instead of a 200)
>>>  . The current proposal misses the information about the real time range
>>> it
>>> identifies when the bytes range request is issued. Should we simply fix
>>> it
>>> by adding 2 Range headers: one in bytes and one in e.g. time:npt?
>>
>> The byte-range request is a mechanism for retrieving some data. The UA
>> knows why it is retrieving that data, ie. it is the data corresponding
>> to an earlier request for a URI including a time range.
>
> Yes, but there are damned caches in the middle which might be interested in
> storing more information about the request in order to optimize the next
> similar one.

In the scheme I'm proposing, the first HTTP request receives the
information about byte ranges that make up a response. A cache in the
middle can see that mapping at that point and start optimizing its
behavior.

On the other hand, the whole point of the second HTTP request is that
it is not identifiable as some special "media fragments" request, it
is simply a byte-range request for part of a file. It is
indistinguishable from similar requests made for other purposes
(seeking, resuming a download etc.) and can use and populate the same
caches.

>>> *Question:* is the role of the new headers introduced by Conrad
>>> ('Range-Refer' and 'Accept-Range-Refer') similar to the new headers
>>> introduced in the current proposal ('Range-Redirect' and
>>> 'Accept-Range-Redirect')?
>>
>> I renamed my Range-Redirect proposal to Range-Refer after some
>> feedback from this WG. It was modelled on a simpler Range-Redirect
>> mechanism from Annodex, which only handled changes of header data.
>
> OK, so I understand from your reply that:
>  - conrad:Range-Refer = spec:Range-Redirect
>  - conrad:Accept-Range-Refer = spec:Accept-Range-Redirect
> I don't really care at the moment as how we will finally name the headers,
> I'm just trying to spot the similarities and differences of both approaches.

I introduced the ability to have multiple such headers, and that the
body of the first response can be used to provide data for any of the
response, such as tailers (ie. not limited to providing a single block
of header data).

cheers,

Conrad.
Received on Tuesday, 8 December 2009 00:51:26 UTC