UA Server HTTP Communication

From Media Fragments Working Group Wiki
Jump to: navigation, search

Proposed Solutions

There are two different ways of retrieving the media fragments.

  • By using custom HTTP range units, defined along the different axis suitable to describe media fragments. This can be translated in one single roundtrip.
  • Via the creation of custom resources that will direct the client on how and where to get all the data needed to construct a fragment. This generally means at least 2 roundtrips.

Unfortunately, no approach is vastly superior, so the solution might be to use both, depending on which problem an Web application is trying to solve. Other concerns to deal with are the cachability of the resource.

  • Conrad proposed to use the term optional byte-range redirection instead of handshake. Yves argued that this is not really a byte-range redirection since there is no 3xx code used here. Hence we now refer to byte-range referral.

Proposal described in the FPWD

Single-step partial GET (aka 2-way handshake or 1 roundtrip)

A user requests a media fragment URI, for example using a web browser:

  • User → UA (1):
http://www.w3.org/2008/WebVideo/Fragments/media/fragf2f.mp4#t=12,21

UA chops off the fragment and turns it into a HTTP GET request with a time range header:

  • UA → Proxy (2) → Origin Server (3):
GET /2008/WebVideo/Fragments/media/fragf2f.mp4 HTTP/1.1
Host: www.w3.org
Accept: video/*
Range: seconds=12-21

The server has a module for slicing on demand multimedia resources, that is, establishing the relationship between seconds and bytes, extract the bytes corresponding to the requested fragment, and add the new container headers in order to serve a playable resource. The server will then reply with the closest inclusive range in a 206 HTTP response:

  • Origin Server → Proxy (4) → UA (5):
HTTP/1.1 206 Partial Content
Accept-Ranges: bytes, seconds
Content-Length: 3571437
Content-Type: video/mp4
Content-Range: seconds 11.85-21.16/3600

The user agent will then have to skip 0.15s to start playing the multimedia fragment as 12s. Note that the server indicated that the full duration of the resource being served is 3600s.

Dual-step partial GET (aka 4-way handshake or 2 roundtrips)

A user requests a media fragment URI, for example using a web browser:

  • User → UA (1):
http://www.w3.org/2008/WebVideo/Fragments/media/fragf2f.mp4#t=12,21

UA chops off fragment and turns it into a HTTP GET request with a time range header:

  • UA → Proxy (2) → Origin Server (3):
GET /2008/WebVideo/Fragments/media/fragf2f.mp4 HTTP/1.1
Host: www.w3.org
Accept: video/*
Range: seconds=12-21
X-Accept-Range-Redirect: bytes

Origin Server converts (maps) time range to bytes range and put all header data, occurring at the beginning of the media resource, that cannot be cached but is required by the UA to receive a fully functional media resource into the HTTP response. It also replies with a X-Accept-TimeURI header that indicates to the client that it has processed the time request and converted to bytes (similarly this could be extended to X-Accept-SpaceURI, X-Accept-TrackURI and X-Accept-NameURI). The message body of this answer contains the control section of fragf2f.mp4#12,21 (if required).

  • Origin Server → Proxy (4) → UA (5):
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Type: video/mp4
X-Accept-TimeURI: npt, smpte-25
X-Range-Redirect: bytes 1113724-2082711/4500000
Vary: X-Accept-Range-Redirect
Location: http://www.w3.org/2008/WebVideo/Fragments/media/fragf2f.mp4

The UA buffers the data it receives for hand-over to the media subsystem. It then proceeds to put the actual fragment request through:

  • UA → Proxy (6) → Origin Server (7):
GET /2008/WebVideo/Fragments/media/fragf2f.mp4 HTTP/1.1
Host: www.w3.org
Range: bytes 1113724-2082711

The Origin Server puts the data together and sends it to the UA:

  • Origin Server → Proxy (8) → UA (9):
HTTP/1.1 206 Partial Content
Accept-Ranges: bytes
Content-Type: video/mp4
Content-Range: bytes 1113724-2082711/4500000

The UA hands over the header and video data to the media subsystem and therefore display it to the user (9). Note that the server indicated that the full length of the resource being served is 4500000 bytes.

TO DO

  • Describe Silvia's proposal regarding the new headers Accept-TimeURI, Accept-SpaceURI, Accept-TrackURI and Accept-NameURI
  • Detail the header and body of the HTTP request and HTTP response in all circumstances
  • Propose a mapping (new) header in the HTTP response in the case of the single-step partial GET so that the resource becomes cachable by smart media server
    • Range-Mapping: bytes 1234-2345/58588; original=11234-12345/39849384

Conrad's proposal

General approach

As we are dealing with an extension to existing HTTP behaviour we must provide adequate fallbacks in the case that either the client or the server are not familiar with the new mechanisms.

Generally such an HTTP transaction will have the following form, allowing for discovery and fallback:

  • A content author publishes a single URI to a media resource.
  • A user agent which understands Media Fragment URIs mechanisms will try to use those mechanisms to access it.
  • It advertises to the server each mechanism that it is capable of through an appropriate request header.
  • The server specifies that a given response uses the specified mechanism through an appropriate response header.

Proposals

We separate the various parts of HTTP media segment handling into three orthogonal proposals.

Examples

A user agent may use some combination of these proposals in order to meet the requirements of specific use cases.

References

(DEPRECATED) Pro & Cons

  • Pro
    1. 2-ways handshake needs only one roundtrip
    2. 2-ways handshake allows to extract a spatial region from a Motion JPEG2000 (see format table)
    3. 2-ways handshake usually achieves what we want without needing HTTP protocol extension for any resource with an intrinsic time->data map such as .mov, .mp4.
    4. 4-ways handshake allows current web proxies to cache media fragments
  • Cons
    1. In both cases, we create a custom Range unit, namely 'seconds'. We would need to create custom range unit to convey the notion of seconds, pixels, tracks, etc. as well
    2. 4-ways handshake need two roundtrips
    3. 4-ways handshake does not allow to extract a spatial region from a Motion JPEG2000. Note though that all other media formats are characterized by a fixed non-cacheable header occurring at the beginning of the media stream and are thus compatible with the 4-way handshake approach
    4. 2-ways handshake requires specialized 'media'-caches to cache media fragments