Copyright © 2011 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This document complements the Media Fragments 1.0 specification. It described various recipes for processing media fragments URI when used over the HTTP protocol.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This is the First Public Working Draft of the Protocol for Media Fragments 1.0 Resolution in HTTP document. It has been produced by the Media Fragments Working Group , which is part of the W3C Video on the Web Activity. The Working Group intends to publish this document as a Working Group Note, as a starting point for future work.
Please send comments about this document to public-media-fragment@w3.org mailing list ( public archive ).
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
1 Introduction
2 Protocol for URI fragment Resolution in HTTP
    2.1 UA mapped byte ranges
        2.1.1 UA requests URI fragment for the first time
        2.1.2 UA requests URI fragment it already has buffered
        2.1.3 UA requests URI fragment of a changed resource
    2.2 Server mapped byte ranges
        2.2.1 Server mapped byte ranges with corresponding binary data
        2.2.2 Server mapped byte ranges with corresponding binary data and codec setup data
        2.2.3 Proxy cacheable server mapped byte ranges
3 Protocol for URI query Resolution in HTTP
A Processing media fragment URIs in RTSP (Non-Normative)
    A.1 How to map Media Fragment URIs to RTSP protocol methods
        A.1.1 Dealing with the media fragment URI dimensions in RTSP
            A.1.1.1 Temporal Media Fragment URIs
            A.1.1.2 Track Media Fragment URIs
            A.1.1.3 Spatial Media Fragment URIs
            A.1.1.4 Id Media Fragment URIs
        A.1.2 Putting the media fragment URI dimensions together in RTSP
        A.1.3 Caching and RTSP for media fragment URIs
B Acknowledgements (Non-Normative)
Audio and video resources on the World Wide Web are currently treated as "foreign" objects, which can only be embedded using a plugin that is capable of decoding and interacting with the media resource. Specific media servers are generally required to provide for server-side features such as direct access to time offsets into a video without the need to retrieve the entire resource. Support for such media fragment access varies between different media formats and inhibits standard means of dealing with such content on the Web.
This specification provides for a media-format independent, standard means of addressing media fragments on the Web using Uniform Resource Identifiers (URI). In the context of this document, media fragments are regarded along four different dimensions: temporal, spatial, and tracks. Further, a temporal fragment can be marked with a name and then addressed through a URI using that name, using the id dimension. The specified addressing schemes apply mainly to audio and video resources - the spatial fragment addressing may also be used on images.
The aim of this specification is to enhance the Web infrastructure for supporting the addressing and retrieval of subparts of time-based Web resources, as well as the automated processing of such subparts for reuse. Example uses are the sharing of such fragment URIs with friends via email, the automated creation of such fragment URIs in a search engine interface, or the annotation of media fragments with RDF. Such use case examples as well as other side conditions on this specification and a survey of existing media fragment addressing approaches are provided in the requirements mf-req document that accompanies this specification document.
The media fragment URIs specified in this document have been implemented and demonstrated to work with media resources over the HTTP protocol. This specification is not defining the protocol aspect of RTSP handling of a media fragment in the normative sections. We expect the media fragment URI syntax to be generic and a possible mapping between this syntax and RTSP messages can be found in an appendix of this specification A Processing media fragment URIs in RTSP. Existing media formats in their current representations and implementations provide varying degrees of support for this specification. It is expected that over time, media formats, media players, Web Browsers, media and Web servers, as well as Web proxies will be extended to adhere to the full specification. This specification will help make video a first-class citizen of the World Wide Web.
This section defines the protocol steps in HTTP rfc2616 to resolve and deliver a media fragment specified as a URI fragment.
In a well known context where the MIME TYPE of the resource requested is known, various recipes are proposed depending on the dimension addressed in the media fragment URI, the container and codec formats used by the media resource, or some advanced processing features implemented by the User Agent. Hence, if the container format of the media resource is fully indexable (e.g. MP4, Ogg or WebM) and if the time dimension is requested in the media fragment URI, the User Agent MAY priviledge the recipe described in the section 2 Protocol for URI fragment Resolution in HTTP since it will be in a position of issuing directly a normal RANGE request expressed in terms of byte ranges. On the other hand, if the container format of the media resource is a legacy format such as AVI, the Use Agent MAY priviledge the recipe described in the section 2.2 Server mapped byte ranges, issuing a RANGE request expressed with a custom unit such as seconds and waiting for the server to provide the mapping in terms of byte ranges.
The User Agent MAY also implement a so-called optimistic processing of URI fragments in particular cases where the MIME TYPE of the resource requested is not yet known. Hence, if a URL fragment occurs within a particular context such as the value of the @src attribute of a media element (audio, video or source) and if the time dimension is requested in the media fragment URI, the User Agent MAY follow the scenario specified in section 2.2 Server mapped byte ranges and issues directly a range request using custom units assuming that the resource requested is likely to be a media resource. If the MIME-type of this resource turns out to be a media type, the server SHOULD interpret the RANGE request as specified in section 2.2 Server mapped byte ranges. Otherwise it SHOULD just ignore the RANGE header.
| Editorial note: Silvia | |
| If the UA needs to retrieve a large part of the resource or even the full resource, it will probably decide to make multiple range requests rather than a single one. If the resource is, however, small, it may decide to just retrieve the full resource without a range request. The UA should make this choice given context information, e.g. if it knows that it will be a lot of data, it will retrieve it in smaller chunks. If it chooses to request the full resource in one go and not make use of a Range request, the result will be a 200 rather than a 206. | |
| Editorial note | |
| This section is ready to implement. | |
The most optimal case is a user agent that knows how to map media fragments to byte ranges. This is the case typically where a user agent has already downloaded those parts of a media resource that allow it to do or guess the mapping, e.g. headers or a resource, or an index of a resource.
In this case, the HTTP exchanges are exactly the same as for any other Web resource where byte ranges are requested rfc2616.
How the UA retrieves the byte ranges is dependent on the media type of the media resource. We here show examples with only one byte range retrieval per time range, which may in practice turn into several such retrieval actions necessary to acquire the correct time range.
Here are the three principle cases a media fragment enabled UA and a media Server will encounter:
A user requests a media fragment URI:
User → UA (1):
http://www.example.com/video.ogv#t=10,20
The UA has to check if a local copy of the requested fragment is available in its buffer - not in this case. But it knows how to map the fragment to byte ranges: 19147 - 22890. So, it requests these byte ranges from the server:
UA (1) → Proxy (2) → Origin Server (3):
                GET /video.ogv HTTP/1.1
                Host: www.example.com
                Accept: video/*
                Range: bytes=19147-22890
              The server extracts the bytes corresponding to the requested range and replies in a 206 HTTP response:
Origin Server (3) → Proxy (4) → UA (5):
                HTTP/1.1 206 Partial Content
                Accept-Ranges: bytes
                Content-Length: 3743
                Content-Type: video/ogg
                Content-Range: bytes 19147-22880/35614993
                Etag: "b7a60-21f7111-46f3219476580"
                {binary data}
              Assuming the UA has received the byte ranges that it requires to serve t=10,20, which may well be slightly more, it will serve the decoded content to the User from the appropriate time offset. Otherwise it may keep requesting byte ranges to retrieve the required time segments.

A user requests a media fragment URI:
User → UA (1):
http://www.example.com/video.ogv#t=10,20
The UA has to check if a local copy of the requested fragment is available in its buffer - it is in this case. But the resource could have changed on the server, so it needs to send a conditional GET. It knows the byte ranges: 19147 - 22890. So, it requests these byte ranges from the server under condition of it having changed:
UA (1) → Proxy (2) → Origin Server (3):
                GET /video.ogv HTTP/1.1
                Host: www.example.com
                Accept: video/*
                If-Modified-Since: Sat, 01 Aug 2009 09:34:22 GMT
                If-None-Match: "b7a60-21f7111-46f3219476580"
                Range: bytes=19147-22890
              The server checks if the resource has changed by checking the date - in this case, the resource was not modified. So, the server replies with a 304 HTTP response. (Note that a If-Range header cannot be used, because if the entity has changed, the entire resource would be sent.)
Origin Server (3) → Proxy (4) → UA (5):
                HTTP/1.1 304 Not Modified
                Accept-Ranges: bytes
                Content-Length: 3743
                Content-Type: video/ogg
                Content-Range: bytes 19147-22880/35614993
                Etag: "b7a60-21f7111-46f3219476580"
              So, the UA serves the decoded resource to the User our of its existing buffer.

A user requests a media fragment URI and the UA sends the exact same GET request as described in the previous subsection.
This time, the server checks if the resource has changed by checking the date and it has been modified. Since the byte mapping may not be correct any longer, the server can only tell the UA that the resource has changed and leave all further actions to the UA. So, it sends a 412 HTTP response:
Origin Server (3) → Proxy (4) → UA (5):
                HTTP/1.1 412 Precondition Failed
                Accept-Ranges: bytes
                Content-Length: 3743
                Content-Type: video/ogg
                Content-Range: bytes 19147-22880/22222222
                Etag: "xxxxx-yyyyyyy-zzzzzzzzzzzzz"
              So, the UA can only assume the resource has changed and re-retrieve what it needs to get back to being able to retrieve fragments. For most resources this may mean retrieving the header of the file. After this it is possible again to do a byte range retrieval.

Some User Agents cannot undertake the fragment-to-byte mapping themselves, because the mapping is not obvious. This typically applies to media formats where the setup of the decoding pipeline does not imply knowledge of how to map fragments to byte ranges, e.g. Ogg without OggIndex. Thus, the User Agent would be capable of decoding a continuous resource, but would not know which bytes to request for a media fragment.
In this case, the User Agent could either guess what byte ranges it has to retrieve and the retrieval action would follow the previous case. Or it could hope that the server provides a special service, which would allow it to retrieve the byte ranges with a simple request of the media fragment ranges. Thus, the HTTP request of the User Agent will include a request for the fragment hoping that the server can do the byte range mapping and send back the appropriate byte ranges. This is realized by introducing new dimensions for the HTTP Range header, next to the byte dimension.
The specification for all new Range Request Header dimensions is given through the following ABNF as an extension to the HTTP Range Request Header definition (see http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35.2):
          Range                  = "Range" ":" ranges-specifier
          ranges-specifier       = byte-ranges-specifier | fragment-specifier
          ;
          ; note that ranges-specifier is extended from rfc2616
          ; to cover alternate fragment range specifiers
          ;
          fragment-specifier     = "include-setup" | fragment-range *( "," fragment-range )
          [ ";" "include-setup" ]
          fragment-range         = time-ranges-specifier | id-ranges-specifier
          ;
          ; note that this doesn't capture the restriction to one fragment dimension occurring
          ; maximally once only in the fragment-specifier definition.
          ;
          time-ranges-specifier  = npttimeoption / smptetimeoption / clocktimeoption
          npttimeoption          = pfxdeftimeformat "=" npt-sec   "-" [ npt-sec ]
          smptetimeoption        = pfxsmpteformat   "=" frametime "-" [ frametime ]
          clocktimeoption        = pfxclockformat   "=" datetime  "-" [ datetime ]
          id-ranges-specifier  = idprefix  "=" idparam
        This specification is meant to be analogous to the one in URIs, but it is a bit stricter. The time unit is not optional. For instance, it can be "npt", "smpte", "smpte-25", "smpte-30", "smpte-30-drop" or "clock" for temporal. Where "ntp" is used for a temporal range, only specification in seconds is possible. Where "clocktime" is used for a temporal range, only "datetime" is possible and "walltime" is fully specified in HHMMSS with fraction and full timezone. Indeed, all optional elements in the URI specification basically become required in the Range header.
There is an optional 'include-setup' flag on the fragment range specifier - this flag signals to the server whether delivery of the decoder setup information (i.e. typically file header information) is also required as part of the reply to this request. This can help avoid an extra roundtrip where a Media Fragment URI is, e.g. directly typed into a Web browser.
Note that the specification does not foresee a Range dimension for spatial and track media fragments since they are typically resolved and interpreted by the User Agent (i.e., spatial and track fragment extraction is not performed on server-side) for the following reasons:
spatial media fragments are typically not expressible in terms of byte ranges. Spatial fragment extraction would thus require transcoding operations resulting in new resources rather than fragments of the original media resource. Track media fragments are expressible in terms of byte ranges but addressing one track in a media resource typically results in a huge number of byte ranges (due to interleaved tracks). Spatial and track fragment extraction is in this case better represented by URI queries.
When a User Agent receives an extracted spatial media fragment, it is not trivial to visualize the context of this fragment. More specifically, spatial context requires a meaningful background, which will not be available at the User Agent when the spatial fragment is extracted by the server.
Next to the introduction of new dimensions for the HTTP Range request header, we also introduce a new HTTP response header, called Content-Range-Mapping, which provides the mapping of the retrieved byte range to the original Range request, which was not in bytes. It serves two purposes:
It Indicates the actual mapped range in terms of fragment dimensions. This is necessary since the server might not be able to provide a byte range mapping that corresponds exactly to the requested range. Therefore, the User Agent needs to be aware of this variance.
It provides context information regarding the parent resource in case the Range request contained a temporal dimension. More specifically, the header contains the start and end time of the parent resource. This way, the User Agent is able to understand and visualize the temporal context of the media fragment.
The specification for the Content-Range-Mapping header is based on the specification of the Content-Range header (see http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.16) and is shown below. Note that the Content-Range-Mapping header adds in case of the temporal dimension the instance start and end in terms of seconds after a slash "/" character in analogy to the Content-Range header. Also, we introduce an extension to the Accept-Ranges header (see http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.5).
          Content-Range-Mapping      = "Content-Range-Mapping" ":" '{'
          ( content-range-mapping-spec [ ";" def-include-setup ] ) / def-include-setup
          '}' '=' '{'
          byte-content-range-mapping-spec '}'
          def-include-setup          = %x69.6E.63.6C.75.64.65.2D.73.65.74.75.70  ; "include-setup"
          byte-range-mapping-spec    = bytes-unit SP
          byte-range-resp-spec *( "," byte-range-resp-spec ) "/"
          ( instance-length / "*" )
          content-range-mapping-spec = time-mapping-spec | id-mapping-spec
          time-mapping-spec          = timeprefix ":" time-mapping-options
          time-mapping-options       = npt-mapping-option / smpte-mapping-option / clock-mapping-option
          npt-mapping-option         = deftimeformat SP npt-sec   "-" npt-sec   "/"
          [ npt-sec ]   "-" [ npt-sec ]
          smpte-mapping-option       = smpteformat   SP frametime "-" frametime "/"
          [ frametime ] "-" [ frametime ]
          clock-mapping-option       = clockformat   SP datetime  "-" datetime  "/"
          [ datetime ]  "-" [ datetime ]
          id-mapping-spec          = idprefix  SP idparam
          Accept-Ranges              = "Accept-Ranges" ":" acceptable-ranges
          acceptable-ranges          = 1#range-unit *( "," 1#range-unit )| "none"
          ;
          ; note this does not represent the restriction that range-units can only appear once at most
          ;
          range-unit                 = bytes-unit | other-range-unit
          bytes-unit                 = "bytes"
          other-range-unit           = token | timeprefix | idprefix
        Three cases can be distinguished when a User Agent needs assistance by a server to perform the byte range mapping. In the next subsections, we'll go through the protocol exchange action step by step.
User → UA (1):
http://www.example.com/video.ogv#t=10,20
The UA has to check if a local copy of the requested fragment is available in its buffer. If it is, we revert back to the processing described in sections 2.1.2 UA requests URI fragment it already has buffered and 2.1.3 UA requests URI fragment of a changed resource, since the UA already knows the mapping to byte ranges. If the requested fragment is not available in its buffer, the UA sends an HTTP request to the server, including a Range header with temporal dimension. The request is shown below:
UA (1) → Proxy (2) → Origin Server (3):
                GET /video.ogv HTTP/1.1
                Host: www.example.com
                Accept: video/*
                Range: t:npt=10-20
              If the server does not understand a Range header, it MUST ignore the header field that includes that range-set. This is in sync to the HTTP RFC rfc2616. This means that where a server does not support media fragments, the complete resource will be delivered. It also means that we can combine both, byte range and fragment range headers in one request, since the server will only react to the Range header it understands.
Assuming the server can map the given Range to one or more byte ranges, it will reply with these in a 206 HTTP response. Where multiple byte ranges are required to satisfy the Range request, these are transmitted as a multipart message-body. The media type for this purpose is called "multipart/byteranges". This is in sync with the HTTP RFC rfc2616.
Here is the reply to the example above, assuming a single byte range is sufficient:
Origin Server (3) → Proxy (4) → UA (5):
                HTTP/1.1 206 Partial Content
                Accept-Ranges: bytes, t, id
                Content-Length: 3743
                Content-Type: video/ogg
                Content-Range: bytes 19147-22880/35614993
                Content-Range-Mapping: { t:npt 9.85-21.16/0.0-653.79 } = { bytes 19147-22880/35614993 }
                Etag: "b7a60-21f7111-46f3219476580"
                {binary data}
              Note the presence of the new reply header called Content-Range-Mapping, which provides the mapping of the retrieved byte range to the original Content-Range request, which was not in bytes. As we return both, byte and temporal ranges, the UA and any intermediate caching proxy is enabled to map byte positions with time offsets and fall back to byte range request where the fragment is re-requested. Also note that through the extended list in the Accept-Ranges it is possible to identify which fragment schemes a server supports.

In the case where a media fragment results in a multipart message-body, the Content-Range headers will be spread throughout the binary data ranges, but the Content-Range-Mapping of the media fragment will only be with the main header. Note that requesting setup information with a temporal (or id) fragment typically result in multipart message-bodies, as will be illustrated in section 2.2.2 Server mapped byte ranges with corresponding binary data and codec setup data
Note that a caching proxy that does not understand a Range header must not cache "206 Partial Content" responses as per HTTP RFC rfc2616. Thus, the new Range requests won't be cached by legacy Web proxies.
Id fragments can be requested in a similar way. The following example illustrates a request for the temporal fragment with name 'chapter1':
UA (1) → Proxy (2) → Origin Server (3):
                GET /video.ogv HTTP/1.1
                Host: www.example.com
                Accept: video/*
                Range: id=chapter1
              Assuming the server can map the given id to one or more byte ranges, it will for instance reply with the following HTTP response:
Origin Server (3) → Proxy (4) → UA (5):
                HTTP/1.1 206 Partial Content
                Accept-Ranges: bytes, t, id
                Content-Length: 3743
                Content-Type: video/ogg
                Content-Range: bytes 19147-22880/35614993
                Content-Range-Mapping: { id chapter1 } = { bytes 19147-22880/35614993 }
                Etag: "b7a60-21f7111-46f3219476580"
                {binary data}
              When the User Agent needs help from the server to setup the initial decoding pipeline (i.e., the User Agent has no codec setup information at its disposal), the User Agent can request, next to the bytes corresponding to the requested fragment, the bytes necessary to setup its decoder. This is possible by adding the 'include-setup' flag to the Range header, as illustrated below:
UA (1) → Proxy (2) → Origin Server (3):
                GET /video.ogv HTTP/1.1
                Host: www.example.com
                Accept: video/*
                Range: t:npt=10-20;include-setup
              Analogous to section 2.2.1 Server mapped byte ranges with corresponding binary data, the server can map the given Range to one or more byte ranges, it will reply with these in a 206 HTTP response. Additionally, the server adds the bytes corresponding with the requested setup information to the response. Since this setup information usually appears in front of a media resource, the response typically results in a multipart message-body. The response is shown below:
Origin Server (3) → Proxy (4) → UA (5):
                HTTP/1.1 206 Partial Content
                Accept-Ranges: bytes, t, id
                Content-Length: 3795
                Content-Type: video/ogg
                Content-Range-Mapping: { t:npt 11.85-21.16/0.0-653.79;include-setup } = { bytes  0-52,19147-22880/35614993 }
                Content-type: multipart/byteranges; boundary=THIS_STRING_SEPARATES
                Etag: "b7a60-21f7111-46f3219476580"
                --THIS_STRING_SEPARATES
                Content-type: video/ogg
                Content-Range: bytes 0-52/35614993
                {binary data}
                --THIS_STRING_SEPARATES
                Content-type: video/ogg
                Content-Range: bytes 19147-22880/35614993
                {binary data}
                --THIS_STRING_SEPARATES--
              Note that the Content-Range-Mapping header indicates that the codec setup information is included in the response. In this example, the response consists of two parts of byte ranges: the first part corresponds to the setup information, the second part corresponds to the requested fragment.

The server mapped byte ranges approach can be extended to play with existing caching Web proxy infrastructure. This is important, since video is a huge bandwidth eater in the current Internet and falling back to using existing Web proxy infrastructure is important, particularly since progressive download and direct access mechanisms for video rely heavily on this functionality. Over time, the proxy infrastructure will learn how to cache media fragment URIs directly as described in the previous section and then will not require this extra effort.
To enable media-fragment-URI-supporting UAs to make their retrieval cacheable, we introduce some extra HTTP headers, which will help tell the server and the proxy what to do. There is an Accept-Range-Redirect request header which signals to the server that only a redirect to the correct byte ranges is necessary and the result should be delivered in the Range-Redirect header.
The ABNF for these additional two HTTP headers is given as follows:
            Accept-Range-Redirect      = "Accept-Range-Redirect" ":" bytes-unit
            Range-Redirect             = "Range-Redirect" ":" byte-range-resp-spec *( "," byte-range-resp-spec )
          Let's play it through on an example. A user requests a media fragment URI:
User → UA (1):
http://www.example.com/video.ogv#t=10,20
The UA has to check if a local copy of the requested fragment is available in its buffer. In our case here, it is not. If it was, we would revert back to the processing described in sections 2.1.2 UA requests URI fragment it already has buffered and 2.1.3 UA requests URI fragment of a changed resource, since the UA already knows the mapping to byte ranges. The UA issues a HTTP GET request with the fragment and requesting to retrieve just the mapping to byte ranges:
UA (1) → Proxy (2) → Origin Server (3):
                GET /video.ogv HTTP/1.1
                Host: www.example.com
                Accept: video/*
                Range: t:npt=10-20
                Accept-Range-Redirect: bytes
              The server converts the given time range to a byte range and sends an empty reply that refers the UA to the right byte range for the correct time range.
Origin Server (3) → Proxy (4) → UA (5):
                HTTP/1.1 307 Temporary Redirect
                Location: http://www.example.com/video.ogv
                Accept-Ranges: bytes, t, id
                Content-Length: 0
                Content-Type: video/ogg
                Content-Range-Mapping: { t:npt 11.85-21.16/0.0-653.79 } = { bytes 19147-22880/* }
                Range-Redirect: 19147-22880
                Vary: Accept-Range-Redirect
              Note that codec setup information can also be requested in combination with the Accept-Range-Redirect header, which can be realized by adding the 'include-setup' flag to the Range request header.
The UA proceeds to put the actual fragment request through as a normal byte range request as in section 2.1.1 UA requests URI fragment for the first time:
UA (5) → Proxy (6) → Origin Server (7):
                GET /video.ogv HTTP/1.1
                Host: www.example.com
                Accept: video/*
                Range: 19147-22880
              The Origin Server puts the data together and sends it to the UA:
Origin Server (7) → Proxy (8) → UA (9):
                HTTP/1.1 206 Partial Content
                Accept-Ranges: bytes, t, id
                Content-Length: 3743
                Content-Type: video/ogg
                Content-Range: bytes 19147-22880/35614993
                Etag: "b7a60-21f7111-46f3219476580"
                {binary data}
              The UA decodes the data and displays it from the requested offset. The caching Web proxy in the middle has now cached the byte range, since it adhered to the normal byte range request protocol. All existing caching proxies will work with this. New caching Web proxies may learn to interpret media fragments natively, so won't require the extra packet exchange described in this section.

This section describes the protocol steps used in HTTP rfc2616 to resolve and deliver a media fragment specified as a URI query.
A user requests a media fragment URI using a URI query:
User → UA (1):
http://www.example.com/video.ogv?t=10,20
This is a full resource, so it is a simple HTTP retrieval process. The UA has to check if a local copy of the requested resource is available in its buffer. If yes, it does a conditional GET with e.g. an If-Modified-Since and If-None-Match HTTP header.
Assuming the resource has not been retrieved before, the following is sent to the server:
UA (1) → Proxy (2) → Origin Server (3):
            GET /video.ogv?t=10,20 HTTP/1.1
            Host: www.example.com
            Accept: video/*
          If the server doesn't understand these query parameters, it typically ignores them and returns the complete resource. This is not a requirement by the URI or the HTTP standard, but the way it is typically implemented in Web browsers.
A media fragment supporting server has to create a complete media resource for the URI query, which in the case of Ogg requires creation of a new resource by adapting the existing Ogg file headers and combining them with the extracted byte range that relates to the given fragment. Some of the codec data may also need to be re-encoded since, e.g. t=10 does not fall clearly on a decoding boundary, but the retrieved resource must match as closely as possible the URI query. This new resource is sent back as a reply:
Origin Server (3) → Proxy (4) → UA (5):
            HTTP/1.1 200 OK
            Content-Length: 3782
            Content-Type: video/ogg
            Etag: "b7a60-21f7111-46f3219476580"
            Link: <http://www.example.com/video.ogv#t=10,20>; rel="alternate"
            {binary data}
          Note that a Link header MAY be provided indicating the relationship between the requested URI query and the original media fragment URI. This enables the UA to retrieve further information about the original resource, such as its full length. In this case, the user agent is also enable to choose to display the dimensions of the primary resource or the ones created by the query.
The UA serves the decoded resource to the user. Caching in Web proxies works as it has always worked - most modern Web servers and UAs implement a caching strategy for URIs that contain a query using one of the three methods for marking freshness: heuristic freshness analysis, the Cache-Control header, or the Expires header. In this case, many copies of different segments of the original resource video.ogv may end up in proxy caches. An intelligent media proxy in future may devise a strategy to buffer such resources in a more efficient manner, where headers and byte ranges are stored differently.
Further, media fragment URI queries can be extended to enable UAs to use the Range-Redirect HTTP header to also revert back to a byte range request. This is analogous to section 2.2.3 Proxy cacheable server mapped byte ranges.
Note that a server that does not support media fragments through either URI fragment or query addressing will return the full resource in either case. It is therefore not possible to first try URI fragment addressing and when that fails to try URI query addressing.
This appendix explains how the media fragment specification is mapped to an RTSP protocol activity. We assume here that you have a general understanding of the RTSP protocol mechanism as defined in rtsp. The general sequence of messages sent between an RTSP UA and server can be summarized as follows:
Note that the RTSP protocol is intentionally similar in syntax and operation to HTTP.
We illustrated for each of the four media fragment dimensions how they can be mapped onto RTSP commands. The following examples are used to illustrated each of the dimensions: (1) temporal: #t=10,20 (2) tracks: #track=audio&track=video (3) spatial: #xywh=160,120,320,24 (4) id: #id=Airline%20Edit
In RTSP, temporal fragment URIs are provided through the PLAY method. A URI such as
rtsp://example.com/media#t=10,20
will be executed as a series of the following methods (all shortened for readability).
The actual temporal selection is provided in the PLAY method:
C->S: PLAY rtsp://example.com/media Range: npt=10-20
The server tells the UA which temporal range is returned:
S->C: RTSP/1.0 200 OK Range: npt=9.5-20.1
We can explain this mapping for all of the media fragment defined time schemes. Also, several temporal media fragment URI requests can be sent as pipelined commands without having to re-send the DESCRIBE and SETUP commands.
In RTSP, track fragment URIs are provided through the SETUP method. A URI such as
rtsp://example.com/media#track=audio&track=video
will be executed as a series of the following methods (all shortened for readability).
The discovery of available tracks is provided through the SDP reply to DESCRIBE, but it could be done through alternative methods, too. Several consecutive track media fragment URI requests can only be sent with new SETUP commands and cannot be pipelined.
In RTSP, spatial fragment URIs are not specifically provided for. Just like in HTTP, spatial fragments are interpreted at the UA and thus not communicated to the server. A URI such as
rtsp://example.com/media#xywh=160,120,320,24
will be executed as the url rtsp://example.com/media.
A URI such as
rtsp://example.com/media#xywh=160,120,320,24&t=10,20&track=audio&track=video
will be executed as a series of the following methods (all shortened for readability). The data selection is provided both in the SETUP method and the PLAY method:
UA->S: DESCRIBE rtsp://example.com/media
S->UA: RTSP/1.0 200 OK (with an SDP description, see wiki)
UA->S: SETUP rtsp://example.com/media/video
S->UA: RTSP/1.0 200 OK
UA->S: SETUP rtsp://example.com/media/audio
S->UA: RTSP/1.0 200 OK
UA->S: PLAY rtsp://example.com/media
       Range: npt=10-20
S->UA: RTSP/1.0 200 OK
       Range: npt=9.5-20.1It is the UA's task to only display the rectangle xywh=160,120,320,2. It is true that the resolution of the dimensions is done at different levels of the protocol, but that does not create a problem.
This document is the work of the W3C Media Fragments Working Group. Members of the Working Group are (at the time of writing, and in alphabetical order): Eric Carlson (Apple, Inc.), Chris Double (Mozilla Foundation), Michael Hausenblas (DERI Galway at the National University of Ireland, Galway, Ireland), Philip Jägenstedt (Opera Software), Jack Jansen (CWI), Yves Lafon (W3C), Erik Mannens (IBBT), Thierry Michel (W3C/ERCIM), Guillaume (Jean-Louis) Olivrin (Meraka Institute), Soohong Daniel Park (Samsung Electronics Co., Ltd.), Conrad Parker (W3C Invited Experts), Silvia Pfeiffer (W3C Invited Experts), Nobuhisa Shiraishi (NEC Corporation), David Singer (Apple, Inc.), Thomas Steiner (Google, Inc.), Raphaël Troncy (EURECOM), Davy Van Deursen (IBBT),
The people who have contributed to discussions on public-media-fragment@w3.org are also gratefully acknowledged. In particular: Olivier Aubert, Werner Bailer, Pierre-Antoine Champin, Cyril Concolato, Franck Denoual, Martin J. Dürst, Jean Pierre Evain, Ken Harrenstien, Kilroy Hughes, Ryo Kawaguchi, Wim Van Lancker, Véronique Malaisé, Henrik Nordstrom, Yannick Prié, Yves Raimond, Julian Reschke, Sam Sneddon, Felix Sasaki, Jakub Sendor, Philip Taylor, Christian Timmerer, Jorrit Vermeiren, Jeroen Wijering and Munjo Yu.