Types of Fragment Addressing

From Media Fragments Working Group Wiki
Jump to: navigation, search

With media fragment addressing, we have to assume that we are dealing with compressed content delivered inside a container format - as described in the general model of a media resource.

This page describes the list of desirable media fragment addressing types that have resulted from the use cases and requirements analysis.

It further analyses what format requirements a the media resources has to adhere to in order to allow the extraction of the data that relates to that kind of addressing.

Track fragments

A typical media resource consists of multiple tracks of data somehow multiplexed together into the media resource. A media resource could for example consist of several audio, several video, and several textual annotation or metadata tracks. Their individual extraction / addressing is desirable in particular from a media adaptation point of view.

Whether the extraction of tracks from a media resource is supported or not depends on the container format of the media resource. Since a container format only defines a syntax and does not introduce any compression, it is always possible to describe the structures of a container format. Hence, if a container format allows the encapsulation of multiple tracks, then it is possible to describe the tracks in terms of byte ranges. Examples of such container formats are Ogg and MP4. Note that it is possible that the tracks are multiplexed, implying that a description of one track consists of a list of byte ranges. Also note that the extraction of tracks (and fragments in general) from container formats often introduces the necessity of syntax element modifications in the headers.

Temporal fragments

A temporal fragment of a media resource is a clipping along the time axis from a start to an end time that are within the duration of the media resource.

If a media resource supports temporal fragment extraction is in the first place dependent on the coding format and more specifically how encoding parameters were set. For video coding formats, temporal fragments can be extracted if the video stream provides random access points (i.e., a point that is not dependent on previously encoded video data, typically corresponding to an intra-coded frame) on a regular basis. The same holds true for audio coding formats, i.e., the audio stream needs to be accessed at a point where the decoder can start decoding without the need of previously coded data.

Spatial fragments

A spatial fragment of a media resource is a clipping of an image region. For media fragment addressing we only regard square regions.

Support for extraction of spatial fragments from a media resource in the compressed domain depends on the coding format. The coding format must allow to encode spatial regions independently from each other in order to support the extraction of these regions in the compressed domain. Note that there are currently two variants: region extraction and interactive region extraction. In the first case, the regions (i.e., Regions Of Interest, ROI) are known at encoding time and coded independently from each other. In the second case, ROIs are not known at encoding time and can be chosen by a user agent. In this case, the media resource is divided in a number of tiles, each encoded independently from each other. Subsequently, the tiles covering the desired region are extracted from the media resource.

Named fragments

A named fragment of a media resource is a media fragment - either a track, a time section, or a spatial region - that has been given a name through some sort of annotation mechanism. Through this name, the media fragment can be addressed in a more human-readable form.

No coding format provides support for named fragments, since naming is not part of the encoding/decoding process. Hence, we have to consider container formats for this feature. In general, if a container format allows the insertion of metadata describing the named fragments, then the container format supports named fragments, if the fragment class is also supported. For example, you can include a CMML or TimedText description in an MP4 or Ogg container and interpret this description to extract temporal fragments based on a name given to them in the description.

Evaluation of Fitnes

There is a large number of media codecs and encapsulation formats that we need to take into account as potential media resources on the Web. This section analyses a list of typical formats and determines which we see fit, which we see conditionally fit, and which we see unfit for supporting media fragment URIs.


Media resources should fulfill the following conditions to allow extraction of fragments:

  • The media fragments can be extracted in the compressed domain.
  • No syntax element modifications in the bitstream are needed to perform the extraction.

Not all media formats will be compliant with these two conditions. Hence, we distinguish the following categories:

  1. Fit: The media resource meets the two conditions (i.e., fragments can be extracted in the compressed domain and no syntax element modifications are necessary). In this case, caching media fragments of such media resources on the byte level is possible.
  2. Conditionally fit: Media fragments can be extracted in the compressed domain, but syntax element modifications are required. These media fragments are provide cachable byte ranges for the data, but syntax element modifications are needed in headers applying to the whole media resource/fragment. In this case, these headers could be sent to the client in the first response of the server.
  3. Unfit: Media fragments cannot be extracted in the compressed domain as byte ranges. In this case, transcoding operations are necessary to extract media fragments. Since these media fragments do not create reproducable bytes, it is not possible to cache these media fragments. Note that media formats which enable extracting fragments in the compressed domain, but are not compliant with category 2 (i.e., syntax element modifications are not only applicable to the whole media resource), also belong to this category.

Evaluation of Fitness

In order to get a view on which media formats belong to which fitness category, an overview is provided for the media formats currently described in State_of_the_Art/Codecs and State_of_the_Art/Containers. In the following table, the numbers 1, 2, and 3 correspond to the three categories described in Section #Conditions. The 'X' symbol indicates that the media format does not support a particular fragment axis.

Media formatTrackTemporalSpatialNameRemark
MPEG-1 Videon/a13n/a
H.262/MPEG-2 Videon/a13n/a
MPEG-4 Visualn/a13n/a
H.264/MPEG-4 AVCn/a12n/aSpatial fragment extraction is possible with Flexible Macroblock Ordening (FMO)
Motion JPEGn/a13n/a
Motion JPEG2000n/a13n/aSpatial fragment extraction is possible in the compressed domain, but syntax element modifications are needed for every frame.
MPEG-1 Audion/a1n/an/a
Ogg Vorbisn/a1n/an/a
AC-3/Dolby Digitaln/a1n/an/a
JPEG LSn/an/a3n/a
HD Photon/an/a2n/a
MOV2n/an/a2QTText provides named chapters
MP42n/an/a2MPEG-4 TimedText provides named sections
3GP2n/an/a23GPP TimedText provides named sections
MPEG-21 FF2n/an/a2MPEG-21 Digital Item Declaration provides named sections
OGG2n/an/a2CMML provides named anchor points
ASF2n/an/a2Marker objects provide named anchor points
FLV2n/an/a2cue points provide named anchor points
RMFF1 or 2(?)n/an/a?
TIFF2n/an/a2Can store multiple images (i.e., tracks) in one file, possibility to insert "private tags" (i.e., proprietary information)

We have to deal with the complexities of codecs and media resources. Not all media types are currently capable of doing what server-side media fragments would require. Those that are capable are of interest to us. For those that aren't, the fall-back case applies (i.e. full download and then offsetting).