State of the Art/Containers
ISO Base Media File Format (MPEG-4 part 12)
See: MPEG-4 part 12
MPEG-4 part 12 has been developed by the MPEG committee as a part of the MPEG-4 standard. The format is a base format for media file formats. Many other file formats are based upon the MPEG-4 part 12 standard. The file format is object-oriented. A presentation, a combination of one or more motion sequences and audio, is described via objects, called boxes. A presentation can consist of one or multiple files. One file describes the structure of the presentation. The other files, if used, contain the media data or other information. This information and the media data can also be integrated in the presentation file. The file format can contain any file type and supports download and streaming.
DIVX and XVID are variants of this container format.
MOV/Quicktime File Format (based on MPEG-4 part 12, the ISO Base Media File Format)
The MOV container has been developed by Apple in 1991. The format served as a base for the MP4-container format. The container can contain audio, video as well as chapters. The video streams can be in any format supported by the Quicktime codec manager, such as MPEG-4 and Sorensen. The audio streams can be in any format supported by the sound manager and coreaudio, such as AIFF, WAV and MP3. A MOV container is build up out of atoms. These atoms are hierarchically structured and are either parents or contain media or data. The media can either be the media streams themselves or links to the media streams. MOV containers contain a timeline, separated from the media streams. The timeline allows editing of MOV-files without the need of copying media streams. MOV containers are mainly used on Apple’s operating system Mac OS X.
MP4 File Format (MPEG-4 part 14) (based on MPEG-4 part 12, the ISO Base Media File Format)
MP4, or MPEG-4 Part 14, is a multimedia container format, finalized in 2003, that is part of the MPEG-4 ISO standard. MP4 can contain video and audio streams. The video streams can be encoded in the MPEG-1, MPEG-2, MPEG-4 and H.264/AVC standards. The audio streams can be (HE)-AAC, MPEG-1 Audio Layer 1-2-3, CELP, TwinVQ, Vorbis or Apple Lossless. MP4-containers that contain only audio get the M4A-extension. The MPEG-4 containers are broadly used for media players such as the iPod and DVD-players supporting MPEG-4 video and audio. An MP4 container can also contain so called private streams. These streams can contain any sort of information. Nero, for instance, uses these streams to add DVD-compliant subtitles. MP4 containers can also contain images, hyperlinks, subtitles and chapters.
3GP (based on MPEG-4 part 12, the ISO Base Media File Format)
3GP, or 3G Protocol, is a multimedia container format, developed by the Third Generation Partnership Project (3GPP), for use in 3G mobile phones. 3GP is a simplified version of the MP4 container format and is designed to minimize storage and bandwidth requirements. 3GP supports MPEG-4 Visual, H.264/AVC and H.263 as video formats and AMR-NB, AMR-WB, AMR-WB+ and AAC for audio streams. 3GP also supports subtitles. 3GP files can be streamed as well as downloaded (in MMS-messages for example).
MPEG-21 File Format (MPEG-21 part 9) (based on MPEG-4 part 12, the ISO Base Media File Format)
See: MPEG-21 part 9
The MPEG-21 File Format is developed within the MPEG-4 standard. The file format uses the structural definition of a container based file, as defined in the ISO Base Media File Format, but without the extra definitions for time based media. The file format defines the storage of an MPEG-21 Digital Item plus additional (meta)data (such as pictures, movies) within one file. An MPEG-21 file exists of a generic meta-container which contains the MPEG-21 DID description of the resource and a list with all related resources. These resources can be put into a sub-container or can be represented by a link.
Motion JPEG2000 (based on MPEG-4 part 12, the ISO Base Media File Format)
See: Motion JPEG2000
Motion JPEG2000 is an object-oriented file wrapper based on ISO_BMFF, designed for time-based audio-visual information, including video, audio, and other tracks. All the data within a conforming file is encapsulated in boxes (called atoms in the other versions of this file format [i.e., QuickTime]). There is no data outside the box structure. All the meta-data, including that defining the placement and timing of the media, is contained in structured boxes. This specification defines the boxes. The media data (frames of video, for example) is referred to by this metadata. The media data may be in the same file (contained in one or more boxes), or can be in other files; the meta-data permits referring to other files by means of URLs. Tracks can be of various kinds. Three are important here. Video tracks contain samples which are visual; audio tracks contain audio media. Hint tracks are rather different; they contain instructions for a streaming server in how to form packets for a streaming protocol, from the media tracks in a file. Hint tracks can be ignored when a file is read for local playback; they are only relevant to streaming.
Ogg is a container format that was developed by the open source community around Xiph.org. Ogg is a generic encapsulation format that can encapsulate any time-continuously sampled data, but is targeted towards audio and video streams. Ogg has recently received an extension through the skeleton headers which enables it to describe its contents without having to decode the containing content tracks.
OGM, or OGG Media, is a container format that forms an extension of Xiph.org’s OGG container format, released in 2003. Media mappings for codecs into have been defined for Speex, Theora and Ogg Vorbis. OGM adds the support for VfW video codecs and ACM audiocodecs. OGM is mainly seen as a temporary step until the time that other container formats, such as Matroska, are mature and offer the same possibilities such as support for chapters, subtitles and multiple audio channels.
Matroska is a container format that is being developed by the open source community. At this moment the specifications are frozen meaning that no changes are possible anymore, only additions are still allowed. Matroska is based on the EBML standard (Extensible Binary Meta Language), which is a binary byte-aligned file format based on the principles of XML. A Matroska file contains a header, followed by a Metaseek section. The Metaseek section serves as an index to retrieve the different sections of the file. These sections can contain information for e.g. channel information, chapter information and tagging. A Matroska file can either contain audio (.MKA) or audio and video combined (.MKV). Nearly all video and audio encoding formats are supported. Video streams can be in MPEG-1, MPEG-2, MPEG-4, Quicktime, Real and Theora, audio streams can be in MPEG-1 Audio Layer 1-2-3, PCM, AC3, FLAC and AAC. Next to audio and video streams, a Matroska container can contain any type of files. The number of video and audio streams is unlimited and Matroska also supports subtitles, for which fonts can be added, streaming, chapters and DVD-like menus.
MXF, short for Material eXchange Format, is a standard for professional use and is build on a set of SMPTE standards. An MXF container contains a header, video and audio streams and an EDL, an Edit Decision List. This EDL contains the information used on audiovisual content editing systems and serves as a timeline. MXF containers can contain all video and audio formats. Furthermore MXF supports the addition of any type of file. This way anything from images to text can be added. MXF also supports streaming and some professional features like time coding. At the moment, MXF is used in professional cameras such as Sony’s XDCAM, though in slightly changed way. The standard also specifies a number of operational patterns ("OPs") intended to accommodate different levels of complexity in a file, e.g., one essence or multiple essences, "ganged" segments or a set of segments from which sub-segment are to be played, and so on. Application Specifications are specific profiles that are constrained to a certain OP, a particular encoding, metadata structure, etc. and other elements.
ASF, short for Advanced Systems Format, is a proprietary container format designed by Microsoft as a part of the Windows Media Framework in 2004. The main purpose of the format is streaming. There are two versions of ASF. Version 1.0 is commonly used but closed, apart from some details nothing is known about the file format. Version 2.0 is open but is merely used in practice. ASF supports nearly all video and audio formats via VfW and ACM but is mainly used in combination with Microsoft’s own formats. Subtitles and chapters are also supported. ASF also offers error correction techniques and a DRM framework.
AVI, short for Audio Video Interleave, is a container format developed by Microsoft in November 1992. AVI-containers can contain multiple audio and video streams. An AVI-container contains a header with information about the video, such as size and frame rate, and the actual data. An index is also supported, thus allowing navigation through the container. AVI containers support nearly all audio and video formats available through DMO, VfW and ACM. Coding formats that use B-frames are not supported natively, though hacks are used to add support. AVI is mainly used on Microsoft’s Operating Systems.
FLV, short for Flash Video, is a container format developed by Macromedia (now part of Adobe) in 2002. An FLV container can contain one video and one audio stream. Audio streams can be encoded in ADPCM, MP3, Linear PCM, Nellymoser, A-Law, µ-Law, AAC or device-specific sounds. Video streams can be encoded with H.263, VP6 or H.264/AVC. Plain Flash content can also be added to FLV containers. FLV containers can be downloaded, streamed or embedded in Flash animations. FLV is used in web browser based video services like Youtube.
RMFF, short for RealMedia File Format, is a generic multimedia container format developed by RealNetworks in 1998. The format is designed for streaming media data but allows local playback. The RMFF file format is data-independent, allowing any data type to be recorded, manipulated and played back. RealNetworks uses this format in its streaming solutions using the RealVideo 8, 9 and 10 video formats. For audio streams HE-AAC, Cook, Vorbis or RealAudio Lossless are being used.
WAV is a wrapper file format for audio. WAV stands for WAVeform Audio File Format. WAV is developed in 1991 by Microsoft and IBM, based on the RIFF file format. A WAV file consists of chunks. These chunks contain the audio and information about the audio. The audio is mainly encoded using Microsoft’s LPCM format for lossless compression but other audio formats are supported, e.g. MP3, µ-Law, A-Law, DPCM and ADPCM. As WAV containers support lossless compression and are supported widely, WAV containers are used very much for archival purposes. The maximal size of a WAV container is 4GB. This can be an issue when archiving long audio sequences in a lossless format. Therefore the EBU designed the RF64 format, which uses 64-bit pointers instead of 32-bits pointers allowing WAV container to be much bigger.
AIFF, short for Audio Interchange File Format, is developed by Apple in 1988. AIFF is comparable to the WAV-format. It is also based on the IFF-format family and the data is split into chunks. The main difference is in the storage of the samples, AIFF uses big-endian-byte sequences, WAV little-endian. With the transition to Mac OS X a new AIFF-format has been declared that also uses little-endian byte sequences. AIFF-containers can contain various audio bit streams, ranging from uncompressed waveform to MIDI. AIFF is primarily used by Apple users, mainly in professional audio work, as a master file as it can contain lossless audio.
XMF, short for eXtensible Music Format, is developed by the MIDI Manufacturers Association. The first specification was released in 2001. XMF-containers can contain one or more existing files, ranging from standard MIDI files, over DLS instrument files to WAV files and other digital audio files. These files are placed on a MIDI timeline. There are 4 different types of XMF files. Type 0 and 1 XMF files contain standard MIDI Files and/or custom DLS instruments in the file. Type 2 files, also called Mobile XMF files, were specifically designed for mobile phones. Type 3 files, also called audio clips for Mobile XMF files, turn XMF into a rich recorded music format by allowing audio clips to be placed on the MIDI timeline. These audio clips can be in any format registered with MMA/AMEI. Type 4 files add interactivity to audio content.
See: SUN's AU
The Au file format is a simple audio file format introduced by Sun Microsystems. The format was common on NeXT systems and on early web pages. Originally it was headerless, being simply 8-bit µ-law-encoded data at an 8000 Hz sample rate. Hardware from other vendors often used sample rates as high as 8192 Hz, often integer factors of video clock signals. Newer files have a header that consists of six 32-bit words, an optional information chunk and then the data (in big endian format). Although the format now supports many audio encoding formats, it remains associated with the µ-law logarithmic encoding. This encoding was native to the SPARCstation 1 hardware, where SunOS exposed the encoding to apps through the /dev/audio interface. This encoding and interface became a de facto standard for Unix sound.
TIFF stands for Tagged Image File Format. It is actually a container for images. By tagging you can freely specify the used compression technique as well as the used color space. All this information is stored in the header, which you can specify as you want. This means that TIFF can be used with or without compression. The TIFF format without compression is the most used format for archiving images. TIFF supports also multipage and multilayer. Of course, the freedom to specify your compression and color space can lead to incompability problems. To solve this issue, TIFF defines some basic profiles that most applications support.