State of the Art/Codecs
From Media Fragments Working Group Wiki
Video Coding Formats
See: H.261 Standard
H.261 is a video codec which belongs to the H.26x family of video coding standards in the domain of the ITU-T Video Coding Experts Group (VCEG). It is designed in 1990 for transmission over ISDN lines primarily for video conferences and video telephony. ISDN lines have data rates which are multiples of 64kbit/s. The algorithm operates at video bit rates between 40 Kbit/s and 2Mbit/s. H.261 was the first practical digital video coding standard. All subsequent international video coding standards have been based on the H.261 design. The coding algorithm uses a hybrid of motion compensated inter-picture prediction and spatial transform coding with scalar quantization, zig-zag scanning and entropy encoding.
MPEG-1 Video/MPEG-1 Part2
See: MPEG-1 Standard
MPEG-1 was designed in 1991. It was the first compression standard for audio and video which was developed by the Moving Picture Experts Group. Later on, the standard was used for the video cd format VCD. This video codec can only be applied to non-interlaced pictures. The format describes also the synchronization and multiplexing of video and audio, some procedures to test the conformance and the reference software.
See: MPEG-2 Standard
MPEG-2 is widely used as the format of digital television signals that are broadcasted by terrestrial (over-the-air), cable, and direct broadcast satellite TV systems. It also specifies the format of movies and other programs that are distributed on DVD and similar disks. As such, TV stations, TV receivers, DVD players, and other equipment are often designed for this standard. MPEG-2 was the second of several standards developed by the Moving Pictures Expert Group (MPEG) and is an international standard (ISO/IEC 13818). Parts 1 and 2 of MPEG-2 were developed in a joint collaborative team with ITU-T, and they have a respective catalog number in the ITU-T Recommendation Series. The Video section, part 2 of MPEG-2, is similar to the previous MPEG-1 standard, but also provides support for interlaced video, the format used by analog broadcast TV systems. MPEG-2 video is not optimized for low bit-rates, especially less than 1 Mbit/s at standard definition resolutions. However, it outperforms MPEG-1 at 3 Mbit/s and above. With some enhancements, MPEG-2 Video and Systems are also used in some HDTV transmission systems.
See: H.263 Standard
H.263 is a video codec standard originally designed as a low-bitrate compressed format for videoconferencing. It was developed by the ITU-T Video Coding Experts Group (VCEG) in a project ending in 1995/1996 as one member of the H.26x family of video coding standards in the domain of the ITU-T. H.263 was developed as an evolutionary improvement based on experience from H.261, the previous ITU-T standard for video compression, and the MPEG-1 and MPEG-2 standards. Its first version was completed in 1995 and provided a suitable replacement for H.261 at all bitrates. H.263 has since found many applications on the internet: much Flash Video content (as used on sites such as YouTube, Google Video, MySpace, etc.) is encoded in this format, though many sites now use VP6 encoding, which is supported since Flash 8. The original version of the RealVideo codec was based on H.263 up until the release of RealVideo 8. The codec was first designed to be utilized in H.324 based systems (PSTN and other circuit-switched network videoconferencing and video telephony), but has since also found use in H.323 (RTP/IP-based videoconferencing), H.320 (ISDN-based videoconferencing), RTSP (streaming media) and SIP (Internet conferencing) solutions.
MPEG-4 Visual/MPEG-4 Part2
See: MPEG-4 Standard
MPEG-4 is a collection of methods defining compression of audio and visual (AV) digital data. It was introduced in late 1998 and designated a standard for a group of audio and video coding formats and related technology agreed upon by the ISO/IEC Moving Picture Experts Group (MPEG) under the formal standard ISO/IEC 14496. Uses of MPEG-4 include compression of AV data for web (streaming media) and CD distribution, voice (telephone, videophone) and broadcast television applications. MPEG-4 absorbs many of the features of MPEG-1 and MPEG-2 and other related standards, adding new features such as (extended) VRML support for 3D rendering, object-oriented composite files (including audio, video and VRML objects), support for externally-specified Digital Rights Management and various types of interactivity. AAC (Advanced Audio Codec) was standardized as an adjunct to MPEG-2 (as Part 7) before MPEG-4 was issued. Initially, MPEG-4 was aimed primarily at low bit-rate video communications; however, its scope was later expanded to be much more of a multimedia coding standard. MPEG-4 is efficient across a variety of bit-rates ranging from a few kilobits per second to tens of megabits per second.
MPEG-4 AVC/MPEG-4 Part10
See: H.264 Standard
H.264 is a standard for video compression. It is also known as MPEG-4 Part 10, or MPEG-4 AVC (for Advanced Video Coding). As of 2005, it is the latest block-oriented motion-compensation-based codec standard developed by the ITU-T Video Coding Experts Group (VCEG) together with the ISO/IEC Moving Picture Experts Group (MPEG), and it was the product of a partnership effort known as the Joint Video Team (JVT). The intent of the H.264/AVC project was to create a standard capable of providing good video quality at substantially lower bit rates than previous standards (e.g. half or less the bit rate of MPEG-2, H.263, or MPEG-4 Part 2), without increasing the complexity of design so much that it would be impractical or excessively expensive to implement. An additional goal was to provide enough flexibility to allow the standard to be applied to a wide variety of applications on a wide variety of networks and systems, including low and high bit rates, low and high resolution video, broadcast, DVD storage, RTP/IP packet networks, and ITU-T multimedia telephony systems. Advanced Video Coding (AVC) is the new generation compression algorithm for consumer digital video. Compared to the current industry standard MPEG-2, AVC is at least twice as efficient at all bit rates. This means that AVC will open up channels to the end user that were previously closed for digital video services at the right quality. AVC offers significantly higher video resolution at the same bit rate, or the same video quality with half the bit rate that is required for MPEG-2.
Scalable Video Coding (MPEG-4 SVC) was defined as an amendment over MPEG4-AVC, providing efficient scalable representation of video by flexible multi-dimensional resolution adaptation. The interrelationship and adaptation between transmission/storage and compression technology is highly simplified by this scalable video representation, giving support to various network and terminal capabilities and also giving significantly increased error robustness by very simple stream truncation. The subset bitstream is derived by dropping packets from the larger bitstream. A subset bitstream can represent a lower spatial or temporal resolution or a lower quality video signal (each separately or in combination) compared to the bitstream it is derived from. In short, the following modalities are possible: temporal scalability, spatial scalability, SNR/quality/fidelity scalability & a combination of the 3 scalability modalities described above.
DivX is a brand name of products created by DivX, Inc. (formerly DivXNetworks, Inc.), including the proprietary DivX Codec which has become popular due to its ability to compress lengthy video segments into small sizes while maintaining relatively high visual quality. The DivX codec uses lmost of the lossy MPEG-4 Part 2 compression techniques, also known as MPEG-4 Visual, where quality is balanced against file size for utility. It is one of several codecs commonly associated with "ripping", whereby audio and video multimedia are transferred to a hard disk and transcoded. Many newer "DivX Certified" DVD players are able to play DivX encoded movies, although the Qpel and global motion compensation features are often omitted to reduce processing requirements. They are also excluded from the base DivX encoding profiles for compatibility reasons.
Xvid is a video codec library following the MPEG-4 standard. Xvid features MPEG-4 Advanced Simple Profile features such as b-frames, global and quarter pixel motion compensation, lumi masking, trellis quantization, and H.263, MPEG and custom quantization matrices. Xvid is a primary competitor of the DivX Pro Codec (Xvid being DivX spelled backwards). In contrast with the DivX codec, which is proprietary software developed by DivX, Inc., Xvid is free software distributed under the terms of the GNU General Public License. This also means that unlike the DivX codec, which is only available for a limited number of platforms, Xvid can be used on all platforms and operating systems for which the source code can be compiled.
Audio Video Standard (AVS) is a compression codec for digital audio and video, and is competing with H.264/AVC to potentially replace MPEG-2. Chinese companies own 90% of AVS patents, as it was imposed by China to define an own codec to be able to get rid off the patent payments of e.g. H.264/AVC. AVS is currently expected to be approved for the Chinese high-definition successor to the Enhanced Versatile Disc. Open source implementations of an AVS video decoder can be found in the OpenAVS project and within the libavcodec library. The latter is integrated in some free video players like MPlayer, VLC or xine. xAVS is also an open source AVS encoder with a working decoder. China's proposed high-definition video disc format, known as CBHD (China Blue High-Definition), will include support for AVS. The audio and video files have an .avs extension as a container format.
See: BBC's DIRAC
Dirac is an advanced royalty-free video compression format designed by the BBC for a wide range of uses, from delivering low-resolution web content to broadcasting HD and beyond, to near-lossless studio editing. It has been developed to address the growing complexity and cost of current video compression technologies, which provide greater compression efficiency at the expense of implementing a very large number of tools. It was presented by the BBC in January 2004 as the basis of a new codec for the transmission of video over the Internet. The codec was finalized on January 21, 2008, and further developments will only be bug fixes and constraints. The immediate aim is to be able to encode standard digital PAL TV definition (720 x 576i pixels per frame at 25 frames per second) in real time; the reference implementation can encode around 17 frames per second on a 3 GHz PC but extensive optimization is planned. Dirac is a powerful and flexible compression system. A key element of its flexibility is its use of the wavelet multi-resolution transform for compressing pictures and motion-compensated residuals, which allows Dirac to be used across a very wide range of resolutions without enlarging the toolset.
The Dirac Pro specification describes a sub-set of the main Dirac Specification, and is aimed at high bitrate I-Frame only applications for studio and professional use. This specification is being considered for approval as SMPTE VC-2 (VC-2 by SMPTE).
See: Motion JPEG Standard
In multimedia, Motion JPEG (M-JPEG) is an informal name for multimedia formats where each video frame or interlaced field of a digital video sequence is separately compressed as a JPEG image. It is often used in mobile appliances such as digital cameras. Motion JPEG uses intraframe coding technology that is very similar in technology to the I-frame part of video coding standards such as MPEG-1 and MPEG-2, but does not use interframe prediction. The lack of use of interframe prediction results in a loss of compression capability, but eases video editing, since simple edits can be performed at any frame when all frames are I-frames. Video coding formats such as MPEG-2 can also be used in such an I-frame only fashion to provide similar compression capability and similar ease of editing features. Using only intraframe coding technology also makes the degree of compression capability independent of the amount of motion in the scene, since temporal prediction is not being used. However, although the bitrate of Motion JPEG is substantially better than completely uncompressed video, it is substantially worse than that of video codecs which use inter-frame motion compensation such as MPEG-1. (One exception may be in surveillance cameras which only take one frame per second, in which time there could be large amounts of motion which MPEG could not compensate for.) There exists a more advanced version of this codec which uses JPEG2000 compression instead of JPEG. This compression format is primarily used in digital cinema. It also is under consideration as a digital archival format by the Library of Congress.
See: VC1 Standard
VC-1 is the informal name of the SMPTE 421M video codec standard initially developed by Microsoft. It was released on April 3, 2006 by SMPTE. It is now a supported standard for Blu-ray Discs and Windows Media Video 9. VC-1 is an evolution of the conventional DCT-based video codec design also found in H.261, H.263, MPEG-1, MPEG-2, and MPEG-4 Part 2. It is widely characterized as an alternative to the latest ITU-T and MPEG video codec standard known as H.264/MPEG-4 AVC. VC-1 contains coding tools for interlaced video sequences as well as progressive encoding. The main goal of VC-1 development and standardization is to support the compression of interlaced content without first converting it to progressive, making it more attractive to broadcast and video industry professionals. Microsoft has designated VC-1 as the Xbox 360 video game console’s official video codec, and game developers may use VC-1 for full motion video included with games.
Theora is an open and free lossy video compression technology being developed by the Xiph.Org Foundation as part of their Ogg project. Based upon On2 Technologies' VP3 codec, Theora competes with MPEG-4, WMV, and similar low-bitrate video compression schemes. The compressed video can be stored in any suitable container format. Theora video is generally included in the Ogg container format and is frequently paired with Vorbis format audio streams. The combination of the Ogg container format, Theora video and Vorbis audio allows for a completely open, royalty-free multimedia format. Other multimedia formats, such as MPEG-4 video and MP3 audio, are patented and subject to license fees for commercial use.
RealVideo is a proprietary video format developed by RealNetworks. It was first released in 1997 and as of 2008 is at version 11. RealVideo is supported on many platforms, including Windows, Mac, Linux, Solaris, and several mobile phones. The first version of RealVideo was announced in 1997 and was based on the H.263 codec. RealVideo continued to use H.263 until RealVideo 8, when the company switched to a proprietary video codec. RealVideo codecs are identified by four character codes. RV10 and RV20 are the H.263-based codecs. RV30 and RV40 are RealNetworks' proprietary formats. RealVideo 10 uses RV40. RealVideo can be played from a RealMedia file or streamed over the network using the Real Time Streaming Protocol (RTSP), a standard protocol for streaming media developed by the IETF. However, RealNetworks uses RTSP only to set up and manage the connection. To facilitate real-time streaming, RealVideo (and RealAudio) normally uses constant bit rate encoding, so that the same amount of data is sent over the network each second. Recently, RealNetworks has introduced a variable bit rate form called RealMedia Variable Bitrate (rmvb). This allows for better video quality, however this format is less suited for streaming.
See: Sony's DV
Digital Video is a digital video format developed by Sony in 1995 for storage on tape. Since then is has become the standard for semi professional video production. The specification describes not only the compression but also the tape to be used. The compression techniques used with DV are intraframe at a fixed bitrate of 25 MBits/s. At the same bitrate performs DV slightly better than the older MJPEG and is equivalent to the video quality of intraframe MPEG-2. The DCT transformation is specially adapted for storage on tape. For professional use there exist some variants of the codec DVCAM of Sony and DVCPRO of Panasonic.
See: Sony's Betacam
Betacam is, like DV, a compression technique for storage on tape. It is also developed by Sony. The first release of betacam was in 1982. It was an analogue video format. In 1993 the digital format was launched: Digital Betacam (digibeta). In comparison to DV it also uses some temporal compression leading to a sequence of I and B frames. It is mainly used as digital video format with the broadcasters. Other developments of Betacam are: Betacam SX, a cheaper version of digibeta, MPEG IMX, which uses MPEG compression like Betacam SX but at higher bitrates and HDCAM, a high definition version of the digital betacam format.
OMS Video is a free, open source, royalty-free, video codec currently under development by Sun Microsystems's Open Media Commons as part of the Open Media Stack. It was announced on Apr 11 2008. OMS Video is based on an updated version of the H.261 codec as the patents on it have now expired. Vorbis is currently planned for use as the audio codec.
See: FFMPEG's SNOW
Snow is an experimental video codec developed by Michael Niedermayer for the FFmpeg package. It can compress video either lossily or losslessly. Snow implements wavelet-based compression, aiming for good image quality at very low bitrates. It is open source licensed under the LGPL. Snow is similar to Tarkin, Dirac, and numerous other wavelet-using codecs. FFmpeg aims to get the codec to become a Request for Comments (RFC). Therefore, a version 1.0 has to be finished. The following open-source video editing programs can encode to the Snow format: FFmpeg, Avidemux, LiVES, MeGUI, VirtualDubMod with ffdshow tryouts, MEncoder
Audio Coding Formats
MPEG-1 audio layer 2 & 3/MP2 & MP3
See: MP2 & MP3 Standard
MP2 and MP3 are both standards of MPEG. It involves MPEG-1 Audio Layer 2 and Layer 3. It are lossy audio compression formats. The algorithm is based on a psycho-acoustic model. In that model a principal is used that that says when there is a signal which is dominant at a certain frequency, then the neighbouring frequencies won`t be heard by humans. This is the principal MP2 and MP3 make use of. MP2 divides the frequencies in 32 subbands, MP3 in 576 subbands. On top of that MP3 uses entropy encoding so MP3 has greater compression ratios than MP2 for the same audio quality. Despite the higher frequency resolution of MP3 it doesn`t always perform better than MP2. MP3 uses filterbanks which aren`t always efficiently implemented. This is the reason why MP2 performs better when it comes to impulses (short, complex soundwaves). Besides of that is MP2 also more resistant to transmission errors. That is why MP2 more used by broadcasters, while MP3 is the dominant audio compression format for internet and PC applications.
See: AAC Standard
AAC has been standardized by ISO and IEC, as part of the MPEG-2 & MPEG-4 specifications. It is an enhancement of MP3. With AAC some performance issues are solved that MP3 suffered from. So AAC has beter compression ratios than MP3 for the same quality. On top of that AAC supports up to 48 audio channels, while MP3 only support 5.1 (6 audio channels). AAC is already being used by iPhone, iPOD, iTunes, MPEG-4, Playstation 3, Playstation portable, Sony walkman and Nintendo Wii. It will become the substitute for MP3.
See: Ogg Vorbis
Ogg Vorbis is a fully open, non-proprietary, free, general-purpose compressed audio format for mid to high quality (8kHz-48.0kHz, 16+ bit, polyphonic) audio and music at fixed and variable bitrates from 16 to 128 kbps/channel. This places Vorbis in the same competitive class as audio representations such as MPEG-4 (AAC), and similar to, but higher performance than MPEG-1/2 audio layer 3, MPEG-4 audio (TwinVQ), WMA and PAC. OGG stands for the format which can include many different parts. Vorbis is an audio part of it. The compression techniques used are similar to those used with MP3. It is also based on psycho-acoustic models and drops information that is not perceptible by the human ears.
Flac stands for Free Lossless Audio Codec. Flac is the fastest and most supported lossless audio compression format. It is free and open. FLAC gets compression ratios of 30-50%, where MP3 reaches 80% (but is lossy). FLAC is supported by most media players like Winamp, XMMS, Media Player Classic, Foobar2000 and Songbird. Flac supports up to 8 channels, so it can be used for surround sound. The latest release of FLAC was FLAC 1.2.1 and was released in September 2007.
Speex is a free software speech codec from the Xiph open source community that may be used on VoIP applications and podcasts. It may be used with the Ogg container format or directly transmitted over UDP/RTP. The Speex designers see their project as complementary to the Vorbis general-purpose audio compression project. Speex is a lossy format, meaning quality is permanently degraded to reduce file size. Unlike many other speech codecs, Speex is not targeted at cellular telephony but rather at Voice over IP (VoIP) and file-based compression. The design goals have been to make a codec that would be optimized for high quality speech and low bit rate. To achieve this the codec uses multiple bit rates, and supports ultra-wideband (32 kHz sampling rate), wideband (16 kHz sampling rate) and narrowband (telephone quality, 8 kHz sampling rate). Designing for Voice over IP (VoIP) instead of cell phone use means that Speex must be robust to lost packets, but not to corrupted ones since the User Datagram Protocol (UDP) ensures that packets either arrive unaltered or do not arrive at all. All this led to the choice of Code Excited Linear Prediction (CELP) as the encoding technique to use for Speex. One of the main reasons is that CELP has long proven that it could do the job and scale well to both low bit rates (as evidenced by DoD CELP @ 4.8 kbit/s) and high bit rates (as with G.728 @ 16 kbit/s).
See: Dolby Digital
Dolby Digital is the name for a whole series of lossy audio compressions which are developed by the Dolby Laboratories. AC-3 is the most popular of them. It supports up to 6 discrete audio channels (5.1 surround sound) and a sample rate up to 48 kHz. AC-3 is mainly used in digital cinema`s and on DVD`s. The more advanced codecs Dolby Digital Plus and Dolby TrueHD (lossless) are now supported by Blu-ray and HD-DVD.
See: True Audio
TTA stands for True Audio. It is a free, lossless audio compression format. The compression reached with this format is up to 30%. TTA performs lossless compression on multichannel 8, 16 and 24 bit data of uncompressed wav input files. The technique that is used for compression is based on prognose adaptive filtering. TTA also supports ID3 tags like MP3.
Windows Media Audio
WMA is an audio compression format developed by Microsoft. The format is closed. This codec has rapidly become a standard because it was included in the windows media players. The techniques used are the same as with MP3. So it is also based on a psycho-acoustic model. It drops all the frequencies under 20 Hz and above 20 kHz. The compression ratio is bigger then with MP3. WMA is mainly designed for streaming at low bitrates. It is not the preferred streaming format because it is platform and player dependent.
Meridian Lossless Packing, also known as Packed PCM (PPCM), is a proprietary lossless compression technique for compressing PCM audio data developed by Meridian Audio, Ltd. MLP is the standard lossless compression method for DVD-Audio content (often advertised with the Advanced Resolution logo) and typically provides about 2:1 compression on most music material. All DVD-Audio players are equipped with MLP decoding, while its use on the discs themselves is at their producers' discretion. Dolby TrueHD, used in Blu-ray, employs MLP, but compared with DVD-Audio, adds higher bit rates, 8 full-range channels, extensive metadata, custom speaker placements (as specified by SMPTE), and timecode.
Image Coding Formats
See: JPEG Standard
JPEG, developed by ISO/ITU, was the first wide spread compression standard for images. It was designed for digital images and is now the standard for images on the internet and as storage format for digital cameras. JPEG can reach compression ratios of 10:1, but it is dependent of the image`s details and the high frequency components in the image. JPEG scores much better with photos than with graphs, lines and letters. That is because JPEG uses DCT-transformation which doesn`t work well with high frequency components inside an image.
See: JPEG2000 Standard
JPEG2000 is a development of the JPEG format. The main focus here is not compression, but quality and functionality. In JPEG the DCT-transformation doesn`t work well under all circumstances and it is replaced JPEG2000 by a wavelet transformation. Because of this it handles better with lines, letters, graphs, … So JPEG2000 scores much better certainly at low bitrates than JPEG. Some of the functionalities that JPEG2000 supports are: compression of continuous tint images, a great dynamic range of the pixels, big images, progressive transmission and region of interest coding.
JPEG-LS is the current ISO/ITU standard for lossless compression and is a part of the recommendations ISO made for better compression on medical images. Lossless compression is of great importance when it comes to images that contain critical information. The other lossless compression techniques like GIF and PNG are only efficient with images that use a limited number of colors. JPEG LS makes it possible to compress images lossless that contain more colors. JPEG LS provides also the possibility for almost lossless compression in which the reconstructed samples can have a maximal deviation of the original image.
HD Photo/JPEG XR
See: HD Photo
HD Photo is an image compression technique, developed and patented by Microsoft. It is announced by Microsoft and JPEG to be considered for a JPEG standard, titled JPEG XR. HD Photo is an image codec that gives a high-dynamic-range image encoding while requiring only integer operations (with no divides) for both compression and decompression. It supports monochrome, RGB, CMYK and even n-channel color representation, using up to 16-bit unsigned integer representation, or up to 32-bit fixed point or floating point representation, and also supports RGBE (Radiance). It may optionally include an embedded ICC color profile, to achieve consistent color representation across multiple devices. An alpha channel may be present for transparency. All color representations are transformed to an internal color representation. The transformation is entirely reversible, so, by using appropriate quantizers, both lossy and lossless compression can be achieved.
Graphics Interchange Format was a very popular compression technique for images. It is a lossless compression format, but supports only 256 colors. These colors are picked from a pallet of 262.144 different colors. It uses 8 bits to represent a color. The compression is based on the number of colors and their partitioning in horizontal direction. When an image has a limited number of colors with some repeating patterns, GIF can achieve some good compression results. When there are a lot of colors JPEG or PNG which use 24 bits per color are by far a better choice.
PNG is a lossless compression technique. PNG stands for Portable Network Graphics. The PNG format exist from 1995. It was developed to offer a patent free alternative for GIF, which uses an LSW-compression algorithm that was then still patented. That is why is was used a lot in open source applications. PNG supports more than 16 million colors, but it can also use a pallet of 256 colors. By doing this, the number of bits representing a color can be lowered and greater compression is achieved. This way PNG can be used for lossless compression and for lossy compression as well to limit the storage space needed. PNG also supports transparency and animation.