MPEG-4: a Powerful Standard for Use in Web and Television Environments

MPEG-4: a Powerful Standard for Use in Web and Television Environments

Rob Koenen, KPN Research, the Netherlands (chairman MPEG Requirements Group) r.h.koenen@research.kpn.com

Abstract and Introduction

MPEG-4 is a new multimedia standard that is designed for use in broadcast, interactive and conversational environments. The way MPEG-4 is built allows MPEG-4 to be used in Television and Web environments, not just the one after the other, but also facilitates integration of content coming from both channels in the same multimedia ‘scene’. It’s strong points are inherited from the successful MPEG-1 and -2 standards (broadcast-grade synchronisation and the choice of on-line/off-line usage) and VRML (the ability to create content using a ‘scene description’).

MPEG-4 adds to MPEG-1 and -2:

Integration of natural and synthetic content, in the form of ‘objects’. Such objects could represent 'recorded' entities (a person, a chair) or synthesised material (a voice, a face, an animated 3D model);
Support for 2D and 3D content;
Support for several types of interactivity;
Coding at very low rates (2 Kbit/s for speech, 5 Kbit/s for video) to very high ones (5 Mbit for transparent quality Video, 64 Kbit/s per channel for CD quality Audio).
Support for management and protection of intellectual property

MPEG-4 adds to VRML:

Native support for natural content and real-time streamed content, using URLs
Efficient representation of the scene description

Several forms of scalability support usage over networks with a bandwidth that is unknown at the time of encoding.

MPEG-4 preserves compatibility with major existing standards: MPEG-1, MPEG-2, ITU-T H.263, and VRML. MPEG-4 Version 1 is virtually ready (to be fixed in October this year), and the backward compatible MPEG-4 Version 2, which will extend the capabilities of the standard, will be finalised end of '98.

While the full MPEG-4 toolbox is very rich and powerful, it will be too expensive to implement in full for many applications. That is why MPEG has defined 'Profiles', which group the capabilities in useful subsets. This means that the standard is useful for simple applications now, but can still be used with Web content getting richer and Set Tops getting more powerful. The figure below shows how an MPEG-4 device (hardware or software) can be built by choosing Audio, Visual and Graphics Profiles, and combining those with a System Profile. Note that e.g. an 'Audio-only' device can be a perfectly valid MPEG-4 appliance; you don’t need to use all 4 parts if you don't want to.

Integrating Several Types of 'Objects' to Create 'Multimedia'

MPEG- is does not specify one single way of coding audio and visual information, but rather a toolbox of different coding methods, used for different types of content. Different content types have their own optimised encoder. In this way, multimedia is much richer than what can be obtained by adding sound, (moving) images and text, one type of each.

The following types of synthetic content are supported:

'Structured Audio' (SA). SA specifies an extremely bandwidth efficient representation for creating rich synthetic audio content. SA is harmonised with MIDI, which it includes as a subset. It includes a 'Score Language' and an 'Orchestra Language';
Facial animation. This can used in harmony with the MPEG-4 Text-to-speech Interface, which can be used to transmit text and attributes necessary for the correct reproduction by a (proprietary) Text-to-Speech system. This includes things like language, gender, prosody, etc. MPEG is currently working on Body animation, which will be added in Version 2;
2 D meshes with textures mapped onto them. Version 2 will add 3D meshes;
Scalable textures, with support for view-dependent scalability;
VRML-like content (lines, circles, boxes, text, etc.).

MPEG-4 Video supports interlaced as well as progressive content. Video content is not restricted to rectangles of a predefined aspect ratio: objects of any shape can be represented, by sending explicit shape information. Coding is currently is optimised for 5 Kbit to 5 Mbit per second, but the expectation is that MPEG-4 will be extended to also cover studio quality coding.

MPEG-4 Audio covers the range from the extremely low bitrates (mainly for speech and synthetic audio) to transparent quality, multichannel audio. An exceptional speech quality has been demonstrated at a mere 2 kbit/s.

The MPEG-4 Systems Layer takes MPEG from 'reproduction' to 'interaction'

The Systems layers for MPEG-1 and -2 were created for 'reproduction' of content. The MPEG-4 Systems layer has been defined from scratch, to support interactive content consumption, and the integration of the different content types in one scene. MPEG-4 does, however, re-use MPEG-2's synchronisation and buffering technology, which has proven itself in demanding broadcasting environments. The Scene Description makes it possible to integrate content coming from several sources; these sources could be broadcast channels, interactive servers or local storage. Using URLs, content from other sources can be 'included' in a scene.

A file format ('mp4') will be standardised in Version 2. It is designed for interchange and streaming of MPEG-4 objects.

MPEG has chosen not to specify transport mechanisms; it can be used with many different Systems. IP has, however, been on the mind of the developers as one of the most important transport mechanisms, and the Systems layer was built to suit IP transport well. Both Push and Pull paradigms are supported.

The MPEG-4 DMIF layer is designed to separate the application from the network. This means that the application need not know whether the objects come from an interactive server, a broadcast channel or are available locally, allowing content producers to develop for one medium and publish to another (taking into account, of course, some inherent restrictions inherent to the medium).

A separate paper on the workings of MPEG-4 Systems has been submitted to this workshop by Olivier Avaro.

Manage and Protect Intellectual Property - or Fail

The combination of high compression efficiency that coding systems offer, the ever increasing transmission bandwidth and the rapid decrease of the storage cost bring about a very serious threat to multimedia services. Whoever wants to create and make available professional content had better think twice about making it available in digital form. There is a large possibility that it will be copied and illegally redistributed - with a loss of revenues for all parties that hold rights. MPEG recognised this threat, and MPEG-4 will (in Version 2) have the necessary hooks for what MPEG calls 'Intellectual Property Management and Protection (IPMP). While MPEG-4 - by explicit choice - does not specify anything about the nature of the protecting system, the hooks allow seamless integration of these proprietary systems with the MPEG-4 system.

There is another side to protecting intellectual property: that of patents in decoders. Software-decoders are easily copied, and patent holders will not see any return on their investments. The IPMP technology built into MPEG-4 can be used for protecting this type of IP as well.

Where is the Interesting Content anyway??

Because there is just so much of it becoming available, it is getting more and more difficult to find multimedia content - be it on the Web, or be it in a broadcast environment. This is why MPEG has started MPEG-7, to define a 'Multimedia Content Description Interface'. MPEG-7 understood, however, that waiting for the next century would be too long for many applications, and this is why MPEG-4 includes a special stream type with 'Object Content Information' (OCI). OCI can carry textual information associated with MPEG-4 scenes or even objects.

Background - MPEG standards are built by many companies and widely supported

MPEG has built the MPEG-1 and MPEG-2 standards, which are very successful in meeting market requirements in virtually all areas of multimedia. The standards are developed by more than 300 people coming to meetings, of over 200 companies and more than 20 countries, following a tight schedule of 5 one week meetings a year. Many more people work on MPEG-4 ‘at home’ by email in 'Ad Hoc Groups'.

Further reading

More material on the MPEG-4 standard and other MPEG sndards can be found at the MPEG home page:

http://www.cselt.it/mpeg

Other MPEG-4 pages:

MPEG-4 Systems: http://garuda.imag.fr/MPEG4/

MPEG Video: http://www.hhi.de/mpeg-video

Synthetic content in MPEG-4: http://www.es.com/mpeg4-snhc/

MPEG Audio: http://www.tnt.uni-hannover.de/project/mpeg/audio/