Real-time extensions for HTML on interactive digital television

Kimmo Djupsjöbacka, Nokia Multimedia Network Terminals

Kimmo Löytänä, Nokia Multimedia Network Terminals

Tommi Riikonen, Nokia Multimedia Network Terminals

Abstract

In this paper, the use of HTML for supporting real time elements in broadcast and networked environments for digital TV is discussed. For television, video is the most natural media type followed by audio and graphics. Text is often the most difficult media, not only due to technical display problems, but also because of the long distance between the TV set and the user. The flexibility and simplicity of HTML offers very interesting possibilities for creating interactive applications for digital television. Some new features would be needed however, most clearly some way of controlling and addressing real time elements not only on the internet, but also in broadcast environments. The needs for these new features are discussed and some proposals are made in this paper.

Introduction

Digitalization of TV broadcasting in Europe started at the beginning of 1990, when ETSI (European Telecommunications Standards Institute) adopted a 34 Mbit/s digital encoding and decoding standard. This service allowed broadcasters to send one broadcast quality program and it was, and still is, mainly used for transmitting program streams between different stations, e.g. from satellite to the cable network's head-end equipment or between different terrestrial stations. Further development of compression technology (MPEG-2) lead to the possibility of multiplexing several programs in a single transport stream, allowing up to 8 broadcast quality programs to be sent in one 45 Mbit/s stream. This means that it is possible to send about 8 programs in the same bandwidth that was previously reserved only for 1 analog channel. With a little worse picture quality, even 16 or more programs can be included in a single transport stream. Development has also lead to a situation where decoding technology is affordable even for the domestic receiver market.

Digitalization does not only mean more channels in the same bandwidth but also data to television. In the early 90's, video-on-demand (VOD) seemed to be an obvious application for interactive television, but after a few years of trials, affordable VOD-services still seem to be five years ahead. During the last few years, the internet and the Web have been hot topics and also economically very promising technology, so it is not surprising that the internet discussion is pushing its way also into the interactive television world.

Television and internet engineering cultures are traditionally very different. Television broadcasting is based on long lasting rarely changed standards, with products created for mass markets. Internet technology is based on constantly developing standards and solutions, with services traditionally created for individuals rather than for the public at large. Television sets are expected to last for over 10 years, but PCs should be updated every second year. During the last two years, these two worlds have come clearly closer to each other. Internet access and the Web are aggressively marketed to every home, and television is experiencing its biggest change since the introduction of color-TV by promising individualized and interactive services for former couch potatoes.

Standardization organizations of interactive digital television

There are three organizations of interest when talking about interactive services and the internet in digital television.

The first is the ISO's MPEG (Motion Picture Expert Group) work group. MPEG-1 defines a standard for video amd audio coding, primarily meant for stand-alone PC and video-CD applications, with data rates of around 150 kB/s (about 1.2 Mbits/s). Picture quality is comparable to VHS-video. MPEG-2 has been developed for broadcast quality video coding with high speed data rates of up to 15 Mbits/second and even higher. MPEG-2 doesn't define only video and audio coding, but it also defines how data should be included in multiplexed audio and video streams, which can form either one program stream or transport stream consisting of several not necessarily related video, audio and data streams. In addition to these, MPEG-2 defines also a standard for controlling the presentation of these streams with a protocol called DSM-CC for Digital Storage Media - Command and Control.

DSM-CC offers several types and levels of interface for developing interactive services for digital television. The DSM-CC object carousel can be seen as an object based file structure for broadcast data. The DSM-CC object carousel also provides a mechanism for pointing to video streams as DSM-CC objects. The DSM-CC user-to-user application interface can be used to access these object carousels, although the same interface is also used with fully interactive services in bi-directional network environments. DSM-CC has also defined data carousels, which can be used for broadcasting any data without a hierarchical structure. The data carousel is a kind of lower level protocol compared to the object carousel, and is meant for several kind of downloading purposes.

The DVB (Digital Video Broadcast) project was started in Europe in 1993. It aims to make a proposal for the technical specification of digital broadcasting, which would then be recognized by official standardization bodies like ETSI. Representatives from all sectors of the television programme chain are taking part in work on DVB. These include broadcasters and programme producers, transmission companies and satellite operators, equipment and consumer electronics manufacturers and representatives of government departments. There are currently over 200 members from over 25 countries. DVB has defined transmission standards for satellite and cable networks and will soon complete specifications for terrestrial and MMDS (often called wireless cable) networks. In addition to technical specifications for modulation etc. in different networks, DVB has defined, for example, the use of SI (Service Information) and PSI (Program Service Information) in order to gain better interoperability between different operators and terminals and to allow easy navigation through a vast number of different services. DVB is based on the MPEG-2 video coding standard and it has adopted DSM-CC, including object carousel and data carousel, in it's specifications for interactive services. DVB specifications are also starting to be used in many other countries in the Far East and South America and also by several operators in the US.

DAVIC (Digital Audio-Visual Council) is an international consortium for defining common methods of creating end-to-end solutions for interactive television environments. DAVIC also has members from all over the world, but they are perhaps more manufacturing and computer oriented than members of DVB. DAVIC is originally more oriented towards bi-directional services and networks than broadcast environments, whereas DVB started from broadcasting and is now moving towards more interactive services. DAVIC is trying to select the most suitable solution for every part of the end-to-end concept. The idea is to use existing technologies whenever possible. DAVIC has included most of the DVB specifications as the basis of its own, and also parts from, for example, the DSM-CC, TCP/IP, OSI, MPEG and MHEG standards. HTML was chosen for presenting text with different fonts like headers and italics.

All three standardization organizations work rather closely together and many companies take an active part in the work of all three organizations. DAVIC has also contacted W3C for closer cooperation in defining how to use HTML in interactive television environments.

Different uses of HTML-based services in interactive digital television

Since HTML has been one factor in making the internet so popular, it can be expected to play an important role in digital television as well. The main benefits of HTML have been simplicity, platform independency and its capability of pointing to almost any object on the internet. Lately, platform independency has been partly sacrificed in order to gain better control of the appearance of the document, and increased complexity has been compensated for with better tools for document or application development. New terminal types like portable devices and and devices using TV sets as display have limited screen resolution and often cannot display the most advanced features defined for HTML browsers on PC platforms. In the case of television, this limitation is not only caused by the technical limitations of a TV screen, but also through the long distance between the screen and the viewer. But in comparison to a PC, television has excellent capabilities for showing video material; therefore many HTML services on TV will be based on video material. This means that we need good possibilities for controlling, addressing and linking from these real time streams. Most of the ideas presented here apply to audio streams and traditional computer environments as well. After all, HTML is a way of defining a document, or should we say an application, and the extensions defined should be usable on different platforms and network structures.

One possible use for HTML on television is advanced teletext functions. In analog television, teletext has been used for different services for many years, especially in Europe. In the USA, it has been less popular, partly because there has not been an agreed standard on its usage. In analog television, teletext is implemented by sending data in a limited bandwidth area called the VBI (vertical blanking interval) together with the broadcast video stream. This has allowed broadcasters to send text and limited graphics for program information, news and simple advertisements. It can also be used, for example, in subtitling services. User interactivity is usually limited to selecting the desired page by entering a three digit number with a remote controller. In the US there are also some more advanced services based on the VBI and modem connections or cable return paths, but these proprietary solutions have not become very popular and they often demand additional hardware to be connected to the analog TV set.

Digitalization of television changes the possibilities of teletext type services in two ways. The bandwidth used for teletext services can be decided by the broadcaster, since the multiplexed stream can hold video, audio and data information in any proportion. Secondly, digital receivers always contain rather powerful processors in order to handle SI information and display it using an EPG (Electronic Program Guide). This processor can also be used to support more advanced interactivity than traditional teletext offers. HTML is a very good candidate for creating these applications for the same reasons that made it popular on the internet.

Typical advanced teletext services could include interactive news, extended EPG services and home shopping applications. An example application could be a normal TV program or advertisement on which a link to additional information is shown at a specific time. If the user selects the link, he or she sees new HTML either on top of the same TV program or is taken to a completely new screen. This additional information can be either broadcast or fetched from the network depending on the type of infrastructure used. However, this type of application would demand that we define common methods for pointing to different pages and other elements in broadcast environments (discussed in the chapter on URL addressing) and a definition for scheduling video based events (discussed in the chapter on Scheduling). Since video is the most natural media element for television, we also need methods to control the video streams played from local devices, such as hard disc, CD-ROM, Video-CD and DVD, as well as video streams delivered over various network structures. These requirements are discussed in the following chapter.

Video control tags

It is quite easy to define what kind of features we need for controlling video streams. For television, an HTML document can often be just a video with buttons to pause and play, and a possibility to jump to the next document. In this case it makes little sense to use an external helper application for video control. It should also be possible to control several video streams from the same document and it should also be possible to define video to be played in the background, without need for user intervention.

The usual VCR control possibilities, namely: play, stop, pause, fast forward, slow and rewind are the most obviously needed control mechanisms. It would be also nice to be able to define the speed of the fast forward, slow and rewind functions. The possibility to start playing from a defined place and play until a defined place, and looping are also needed. It should also be possible to show a selected frame as is done with e.g. Windows MediaPlayer using the slider. To support several simultaneous videos from the same document, it would be good to use identifiers instead of using URL addresses in all tags. The identifier could be tied to a specific URL in a separate tag. We also need to define a background video tag.

We have defined and used tags like video:///videos/myvideo.mpg for opening a video stream and control:///play and control:///pause for controlling it.

These definitions were made before discussions of object tag were started. Object tag would seem to give the required possibilities at least for identifying the object.

<OBJECT ID="vid1" DECLARE DATA="dvb:///right_path/myvideo.mpg"

TYPE="application/mpeg2video" >

-- here we could also define the display size for the object with WIDTH and HEIGHT

</OBJECT>

The actual control buttons could then be declared in forms or other object tags. The problem is how to inform the object which control option we want to use and how to allow the user to set other possible parameters like speed. One possible solution might be to use video players as objects and then use a technique, similar to Java applets, for specifying the method like #vid1.play and #vid1.pause. PARAM elements could be used for setting speed or looping parameters.

Scheduling

There are at least two cases in which we could use scheduling with video streams. We may want to define when and for how long each page should be shown related to a video stream shown on the background or in a video window. We may also want to define what would happen if we select a link during the defined time. In the simpler case, the same link could be shown on screen all the time, e.g. "Press OK for additional information", but the reference address would depend on the time when the selection occurs. This could be extended to allow several hot spots on the video image and selection would depend on where and when the mouse button is pressed. This would be an image map with temporal and spatial dependency. We could follow HyTime specifications or add TIME and TIMETYPE attributes to the AREA specifications. The reference information could be provided in the object itself, in an external file with USEMAP or through a server. Using a server might be difficult because of time constraints.

Here is an example of how we could define a video object with time and space related links. This is a modification of examples found in "Inserting objects into HTML, WD-object-960412, http://www.w3.org/public/WWW/TR/WD-object.html".

First, with a separate modified map file implementation:

<OBJECT ID="movie1" DATA="video:///movies/myvideo.mpg" TYPE="application/mpeg2video" USEMAP="#imap1">

</OBJECT>

<MAP NAME=imap1>

<AREA SHAPE=rect HREF="video:///movie2.mpg" COORDS="0,0,100,100" TIME="0.0,10.0" TIMETYPE=SEC ALT=Movie2 >

<AREA SHAPE=rect HREF="video:///movie3.mpg" COORDS="0,0,100,100" TIME="10.0,20.0" TIMETYPE=SEC ALT=Movie3 >

</MAP>

Another possibility would be to extend the anchor element to permit four new attributes: SHAPE, COORDS, TIME and TIMETYPE:

<OBJECT ID="movie1" shapes

DATA="video:///movies/myvideo.mpg" TYPE="application/mpeg2videomap"

<A HREF="video:///movies/movie2.mpg" SHAPE=rect COORDS="0,0,100,100" TIME="0.0,10.0" TIMETYPE=SEC >Movie 2</A>

<A HREF="video:///movies/movie3.mpg" SHAPE=rect COORDS="0,0,100,100" TIME="10.0,20.0" TIMETYPE=SEC >Movie 3</A>

</OBJECT>

Instead of a TIMETYPE attribute, it might be easier to use "standard time units" (comparable to standard units for lengths) like ms for milliseconds, s for seconds, etc. Instead of using a real URL address in HREF, it should be possible to use the ID of some other object. In these examples, videos which are referenced cannot have any "image maps". If we would like to use image maps with the referenced video, we could of course point to an HTML page containing an object defining the desired functions.

It would be nice to have a tag within the document itself to specify the length of time that the document should be shown, and which page should be presented next. Actually, if we use time attributes, it might be possible to specify an object's appearance and disappearance for each object separately. Would this lead us too near to MHEG and HyTime standards?

URL addressing

There are at least four different ways to point to a specific HTML file in an MPEG stream. These should be considered when defining URL addressing mechanisms for interactive television environments.

The first possibility is to use HTTP addresses as usually used in internet networks. A typical address would be in the form http://www.server.net/right_path/right.file. This does not demand any changes on the network side, except in the network gateway, but it demands implementation of a TCP/IP stack on the client and the use of IP over MPEG. IP over MPEG definitions are currently under development in the DAVIC and MPEG organizations. This type of addressing would be most suitable for files, which are really fetched from the internet.

DSM-CC client software will be implemented on many DVB compliant terminals, thus using the DSM-CC user-to-user type of addressing for accessing broadcast object carousels and interactive service files. This would be very similar to internet addressing, since object carousels also form hierarchical structures and can have server names. This would allow us to use the same addressing mechanism for both interactive and broadcast data, and there would be no need to implement a TCP/IP stack for terminals.

DSM-CC data carousels can be used for broadcasting any data without a hierarchical structure. They could be used for receiving HTML pages, if we agree on a common addressing mechanism. Implementation of support for data carousels demands less resources and it will be an attractive addressing mechanism, especially for simple receivers supporting simple advanced teletext applications

The MPEG transport stream addresses different services with PIDs (Program IDentifiers). It would be possible to use PID values for pointing to a certain file, but this would lead to similar troubles as with using IP numbers on the internet. For several reasons, a broadcaster might change the PID of a certain service just as a network administrator sometimes has to change a certain machine's IP-number while restructuring his network. For this reason it would be better to use some other addressing mechanism in DVB broadcast streams. This could be based on Service Information (SI) tables in a similar fashion to DNS support in IP-networks.