Bringing the Web to the TV: Convergence Scenarios

Warner ten Kate and Hayder Radha
Philips Research Laboratories
23 June 1998



Summary

We discuss interoperability issues driven by the convergence of the Web and TV. Where the services offered by the Web and TV tend to converge, the technology to provide the functionality required is quite incompatible. Three main layers of functionality are identified at which interoperability is required. They are labeled composition, representation, and transport layer. The current incompatibility raises a need for translating interfaces, which may reside at network (proxy server) or receiver side.

Approaches to realize interoperability are driven by parameters like efficiency, speed and costs. To that end we discuss how a reference terminal can be an assist in saving costs. Another assist obviously is to steer future standards for diminishing demand and load of translation overhead. We show that such opportunities are arising at the composition layer.


Three layers of functionality

The technology providing the converging Web/TV services can be split in three layers of operation as depicted in Figure 1: a composition layer, a representation layer, and a transport layer.


Figure 1

The composition layer refers to the languages in which the service's presentation to the user is specified. It includes aspects like synchronization and user interaction. Examples are d-HTML, CSS, SMIL and MHEG. The representation layer concerns the encoding of the (media) objects of which the presentation is composed: MPEG, RA, RV, AIFF, AVI, WAV are examples. The transport layer encompasses the access and transport protocols to deliver the content, being the MPEG and IP suites.
In addition, a fourth, partly orthogonal, layer could be thought of encompassing database related issues like naming (namespaces, URIs) and indexing (search, retrieval, alternates, languages).

The Web and TV both provide technology implementing these layers of functionality. As each environment developed from its own set of requirements, characteristics, and business models, different solutions for the technology has been realized and installed. Convergence expands these sets, causing them to overlap, however adds the requirement to maintain the achieved optimizations in price and performance.

Interfaces for interoperability

Figure 2 depicts the main components of interest of the interoperable system architecture. At the content provision side there are the TV-programs and the Web-pages representing the content of the typical TV- and Web-service. At the content consumption side there are the TV (thin client), PC (thick client), and a Web-enabled TV representing three levels of complexity for the access terminal.


Figure 2

There are three main places where the Web and TV domain interface, labeled A, B, C in the figure. The thin client fits in the A and B scenario. The thick client corresponds typically to the C scenario.

The first interface, A, consists of a proxy server at the broadcaster's side. The proxy acts as a gateway between requests from the TV user and responses from the Web server. It gateways between the transport protocols, transcodes the content formats and translates the composition language.

The second interface, B, translates TV-programs into the Web's formats and stores that on a server. This allows a PC-user to request the resulting clips and have a VoD experience, beit at Web video-quality level. The reverse process of off-line translating Web-pages into TV-compatible format is also conceivable, namely as part of the preparation and editing of an enhanced-broadcast type of program.

The third interface, C, leaves the incompatibility with the receiver. The receiver is equipped either with translation tools or with a double set of players to handle both forms of the information.

Reference Web Terminal

Where the TV is typically only capable of handling audio-visual data, a Web-enabled TV provides enough complexity so that it can support as much as possible of the Web experience enjoyed by PC users. Meanwhile, the complexity has to be as close as possible to the complexity of the thin client (TV).

These conflicting requirements for a Web-enabled TV receiver make it necessary to develop the concept of a "Reference Web Terminal" (RWT). The RWT concept is very similar to many of the standard-compliant (or reference) terminals developed by international standardization bodies, such as DAVIC and ITU-T. The RWT can encompass different profiles ranging from a minimum complexity, Web-browsing terminal to a multimedia receiver type.

The specification of a RWT allows different vendors to manufacture thin, cost-saving access terminals, while being interoperable and supporting content access within the specified range. Content providers can tune their content for optimal presentation knowing the RWT. They do not need to account for the various types of receivers offered in the market, trying to optimize for all of them or advicing their visitors/viewers to use a particular receiver for "best viewing" results. It suffices to verify the application on an implementation of the RWT to make certain that the content has the required appearance on all terminals.

An example of an RWT is shown in Figure 3. At this juncture, it is important to address three key issues.

  1. In order achieve the minimum complexity requirement stated above (i.e. driving the complexity of a Web-enabled TV toward the thin client side of the spectrum), it is crucial to identify specific standards to support the components of the RWT.
  2. It is possible that some of the key functions in the RWT can be supported using the same resources available within the thin client TV platform, noteworthy the video and audio decoder. Therefore, it is opportune to converge to the corresponding formats in use, in the first place at the representation layer. Observing the unrestricted number of content representations available on the Web, there is a challenge to W3C to achieve some (open) standardization. Aside from the benefits towards the Web-TV convergence, such standardization will also aid interoperability within the Web. We expect such a need to increase when the Web will offer more multimedia documents.
  3. It is also important to identify what functions of the RWT can be supported at the network/delivery side. One possibility are the functions at the composition layer, as discussed below.


Figure 3

It is not the objective to specify the internal architecture of the RWT. The terminal manufacturer controls the specification of his (range of) products, similar as the content provider is controlling the actual application. Rather, the RWT should specify the interfaces needed for uniform content provision in an interoperable way. The interfaces can be organized in a range of profiles corresponding to expanding and/or complementary domains of application types. That range can be multi dimensional: instead of requiring that each subsequent profile includes all its lower profiles, the restriction could apply per dimension only. For example, a type of "version" dimension supports extensibility to evolve with future developments.

Next to specifying the interfaces, issues such as data delivery, memory usage, timing for object handling, and instruction execution need attention. For example, an application may call for a minimum memory size to guarantee its execution without failure. Likewise, buffer under- and overflow in the video decoder may be controlled, to prevent hiccups.

Translation in the three layers

There is no a priori rule to implement the translation process of the three layers at the same place, although it is obvious to include the translation of higher layers when translating a lower layer. Another dimension of freedom is to implement the translation as a real-time process or as a preprocessing step. The choice will likely reflect in a trade-off between response time and quality.

Interoperability at the transport layer obviously requires conversion, although solutions to tunnel IP-packets through the broadcast channel exist. Being a "tunnel" indicates that linking between broadcast content and IP-content is not supported, certainly not if synchronized linking is required.

Interoperability at the representation layer will induce transcoding, in general. While the RWT pleas for convergence to a single format (also within the multimedia Web), this translation interface suggests W3C to consider a pledge that servers offer content at least including a version in TV standard encoding formats. When accessing the server that encoding format can be selected and there is no need to perform the transcoding.

Interoperability at the composition layer has been studied the most and is even subject of study within Web delivery itself. XML and XSL open the way to support conversion between document formats obeying various DTDs. The declarative nature of XML applications enables a relative easy conversion. In that sense, the use of scripting as in d-HTML does not improve interoperability (also within the Web itself). Therefore, the recently proposed SMIL language is preferred over scripting-based solutions to realize temporal and dynamic behavior.

In order to support data processing functionality at the composition layer, e.g. to enable data parsing and decision making at the client side, some procedural extension may be needed. In terms of interoperability such an extension should be specified as an API to a Virtual Machine (VM), where the VM is part of the RWT. The API may include interfaces to (part of) the declarative side of the document. In this respect the Java language is a serious candidate for consideration. The more so, because Java has been selected to provide this functionality in the TV domain.

As the procedural extensions are intended to add data processing capability to the otherwise declarative description, the extensions can best be modeled as objects within the document, similarly to the media objects. As such they would belong to the representation layer instead of the composition layer.

Another aspect of interoperability concerns the richness in specifying presentation details. For instance, the basic-layout functionality offered by SMIL will be preferred in a lot of applications over the richer, and thereby more complex, style of CSS. Similarly to the pledge to add TV standard encoding formats to the range of supported formats, interoperability will benefit from a pledge to supply alternative descriptions using less complex functionality. This extents to other thin client terminals like mobile phones. Both pledges can be considered as an additional aspect to the Accessibility Initiative.

Another route could be to specify documents in a hierarchical manner. A low profile environment supports some basic set of a DTD space, while the higher profiles support extended features. XML namespaces offer an interesting handle to this. It remains to be studied how high-profile documents can be mapped to a low-profile DTD.

Naming and locating

A final aspect of interoperability concerns the identification of the content. As there are two environments where the content may reside, it is interesting to identify them with a single reference. The concept of URNs is interesting in this respect, where the locations can be at a server or in the broadcast stream (Figure 2), including different schemes of access.


Conclusion

In summary, convergence issues are discussed. It is shown how a Reference Web Terminal can be a useful tool to assist such convergence process. Interoperability requires translation at three layers of functionality: composition, representation and transport layer. At the composition layer some opportunities for convergence are arising.

Concerning possible W3C initiatives it is suggested