Main Page/Cloud Browser TF/Architecture

From Web and TV IG

Cloud Browser Taskforce Architecture

This page is dedicated for the discussion on an architecture in the Cloud Browser TF.


Abstract

This document describes an architecture for the Cloud Browser Runtime Environments.

Status of this document

This is a public working on an architectural draft that the Cloud Browser Task Force is discussing and exploring. It has no official standing of any kind and does not represent the support or consensus of any standards organization or contributor. This is a subset of the W3C Public Web & TV Interest Group - Cloud Browser Task Force.

As the Cloud Browser TF progresses its work, this section will be used to identify the architecture that has reached rough consensus within the group.

Technology Pitch

Today's browsers need a vast amount of hardware resources to process modern web sites, or simply fail to support modern technologies such as HTML5.

The Cloud Browser concept addresses these issues by putting the browser into a more powerful and easier manageable server or cloud.

So when a client, for example a Set-Top Box, tries to request a resource such as a web page with a video element, the URL gets forwarded to the server on which a browser is running. The browser fetches all the necessary data (HTML, CSS, JavaScript, images, video) and renders the page. Afterwards, the rendered page gets packed into a transport stream that is sent back to the client. Now the client has only to playback the transport stream, as the expensive rendering has been already done by the server.

Using this design, it is possible to provide a uniform user interface for a large range of devices, by simply creating a custom web application that gets processed by the Cloud Browser and streamed to the clients, such as Set-Top Boxes or HDMI dongles. Furthermore, it reduces the need for processing power inside the clients and helps in deploying new browser technologies faster.

Cloud Browser Pitch

For available approaches, please see the Chapter #Architecture.

Cloud Browser Lifecycle

User:

1. User wants to access a TV portal page at her Cloud Browser client, e.g. Video on Demand service

The Cloud Browser client:

2. translates any user input into an request, e.g. a key press

3. sends the request together with a session id to the Cloud Browser environment

The Cloud Browser server:

4. verifies the session id and passes the requests to the Cloud Browser

The Cloud Browser:

5. fetches the requested resource (e.g. HTML) parses it and fetches dependencies (e.g. CSS, JavaScript, images, videos, ...)

6. decodes and transcode the requested video, if necessary

7. creates the UI stream: executes & renders the previously fetched data into a transport stream

The Cloud Browser server (Orchestration):

8. sends stream to client

The Cloud Browser client:

9. Client now plays back the UI stream

Terminology

Term Definition Also Known As Software Stack, if applicable Example
The client-side TV middleware This is an execution environment for a TV User Interface (UI). It is an abstraction layer between the Set-Top Box (STB) operation system/ hardware and TV related applications. Set-Top Box (STB) Middleware Software Stacks available below in the table for TV local UI, TV UI Thin Client and TV UI Zero Client MHP
TV local UI Rendering and execution of the UI is done on the client-side using local middleware and software stack. On top native or Java-based applications are provided to drive the different services (e.g. EPG, channel lists, etc.). Native UI/ Client-rendered UI/ Thick client
Thick Client
Ericsson Mediaroom
TV UI Thin Client approach Rendering and execution of the UI is done on the client-side using browser technology as the middleware layer on the local software stack where most data (e.g. channel lists and EPG data) is provided by a remote server rather than being processed by the local software stack. Browser-based UI/ Client-rendered UI/ Thin client
Thin Client
Opera TV
TV UI Zero Client approach Rendering and execution of the UI resides in the cloud. Browser-less UI/ Cloud-rendered UI
Zero Client
CloudTV
Cloud browser User agent instance originated in the cloud that executes and renders the TV UI and delivers it to the client as a video stream Remote browser/ Network-based browser
Cloud environment Environment which manage cloud browsers. Act as interface for the RTE Orchestration
Cloud Browser client The client-side runtime environment that exists on actual hardware (e.g. a set-top-box) and is an end-point of the cloud browser on user side.
In-band media It is video that is delivered to the Cloud Browser client from the Cloud Browser. The video is played back on the Cloud Browser server. Used in the single stream approach. n.a.
Out-of-band media It is video that is delivered to the Cloud Browser client not from the Cloud Browser, but from another video resource (CDN, Content Server, etc.) The video is played back in the RTE (on the Cloud Browser client). Used in the double stream approach. n.a.

Evolution of the TV UI

TV local UI

The client-side TV middleware is an execution environment for a TV User Interface (UI) which acts as an abstraction layer between the Set-Top Box (STB) operation system/ hardware and TV related applications. This layer simplifies the deployment of TV applications: platform-specific application implementations do not have to be considered anymore. Only this TV middleware layer must be adapted to the embedded platform of the STB.

Typically, this layer includes application managers, runtime environments, libraries and operator specific data. Different TV middleware systems have been introduced on the market in the past: e.g. proprietary OpenTV or MicrosoftTV and non-proprietary like MHEG or MHP. In classical TV systems the middleware and applications are executed locally on the STB. This is referred to as an TV local UI. The TV local UI model is typically data driven in that data is supplied to some generally static code on the STB.

TV UI Thin Client Approach

Parallel to this with regard to browser technologies, web browsers have evolved from HTML interpreters into well supported ecosystems for high end applications. In the last two years, the TV UI execution has undergone major changes. The convergence of TV and web has introduced the browser as a TV middleware technology. Therefore, the client-side TV middleware is being replaced by the browser execution environments. The TV UI has become a web application and is called a browser-based TV UI. With regard to the software stack structure on the client, this approach is a so-called TV UI Thin Client approach. The TV UI Thin Client approach is application-centric with data being incorporated into the application in the server (when the application is prepared) or in the STB. In this model, many different applications comprise the user experience. The TV UI Thin Client model makes use of service side/cloud processing of logic or data set analysis. The Thin Client Approach is supported by various standardization organizations and represents the current state-of-the-art in STB middleware technology. This approach has been already commercialized by some of the operators, Over-the-Top or Pay TV providers in such products like AppleTV or Google Chromecast.

TV UI Zero Client Approach

Over the last years, trends for the centralization of services in the cloud have yielded new approaches for reaching TV and mobiles as well. As a consequence, a novel concept will be derived here to enable the shift of the browser runtime environment into the cloud which may reside in the service provider domain. Given that most of the functionality is shifted from the client to the cloud, this approach is the so-called TV UI Zero Client approach. The Cloud Browser is the enabler of the TV UI Zero Client approach. Therefore, not only the TV middleware but also the rendering and execution of the UI resides in the cloud. The UI is then streamed down to the STB as a component of an MPEG TS video stream.

Indeed, the IPTV device market for Set-Top Boxes is becoming very fragmented. Legacy STBs often run proprietary media frameworks and will not be able to execute the latest HTML5 features. Moreover, a lot of legacy STBs do not have enough hardware power to support the browser. This also applies to new, low-cost Set-Top Boxes like HDMI Dongles and low-end STBs. Some of the STB challenges have already been addressed in the past by partial cloud-based infrastructures. However, the challenge of device fragmentation in the STB market can only be addressed by the browser executed in the cloud.

All things considered, the emergence of new cloud-based technologies exposes a missing gap in communication between the Cloud Browser and the client. The so-called Cloud Browser APIs enable the Cloud Browser to access local resources of the client and to execute conventional TV- and browser-related technologies in new Cloud Browser clients.

Architecture

Both the network load and degree of service composition complexity in the cloud increase when moving to the TV UI Zero Client approach. Taking this into account, the Zero Client Approach can be divided into two different sub-approaches with respect to the UI and media delivery to the client:

Single Stream Approach
the Cloud Browser combines both the media stream (called in-band media) and the UI/Apps (e.g., HTML, CSS, etc.) into one single transport stream that is then delivered to the client;
Double Stream Approach
the Cloud Browser renders the UI with applications only, while the media is delivered from another server (hence called out-of-band media). Thus, the UI/Apps and media streams are delivered separately to the client, which then has to combine both these streams to present them to the end user in a unified form.

Each of these sub-approaches can also be designed differently with regard to the location of the component that processes the media stream (Notice: in both these approaches the UI is processed in the Cloud Browser). These divides the Zero Client Approach into two execution modes:

Cloud Player Approach
the graphics library runs on the server, therefore, the client execution task is minimized;
Local Player Approach
the graphics library runs on the client, therefore, the client execution is extended to a rendering function, and the server is not responsible for video and audio rendering.


The main/ primary approaches used in current deployments due to their application use cases are:

Single Stream Cloud Player Approach

An approach where the service execution is completely shifted to the Cloud. This approach is the closest approach to the idea, explained in Chapter #Technology_Pitch.

Double Stream Local Player Approach

An approach where the media processing is still handled on a client and only UI is delivered to the client as a video stream. In this approach legacy media delivery (Multicast, Cable, etc.) is reused that saves the infrastructure costs.


Cloud Browser Main/ Primary Approaches

Single Stream Cloud Player

Technical Description
Single Stream: cloud player

After the resources are downloaded, they are parsed and interpreted by the Cloud Browser. The JavaScript is processed and executed by the JavaScript engine of the browser. After the DOM and the Render Trees are built, the painting commands are sent to the Graphics Library over the Graphics Context.

The video is downloaded from the Media Sources by the browser and is sent to the media processing function of the Orchestration. The media processing function renders and writes the data into the Buffer. However, in the case of the Cloud Browser, the data is not sent to the display. The data is encoded using a required video codec and sent as a media stream to the Cloud browser client located on the client device. For this delivery any application layer protocols could be used, depending on the implementation. The Cloud Browser client receives the media stream that is directly decoded by the Decoder. The stream might be also first processed by the Descrambler, if the content is encrypted by the orchestration with an encryption scheme supported by the client device.

Operation Scope

Considered devices: legacy devices, low-end and low-power devices only with hardware functions.

In this approach the Cloud Browser overtakes all client device software functionalities. The Cloud Browser adapts the output data to the available hardware of the client device, e.g. secure processor, video decoding, network. Therefore, the client functions are limited to the available hardware. The reduced Runtime Environment of the client device is limited to support of these legacy hardware capabilities, if required. It could contain specific video functions that could not be reduced, as quality probes for quality measurements.

This approach could be used e.g. for simplification of the client decryption. The decryption functions are terminated in the Cloud and re-encrypted by sending the stream down to the client device. This Scenario implies decryption capabilities of the Orchestration. The client doesn't need to support the source encryption.

Double Stream Local Player

Technical Description
Double Stream: local player

The HTML, CSS and JavaScript resources are interpreted by the Cloud Browser, its Javascript engine and the Graphics Library. Out of this data the Cloud Browser generated an UI stream. Some of the video elements might be downloaded by the Cloud Browser from the Media Sources, if required for the building of the UI. As opposed to single stream approaches, the Cloud Browser does not process any video data that is not required for the UI building.

The UI stream data is received and processed by the client device. The video data is directly requested by the client device from the Media Sources. The video data is referred to as out-of-band video data, as it is not processed by the Cloud Browser. In case if content is encrypted, the video data first is descrambled by the Secure Library.

Operation Scope

Considered devices: legacy devices, low-end and low-power devices with software functions.

In this approach the UI content, e.g. html, css, etc., is executed into the Cloud Browser. The out-of-band video content is delivered to the client directly bypassing the Cloud Browser. Here the Cloud Browser takes over only the UI execution and rendering functions. The rest of the functions is the responsibility of the client device. The local Media Player, the Secure Library, networking functions and specific video functions comprise the Runtime Environment on the client device.

This approach could be uses in case of an infrastructure where it is less likely to route the media to the Cloud Browser such as broadcast.

Cloud Browser Secondary Approaches

Single Stream Local Player

Technical Description
Single Stream: local player

The processing of the resources by the Cloud Browser is analogue to the Single Stream Cloud Player approach. The HTML, CSS, JavaScript and video data are processed and executed within the Cloud Browser and its Graphics Library.

The data is sent as a video stream to the client device. The client device receives the input stream that is processed by the Media Player of the client. If the content is encrypted by the Cloud Browser, the Secure Library first decrypts the content. The Media Player will render the content to the display.

Operation Scope

Considered devices: legacy devices, low-end and low-power devices with software functions.

In this approach the Cloud Browser takes over all Cloud Browser client software functionalities, as it was the case for the Single Stream Cloud Player approach. However, the Cloud Browser does not adapt the output data to the available hardware of the Cloud Browser client. The Runtime Environment of the Cloud Browser client includes the local Media Player, the Secure Library, networking functions and specific video functions, if required.

This approach is used in cases when the client supports decryption as a software function. If the client supports the source encryption, the video could be directly passed through the Cloud Browser without any changes at moments of time when no UI elements are applied. This is applicable to the implementations that do not support continuous UI stream and send it to the Cloud Browser client in chunks. Therefore, there are virtually two streams sent to the client. The one stream consists only of video data, the second stream contains the injected UI elements. However, they are distributed in time so that there is only one stream at a certain point of time.


Double Stream Cloud Player

Technical Description
Double Stream: cloud player

Analogue to the Single Stream Cloud Player approach HTML, CSS and JavaScript data are parsed, processed and executed by the Cloud Browser. The Graphics Library renders the data according to the Render Tree built by the Cloud Browser. The data is written into the Buffer and is referred to as an User Interface stream (usually a stream of images). The Cloud Browser downloads the video from the Media Sources and passes it over to the media processing function. The media processing function renders the video data and writes it into the Buffer. The data is referred to as a video stream.

These two separate streams with the UI stream and video data are encoded using codecs required by the client. The Cloud Browser client receives two input stream that are directly decoded and overlapped by the hardware Decoders. The encrypted video stream is processed by the Descrambler first, in cases where an encryption scheme is supported by the Cloud Browser client.

Operation Scope

Considered devices: legacy devices, low-end and low-power devices with extended hardware functions (e.g. DSPs).

In this approach the functionality of the Cloud Browser client is the overlapping of the UI and video streams on the hardware level. The rest of the functionalities is overtaken by the Cloud Browser. As in the Single Stream Cloud Player approach the Cloud Browser must adapt the coding and scrambling of the output data to the available hardware of the Cloud Browser client. The Runtime Environment of the Cloud Browser client is reduced to the support of legacy hardware functions.

This approach could be used if the Cloud Browser client is not able to playback the out-of-band media. The video stream in this case might be rather a picture sequence. This approach also simplifies the client decryption functions. Therefore, this approach is used in cases where the Cloud Browser client doesn't support the source encryption. Here the Cloud Environment overtakes the decryption.

Functions

The functions of the Cloud Browser architecture are executed by the Orchestration, the Cloud Browser and the Cloud Browser client. The high-level description of the functions is presented below:

Orchestration
  • managing the cloud browser
  • session management
  • provide the cloud browser stream
Cloud Browser
  • No specific functionality (should act as any other browser)
Cloud Browser client
  • receive the cloud browser stream
  • expose client capabilities (Cloud Browser API)

Description of Functions

The detailed description of the functions within the Cloud Browser architecture is presented in the Table below.

Functionality Description Execution layer
UI Execution Data parsing & executing Web applications like TV portal pages, EPG or other TV-related applications in JavaScript. Software
Composition Composing User Interface (also applications) with video, fusing two streams into one picture that is displayed on a user’s end device.
  • Software composition of UI with video
  • Hardware composition made by dedicated digital signal processors (DSPs)
Rendering Software
Fetching
  • For UI: CSS, JavaScript, images, videos (if required)
  • For video: Video data
Software
Decoding/ encoding Applying of video compression techniques to encode or decode the streams:
  • Video stream
  • UI stream
Software or hardware (dedicated hardware decoders/ encoders)
UI capturing A process of creation of a stream or a picture sequence out of UI rendering Software
Transcoding Conversion of encodings Software
Encryption/ decryption Applying content protection techniques to encrypt or decrypt the content
  • Video content
  • UI content (UI stream in case it contains video scenes that need to be encrypted)
Software or hardware (dedicated security processors)
Networking TCP/IP functionality Software or hardware
Input processing Translate between user input to HTTP request Software
Channel tuning Switching or zapping between TV channels Software
Session management Management of session for different cloud browser clients Software

Description of Functions Main/ Primary Approaches

Single Stream Cloud Player Approach Functions

sw: software, hw: hardware

Approach Cloud environment Cloud Browser Cloud Browser client

Single Stream Cloud Player

  • Video rendering
  • Decoding/ encoding (sw)
  • Transcoding
  • UI capturing
  • Decryption/ encryption (sw)
  • Channel tuning
  • Session mgmt
  • Execution
  • Composition (sw)
  • UI Rendering
  • Fetching (video + UI data)
  • Decoding (hw)
  • Decryption (hw)
  • Networking (hw, sw)
  • Input processing (sw)

Double Stream Local Player Approach Functions

sw: software, hw: hardware

Approach Cloud environment Cloud Browser Cloud Browser client

Double Stream Local Player

  • UI capturing/ UI encoding
  • Session mgmt
  • (Optional: if we have the in-band media, that needs to be protected, as part of the UI stream, also the UI encryption might be applied. Here also the video decryption function might be required. )
  • Execution
  • UI Rendering
  • Fetching (UI data)
  • Composition (sw)
  • Video rendering (sw)
  • Fetching (video data)
  • Decoding (sw)
  • Transcoding (if required)
  • Decryption (sw)
  • Networking (sw)
  • Input processing (sw)

Description of Functions Secondary Approaches

Single Stream Local Player Approach Functions

sw: software, hw: hardware

Approach Cloud environment Cloud Browser Cloud Browser client

Single Stream Local Player

  • Video rendering
  • Decoding/ encoding (sw)
  • Transcoding
  • UI capturing
  • Decryption/ encryption (sw)
  • Channel tuning
  • Session mgmt
  • Execution
  • Composition (sw)
  • UI Rendering
  • Fetching (video + UI data)
  • Decoding (sw)
  • Decryption (sw)
  • Networking (sw)
  • Input processing (sw)

Double Stream Cloud Player Approach Functions

sw: software, hw: hardware

Approach Cloud environment Cloud Browser Cloud Browser client

Double Stream Cloud Player

  • Video rendering
  • Decoding/ encoding (sw)
  • Transcoding
  • UI capturing
  • Decryption/ encryption (sw)
  • Channel tuning
  • Session mgmt
  • Execution
  • Composition (sw)
  • UI Rendering
  • Fetching (video + UI data)
  • Composition (hw, e.g. two dedicated DSPs)
  • Decoding (hw)
  • Decryption (hw)
  • Networking (hw or sw)
  • Input processing (sw)

Architectural Gaps

File:Cloud-browser-apis.pdf

Normative References

[RFC 2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.