Strategic thinking for Web Video

Franck Denoual, Hervé Ruellan

Introduction

To make video a first-class object of the World Wide Web, video content should be highly configurable: easy to insert inside web pages (video widgets, personal videos), easy to exchange at good quality level, easy to interact with (random access) and customizable in term of display (keep clear separation between content and display as in XHTML). This implies to look at the technical fields listed hereunder.

Video Sharing

Sharing Video is now very popular, see the success of “YouTube”, or more recently: “Yahoo! Videocast”. It enables one to instantaneously inform lot of friends about an interesting event. Even if such application is really successful, it suffers from several drawbacks:

User has no control on final video quality, the video content being often poor quality in Flash format with only zoom and loop capabilities.
User may need to watch all the content (and sometimes to wait for it) to get the part she is really interested in.

Enabling enhanced controls on the video will become a must-have feature: for example, use multiple reference links to the video in order to skip to specific parts by clicking on some keywords or on some part of the description of this video. The <video> element of HTML5 enable the following functionalities:

Video basic control (play, pause, forward, backward)
Video advanced control (fast forward, jump to location…)
Video characteristic retrieval (length, dimensions…)
Video display control (size, quality, sound level…)
Video display advanced management (video as background, transparency, clipping videos…)

Those functionalities should be accessible either directly from html (at least for the most “common” functionalities) or through javascript.

Video Tagging

Video Tagging links video sharing and video searching. Tagging is a well known web tool and could be adapted to video by taking into account the temporal axis: (a user could set tags for some portions of the content, not the whole content). This leads to have new types of tags, tags that are not only dedicated to retrieve content, but that also enable the access to a specific part of the content. For example, the SMIL2.0 “metadata” attribute could be used for, dividing a video into sections, each section being described in RDF and DublinCore. This description could contain a start time and an end time in order to be directly accessible. “skip” functionality into SMIL would then enable to start the video at the desired time. Automatic tag generation may also be possible if the video stream contains metadata information. Then, when uploading a video, text description could be immediately available for update/improvement.

Video Searching

Before viewing a video, a user needs to access it. If the user does not have the direct video URI, he must use a search engine. However, currently a search engine can only extract very few information from the video itself (as it is the case with most images). Therefore, video needs to be enhanced with metadata describing their content. Those metadata includes:

Information about the video makers (actors, technical staff…). This answers the search “video staring Monica Belluci”.
Information about the video content (subject, type of film…). This answers the search “video about mystery”.
Detailed information about each sequence in the video (sequence name, sequence subject, actors, sub-titles, dialogs…). This answers the search “sequence of the video where the suspect dies”.

For many “commercial” videos, part of those metadata already exists on the Web. For example, information about the video makers is already described in pages linked to each film. For all videos, part of those metadata is present directly in the video, but in non-textual form. For example, the dialogs are contained in the video, but as an audio component. Links to video components already exist. Interoperable web links from video components would add a significant benefit to video searching.

Video anywhere

The web page embedding video should also be compliant with devices heterogeneity. To guarantee the success of video on the Web, mobile users should not have to wait too long when accessing a web page with video. The web server needs in fact to consider device capabilities. Since CC/PP profiles are described in RDF, we can imagine tagging the video content using CC/PP profiles in order to check compliance between a content and a client device. Negotiation between client and server could also leverage on HTTP negotiation features. Information on the video format in use and scalability properties could fall in these specific tags.

Video for everyone (accessibility)

When integrating video into the Web, the accessibility problem must be kept in mind. The main question is help disabled users get the best possible experience from the video, or at least to prevent denying him access to the information contained in the video enhanced Web page. Among the accessibility features are:

Sub-titles for hearing-impaired users.
Scene description for visually-impaired users.
Compatibility with devices and technologies used by disabled users (eg: a video must not prevent a visually-impaired user to listen to text-to-speech transformation of the Web page).

Video Authoring

In order to later share his video, a user may add metadata information or tags to his videos directly from his camera or mobile phone. Common vocabulary for metadata has to be used at both user and web site sides. The vocabulary description could be downloaded from the favorite web site of the user onto his device.

Conclusion

As devices are getting more powerful, are embedding more and more functionalities and are providing continuous connectivity, users want to share live content with their friends and relatives.

It should be a goal for the W3C to promote the technologies through guidelines and standards that enable developers and users to create enriched content and to facilitate its sharing/retrieval.