W3C Video Workshop Position Paper

Drew Major

Metadata For User Generated Video

It is expected that the amount of user generated Internet video will continue to accelerate, driven by increased broadband, ease of creation and interest.  User generated video will be stored and published from both devices in the home as well as utilizing services and sites within the Internet.  What hasn’t emerged yet is a uniform method of identifying and classifying (and finding) video.  Today such data is usually just kept within the particular site on which the video is hosted.  A standard or set of standards is needed to enable finding and accessing user generated video across the entire Internet.

The TV-Anytime Forum has tackled many of these issues, though from the perspective of exploiting personal media storage.  Much of that work can be applied to this somewhat different problem of finding and dealing with user video potentially being published everywhere.

There are a number of aspects of Internet video where today it will be hard to create a standard that in the near term could be broadly accepted.  Different technologies and approaches are being used, with no clear winner.  Most video web sites utilize proprietary clients implementing proprietary delivery protocols and different codecs (most which require royalties).  But this whole area of identifying and classifying and finding video is of yet largely untouched and needs a solution, and which would be useful even while other aspects of video remain un-standardized.

Simplistic Use Scenarios

A user decides to store / archive the personal home videos that he shot and edited.  He wants to make all of it available to close friends and relatives.  Some of it (say soccer matches or school performances) he wants to make available to anyone.

The video is uploaded either to a home video storage / publishing device or to an Internet service.  Part of the publishing process involves identification (participants, activity or maybe time / GPS coordinates / location) of various clips within the video.  Access control is also set but probably just used and enforced at the publishing point.  All of this is transformed into metadata that, depending on type, is either embedded within the video or externally associated with the video.  A browser executable access path for video access is created (this path can be shared by the user so that others may directly access the video).

A video search engine uses a web crawler to discover the video and associated metadata and access path.

A neighbor wants to find out if there are videos available of a certain school performance.  A search is made, the video is identified, and access is specified via the path (which may invoke a proprietary video player).

A relative of the user wants to see all of the videos he is in.  A search is made by his name and / or another identifiers and the video is found.  Upon access a login could be required, which may require the relative to contact the user for access.


Video (and other media like pictures and audio) is inherently more difficult to search on than text.  Moreover, in the future more people may inclined to upload video (and pictures) to the Internet for purposes of archival and access than publishing using text (harder to write than to just take pictures or shoot video).  All of which points to a need for a universal standard way for characterizing, identifying and finding video.