Here is my position paper regarding my current activity here at Fraunhofer FOKUS (I haven't included anything about the work in the mCDN project, since this is (in regard to metadata) just a user of the TV Anytime standard and less relevant for the scope of the workshop.)

Content Management for the Digital Home

Fraunhofer FOKUS is developing a Content Management System for the use in home storage devices. While Content Management Systems are widely used in business context, they are rarely used for multimedia data in the home. This will significantly change with more and more digital content entering private homes. Home users today store mainly songs for their MP3 player and pictures from their digital camera, but in the near future they will have large collections of digital videos as well. Future generations of PVRs will provide storage capacities measured in terabytes. The more storage available, the more content the user will store, and the more difficult it will be to find a particular item. This problem can be solved by a Content Management System for the Digital Home. Finding a piece of content depends on identifying the specific item, which means that the item has to be categorized. For movies and songs there are a number of services available today which provide detailed metadata (e.g. Internet Movie Database IMDb for movies, CDDB for music). But to be of real use for the management of large amounts of content, this metadata has to be stored in conjunction with the related content and always be presented to the user as one coherent item.

Metadata may come from different sources. This does not only include 3rd party service providers and data that is directly delivered as part of the content, but also the user himself who wants to add his own metadata to third party content (e.g. rating, comments) and who also wants to categorize his home-grown content (mainly videos and pictures). "Home-grown" content like holiday videos or images from the digital camera initially do not include any metadata beyond technical details. Adding meaningful metadata for a large amount of content cannot be done manually. For example, imagine that there are several thousand digital photographs. Typically, camera and desktop software will catalogue this content just by naming the photos in sequence. Maybe they place these in a folder according to the date when the photos were taken. Searching for a particular photo in this scheme is inconvenient for the user. A system which would provide better categorization support, e.g. providing grouping mechanism of similar looking pictures, would help the user to add meaningful metadata to a bunch of pictures at a time.

Historically, a file system was designed to retrieve content based on only the name of the file. The only metadata available was a minimum set of technical data, such as the access rights and the creation and modification dates. Such a file system is not flexible enough to store and retrieve the data in a future home environment in an efficient way. Our goal is therefore the development of a content management system which can retrieve, store and index metadata to handle large amounts of multimedia content.

While importing all available metadata information from the available metadata description and annotation sources as name/value pairs into a database seems to be a feasible solution at first, it has significant drawbacks. PVRs and other home entertainment devices usually lack the computing power to provide acceptable performance when operating on basically unstructured data. In addition, the problems of identical tags used for different metadata in different description schemes and of having different tags for essentially the same information remains. So only a subset of this metadata information can be handled in a meaningful way.

This subset has to fulfil the following criteria:

The data has to be relevant for the user initiated search and retrieval of content
The metadata format has to be able to unify metadata from user supplied sources (e.g. annotated images) and commercial sources (e.g. purchased audio files) .. The metadata format has to cover audio, images and video content
A meaningful conversion or mapping from existing description formats to the metadata description format in the content management must be possible
Metadata can come from description formats that are embedded in the content data (e.g. EXIF or MP3 ID3 tags) or from external sources (e.g. CDDB to describe the content of a CD)
Metadata will not only be required to describe single content items, but increasingly needs to be capable to describe combination of content items (e.g. playlists, slideshows, scenes cut together from different digital home video files).

Our goal is to develop a content management system which uses a meaningful small set of metadata which can be handled by embedded systems. The available and used metadata formats like ID3, EXIF, etc. will be converted to a common metadata format which builds the base of our system. To evaluate existing standards and formats and to get an idea for this common metadata format is the reason to participate in this workshop.

Christian Fuhrhop
Fraunhofer FOKUS
Kaiserin-Augusta-Allee 31
12043 Berlin
Germany




fuhrhop@fokus.fhg.de