This is an archive of an inactive wiki and cannot be modified.

Authors: SusanneBoll

Use Case: Photo Use Case

Index

1. Introduction

Currently, we are facing a market in which more than 20 billion digital photos are taken per year for example in Europe (GFK, 2006). However, the number of tools both for the desktop but also in the Web is increasing that perform an automatic as well as an manual annotation of the content. For example, a large number of personal photo management tools extract information from the so called EXIF (EXIF) header and add this to the photo description. These tools typically allow to tag and describe single photos. There are also many Web tools that allow to upload photos to share them, organize them and annotate them. Web sites such as (Flickr, 2007) allow tagging on the large scale. Sites like (Riya, 2007) provide specific services such as face detection and face recognition of personal photo collections. Foto community sites such as (Foto Community, 2007) allow an organization of the photos in categories and allow rating and commenting on them. Even though our photos today find more and more tools to manage and share them, these tools come with different capabilities. What remains difficult is finding, sharing, reusing photo collections across the borders of tools and sites. Not only the way in which photos are automatically and manually are annotated is different but also the way in which this metadata is described and represented finds many different standards. In the beginning of the management of personal photo collections is the semantic understanding of the photos.

2. Motivating Examples

From the perspective of an end user let us consider the following scenario to describe what is missing and needed for next generation digital photo services. Ellen Scott and her family were on a nice two-week vacation in the Tuscany. They enjoyed the sun at the beaches of the Mediterranean, appreciating the great culture in Florence, Siena and Pisa, and traveling on the traces of the Etruscans through the small villages of the Maremma. During their marvelous trip, the family was taking pictures of the sightseeing spots, the landscapes and of course from the family members. The digital camera they use is already equipped with a GPS receiver, so every photo is stamped not only with the time when, but also with the geo-location where it has been taken.

2.1. Photo annotation and selection

Back home the family uploads about 1000 pictures from the camera to the computer and wants to create an album for grand dad. On this computer, the family uses a nice photo management tool which both extracts some basic features such as the EXIF header but also allows for entering tags and personal descriptions. Still fulfilled with the memories of the nice trip the mother of the family labels most of the photos. With a second tool, the tour of the GPS receiver and the photos are merged using the time stamp. As a results, each of the photos is geo-referenced with the GPS position stored in the EXIF header. However, showing all the photos would take an entire weekend. So Ellen starts to create a nice and interesting excerpt of their trip and the highlights. Her photo album software takes in the 1000 pictures and makes suggestions for the selection and the arrangement of the pictures in a photo album. For example, the album software shows her a map of Tuscany and visualises, where she has taken which photos and groups them together making suggestions which photos would best represent this part of the vacation. For places for which the software detects highlights, the system offers to add information about the place to the album, stating that on this Piazza in front of the Palazzo Vecchio there is the copy of Michelangelo's famous David statue. Depending on the selected style, the software creates a layout and the distribution of all images over the pages of the album taking into account color, spatial and temporal clusters and template preference. So, in about 20 minutes Ellen has finished the album and orders a paper version as well as an online-version. The paper album is delivered to her by mail three days later. It looks great, and the explaining texts that her software has almost automatically added to the pictures are informative and help her remembering the great vacation. They show the album to grandpa and he can take his time to study their vacation and wonderful Tuscany.

2.2. Exchanging and sharing photos

Selecting the most impressive photos, the son of the family uploads a nice set of photos to FLickr, to give his friends an impression of the great vacation. Unfortunately, However, all the descriptions and annotations from the personal photo management system are lost after the Web upload. Therefore, he adds a few own tags to the Flickr photos to describe the places, events, persons of the trip. Even the GPS track is lost and he places the photos again on the Flickr map application to geo-reference them. One friend finds a cool picture from the Spanish Stairs in Rome by night and would like to get the photo and its location from Flickr. This is difficult again as a pure download of the photo does not retain the geo-location. When aunt Mary visits the Web album and starts looking on the photos she tries to download a few onto her laptop to integrate them into her own photo management software. Now aunt Mary would like to incorporate some of the pictures of her nieces and nephews into her photo management system. And again, the system imports the photos but the precious metadata that mother and sun of family Miller have already annotated twice are gone.

3. The fundamental problem: semantic content understanding

What is needed is a better and more effective automatic annotation of digital photos that better reflects one's personal memory of the events captured by the photos an allows different applications to create value-added services on top of them such as the creation of a personal photo album book. For understanding the personal photos and overcoming the semantic gap, digital cameras leave us with files like dsc5881.jpg, a very poor reflection of the actual event. Is is a 2D visual snapshot of an multi-sensory personal experience. The quality of photos is often very limited (snapshots, over exposed, blurred, ...). On the other hand digital photos come with a large potential for semantic understanding the photos. Photographs are always taken in context. In comparison to analog photography digital photos provide us with explicit contextual information (time, flash, aperture, ...) a “unique id” such as the timestamp allows to later merge contextual information with the pure image content.

However, what we want to remember along with the photo is where it was, who was there with us, what can be seen on the photo, what the weather was, if we liked the event and so on. In recent years, it became clear that signal analysis along will not be the solution. In combination with the context of the photo such as GPS position or time stamp some hard signal processing problems can be solved better. So context analysis has gained much attention and became important for photos and very helpful for photo understanding (Scherp et al 2007). In the following figure a simple example is given of how to combine signal analysis and context analysis to achieve a better indoor/outdoor detection of photos. And, not only with the advent of the Web 2.0 the actual user came into focus. The manual effort of single user annotations but also collaborative effects are considered to be important for semantic photo understanding.

multimodalindooroutdoor

3.1. The role of metadata standards for photos

The role of metadata for this usage of photo collections is manyfold: * Save the experience: The central goal is to overcome the semantic gap and represent as much of the humans impression of the moment when the photo was taken. * Browse and find previously taken photos: Allow searching for events and persons, places, moments in time, ... * Share photos with the metadata with others: give your annotated photo from Flickr or from Foto Community to your friend’ application * Use comprehensive metadata for value-added services of the photos: Create an automatic photo collage or send a flash presentation to your aunt’s TV, notify all friends that are interested in photos from certain locations, events, or persons, ...

The following Figure illustrates the use of photos today and what we do with our photos at home but also in the Web.

usageofphotos

So the social life of personal photos can be summarized as:

For this metadata plays a central role at all times and places of the social life of our photos.

4. The multimedia semantics interoperability problem

4.1. Different levels and types of metadata for photos

The problem we have here that metadata these days are a precious asset for interesting services on top of a photo collection, however is created and enhanced by different tools and systems and follows different standards and representations. Even though there are many tools and standards around that aim to capture and maintain this metadata, they are not necessarily interoperable. So on a technical level we have the problem of a common representation of metadata that is helpful and relevant for photo management, sharing and reuse. Metadata an end user typically gets in touch with are descriptive metadata that stem from the context of the photo. At the same time, in more than a decade many results in multimedia analysis have been achieved to extract many different valuable features from multimedia content. For photos for example, this includes color histograms, edge detection, brightness, texture and so on. With MPEG-7 a very large standard has been developed that allows to describe these features in a metadata standard and exchange content and metadata with other applications. However, both the size of the standard but also the many optional attributes in the standard have lead to a situation in which MPEG-7 is used only in very specific applications and has not been achieved as a world wide accepted standard for adding (some) metadata to a media item. Especially in the area of personal media, in the same fashion as in the tagging scenario, a small but comprehensive shareable and exchangeable description scheme for personal media is missing.

4.2. Different standards for photo metadata and annotations

What is needed is a machine readable description that comes with each photo that allows a site to offer valuable search and selection functionality on the uploaded photos. Even though approaches for Photo Annotation have been proposed they still do not address the wide range of metadata, annotations that could and should be stored with an image in a standardized fashion.

5. Torwards a solution

5.1. Identification of interoperability needs and use cases

The result is clear, that there is not one standardized representation and vocabulary for adding metadata to photos. Even though the different semantic Web applications and developments should be embraced, a photo annotation standard as a patchwork of too many different specifications is not helpful. The following figure illustrates some of the different actitivities as described in the scenario above what people to with their photos and what different local and Web tools they use for this.

interoperability

What is missing, however, for content management, search, retrieval, sharing and innovative semantic (Web 2.0) applications is a limited and simple but at the same time comprehensive vocabulary in a machine-readable, exchangeable, but not over complicated representation is needed. However, the single standards described only solve part of the problem. For example, a standardization of tags is very helpful for a semantic search on photos in the Web. However, today the low(er) level features are also lost. Even though the semantic search is fine on a search level, for a later use and exploitation of a set of photos, previously extracted and annotated lower-level features might be interesting as well. Maybe a Web site would like to offer a grouping of photos along the color distribution. Then either the site needs to do the extraction of a color histogram or the photo itself brings this information already in in its standardized header information. A face detection software might have found the bounding boxes on the photo where a face has been detected and also provide a face count. Then the Web site might allow to search for photos with two or more persons on it. And so one. Even though low level features do not seem relevant at first sight, for a detailed search, visualization and also later processing the previously extracted metadata should be stored and available with the photo.

5.2. RDF Scheme for DIG35 and MPEG-7

6. References

(GFK, 2006)

GfK Group for CeWe Color. Usage behavior digital photography, 2006.

(Flickr, 2007)

Flickr. Yahoo! Inc, USA. http://www.flickr.com/

(Riya, 2007)

Riya Foto Search. http://www.riya.com/

(Foto Community, 2007)

Foto Community. http://www.fotocommunity.com/

(International Imaging Industry Association, 2007)

DIG35 http://www.i3a.org/i_dig35.html

(W3C, 2002)

Photo RDF - Describing and retrieving photos using RDF and HTTP, W3C Note 19 April 2002, http://www.w3.org/TR/photo-rdf

(Dublin Core)

Dublin Core. http://dublincore.org/

(EXIF)

EXIF - Exchangeable Image File Format, Japan Electronic Industry Development Association (JEIDA). Specifications version 2.2 available in HTML and PDF

(XMP)

Adobe, Extensible Metadata Platform (XMP) http://www.adobe.com/products/xmp/index.html

(IPTC)

Information Interchange Model http://www.iptc.org/IIM/

(W3C, 2007)

Image Annotation on the Semantic Web. http://www.w3.org/2001/sw/BestPractices/MM/image_annotation.html#vocabularies