Semantic Web Image Annotation Interoperability

Editor's Draft $Id: interop.html,v 1.12 2006/04/11 20:57:51 gstamou Exp $

This version:
Latest version:
Previous version:
Jacco van Ossenbruggen, Center for Mathematics and Computer Science (CWI Amsterdam)
Raphaël Troncy, Center for Mathematics and Computer Science (CWI Amsterdam)
Giorgos Stamou, IVML, National Technical University of Athens
Jeff Z. Pan, University of Machester
Christian Halaschek-Wiener, University of Maryland
Jane Hunter, invited expert
Nikolaos Simou, IVML, National Technical University of Athens
John Smith, IBM T. J. Watson Research Center
Vassilis Tzouvaras, IVML, National Technical University of Athens
Also see Acknowledgements.

Copyright © 2005 W3C ® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.


To complete.

Document Roadmap

To complete.

Target Audience

To complete.


To complete.

Status of this document

This is a public (WORKING DRAFT) Working Group Note produced by the Multimedia Annotation in the Semantic Web Task Force of the W3C Semantic Web Best Practices & Deployment Working Group, which is part of the W3C Semantic Web activity.

Discussion of this document is invited on the public mailing list public-swbp-wg@w3.org (public archives). Public comments should include "comments: [MM]" at the start of the Subject header.

Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress. Other documents may supersede this document.

Table of Contents

1. Introduction

Basic information on the image annotation standards

2. Image annotation standards: an inside view

An inside view: Short descriptions of the standard to be covered by this document such as MPEG-7, VRA, EXIF, DC, IPTC for "JPEG images", XMP, JPEG-2000 metadata, etc ... We should provide the details that are important for the interoperabilty !

MPEG-7: The MPEG-7 standard, formally named "Multimedia Content Description Interface", provides a rich set of audiovisual Description Tools (Descriptors and Description Schemes) and a Description Definition Language (DDL) that can be used to create the metadata for multimedia documents and can be the basis for applications enabling search, filtering, browsing and retrieval of multimedia content. MPEG-7 provides terms for the description of the creation and production process of the content (director, title), terms related to the usage of the content (copyright pointers, usage history, broadcast schedule), terms related to the storage features of the content (storage format, encoding), terms representing structural, spatial, temporal or spatiotemporal relationships of the content (scene, regions, region motion tracking), terms relevant to low level features(color, texture, sound timbre, melody description), terms representing objects, events, interactions among objects, summaries, variations, user preferences, usage history etc.(MPEG7)

VRA: The VRA Core Categories is a metadata element set appropriate for the description of works of visual culture as well as their digital representations or copies in different formats and modalities. It consists of terms suitable for the description of the title, subject, creator, location, material, dimensions, style and period of the artistic creation. (VRA )

IPTC: The IPTC collection of metadata standards is used for the improvement of news interchange. News Markup Language (NewsML) provides a structure related to a specific news event called News Item that may consist of text, photos, video, audio relevant to this event and metadata describing the content and the interrelations of these diverse modalities (http://www.newsml.org/pages/index.php ). News Industry Text Format (NITF) is an XML format for metadata describing news articles from the point of view of content, structure and preferable format for end users (http://www.nitf.org/ ), Sports Markup Language (SportsML) is an XML vocabulary for the interchange of multimedia documents concerning different kinds of sports events, such as scores, schedules, standings, statistics.( http://www.sportsml.com/ ). ProgramGuide Markup Language (ProgramGuideML) is an XML vocabulary for the interchange of Radio/TV Program Information based on NewsML (http://www.programguideml.org/pages/index.php ).

EXIF data description vocabulary: The Exif vocabulary compounds of terms that can be used for the description of very specific technical attributes of an Image, such as length, width, resolution, compression, the number of pixels per resolution unit in the image width direction, the name and version of the software or firmware of the camera or image input device used to generate the image, etc. (http://www.kanzaki.com/ns/exif)

XMP (Extensible Metadata Platform): Adobe's Extensible Metadata Platform (XMP) is a labeling technology that allows you to embed metadata about the title, creator, copyright, subject of an image. XMP is flexible and extensible, so it can be used to manage and organize files, simplify permissions and copyright issues, and even to view camera settings for digital photo graphs. (http://www.adobe.com/products/xmp/main.html)

CIDOC-Conceptual Reference Model (CRM): CIDOC-CRM facilitates the integration of interchange of heterogeneous cultural heritage information. The CRM is the culmination of more than a decade of standards development work by the International Committee for Documentation (CIDOC) of the International Council of Museums (ICOM). CIDOC-CRM consists of terms describing entities, physical objects, man-made objects, events, places depicted on an image, etc. ( http://cidoc.ics.forth.gr/ )

Web Content Accesibility Guidelines 2.0 (WAI-WCAG 2.0): WCAG 2.0 contains principles, guidelines, success criteria, benefits, and examples that define and explain the requirements for making Web-based information and applications usable to a wide range of people with disabilities, including blindness and low vision, deafness and hearing loss, learning difficulties, cognitive limitations, limited movement, speech difficulties, older people and people who use a wide variety of assistive technologies. (http://www.w3.org/TR/WCAG20/ )

Composite Capabilities/Preference Profiles (CC/PP): CC/PP vocabularies provide descriptions of device capabilities and user preferences. They are often referred to as devicesí delivery context and can be used to guide the adaptation of content presented to that device.( http://www.w3.org/TR/2004/REC-CCPP-struct-vocab-20040115/ )

3. Syntactic interoperability

Separate description of syntactic interoperability issues when converting non SW standards into SW (RDF+OWL) and conversely.

ACTION: Chris should search the web if there is not another EXIF-2-RDF tool available. If so, is the transformation the same than the one proposed by Normal Wash/W3C ? + Put some references ...

POSSIBLE CONTRIBUTION: Oscar and Roberto from DMAG about the difficulties encountered when transforming automatically XSD 2 OWL.

3.1 EXIF Interoperability

One of today's commonly used image format and metadata standards is the Exchangeable Image File Format [EXIF]. This file format provides a standard specification for storing metadata regarding image. Metadata elements pertaining to the image are stored in the image file header and are marked with unique tags, which serves as an element identifying.

As we note in this document, there is ongoing interest in representing image metadata using Semantic Web representation language, such as RDF and OWL. There has recently there has been efforts to encode this EXIF metadata in such Web standards. Encoding EXIF metadata in Semantic Web standards will provide a variety of benefits:

EXIF Semantic Web Ontologies

Recently, there have been various efforts to represent the EXIF metadata specification using RDFS. Below, we are the results of two of these efforts:

The [Kanzaki-EXIF] RDFS ontology provides an encoding of the basic EXIF metadata tags in RDFS. Essentially these are the tags defined from Section 4.6 of [EXIF]. We also note here that relevant domains and ranges are utilized as well.

The [Walsh-EXIF] RDFS ontology provides another encoding of the basic EXIF metadata tags in RDFS. Again, these are the tags defined from Section 4.6 of [EXIF].

We note here that both of these ontologies are semantically very similar, thus this issue is not addressed here. Essentially both are a straightforward encodings of the EXIF metadata tags for images (see [EXIF]). There are some syntactic differences, but again they are quite similar; they primarily differ in their naming conventions utilized.

EXIF Conversion Services

The creators of the previously mentioned EXIF RDFS ontologies ([Kanzaki-EXIF] and [Walsh-EXIF]) additionally provide conversion services to their defined schemas.

Exif-to-RDF Converter

EXIF-to-RDF ([Kanzaki-Converter]) is a metadata extractor for EXIF images. In particular the service takes a URL to an EXIF image and extracts the embedded EXIF metadata. The service then converts this metadata to the [Kanzaki-EXIF] schema and returns this to the user.

To demonstrate this service, we have extracted the EXIF metadata from a sample image. The resulting RDF/XML is provided in Table 1.

Table 1: Generated RDF/XML by EXIF-to-RDF


 <foaf:Image rdf:about="http://www.mindswap.org/~chris/alligator.jpg">
  <exifdata rdf:resource="#Primary_Image"/>
  <exifdata rdf:resource="#Thumbnail"/>
 <IFD rdf:ID="Primary_Image">
      <exposureProgram>Normal program</exposureProgram>
      <flash>Flash did not fire</flash>
      <fileSource>DSC (Digital Still Camera)</fileSource>
      <sceneType>A directly photographed image</sceneType>
<!-- thumbnail -->
 <IFD rdf:ID="Thumbnail">

JPEGRDF - EXIF-RDF Manipulater

JPEGRDF ([JPEGRDF-Converter]) is a Java applications written by Norm Walsh which provides an API to read and manipulate EXIF meatadata stored in JPEG images. Currently, JPEGRDF can can extract, query, and augment the EXIF/RDF data stored in the file headers. In particular, we note that the API can be used to convert existing EXIF metadata in file headers to the [Walsh-EXIF] schema. The resulting RDF can then be stored in the image file header, etc. (Note here that the API's functionality greatly extends that which was briefly presented here).

To demonstrate this service, we have again extracted the EXIF metadata from the same sample image, however this time used the JPEGRDF API. The resulting RDF/XML is provided in Table 2.

Table 2: Generated RDF/XML by JPEGRDF


  <rdf:Description rdf:about="file:/Users/chalaschek/Desktop/jpegrdf-2.3.0/src//Users/chalaschek/alligator.jpg">
    <exif:dateTime>2003:04:05 08:53:19</exif:dateTime>

4. Semantic interoperability

Separate description of semantic interoperability issues when converting non SW standards into SW (RDF+OWL) and conversely.

ACTION: Giorgos Stamou to contribute on how you can use OWL versions of standards (with specific reference to MPEG-7) in order to provide semantic interoperability of multimedia annotations.

5. Image ontology interoperability

Description of possible transformation issues. Some connections with definition of mappings and ontology alignment techniques should be also provided.

ACTION: Giorgos Stoilos to provide a contribution on multimedia ontology alignment.

6. Feasibility study and Good practices

Provide a feasibility report for the interoperability that clarifies the levels of interoperability that could be achieved in each case.


[Dublin Core]
The Dublin Core Metadata Initiative, Dublin Core Metadata Element Set, Version 1.1: Reference Description.
The EXIF Standard, EXIF 2.2 Specification
[Hunter, 2001]
J. Hunter. Adding Multimedia to the Semantic Web — Building an MPEG-7 Ontology. In International Semantic Web Working Symposium (SWWS 2001), Stanford University, California, USA, July 30 - August 1, 2001.
Kanzaki.com EXIF Converter
Kanzaki.com EXIF RDFS Schema
Information Technology - Multimedia Content Description Interface (MPEG-7). Standard No. ISO/IEC 15938:2001, International Organization for Standardization(ISO), 2001.
[Ossenbruggen, 2004]
J. van Ossenbruggen, F. Nack, and L. Hardman. That Obscure Object of Desire: Multimedia Metadata on the Web (Part I). In: IEEE Multimedia 11(4), pp. 38-48 October-December 2004.
[Ossenbruggen, 2005]
F. Nack, J. van Ossenbruggen, and L. Hardman. That Obscure Object of Desire: Multimedia Metadata on the Web (Part II). In: IEEE Multimedia 12(1), pp. 54-63 January-March 2005.
[OWL Semantics and Abstract Syntax]
OWL Web Ontology Language Semantics and Abstract Syntax, Peter F. Patel-Schneider, Patrick Hayes, and Ian Horrocks, Editors, W3C Recommendation 10 February 2004, http://www.w3.org/TR/2004/REC-owl-semantics-20040210/ . Latest version available at http://www.w3.org/TR/owl-semantics/ .
[RDF Syntax]
RDF/XML Syntax Specification (Revised), Dave Beckett, Editor, W3C Recommendation, 10 February 2004, http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/ . Latest version available at http://www.w3.org/TR/rdf-syntax-grammar/ .
[Stamou, 2005]
G. Stamou and S. Kollias (eds). Multimedia Content and the Semantic Web: Methods, Standards and Tools. John Wiley & Sons Ltd, 2005.
[Troncy, 2003]
R. Troncy. Integrating Structure and Semantics into Audio-visual Documents. In Second International Semantic Web Conference (ISWC 2003), pages 566 – 581, Sanibel Island, Florida, USA, October 20-23, 2003. Springer-Verlag Heidelberg.
Tsinaraki, C.: OWL soccer ontology available at http://elikonas.ced.tuc.gr/ontologies/soccer.zip.
[TV Anytime]
TV Anytime Forum, http://www.tv-anytime.org/
[VRA Core]
Visual Resources Association Data Standards Committee, VRA Core Categories, Version 3.0. Available at: http://www.vraweb.org/vracore3.htm.
aceMedia Visual Descriptor Ontology
JPEGRDF - Norm Walsh EXIF Converter
Norm Walsh EXIF RDFS Schema


The editors would like to thank the following Working Group members for their contributions to this document: Jeremy Caroll, Libby Miller, Michael Uschold and Mark van Assem.

This document is a product of the Multimedia Annotation on the Semantic Web Task Force of the Semantic Web Best Practices and Deployment Working Group.


1. MPEG-7 and TV Anytime

MPEG7 Description Type Hierarchy

Example 1: MPEG-7 Description of this image
          <?xml version="1.0" encoding="iso-8859-1"?>
          <Mpeg7 xmlns="urn:mpeg:mpeg7:schema:2001"
          xsi:schemaLocation="urn:mpeg:mpeg7:schema:2001 mpeg7-2001-valid.xsd">
          <Description xsi:type="ContentEntityType">
          <MultimediaContent xsi:type="ImageType">
          <Image id="a6a55234b-2562-4119-a41a-a5fe41e058b5">
          Auxerre - Metz (final score: 3-2). Jean Alain Boumsong scores with its head at 64' but the goal is refused for an
          active offside position
          Highlight of the player Djibril Cissť who is an active offside position
          <Coords> 84 64 254 64 254 141 84 141 </Coords>
Example 2: TV Anytime metadata associated to the program of this image
          <?xml version="1.0" encoding="iso-8859-1"?>
          <tva:TVAMain xmlns="urn:mpeg:mpeg7:schema:2001"
          xsi:schemaLocation="urn:mpeg:mpeg7:schema:2001 mpeg7-2001-valid.xsd urn:tva:metadata:2002 tva_metadata_v13.xsd">
          <tva:ProgramInformation programId="crid://crid://example.com/sports_magazine/Stade2">
          <tva:Title type="main" xml:lang="fr">Stade 2</tva:Title>
          <tva:Synopsis>Weekly Sports Magazine broadcasted every Sunday</tva:Synopsis>
          <mpeg7:ParentalRating href="urn:tva:metadata:cs:ICRAParentalRatingCS"/>