Semantic Web Image Annotation Interoperability

Editor's Draft $Id: interop.html,v 1.12 2006/04/11 20:57:51 gstamou Exp $

This version:: http://www.w3.org/2001/sw/BestPractices/MM/interop.html
Latest version:: http://www.w3.org/2001/sw/BestPractices/MM/interop.html
Previous version:: http://www.w3.org/2001/sw/BestPractices/MM/interop.html
Editors:: Jacco van Ossenbruggen, Center for Mathematics and Computer Science (CWI Amsterdam); Raphaël Troncy, Center for Mathematics and Computer Science (CWI Amsterdam); Giorgos Stamou, IVML, National Technical University of Athens; Jeff Z. Pan, University of Machester
Contributors:: Christian Halaschek-Wiener, University of Maryland; Jane Hunter, invited expert; Nikolaos Simou, IVML, National Technical University of Athens; John Smith, IBM T. J. Watson Research Center; Vassilis Tzouvaras, IVML, National Technical University of Athens
: Also see Acknowledgements.

Discussion of this document is invited on the public mailing list public-swbp-wg@w3.org (public archives). Public comments should include "comments: [MM]" at the start of the Subject header.

Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress. Other documents may supersede this document.

1. Introduction

2. Image annotation standards: an inside view

An inside view: Short descriptions of the standard to be covered by this document such as MPEG-7, VRA, EXIF, DC, IPTC for "JPEG images", XMP, JPEG-2000 metadata, etc ... We should provide the details that are important for the interoperabilty !

MPEG-7: The MPEG-7 standard, formally named "Multimedia Content Description Interface", provides a rich set of audiovisual Description Tools (Descriptors and Description Schemes) and a Description Definition Language (DDL) that can be used to create the metadata for multimedia documents and can be the basis for applications enabling search, filtering, browsing and retrieval of multimedia content. MPEG-7 provides terms for the description of the creation and production process of the content (director, title), terms related to the usage of the content (copyright pointers, usage history, broadcast schedule), terms related to the storage features of the content (storage format, encoding), terms representing structural, spatial, temporal or spatiotemporal relationships of the content (scene, regions, region motion tracking), terms relevant to low level features(color, texture, sound timbre, melody description), terms representing objects, events, interactions among objects, summaries, variations, user preferences, usage history etc.(MPEG7)

VRA: The VRA Core Categories is a metadata element set appropriate for the description of works of visual culture as well as their digital representations or copies in different formats and modalities. It consists of terms suitable for the description of the title, subject, creator, location, material, dimensions, style and period of the artistic creation. (VRA )

IPTC: The IPTC collection of metadata standards is used for the improvement of news interchange. News Markup Language (NewsML) provides a structure related to a specific news event called News Item that may consist of text, photos, video, audio relevant to this event and metadata describing the content and the interrelations of these diverse modalities (http://www.newsml.org/pages/index.php ). News Industry Text Format (NITF) is an XML format for metadata describing news articles from the point of view of content, structure and preferable format for end users (http://www.nitf.org/ ), Sports Markup Language (SportsML) is an XML vocabulary for the interchange of multimedia documents concerning different kinds of sports events, such as scores, schedules, standings, statistics.( http://www.sportsml.com/ ). ProgramGuide Markup Language (ProgramGuideML) is an XML vocabulary for the interchange of Radio/TV Program Information based on NewsML (http://www.programguideml.org/pages/index.php ).

EXIF data description vocabulary: The Exif vocabulary compounds of terms that can be used for the description of very specific technical attributes of an Image, such as length, width, resolution, compression, the number of pixels per resolution unit in the image width direction, the name and version of the software or firmware of the camera or image input device used to generate the image, etc. (http://www.kanzaki.com/ns/exif)

XMP (Extensible Metadata Platform): Adobe's Extensible Metadata Platform (XMP) is a labeling technology that allows you to embed metadata about the title, creator, copyright, subject of an image. XMP is flexible and extensible, so it can be used to manage and organize files, simplify permissions and copyright issues, and even to view camera settings for digital photo graphs. (http://www.adobe.com/products/xmp/main.html)

CIDOC-Conceptual Reference Model (CRM): CIDOC-CRM facilitates the integration of interchange of heterogeneous cultural heritage information. The CRM is the culmination of more than a decade of standards development work by the International Committee for Documentation (CIDOC) of the International Council of Museums (ICOM). CIDOC-CRM consists of terms describing entities, physical objects, man-made objects, events, places depicted on an image, etc. ( http://cidoc.ics.forth.gr/ )

Web Content Accesibility Guidelines 2.0 (WAI-WCAG 2.0): WCAG 2.0 contains principles, guidelines, success criteria, benefits, and examples that define and explain the requirements for making Web-based information and applications usable to a wide range of people with disabilities, including blindness and low vision, deafness and hearing loss, learning difficulties, cognitive limitations, limited movement, speech difficulties, older people and people who use a wide variety of assistive technologies. (http://www.w3.org/TR/WCAG20/ )

Composite Capabilities/Preference Profiles (CC/PP): CC/PP vocabularies provide descriptions of device capabilities and user preferences. They are often referred to as devices’ delivery context and can be used to guide the adaptation of content presented to that device.( http://www.w3.org/TR/2004/REC-CCPP-struct-vocab-20040115/ )

3. Syntactic interoperability

Separate description of syntactic interoperability issues when converting non SW standards into SW (RDF+OWL) and conversely.

ACTION: Chris should search the web if there is not another EXIF-2-RDF tool available. If so, is the transformation the same than the one proposed by Normal Wash/W3C ? + Put some references ...

POSSIBLE CONTRIBUTION: Oscar and Roberto from DMAG about the difficulties encountered when transforming automatically XSD 2 OWL.

3.1 EXIF Interoperability

One of today's commonly used image format and metadata standards is the Exchangeable Image File Format [EXIF]. This file format provides a standard specification for storing metadata regarding image. Metadata elements pertaining to the image are stored in the image file header and are marked with unique tags, which serves as an element identifying.

As we note in this document, there is ongoing interest in representing image metadata using Semantic Web representation language, such as RDF and OWL. There has recently there has been efforts to encode this EXIF metadata in such Web standards. Encoding EXIF metadata in Semantic Web standards will provide a variety of benefits:

EXIF Semantic Web Ontologies

Recently, there have been various efforts to represent the EXIF metadata specification using RDFS. Below, we are the results of two of these efforts:

The [Kanzaki-EXIF] RDFS ontology provides an encoding of the basic EXIF metadata tags in RDFS. Essentially these are the tags defined from Section 4.6 of [EXIF]. We also note here that relevant domains and ranges are utilized as well.

The [Walsh-EXIF] RDFS ontology provides another encoding of the basic EXIF metadata tags in RDFS. Again, these are the tags defined from Section 4.6 of [EXIF].

We note here that both of these ontologies are semantically very similar, thus this issue is not addressed here. Essentially both are a straightforward encodings of the EXIF metadata tags for images (see [EXIF]). There are some syntactic differences, but again they are quite similar; they primarily differ in their naming conventions utilized.

EXIF Conversion Services

The creators of the previously mentioned EXIF RDFS ontologies ([Kanzaki-EXIF] and [Walsh-EXIF]) additionally provide conversion services to their defined schemas.

Exif-to-RDF Converter

EXIF-to-RDF ([Kanzaki-Converter]) is a metadata extractor for EXIF images. In particular the service takes a URL to an EXIF image and extracts the embedded EXIF metadata. The service then converts this metadata to the [Kanzaki-EXIF] schema and returns this to the user.

To demonstrate this service, we have extracted the EXIF metadata from a sample image. The resulting RDF/XML is provided in Table 1.


<rdf:RDF
  xmlns:nikon="http://www.kanzaki.com/ns/exif/nikon#"
  xmlns="http://www.kanzaki.com/ns/exif#"
  xmlns:exif="http://www.kanzaki.com/ns/exif#"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
  xmlns:image="http://jibbering.com/vocabs/image/#"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:foaf="http://xmlns.com/foaf/0.1/"
>

 <foaf:Image rdf:about="http://www.mindswap.org/~chris/alligator.jpg">
  <dc:date>2003-04-05T08:53:19</dc:date>
  <image:height>480</image:height>
  <image:width>640</image:width>
  <exifdata rdf:resource="#Primary_Image"/>
  <exifdata rdf:resource="#Thumbnail"/>
 </foaf:Image>
 <IFD rdf:ID="Primary_Image">
   <imageDescription></imageDescription>
   <make>NIKON</make>
   <model>E950</model>
   <orientation>top-left</orientation>
   <xResolution>300</xResolution>
   <yResolution>300</yResolution>
   <resolutionUnit>inch</resolutionUnit>
   <software>v981-76</software>
   <dateTime>2003-04-05T08:53:19</dateTime>
   <yCbCrPositioning>co-sited</yCbCrPositioning>
   <exif_IFD_Pointer>
    <IFD>
      <exposureTime>10/890</exposureTime>
      <fNumber>5.7</fNumber>
      <exposureProgram>Normal program</exposureProgram>
      <isoSpeedRatings>0</isoSpeedRatings>
      <exifVersion>2.10</exifVersion>
      <dateTimeOriginal>2003-04-05T08:53:19</dateTimeOriginal>
      <dateTimeDigitized>2003-04-05T08:53:19</dateTimeDigitized>
      <componentsConfiguration>YCbCr</componentsConfiguration>
      <compressedBitsPerPixel>4</compressedBitsPerPixel>
      <exposureBiasValue>3a</exposureBiasValue>
      <maxApertureValue>2.6</maxApertureValue>
      <meteringMode>Pattern</meteringMode>
      <lightSource>unknown</lightSource>
      <flash>Flash did not fire</flash>
      <focalLength>13.9</focalLength>
      <makerNote></makerNote>
      <userComment></userComment>
      <flashpixVersion>1.00</flashpixVersion>
      <colorSpace>sRGB</colorSpace>
      <pixelXDimension>1600</pixelXDimension>
      <pixelYDimension>1200</pixelYDimension>
      <fileSource>DSC (Digital Still Camera)</fileSource>
      <sceneType>A directly photographed image</sceneType>
      <interoperability_IFD_Pointer>
       <IFD>
         <interoperabilityIndex>R98</interoperabilityIndex>
         <interoperabilityVersion>1.00</interoperabilityVersion>
       </IFD>
      </interoperability_IFD_Pointer>
    </IFD>
   </exif_IFD_Pointer>
 </IFD>
<!-- thumbnail -->
 <IFD rdf:ID="Thumbnail">
   <compression>6</compression>
   <xResolution>300</xResolution>
   <yResolution>300</yResolution>
   <resolutionUnit>inch</resolutionUnit>
 </IFD>
</rdf:RDF>

JPEGRDF - EXIF-RDF Manipulater

JPEGRDF ([JPEGRDF-Converter]) is a Java applications written by Norm Walsh which provides an API to read and manipulate EXIF meatadata stored in JPEG images. Currently, JPEGRDF can can extract, query, and augment the EXIF/RDF data stored in the file headers. In particular, we note that the API can be used to convert existing EXIF metadata in file headers to the [Walsh-EXIF] schema. The resulting RDF can then be stored in the image file header, etc. (Note here that the API's functionality greatly extends that which was briefly presented here).

To demonstrate this service, we have again extracted the EXIF metadata from the same sample image, however this time used the JPEGRDF API. The resulting RDF/XML is provided in Table 2.



<rdf:RDF
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:wn="http://xmlns.com/wordnet/1.6/"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
    xmlns:gps="http://nwalsh.com/rdf/exif-gps#"
    xmlns:exif="http://nwalsh.com/rdf/exif#"
    xmlns:nikon950="http://nwalsh.com/rdf/exif-nikon950#"
    xmlns:dctype="http://dublincore.org/2003/03/24/dctype#"
    xmlns:canon="http://nwalsh.com/rdf/exif-canon#"
    xmlns:nikon="http://nwalsh.com/rdf/exif-nikon5700#"
    xmlns:jpegrdf="http://nwalsh.com/rdf/jpegrdf#"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:exifi="http://nwalsh.com/rdf/exif-intrinsic#">

  <rdf:Description rdf:about="file:/Users/chalaschek/Desktop/jpegrdf-2.3.0/src//Users/chalaschek/alligator.jpg">
    <exifi:bitsPerPixel>8</exifi:bitsPerPixel>
    <exif:isoSpeedRatings>0</exif:isoSpeedRatings>
    <exifi:numberOfColorComponents>3</exifi:numberOfColorComponents>
    <exif:dateTime>2003:04:05 08:53:19</exif:dateTime>
    <nikon950:digitalZoom>16777985/33554432</nikon950:digitalZoom>
    <nikon950:ccdSensitivity>0</nikon950:ccdSensitivity>
    <exif:model>E950</exif:model>
    <exif:focalLength>139/10</exif:focalLength>
    <exif:make>NIKON</exif:make>
    <nikon950:whiteBalance>0</nikon950:whiteBalance>
    <nikon950:converter>0</nikon950:converter>
    <exif:maxApertureValue>13/5</exif:maxApertureValue>
    <exif:lightSource>0</exif:lightSource>
    <exif:exposureProgram>2</exif:exposureProgram>
    <nikon950:colorMode>1</nikon950:colorMode>
    <nikon950:imageAdjustment>0</nikon950:imageAdjustment>
    <exif:sceneType>%01</exif:sceneType>
    <nikon950:focus>436207616/16778497</nikon950:focus>
    <exif:fNumber>57/10</exif:fNumber>
    <exifi:width>640</exifi:width>
    <exif:exposureTime>1/89</exif:exposureTime>
    <exif:compressedBitsPerPixel>4</exif:compressedBitsPerPixel>
    <exifi:compression>Baseline</exifi:compression>
    <exif:exposureBiasValue>0</exif:exposureBiasValue>
    <exif:flash>0</exif:flash>
    <exifi:height>480</exifi:height>
    <nikon950:quality>12</nikon950:quality>
    <exif:meteringMode>5</exif:meteringMode>
  </rdf:Description>
<colorSpace/rdf:RDF>

4. Semantic interoperability

Separate description of semantic interoperability issues when converting non SW standards into SW (RDF+OWL) and conversely.

ACTION: Giorgos Stamou to contribute on how you can use OWL versions of standards (with specific reference to MPEG-7) in order to provide semantic interoperability of multimedia annotations.

5. Image ontology interoperability

Description of possible transformation issues. Some connections with definition of mappings and ontology alignment techniques should be also provided.

ACTION: Giorgos Stoilos to provide a contribution on multimedia ontology alignment.

6. Feasibility study and Good practices

Provide a feasibility report for the interoperability that clarifies the levels of interoperability that could be achieved in each case.

References

Acknowledgments

The editors would like to thank the following Working Group members for their contributions to this document: Jeremy Caroll, Libby Miller, Michael Uschold and Mark van Assem.

This document is a product of the Multimedia Annotation on the Semantic Web Task Force of the Semantic Web Best Practices and Deployment Working Group.

Annex

1. MPEG-7 and TV Anytime

Example 1: MPEG-7 Description of this image

          <?xml version="1.0" encoding="iso-8859-1"?>
          <Mpeg7 xmlns="urn:mpeg:mpeg7:schema:2001"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xmlns:mpeg7="urn:mpeg:mpeg7:schema:2001"
          xsi:schemaLocation="urn:mpeg:mpeg7:schema:2001 mpeg7-2001-valid.xsd">
          <Description xsi:type="ContentEntityType">
          <MultimediaContent xsi:type="ImageType">
          <Image id="a6a55234b-2562-4119-a41a-a5fe41e058b5">
          <MediaLocator>
          <MediaUri>soccer.jpg</MediaUri>
          </MediaLocator>
          <TextAnnotation>
          <FreeTextAnnotation>
          Auxerre - Metz (final score: 3-2). Jean Alain Boumsong scores with its head at 64' but the goal is refused for an
          active offside position
          </FreeTextAnnotation>
          </TextAnnotation>
          <SpatialDecomposition>
          <StillRegion>
          <TextAnnotation>
              <FreeTextAnnotation>
          Highlight of the player Djibril Cissé who is an active offside position
              </FreeTextAnnotation>
          </TextAnnotation>
          <SpatialLocator>
              <Polygon>
          <Coords> 84 64 254 64 254 141 84 141 </Coords>
              </Polygon>
          </SpatialLocator>
          </StillRegion>
          </SpatialDecomposition>
          </Image>
          </MultimediaContent>
          </Description>
          </Mpeg7>

Example 2: TV Anytime metadata associated to the program of this image

          <?xml version="1.0" encoding="iso-8859-1"?>
          <tva:TVAMain xmlns="urn:mpeg:mpeg7:schema:2001"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xmlns:mpeg7="urn:mpeg:mpeg7:schema:2001"
          xmlns:tva="urn:tva:metadata:2002"
          xsi:schemaLocation="urn:mpeg:mpeg7:schema:2001 mpeg7-2001-valid.xsd urn:tva:metadata:2002 tva_metadata_v13.xsd">
          <tva:ProgramDescription>
          <tva:ProgramInformationTable>
          <tva:ProgramInformation programId="crid://crid://example.com/sports_magazine/Stade2">
          <tva:BasicDescription>
          <tva:Title type="main" xml:lang="fr">Stade 2</tva:Title>
          <tva:Synopsis>Weekly Sports Magazine broadcasted every Sunday</tva:Synopsis>
          <tva:Genre>
          <tva:Name>urn:tva:metadata:cs:IntentionCS:2002:1.1</tva:Name>
          <tva:Definition>ENTERTAINMENT</tva:Definition>
          </tva:Genre>
          <tva:Genre>
          <tva:Name>urn:tva:metadata:cs:FormatCS:2002:2.1.2</tva:Name>
          <tva:Definition>Magazine</tva:Definition>
          </tva:Genre>
          <tva:Genre>
          <tva:Name>urn:tva:metadata:cs:ContentCS:2002:3.2</tva:Name>
          <tva:Definition>SPORTS</tva:Definition>
          </tva:Genre>
          <tva:ParentalGuidance>
          <mpeg7:ParentalRating href="urn:tva:metadata:cs:ICRAParentalRatingCS"/>
          </tva:ParentalGuidance>
          <tva:Language>fr</tva:Language>
          <tva:ReleaseInformation>
          <tva:ReleaseDate>
              <tva:DayAndYear>2002-03-17</tva:DayAndYear>
          </tva:ReleaseDate>
          <tva:ReleaseLocation>fr</tva:ReleaseLocation>
          </tva:ReleaseInformation>
          </tva:BasicDescription>
          </tva:ProgramInformation>
          </tva:ProgramInformationTable>
          </tva:ProgramDescription>
          </tva:TVAMain>