Nearby: Latest version, SWBPD Home, MM Task Force, Tools, Resources, mail archive

Image Annotation on the Semantic Web: Vocabularies Overview

Status: this document is no longer actively maintained. Please update your links to point to the new version maintained by the W3C Multimedia Semantics Incubator Group .

This document is based on unfinished work performed in the context of the multimedia task force of the W3C Semantic Web Best Practices and Deployment Working Group. All information provided here serves purely as a list of examples, and inclusion on this page does not imply endorsement by the W3C membership or the Working Group.

This vocabulary collection was developed in the context of Image annotation on the Semantic Web.

Introduction

This document provides a collection of RDF and OWL vocabularies that are relevant for images annotation.

Many of the relevant vocabularies have been developed prior to the Semantic Web, and below a number of translations of such vocabularies to RDF or OWL are discussed. Most notably, the key International Standard in this area, the Multimedia Content Description standard, widely known as MPEG-7, is defined using XML Schema. At the time of writing, there is no commonly accepted mapping from the XML Schema definitions in the standard to RDF or OWL, so we discuss the pros and cons of the alternative mappings.

Another relevant vocabulary is the VRA Core. Where the Dublin Core specifies a small and commonly used vocabulary for on-line resources in general, VRA Core defines a similar set targeted especially at visual resources. Dublin Core and VRA Core both refer to terms in their vocabularies as elements, and both use qualifiers to refine elements in similar way. The more general elements of VRA Core have direct mappings to comparable fields in Dublin Core. Furthermore, both vocabularies are defined in a way that abstracts from implementation issues and underlying serialization languages. A key difference, however, is that for Dublin Core there exists a commonly accepted mapping to RDF, along with the associated schema. At the time of writing, this is not the case for VRA Core, and we discuss the pros and cons of the alternative mappings.

TO DO: Discuss Simile and Mark van Assem's VRA RDF/OWL schema

Mark's version

Simile version

3.1 MPEG-7 translations to RDFS and OWL

The "Multimedia Content Description" standard, widely known as MPEG-7 aims to be the standard for describing any multimedia content. MPEG-7 standardizes tools or ways to define multimedia Descriptors (Ds), Description Schemes (DSs) and the relationships between them. The descriptors correspond to the data features themselves, generally low-level features such as visual (e.g. texture, camera motion) or audio (e.g. melody), while the description schemes refer to more abstract description entities. These tools as well as their relationships are represented using the Description Definition Language (DDL), a core part of the language. The W3C XML Schema recommendation has been adopted as the most appropriate schema for the MPEG-7 DDL. Note that several extensions (array and matrix datatypes) have been added in order to satisfy specific MPEG-7 requirements.

The set of MPEG-7 XML Schemas define 1182 elements, 417 attributes and 377 complex types which is usually seen as a difficulty when managing MPEG-7 descriptions. Moreover, the MPEG-7 committee is still developing some specific terminologies (named Classification Schemes) that should be used as values for specific descriptors [TO DO: give the list and the main subject of the various CS]. But these additional vocabularies tend to increase the complexity of a standard which already produces too complex and usually inadequate multimedia descriptions. Furthermore, several works have already pointed out the lack of formal semantics of the standard that could extend the traditional text descriptions into machine understandable ones. These attempts that aim to bridge the gap between the multimedia community and the Semantic Web, either for the whole standard, or just one of its part, are detailed below.

MPEG-7 Upper MDS Ontology by Hunter et al. http://maenad.dstc.edu.au/slittle/mpeg7.owl

Chronologically the first one, this MPEG-7 ontology was firstly developed in RDFS [1], then converted into DAML+OIL, and is now available in OWL. The ontology covers the upper part of the Multimedia Description Scheme (MDS) part of the MPEG-7 standard. It consists in about 60 classes and 40 properties. This is an OWL Full ontology.

Note: three small mistakes inside the OWL file should be corrected to get an OWL valid file. The &xsd;nil should thus be replaced by &rdf;nil each time.

MPEG-7 MDS Ontology by Tsinaraki et al. http://elikonas.ced.tuc.gr/ontologies/av_semantics.zip

Starting from the previous ontology, this MPEG-7 ontology covers the full Multimedia Description Scheme (MDS) part of the MPEG-7 standard. It contains 420 classes and 175 properties. This is an OWL DL ontology [2a, 2b, 2c].

MPEG-7 Ontology by DMAG. http://dmag.upf.edu/ontologies/mpeg7ontos/

This MPEG-7 ontology has been produced fully automatically from the MPEG-7 standard in order to give it a formal semantics. For such a purpose, a generic mapping XSD2OWL has been implemented. The definitions of the XML Schema types and elements of the ISO standard have been converted into OWL definitions according to the table given in [3]. This ontology could then serve as a top ontology thus easing the integration of other more specific ontologies such as MusicBrainz. The authors have also proposed to transform automatically the XML data (instances of MPEG-7) into RDF triples (instances of this top ontology).

This ontology aims to cover the whole standard and it thus the most complete one (with respect to the previous mentioned). It contains finally 2372 classes and 975 properties. This is an OWL Full ontology since it employs the rdf:Property construct to cope with the fact that there are properties that have both datatype and object type ranges.

INA Ontology. store this ontology on CWI for ease of reference

This ontology is not really an MPEG-7 ontology since it does not cover the whole standard. It is rather a core audio-visual ontology inspired by several terminologies, either standardized (like MPEG-7 and TV Anytime) or still under development (ProgramGuideML). Furthermore, this ontology benefits from the practices of the French INA institute, the English BBC and the Italian RAI channels, which have also developed a complete terminology for describing radio and TV programs [4, 5].

This core ontology contains currently 1100 classes and 220 properties and it is represented in OWL Full.

3.2 Visual Ontologies

The MPEG-7 standard is divided into several parts reflecting the various media one can find in multimedia content. This section focus on various attempts to design ontologies that correspond to the visual part of the standard.

aceMedia Visual Descriptor Ontology. latest version 9.0

The Visual Descriptor Ontology (VDO) developed within the aceMedia project for semantic multimedia content analysis and reasoning, contains representations of MPEG-7 visual descriptors and models Concepts and Properties that describe visual characteristics of objects. The term descriptor refers to a specific representation of a visual feature (color, shape, texture etc) that defines the syntax and the semantics of a specific aspect of the feature. For example, the dominant color descriptor specifies among others, the number and value of dominant colors that are present in a region of interest and the percentage of pixels that each associated color value has. Although the construction of the VDO is tightly coupled with the specification of the MPEG-7 Visual Part, several modifications were carried out in order to adapt to the XML Schema provided by MPEG-7 to an ontology and the data type representations available in RDF Schema [6].

mindswap Image Region Ontology. http://www.mindswap.org/2005/owl/digital-media

[7] TO DO

Hollink Visual Ontology. http://www.cs.vu.nl/~laurah/VO/visualWordnetschema2a.rdfs.

[8] TO DO

3.3 MPEG-7 Classification Schemes translations to OWL

Tsinaraki et al. http://astral.ced.tuc.gr/delos/content/testbeds/MPEG703.zip

All MPEG-7 CSs have been translated into OWL in [2c].

3.4 Example

Write a simple example using these vocabularies (preferably several). The idea is to show a little bit of XML code to give an hint of what a description look likes ...

3.5 References

[1] Adding Multimedia to the Semantic Web - Building an MPEG-7 Ontology: J. Hunter. In Proc. of the 1st International Semantic Web Working Symposium (SWWS 2001), Stanford, USA, 30 July - 1 August 2001.
[2a] Integration of OWL ontologies in MPEG-7 and TVAnytime compliant Semantic Indexing: C. Tsinaraki, P. Polydoros and S. Christodoulakis. In Proc. of the 16th International Conference on Advanced Information Systems Engineering (CAiSE 2004), Riga, Latvia, June 2004.
[2b] Coupling OWL with MPEG-7 and TV-Anytime for Domain-specific Multimedia Information Integration and Retrieval: C. Tsinaraki, P. Polydoros, N. Moumoutzis and S. Christodoulakis. In Proc. of RIAO 2004, Avignon, France, April 2004.
[2c] Interoperability support for Ontology-based Video Retrieval Applications: C. Tsinaraki, P. Polydoros and S. Christodoulakis, In Proc. of 3rd International Conference on Image and Video Retrieval (CIVR 2004), Dublin, Ireland, 21-23 July 2004.
[3] Semantic Integration and Retrieval of Multimedia Metadata: R. Garcia and O. Celma. In Proc. of the 5th International Workshop on Knowledge Markup and Semantic Annotation (SemAnnot 2005) to be held with ISWC 2005, Galway, Ireland, 7 November 2005.
[4] Designing and Using an Audio-Visual Description Core Ontology: A. Isaac and R. Troncy. In Workshop on Core Ontologies in Ontology Engineering held in conjunction with the 14th International Conference on Knowledge Engineering and Knowledge Management (EKAW'04), Whittlebury Hall, Northamptonshire, UK, 8 October 2004.
[5] Integrating Structure and Semantics into Audio-visual Documents: R. Troncy. In Proc. of the 2nd International Semantic Web Conference (ISWC'03), LNCS 2870, pages 566-581, Sanibel Island, Florida, USA, 21-23 October 2003.
[6] Semantic Annotation of Images and Videos for Multimedia Analysis: S. Bloehdorn, K. Petridis, C. Saathoff, N. Simou, V. Tzouvaras, Y. Avrithis, S. Handschuh, I. Kompatsiaris, S. Staab, and M. G. Strintzis. In Proc. of the 2nd European Semantic Web Conference (ESWC 2005), Heraklion,Greece, May 2005.
[7] A Flexible Approach for Managing Digital Images on the Semantic Web: C. Halaschek-Wiener, A. Schain, J. Golbeck, M. Grove, B. Parsia and J. Hendler. In Proc. of the 5th International Workshop on Knowledge Markup and Semantic Annotation (SemAnnot 2005) to be held with ISWC 2005, Galway, Ireland, 7 November 2005.
[8] Building a Visual Ontology for Video Retrieval: L. Hollink, M. Worring and G. Schreiber. In Proc. of the ACM Multimedia, Singapore, November 2005.