Position paper for W3C Workshop on Annotations April 2, 2014

Better image area annotations

Robert Casties
Max Planck Institute for the History of Science (MPIWG)

Perspective on the topic of the workshop

The MPIWG is an active proponent of digital methods in the humanities. It has led efforts in the development of electronic online sources and research tools like Galileo Galileis Notes on Motion of 1998, ECHO Cultural Heritage Online of 2003 and many other projects on historical sources and cultural heritage.

The author is also the main developer of the scaling image server digilib that is used at the MPIWG and by other research institutions and projects for presenting scanned historical documents, drawings, paintings, and other image material.

digilib had a simple client-side annotation function that encodes the state of the viewer and a point-like visual mark in a URL from the beginning in 2002. This function is used regularly by scholars in emails and documents. Since 2012 it has a plugin based on Annotator.js for point-like and rectangular server-side annotations.

The author is in close contact with the developers of the Annotorious image annotation tool and the HyperImage image annotation and linking environment. Both projects have expressed strong interests in a common format for interoperable rich image annotations.

The ability to reference image regions is an important part of scholarly electronic publications (HTML, PDF,...) and annotation systems. The references should be resolution-independent to accomodate devices and user interfaces of different sizes and resolutions and also allow the specification of non-rectangular regions like polygons.


The status quo for the specification of image regions seems to be W3C Media Fragments that only allow rectangular regions specified in absolute pixels or (integer) percent or the use of SVG that is a full-fledged vector graphics language.

The Open Annotation Data Model proposes to use either Media Fragments or SVG. In the section on the use of SVG it recommends the use of only a subset of SVG primitives and coordinate systems and the provision of the SVG as a separate resource or inline XML in the RDF graph.

The Media Fragments syntax offers not enough freedom as in its original form it doesn't accomodate resolution-independent coordinats and it only specifies rectangles along coordinate axes while SVG offers too much freedom like polygon as well as path elements and the ability to use different coordinate systems for different elements together with an unwieldy XML-based format.

The author proposes the use of fractional relative coordinates for scaleable resolution-independent images both as an extension of the Media Fragments syntax and in conjunction with the Well-Known-Text and GeoJSON formats for specifying more complex image regions.

Fractional relative coordinates

Fractional relative coordinates are an image coordinate system where each axis is expressed as a fraction of the length respective side of the original image. X and Y are decimal fractions between 0 and 1. Accuracy varies with the length of the decimal fraction. For example (0.3333, 0.5) specifies a point at (roughly) 1/3 of the width of the image and half the height of the image.

The digilib scaling image server has used fractional relative coordinates from the beginning both for specifying the visible zoom area and the annotation marks. The server component only requires the zoom area and the destination size to produce the visible image. The client does not need to know the pixel dimensions of the image on the server therefore only one HTTP request is needed to display the zoomed image in the resolution required by the client.

An additional benefit of relative coordinates is the opportunity to replace the image on the server with a higher resolution version without any problematic consequences for the client or existing references.

A shortcoming of the pure relative coordinate system is the inability to calculate angles and distances on the image from relative coordinates since both axes are stretched by different factors depending on the image's aspect ratio. This has not been a real obstacle so far.

Media Fragments could add the "fraction" unit to the existing "pixel" and "percent" units for fragments like "#xywh=fraction:0.3333,0.5,0.1,0.314".

WKT/GeoJSON specifiers

The Well-Known-Text (WKT) markup language for vector geometry is used by many GISs to describe geographical data (point, line, polygon, etc.). It is also used by the Maphub project in a format for annotations on images (historic maps) that is based on the Open Annotation data model.

The combination of WKT and Open Annotation seems very promising since the geometrical features of WKT are richer than Media Fragments but more restrained and unambiguous than SVG.

The examples in the Maphub documentation do not use a specific selector class for WKT geometries. It would be good to add a specific selector class to the Open Annotation model or its extensions.

The Maphub model also uses the absolute pixel coordinates of the map image for the WKT geometries where the author would propose the use of fractional relative coordinates for scaleable image servers.

With GeoJSON there is already a popular JSON equivalent of WKT that could be used with JSON-LD based Open Annotation data or the current JSON-based format of Annotator.js.