This paper will be published in the proceedings of the Semantic Web Kick-off Seminar in Finland Nov 2, 2001 which is part of the Semantic Web in Finland effort

Annotea: Applying Semantic Web Technologies to Annotations

Marja-Riitta Koivunen
marja@w3.org
World Wide Web Consortium
MIT Laboratory for Computer Science

ABSTRACT

Annotea is a shared annotation system in the Web that uses the Semantic Web technologies. It is built on top of a general-purpose open RDF infrastructure and it models annotations as RDF metadata. Annotations can be attached to any XML-based document, such as XHTML and SVG document, or to annotations themselves, without having to modify the original documents. The annotations are shared by storing them to annotation servers which can be queried by a community of users who retrieve the annotations by using annotation capable clients, such as the W3C Amaya editor/browser.

Unlike other Web annotation systems, Annotea is completely open and is built on top of standard W3C technologies. We use RDF to model the annotations, XPointer and XPath to attach annotations to documents, and HTTP for all the data exchange between client and servers.

The Annotea annotation model can be easily extended to support other similar collaborative applications that share metadata about Web pages. Some of these scenarios are described here.

Keywords: Semantic Web, annotation infrastructure, metadata, collaboration scenarios

1 INTRODUCTION

Users of the World Wide Web collaborate by sharing content through Web pages. This kind of collaboration is limited as readers can seldom write back to the pages to share annotations, such as comments or questions, even when they are members of a closed collaborative group. Instead, much effort is spent on forming and trying to understand different e-mail conventions for commenting or annotating Web documents.

Annotea [Annotea 2001] uses Semantic Web technologies to create annotations that can support very rich communications about the Web pages without requiring write access to the annotated page. A user can attach an annotation to a Web page for a collaborator who sees the annotation when he or she retrieves the same Web page (see Figure 1) as long as they share the same annotation server.

user annotates page - another user sees it

Figure 1: A user shares a comment with another user by annotating a Web page.

Annotea annotations are metadata about the Web pages or parts of Web pages. They use metadata vocabularies grounded in semantically rich ontologies that are themselves published in the Web. This metadata infrastructure opens many possibilities that can be extended beyond the basic annotation capabilities [KCAP 2001].

Section 2 in this paper describes a simple scenario showing how annotations can support collaboration and then broadens the scope of the annotations in a couple of additional scenarios. Section 3 explains the basic Annotea metadata infrastructure in more detail, and then explains the features that are needed to support the additional scenarios in Section 2.

2 SCENARIOS

The following subsections describe three annotation scenarios. The first scenario explains the use of annotations for basic collaboration, the second one shows an interpretation of shared bookmarks as annotations, and the last scenario examines the use of annotations for communicating evaluation results. Currently, Annotea offers implementations for the annotations and replies discussed in the basic collaboration scenario; the other scenarios describe possible extensions.

2.1 Scenario: Using Annotations for Collaboration

A group of remote education students are writing a report on the communication of whales. They collaborate by using the Web to publish new material, to search and share hypertext links to references and to annotate the material they uncover. By using annotations to conduct their commentary on their reading, the group avoids contention for write access to a single shared document and potential loss of data from conflicting updates.

The group uses an annotation (metadata) server dedicated to this seminar to store their annotations. Only users subscribing to the seminar annotation server can see the annotations. Figure 2 presents one of the annotations made by the group. It is presented to other students with a pencil icon, which can be opened to an annotation window containing the content of the annotation.

the pencil icon marks an annotation, it can be opened to a window to see the content

Figure 2: A selected annotation is opened in an annotation window.

As the students read the reference papers they find from the Web, they mark each paper as interesting or uninteresting by attaching annotations to it. They use annotations also to mark or question unclear text, point out interesting perspectives, add keywords or categories, and share other general comments with each other.

At times, the students may disagree on the comments made by others or they may want to add some information to the comments. When this happens, they use replies to create a discussion thread and attach it to an annotation. Figure 3 presents a sample reply to the annotation in Figure 2. The original annotation now presents the thread of the replies at the bottom of the window. The replies can be opened from that thread by double-clicking on them.

annotation window with a thread and an opened reply window

Figure 3: Annotation window with a reply thread at the bottom and an opened reply window.

Later in their process the students dedicate one person to write more detailed replies to selected research questions pointed out in the annotations. Occasionally, this starts fruitful discussions in the context of the reference document. The results are gathered into summary pages that again can be annotated.

2.2 Scenario: Using Annotations for Shared Bookmarking

bookmark annotations attach categories to pages

Figure 4: Annotations can attach categories to Web pages.

The student group in the earlier scenario uses traditional Web search tools to locate references on the Web. They want to group the papers under categories and list them on a Web page. Instead of each of them editing manually a shared page they attach special annotations with category information to interesting pages (see Figure 4). In a Semantic Web sense the category is just one of a variety of such extensions that the group can store with their annotation metadata.

These annotations can then be presented in a category hierarchy similar to the bookmark or favorites hierarchies in today's browsers (see Figure 5). Or they can be presented as icons on a bookmarked page. In a sense, the annotations with categories are bookmarks.

sample bookmark ui from IE favorites menu

Figure 5: A sample hierarchical presentation of bookmark categories.

The students may filter the list of bookmarks in different ways, for instance, show all categories in alphabetic order, or show categories grouped under different users. Students can also define the categories themselves or select the categories from an existing ontology.

2.3 Scenario: Using Annotations to Present Evaluation Results

TA attaches accessibility evaluations as annotation to a page

Figure 6: The teaching assistant marks the accessibility defects in students' pages by using annotations.

Annotations can also be used to attach either automatic or manual evaluations to a page. For instance, Kim, the teaching assistant for the seminar, uses annotations to remind the students that the readers of their documents may have different physical or cognitive abilities in receiving and interacting with the information and that their documents need to be accessible.

Kim uses the Web Accessibility Initiative [WAI] guidelines and some automatic tools for assessing the markup used within Web pages the students create. These accessibility assessment tools rely on EARL [EARL 2001], a metadata language expressing what is or may be wrong in a page, citing by URI the specific guideline that describes the accessibility issue.

Kim stores the EARL analysis of each document in the same annotation server that holds the seminar's other annotations. Kim also adds to the server some inferencing rules that represent a transformation from the EARL vocabulary to the annotation vocabulary. The EARL vocabulary is a superset of the annotation vocabulary, so Kim includes some style rules that instruct presentation clients in the rendering of the extra properties of the EARL metadata.

When students view their pages, they see the EARL report items as annotations on the pages as a result of processing the inferencing rules. For instance, in Figure 6 Kim has attached an annotation to the image of Minke whales stating that it does not have alternative text and therefore is not accessible for user's who cannot see the image. The students can address the accessibility issues in the context of the page and add additional metadata to the annotations, for instance by replying to them, to note them as fixed or to request help from Kim. When Kim helps the group, she sends a mail to the discussion list explaining the problem and adds a link to the EARL annotation so that others in the group can benefit from the example.

When the corrections are done, the group can run the accessibility evaluation tools again. The document author can choose to delete the earlier report annotations at this time or she may just mark them as obsolete. The group may also freeze a copy of the evaluated page with the original annotations.

3 ANNOTEA METADATA INFRASTRUCTURE

The basic Annotea infrastructure is presented in Figure 7. When a user attaches an annotation to a Web page the annotation metadata is stored on one or more annotation servers. When the annotated Web page is visited by a user using an Annotea capable client the client queries annotations related to the document from the annotations servers to which the user subscribes. For the queries, Annotea has a special query language Algae in addition to a simple URI request for 'all' annotations of a page. The retrieved annotations can be presented to the user in several ways. In our Amaya [Amaya] client they appear in the context of the Web page as icons, but it is possible to query and present the metadata information in many other ways also.

client embedds annotations from Annotea servers to a Web page

Figure 7: Annotea infrastucture

Annotea uses W3C technologies when possible. The HTTP protocol is used to store and retrieve the RDF/XML [RDF 1999] metadata describing annotations from the Annotea servers. Each Annotea server is a generic RDF store. The XPointer standard is used to refer to the part of the document being annotated and Xlink is used to present the annotations on a Web page.

The metadata infrastructure of the Annotea project makes it easy to support the annotation scenarios presented above. The basic annotation schema and the extensions needed for the previous scenarios are discussed in the following sections.

3.1 Basic Annotea Annotations

In the first scenario, the students annotate Web pages and use the reply threads as supported by the Annotea infrastructure. The annotations and the replies in these scenarios are metadata described with RDF.

annotation schema

Figure 8: An instance of the basic annotation schema

Figure 8 presents an instance of a basic annotation schema. It uses properties from multiple RDF schemas [RDFS 2000] e.g. Dublin Core (dc:) [DCMI 1999] to define the annotations. Annotations can apply to a whole document or just a part of it. The annotates property refers to the annotated document and the context property refers to the actual location of the annotation within the annotated document. The annotation content written by the user is stored in the body property and a descriptive annotation title is stored in the dc:title property. The other properties further describe the annotation.

3.2 Extending the Annotation Schema for Reply Threads

Annotea has also a reply concept that can be attached to an annotation or to another reply. Replies form discussion threads that start from an annotation.

reply schema

Figure 9: An instance of the reply schema

Figure 9 presents an instance of a reply schema. It looks very similar to an annotation schema. It has two new properties, the inReplyTo property, which defines which annotation or reply was the previous one in the thread, and the root, which always points to the first annotation in the thread. The root is used for performance optimization, it permits us to find many replies more easily without having to do more chaining.

The generic metadata-based design of our annotation server and the query language made it easy to incorporate the additional properties; the bulk of the work was in extending the user interface capabilities.

3.3 Using Annotea for Shared Bookmark Annotations

When we add a category property to a schema very similar to an annotation schema we can create bookmarks. These may be seen as annotations of type bookmark or as a separate bookmark concept. With RDF it is easy to add the category property. The DAML+OIL ontology construction vocabulary [DAML 2001] provides a framework for describing new properties with precise semantics and placing those semantics in the Web.

The generic metadata approach naturally lends itself to supporting a variety of views of the bookmarks. We can write new queries so that no changes are needed for our annotation server. However, the user interface needs some work. The bookmarks need a special icon to visually differentiate them from other kinds of more conventional annotations when the bookmarks are presented on a visited page. Also we would like to be able to present all the bookmarks in a category hierarchy.

In addition, the client needs to be able to present the bookmark properties in a window in a similar way as we present the annotation properties. As new properties can be easily added to bookmarks or annotations it would be nice to be able to simply define a presentation style for each new property in the same metadata framework as properties of properties are defined.

3.4 Accessibility Evaluation Report Items as Annotea Annotations

Annotations can be used to present automatically generated report items, such as accessibility evaluation items or markup validation items. If the report items are described in the metadata format it is straight-forward to map them to an annotation schema. For instance, the EARL report item reporting an accessibility problem has semantics that map easily into an annotation of a part or the whole of the evaluated Web page. This mapping can be expressed as a collection of inference rules over the properties produced by the EARL tools.

The generic metadata framework provides the necessary flexibility to decide on a case by case basis whether to archive, delete, or revise annotations when a document is reprocessed through the evaluation tool. The tool can maintain state information for successive runs in the same metadata store.

4 CONCLUSIONS

Annotea is a metadata based annotation infrastructure for sharing annotations on Web pages. It uses standard W3C technologies and can support a broad range of different annotation needs. The generic property mechanism of RDF allows us also to construct ontology-neutral data stores. Applications can use several ontologies simultaneously to describe different aspects of their annotations.

Currently Annotea implements the basic annotations and replies for creating discussion threads. The new scenarios need extensions to the basic annotation schemas but the main work is in customizing the user interfaces. More research is needed to ease the presentation of the metadata, especially new properties from ontologies the application (or user) may not have previously seen. More work is also needed to develop client-side or server-side inferencing for mapping between ontologies.

ACKNOWLEDGMENTS

Annotea is developed by a team of people. This paper is based on the innovative and hard work of Ralph Swick, Jose Kahan, Eric Prud'hommeaux, and Art Barstow. In addition, Eric Miller, Charles McCathieNevile and other W3C staff have contributed many ideas to Annotea. I also want to thank Elisa Communications for supporting this work.

Partial funding for the development of Annotea was provided by the Defense Advanced Research Projects Agency (DARPA) and Air Force Research Laboratory, Air Force Materiel Command, USAF, under agreement number F30602-00-2-0593.

REFERENCES

[Amaya] Amaya Browser/Editor home page. http://www.w3.org/Amaya/

[Annotea 2001] José Kahan, Marja-Riitta Koivunen, Eric Prud'Hommeaux, Ralph R. Swick. Annotea: An Open RDF Infrastructure for Shared Web Annotations, in Proc. of the WWW10 International Conference, Hong Kong, May 2001 (http://www10.org/cdrom/papers/488/index.html).

[DAML 2001] Frank van Harmelen, Peter F. Patel-Schneider and Ian Horrocks (eds.). Reference description of the DAML+OIL (March 2001) ontology markup language, Joint United States / European Union ad hoc Agent Markup Language Committee
(http://www.daml.org/2001/03/reference.html)

[DCMI 1999] Dublin Core Metadata Initiative, Dublin Core Metadata Element Set, Version 1.1, http://purl.org/dc/documents/rec-dces-19990702.

[EARL 2001] Sean Palmer (ed.). EARL 1.0 Specification, WAI ER WG Note 09 December 2001. http://infomesh.net/2001/earl1.0/

[KCAP 2001] Marja-Riitta Koivunen, and Ralph Swick. Metadata Based Annotation Infrastructure offers Flexibility and Extensibility for Collaborative Applications and Beyond, in Proc. of the KCAP 2001 workshop on knowledge markup & semantic annotation.

[RDF 1999] Ora Lassila and Ralph R. Swick (eds.). Resource Description Framework (RDF) Model and Syntax Specification, W3C Recommendation, February 1999
(http://www.w3.org/TR/1999/REC-rdf-syntax-19990222).

[RDFS 2000] D. Brickley, R.V. Guha (eds.). Resource Description Framework (RDF) Schema Specification 1.0. W3C Candidate Recommendation, 27 March 2000. http://www.w3.org/TR/2000/CR-rdf-schema-20000327.

[WAI] Web Accessibility Initiative home page. http://www.w3.org/WAI/

$Revision: 1.77 $ of $Date: 2002/03/20 15:36:14 $