Robert A. Morris and Paul Morris are members of the FilteredPush development group at the Harvard University Herbaria. Our immediate perspective is the annotation of scientific data, but we have two related viewpoints. First, scientific funding agencies increasingly require that the data supporting publications should be made available publicly in usable form. In the face of this we expect that scientific e-publications and the data supporting them will ultimately require reciprocal annotation. Second, modern web documents are typically rendered at access time from data in back-end data stores. Our perspective is that the actual knowledge thus resides in the those stores, with the web document serving as a captured-in-time view, and that this pervades whether the back end is structured, semi-structured, or unstructured. Thus even if access to the back-end is not directly accessible from the document generation system, we believe that some document annotation pitfalls can be analyzed from this perspective.

System built

The FilteredPush project has built a configurable platform to support the semantic annotation of all forms of distributed data. It is presently deploying two instances of the platform aimed at annotation supporting collaborative digitization and data quality control of georeferenced metadata for specimens in Natural Science collections in the U.S. Two similar projects are known in Australia and Europe.1 2 Our annotations exploit a small extension of the Open Annotation Ontology (OA) central to academic scholarship, especially in the sciences. Of importance to us is the ability to model, within an annotation, assertions about the results of a query, including assertions that are independent of some of the query details. For data quality control we also require that annotations be actionable by annotation consumers based on an expectation of the producers expressed in the annotation. This requirement has led us to consider some requirements of "annotation conversations" surrounding the actions taken by a consuming agent, and requirements for conveying the annotator's expectation about actions to be taken by the consumers on their own datasets. In turn, the consumer must be able to launch an annotation suitable for informing interested parties what action they took in response to an annotation. Further discussion is here3. In practice we support annotation production and consumption as web services that can be exploited by third-party data management tools available to the domain scientists. In addition, a semantic pub-sub component allows notice of interested parties of the publication of new annotations relevant to their interests.

Lessons learned

Lessons yet to be learned

What is missing from OA for data and other scientific web resource annotations