HCLSIG/SWANSIOC/Meetings/2010-05-10 Conference Call

Conference Details

Date of Call: Monday May 10, 2010
Time of Call: 10:00am Eastern Time
Dial-In #: +1.617.761.6200 (Cambridge, MA)
Dial-In #: +33.4.89.06.34.99 (Nice, France)
Dial-In #: +44.117.370.6152 (Bristol, UK)
Participant Access Code: 42572 ("HCLS2")
IRC Channel: irc.w3.org port 6665 channel #HCLS2 (see W3C IRC page for details, or see Web IRC)
Duration: ~1 hour
SubTask: Rhetorical Document
Convener: Anita de Waard
Presentation: Paolo Ciccarese will present his work on AO (Annotation Ontology), a provenance-aware positional model for annotating HTML and XML web documents.
Notes: Tim Clark
[10:13] Goal: Coarse grained rhetorical structure ontology
[10:13] Actions: (1) Annotation Ontologies (2) Rhetorical Structure (Alex & Tudor)
[10:14] (1) Paolo intro
[10:15] See slides at:
[10:16] Paolo:
[10:16] <jodi> http://www.slideshare.net/paolociccarese/ao-annotation-ontology-4031043
[10:16] AO (Annotation Ontology) result of developing SWAN and SCF
[10:17] SWAN is full semantic web application - ontology of scientific discourse - but missing link between discourse elements and published text
[10:17] Paolo:
[10:18] Also, we publish in SCF lots of content, which contains lots of entities we would like to publish using semantic web ontologies
[10:18] Paolo:starting pint is Annotea
[10:19] Paolo:Slide 2, top left, annotation is a "post it" attached to some part of a document, in Annotea using XPOINTER (altho not robust)
[10:20] Paolo: Annotea also introduced concept of bookmark - "has topic <something>" - idea to create topics network, annotating resources
[10:21] Paolo: Annota limitations are it is mainly HTML documents and images as "whoel thing" - but we want to annotate pieces of documents, of various kinds, and subject to curation
[10:21] Paolo: for our communities we need a more complex mechanism to support curation
[10:22] Paolo: Slide 3 - we intrdcue concept of Selector - different Selectors for different kinds of document, or even multiple Selectors for same type of document depending on the method to ref to locations
[10:23] Paolo: Slide 4 - sketch of Anotation Ontology - similar to Annotea - we always consider SIOC as a "friend" ontology - we stay aligned - there is a text selector pointing to BACE1 -
[10:24] Paolo: linking BACE1 to a URI in PRO protein ontology - there is also use of PAV which is a SWAN based model of provenance annotation and versioning
[10:24] Paolo: Slide 5 - can also annotate sections of images
[10:25] Anita: what are the "empty dotted ovals"?
[10:25] Paolo: Instances
[10:25] Paolo: not-dashed are classes, dashed empty are instances
[10:26] Paolo: Slide 6 - Annotea allowed users to subclass "types" of annotation, we do to, but we provide the subclasses, e.g. SWAN Comment, can be on any section of the document, can be kept private, can be shared, can be fully pulic
[10:27] Paolo: every annotation can have a Description - not shown - similar to Annotea "Body" see slide 2
[10:28] Paolo: Slide 7 - Automatic anntation to leverage text mining tools, call external service, return set of annotations, central dashed ball in Slide 7 was returned by a text mining service
[10:29] Paolo: problem, text mining is not fully accurate, so we allow curation - in this example "Paolo" curated this annotation and "accepted" it
[10:30] Paolo: Slide 8 - this shows the same annotation curated three times - first Paolo rejected the text mining annotation - then he chnages his mind based on some feedback from a colleague
[10:30] Paolo: the annotations are ordered by date, here, but they can be explicitly linked with a property like "previousCuration"
[10:31] Paolo: So this model supports, automated, manual, and mixed annotations
[10:32] Paolo: another point - there are different kinds of specialized curation possible - can result in too much junk on a page, and mixed provenance, etc. So we have AnnotationSets (Slide 9)
[10:32] Paolo: AnnotationSets group annotations - can group by topic, source, etc. etc.
[10:33] Paolo: Cann attach rules for access to these sets - can say e.g. "full community access" on one set, "private" on another
[10:33] Paolo: See slide 10 for example
[10:35] Paolo: these sets must allow online portal to publish some selected sets
[10:36] Paolo: Slide 11 - Document Annotation - is a collection of AnnotationSets - to declare that some collection of sets is the annotation we desire to publish on this document
[10:36] Paolo: Overall idea is that annotation lives outside the document, but this is not required, as in SCF you can - if editor decides - inject the selected annotation inside the doucment as RDFa
[10:37] Paolo: so next slide, we have for the same docuemnt, multiple DocumentAnnotation sets, giving you flavors of representation to select
[10:37] Paolo: Last slide - this can be integrated with both SIOC and Annotea as shown here
[10:38] Paolo: thats it
[10:38] Anita: Questions?
[10:38] Q: have you considerd MOAT?
[10:39] Paolo: yes, I found the missing thing was curation - but some other proerties from MOAT can be integrated
[10:39] Anita: can you loop back to coarse-grained rhetorical structure? do you see this as an annotation, or just a way to know where you are in the document?
[10:40] Paolo: if you have an article that doesn't have coarse grained strucutre published, you can overlay this
[10:44] Tony: I've done work with text mining for anatomical features - similar things crop up in my model - also have AnnotationSets concept
[10:46] Tony: in my model annotation is over regions - using XPOINTER for start and end points - how to annotate differences to map between XML and text for example - I have a standard annotation for mapping back and forth between representations
[10:47] Tony: So I have Annotations and Regions where regions are nodes in RDF graph
[10:47] Tony: I describe a Region Registry so that people will use the same model of region identifiers
[10:48] Tony: also standardize XML to text mapping bi-directionally
[10:48] Tony: aim of splitting the two pieces of an Annotation we can build on annotations (rather than having them all independent)
[10:49] Tony: this is important for text mining
[10:49] Tony: can show an example of this
[10:49] Tony: formalized in OWL etc.
[10:50] Tony: mirrors nicely with Paolo's presentation
[10:50] Anita: nice complementarity between these two - let's move to Alex now
[10:51] Alex: actually Paolo's work is very much in line with what we are doing
[10:51] Alex: we are taking the XML version of a paper and tagging the document (humans or textmining in a tagging pipeline)
[10:52] Alex: we realized that Annotea wasn't good enough for biological knowledge - because you need pre-existing features
[10:52] Alex: we built a model called Biotea
[10:53] Alex: coarse grained annotation can be seen as an annotation - very unlikely to get everyone to change the way they annotat
[10:53] Alex: we are working this with DAS and people at EBI
[10:54] Alex: originally to facilitate annotation of sequence features - very close to what Paolo is doing
[10:54] Tony: all three are very similar - we can identify regions in objects and say anything you like about it
[10:56] Anita: can Paolo, Tony and Alex get together after the call and compare
[10:57] Tim: we will put together a call - Paolo, Tony and Alex
[10:57] to converge
[10:58] Anita: can the three of you report on progress after the discussion - in three weeks

[10:59] Anita: that's it, great action item - another call in three weeks - put up anything to share on the WIKI