Meeting November 15 2010, 10 am EDT
1. Recap goals and use cases for the Rhetorical Document Model Subtask:
2. ORB (Ontology of Rhetorical Blocks) progress: ORB OWL file
3. Medium-grained progress: first pass at Medium, very much open for discussion.
4. DoCo (Document Components Ontology) - David Shotton: DoCO v1.0
5. Next steps.
Three alignment efforts:
- 1) ORB - MediumGrain,
- 2) MediumGrain - Data+Discourse,
- 3) ORB & MediumGrain - DoCO
- Who to do: offline discussion.
- When: report back on next call.
Participants: Anita de Waard, Scott Marshall, Jodi Schneider, David Shotton, Joanne Luciano, Tim Clark, Silvio Peroni, Paolo Ciccarese Scribe: Scott
- Anita: Shall we Recap use cases?
- 1 - Would like to identify the section of the document where
you've identified a gene, for example...application of Coarse Grained structure model, "ORB" (Ontology of Rhetorical Blocks) http://esw.w3.org/images/d/d2/Orb-0_1.owl
- 2 - Coarse grained model is very simple: Introduction, Methods, Results and Discussion
- 3 - Medium-grained (DRO), identify part of section, paragraph
level, research question
- 4 - Fine-grained sentence level, phrase, clause.
- Paolo: Coarse-grained - We assume contiguous piece of text.
- Anita: IMRaD Model = Introduction, Methods, Results and Discussion
- Header: in Dublin Core/PRISM; Fabio/Bibo can model references
- Classes are disjoint: coarse-grained blocks are contiguous, not overlapping in the article
- Tim: (Tudor is traveling, he and Ron and Alex and Anita and Paolo made ORB) - is straightforward, adds a way to talk about entire article,disjunct from other level, useful for text miners
- Joanne: Tim suggests we adopt it, I agree.
- Tim: notes that ORB is simple and useful
- Paolo: that was motivation for this ontology, this is consensus
and quite neat in terms of definition
- Joanne: The classes are References, Discussion, Header, Introduction,
Methods, References, Results
- Anita: Write out SIG note and send to HCLS as a whole
- Scott: Tim is discussing with Dietrich Rebholz Shumann - Pistoia SESL Project, about microarray
- Scott: annotation of microarray corpus -
- Tim: how is Dietrich annotating these? Any further talks?
- Tim: wanted to collaborate; DRS didn't respond yet. Scott and Tim and Pfizer people are interested.
- Paolo: This is the purl for the most recent ORB version: http://purl.org/orb/
- Joanne: There are folks here at RPI that are interested also, for provenance purposes, they couldn't make the call because of conflicting schedules, but contact us off-line.
- Anita: Anita is chasing Pistoia corpus from Elsevier end - will let know
if it is available with or without annotation by EBI and others.
- Paolo: This is the PURL for the version 0.1 only http://purl.org/orb/0.1/
- Anita: Anita asks Paolo and Tudor to write note for releasing on the world(wideweb)
- Paolo: agrees - will take lead, Tudor to help.
- Anita: Let's not subdivide header and references, since we already have standards for that...
- Paolo: don't give options?
- Paolo: do provide options! Let's start listing them and go through existing bibliographic standards.
- Tim: task for the next time we take up this thread - let's take ones we like the best, at least DC and PRISM and XMP/ElPub standards and also CiTO/Bibo
- Anita: Involve Ron Daniel in this discussion - keep placeholder
- Anita: Ron Daniel headed up the PRISM metadata project (?)
- Jodi: I think that paper type needs to be distinguished for medium grain. Experimental vs. theoretical, review, ...
- Scott: - yes, Ron Daniel headed up PRISM,http://en.wikipedia.org/wiki/Publishing_Requirements_for_Industry_Standard_Metadata
- Jodi: For math, for instance, I don't think the method and results are going to work very well
- Anita: PAM: Prism Aggregator Message = http://www.prismstandard.org/faq/
- Jodi: except perhaps positioning, central problem, definition
- Paolo: Objects of study: ?
- Tim: biomaterials or what
- Anita: HI Jodi, good comments, will address in the call in a minute!
- Tim: what is use case? How can we use this? Overlap with Data + Discourse + Experiment task
- Tim: offline, should chat with Philippe and Susana - overlap with
- Tim: link to Sudeshna's talk from Nov 1 on DEXI: http://www.slideshare.net/sdas617/sci-discourse-nov-2010
- Tim: link to cartoon of current Data+Discourse+Experiment (DEXI) ontology: http://esw.w3.org/File:SWAN-myExp-v4.jpg
- Tim/Anita - let's discuss integration between medium-grained structure and research data/workflow output a la beyond the pdf: https://sites.google.com/site/beyondthepdf/
- Tim: link to the Data+Discourse+Experiment Task: http://esw.w3.org/HCLSIG/SWANSIOC/Actions/SWANmyExpArray
- Jodi: this is life science focused - we need old stuff in other disciplines, that we can apply retrospecitvely
- Tim: Well - this is a Life Sciences SIG! :-)
- Anita: Three distinctions: Life science/Physical sciences/everything else
- Second distinction: Research article, Review article, QUick research note
- Third distinction new material vs. existing text
- Let's discuss other article types as well?
- Jodi: clinical reports are a nice example
- Scott: has been looking at clinical reports as well. This group is more aimed at scientific literature, right? See a very acute need to mine clinical reports.
- Scott: Assign mapping to terminologies, UMLS, SnoMED etc. - but Clin Reports don't have a normalised structure - makes it difficult to make something that is generally useful in terms of ontology of doc
- Jodi: I think that's really pragmatic, Tim!
- Jodi: I guess, let's just be explicit about what the scope is, when giving a medium-grain structure.
- Tim: my 2 cents - fan of restricted initial scope - let's start on Research papers in life sciences, make that use case work; do a stepwise incremental expansion. Take users in astronomy, etc., then look at other types
- Anita: volunteers for medium-grained model - Tim will send an email to Experimental Discourse Group
- Tim: jodi - ?
- Howard: I'm interested in this discussion
- Joanne:interested in being involved in the medium grained model discussion (RPI)
- Jodi: yes, interested ;)
- Anita: Sorry that's Data + Discourse + Experiment task
- David Shotton - access to figures and examples?
- Yes, I think, it's all linked from http://esw.w3.org/HCLSIG/SWANSIOC/Actions/RhetoricalStructure/meetings/20101115
- Anita: Significant overlap between DoCo and medium-grained model - big blocks that describe structural components
- Joanne: This is Item #4 that David is talking about.
- Tim: Overlap with DoCO and Coarsegrained and Medium grained ontology
- Tim: David, what is motivation for this work?
- David: Tried to create an ontology that would accurately describe the components of a document.
- David: publishers could use this, ontology that would accurately describe the components of a document, could be used by publishers and researchers. Document sections have a rhetorical function
- Tim: Is this a 1:1 map NLM dtd to DoCo? Yes.
- Tim: could you do automated processing of NLM DTD to DoCo?
- David: yes but haven't done that yet
- Tim: will you be using this in your JISC project?
- David: In Peter Murray Rust's project for JISC
- Anita: did you look at any other publisher's DTDs?
- David: no -
- Anita: long history of DTD development
- David: are looking at PLoS, then at BioMed Central
- Anita: Is there a way to integrate with Medium-grained system
- David: how do we connect DoCo to medium-grained structure?
- Sylvio: developed the ontology with colleagues in Bologna, interested in patterns in XML documents - we identified structural patterns that allows structure of textual document, we study this topic in Masters
- Tim: so - what I see is mutliple alignment tasks here: ORB - MediumGrain, MediumGrain - Data+Discourse, ORB & MediumGrain - DoCO
- Anita: Silvio, what do you mean by patterns?
- Sylvio: pattern is a general solution to a current problem. E.g. textures, and such. Paragraphs are blocks containing text, and many other elements, such as emphasis, citations, etc.
- Silvio what software tool did you use to generate the DOCO documentation?
- Anita: 1) ORB - MediumGrain, 2) MediumGrain - Data+Discourse, 3) ORB &
MediumGrain - DoCO - offline discussion whom. When: report back on next call!
- David: Silvio's mail is email@example.com