From W3C Wiki
Jump to: navigation, search

Meeting November 15 2010, 10 am EDT


1. Recap goals and use cases for the Rhetorical Document Model Subtask:

2. ORB (Ontology of Rhetorical Blocks) progress: ORB OWL file

3. Medium-grained progress: first pass at Medium, very much open for discussion.

4. DoCo (Document Components Ontology) - David Shotton: DoCO v1.0

---DoCo Background

5. Next steps.

Three alignment efforts:

  • 1) ORB - MediumGrain,
  • 2) MediumGrain - Data+Discourse,
  • 3) ORB & MediumGrain - DoCO
  • Who to do: offline discussion.
  • When: report back on next call.


Participants: Anita de Waard, Scott Marshall, Jodi Schneider, David Shotton, Joanne Luciano, Tim Clark, Silvio Peroni, Paolo Ciccarese Scribe: Scott

  • Anita: Shall we Recap use cases?
  • 1 - Would like to identify the section of the document where

you've identified a gene, for example...application of Coarse Grained structure model, "ORB" (Ontology of Rhetorical Blocks) http://esw.w3.org/images/d/d2/Orb-0_1.owl

  • 2 - Coarse grained model is very simple: Introduction, Methods, Results and Discussion
  • 3 - Medium-grained (DRO), identify part of section, paragraph

level, research question

  • 4 - Fine-grained sentence level, phrase, clause.
  • Paolo: Coarse-grained - We assume contiguous piece of text.
  • Anita: IMRaD Model = Introduction, Methods, Results and Discussion
  • Header: in Dublin Core/PRISM; Fabio/Bibo can model references
  • Classes are disjoint: coarse-grained blocks are contiguous, not overlapping in the article
  • Tim: (Tudor is traveling, he and Ron and Alex and Anita and Paolo made ORB) - is straightforward, adds a way to talk about entire article,disjunct from other level, useful for text miners
  • Joanne: Tim suggests we adopt it, I agree.
  • Tim: notes that ORB is simple and useful
  • Paolo: that was motivation for this ontology, this is consensus

and quite neat in terms of definition

  • Joanne: The classes are References, Discussion, Header, Introduction,

Methods, References, Results

  • Anita: Write out SIG note and send to HCLS as a whole
  • Scott: Tim is discussing with Dietrich Rebholz Shumann - Pistoia SESL Project, about microarray
  • Scott: annotation of microarray corpus -
  • Tim: how is Dietrich annotating these? Any further talks?
  • Tim: wanted to collaborate; DRS didn't respond yet. Scott and Tim and Pfizer people are interested.
  • Paolo: This is the purl for the most recent ORB version: http://purl.org/orb/
  • Joanne: There are folks here at RPI that are interested also, for provenance purposes, they couldn't make the call because of conflicting schedules, but contact us off-line.
  • Anita: Anita is chasing Pistoia corpus from Elsevier end - will let know

if it is available with or without annotation by EBI and others.

  • Paolo: This is the PURL for the version 0.1 only http://purl.org/orb/0.1/
  • Anita: Anita asks Paolo and Tudor to write note for releasing on the world(wideweb)
  • Paolo: agrees - will take lead, Tudor to help.
  • Anita: Let's not subdivide header and references, since we already have standards for that...
  • Paolo: don't give options?
  • Paolo: do provide options! Let's start listing them and go through existing bibliographic standards.
  • Tim: task for the next time we take up this thread - let's take ones we like the best, at least DC and PRISM and XMP/ElPub standards and also CiTO/Bibo
  • Anita: Involve Ron Daniel in this discussion - keep placeholder
  • Anita: Ron Daniel headed up the PRISM metadata project (?)
  • Jodi: I think that paper type needs to be distinguished for medium grain. Experimental vs. theoretical, review, ...
  • Scott: - yes, Ron Daniel headed up PRISM,http://en.wikipedia.org/wiki/Publishing_Requirements_for_Industry_Standard_Metadata
  • Jodi: For math, for instance, I don't think the method and results are going to work very well
  • Anita: PAM: Prism Aggregator Message = http://www.prismstandard.org/faq/
  • Jodi: except perhaps positioning, central problem, definition
  • Paolo: Objects of study: ?
  • Tim: biomaterials or what
  • Anita: HI Jodi, good comments, will address in the call in a minute!
  • Tim: what is use case? How can we use this? Overlap with Data + Discourse + Experiment task
  • Tim: offline, should chat with Philippe and Susana - overlap with

other task.

  • Tim: link to Sudeshna's talk from Nov 1 on DEXI: http://www.slideshare.net/sdas617/sci-discourse-nov-2010
  • Tim: link to cartoon of current Data+Discourse+Experiment (DEXI) ontology: http://esw.w3.org/File:SWAN-myExp-v4.jpg
  • Tim/Anita - let's discuss integration between medium-grained structure and research data/workflow output a la beyond the pdf: https://sites.google.com/site/beyondthepdf/
  • Tim: link to the Data+Discourse+Experiment Task: http://esw.w3.org/HCLSIG/SWANSIOC/Actions/SWANmyExpArray
  • Jodi: this is life science focused - we need old stuff in other disciplines, that we can apply retrospecitvely
  • Tim: Well - this is a Life Sciences SIG! :-)
  • Anita: Three distinctions: Life science/Physical sciences/everything else
  • Second distinction: Research article, Review article, QUick research note
  • Third distinction new material vs. existing text
  • Let's discuss other article types as well?
  • Jodi: clinical reports are a nice example
  • Scott: has been looking at clinical reports as well. This group is more aimed at scientific literature, right? See a very acute need to mine clinical reports.
  • Scott: Assign mapping to terminologies, UMLS, SnoMED etc. - but Clin Reports don't have a normalised structure - makes it difficult to make something that is generally useful in terms of ontology of doc


  • Jodi: I think that's really pragmatic, Tim!
  • Jodi: I guess, let's just be explicit about what the scope is, when giving a medium-grain structure.
  • Tim: my 2 cents - fan of restricted initial scope - let's start on Research papers in life sciences, make that use case work; do a stepwise incremental expansion. Take users in astronomy, etc., then look at other types

of articles.

  • Anita: volunteers for medium-grained model - Tim will send an email to Experimental Discourse Group
  • Tim: jodi - ?
  • Howard: I'm interested in this discussion
  • Joanne:interested in being involved in the medium grained model discussion (RPI)
  • Jodi: yes, interested ;)
  • Anita: Sorry that's Data + Discourse + Experiment task
  • David Shotton - access to figures and examples?
  • Yes, I think, it's all linked from http://esw.w3.org/HCLSIG/SWANSIOC/Actions/RhetoricalStructure/meetings/20101115
  • Anita: Significant overlap between DoCo and medium-grained model - big blocks that describe structural components
  • Joanne: This is Item #4 that David is talking about. rhetOnto.png
  • Tim: Overlap with DoCO and Coarsegrained and Medium grained ontology
  • Tim: David, what is motivation for this work?
  • David: Tried to create an ontology that would accurately describe the components of a document.
  • David: publishers could use this, ontology that would accurately describe the components of a document, could be used by publishers and researchers. Document sections have a rhetorical function
  • Tim: Is this a 1:1 map NLM dtd to DoCo? Yes.
  • Tim: could you do automated processing of NLM DTD to DoCo?
  • David: yes but haven't done that yet
  • Tim: will you be using this in your JISC project?
  • David: In Peter Murray Rust's project for JISC
  • Anita: did you look at any other publisher's DTDs?
  • David: no -
  • Anita: long history of DTD development
  • David: are looking at PLoS, then at BioMed Central
  • Anita: Is there a way to integrate with Medium-grained system
  • David: how do we connect DoCo to medium-grained structure?
  • Sylvio: developed the ontology with colleagues in Bologna, interested in patterns in XML documents - we identified structural patterns that allows structure of textual document, we study this topic in Masters
  • Tim: so - what I see is mutliple alignment tasks here: ORB - MediumGrain, MediumGrain - Data+Discourse, ORB & MediumGrain - DoCO
  • Anita: Silvio, what do you mean by patterns?
  • Sylvio: pattern is a general solution to a current problem. E.g. textures, and such. Paragraphs are blocks containing text, and many other elements, such as emphasis, citations, etc.
  • Silvio what software tool did you use to generate the DOCO documentation?
  • Anita: 1) ORB - MediumGrain, 2) MediumGrain - Data+Discourse, 3) ORB &

MediumGrain - DoCO - offline discussion whom. When: report back on next call!

  • David: Silvio's mail is speroni@cs.unibo.it