Discussion re. Ontology of Rhetorical Blocks (ORB), October 7, 2010 Paolo Ciccarese, Tudor Groza, Ron Daniel, Alex Garcia.
1) Main blocks :
proposal to change to get rid of Background and Conclusion, or leave them optional
2) Physical vs. rhetorical blocks, is there more discussion needed?
Clearly only physical blocks, no rhetorical blocks. Discussion with David Shotton and Tim, David introduced DoCO, discussing how we can merge/align our effort with this and rhetorical doc structure? Tudor: we are talking about rhetorical blocks – text spans with different or variable role; not tying directly to a physical section that has the same name. David S – might be quite hard because he is talking about physical sections; while we are talking about rhetorical sections. Everyone is for identifying sections as they are already identified in existing documents: in that case, we should focus on just Introduction, Methods, Results, Discussion. In CS: motivation, related work.
Anita: coarse-grained is IMRaD; medium-grained is finer distinction, including e.g. Research Question, Other Work, etc.; fine-grained is Claim, Research Statement, etc.
Paolo: hard to differentiate physical from rhetorical. How to differentiate Introduction from Methods; elements that are Methods in non-Methods section? Annotation tool: annotate that part of the document. Don’t care if that is physical or rhetorical; don’t overlap. Motivation, Conclusion etc can be physical component.
Anita: indeed, can even be intrasentential! But that is a different set of terms: then, I’d like to have Goal, Fact, Problem, Hypothesis and Implication included, next to Methods and Results. There are other systems (e.g. paper by Liakata compares three of them [will send around]).
Alex: first step: Introduction, Conclusion etc is first step. Might not reflect the actual content of the paper! But then also identify smaller-grained structure. I see this as steps in a pipeline; here I’d like to go to the exact section that I am interested in. Does physical part match rhetorical component?
Ron: trying right now to make a list of coarse-grained sections; later on, analyze at a finer grain. Let’s try what people make of that, e.g. limit my queries to the Methods section; and then do a finer analysis, material in one section or another.
Tudor: have two ways to go from here: either we strip everything ambiguous from the ontology, and only keep IMRaD; or, we take what David Shotton did, same thing as we are trying to do?
So let’s take a poll: do we do an IMRaD coarse-grained one; and do we take David Shotton’s, or bake our own?
Paolo, if there is e.g. Discussion embedded into the Introduction, isn’t that also coarse-grained? An individual hypothesis is certainly fine-grained.
Let me rephrase in David Shotton’s terms – the physical components (Introduction, Conclusion) are different from e.g. Conclusion; if a smaller part of a section it becomes medium-grained. We don’t distinguish between physical and rhetorical; coarse-grained the physical is the rhetorical!
Paolo: Sometimes papers don’t have headings, don’t have sections – so how can we identify these sections?
Ron: general structure: Introduction, Methods, Results, Discussion – let’s see where that fails.
Ron: Question: what do we mean by physical? These are little bits of markup. Let’s separate automated and manual markup.
Tudor: physical means the section has a title.
Ron: Pre-existing markup that is intended to carry this rhetorical block structure.
Coarse-grained rhetorical ontology.
Alex: I have a text mining tool – read paragraph name; now if it is a Background then I will call it Introduction as well?
Anita: yes. Let’s get this out, first.
Alex: if no papers, then if e.g. Introduction, then add Introduction; if no pre-existing markup, then coarse-grained.
Ron: yes, coarse-grained is contiguous, not at sentence level or smaller.
Anita: save other components for Medium-grained, now publish Coarse-grained which determines IMRaD. Medium-grained allows you to split in smaller sections.
Add heads and tails?
Ron: Not same level of historic usage for heads and tails; let’s leave sections as sections; References. Can include Acknowledgements. Tail: just making clear what section we are talking about. Supplementary data etc. can be included, let’s
Header: Dublin Core?
Ron: there are other choices; NLM DTD has its own header; but would propose Dublin Core. Title, Authors, Data, ISSN etc. Original scale as DC: abstract, authors, title. Dublin Core RDF schema.
Tudor: yes, one of the main reasons we should use it! DC is de facto standard.
Anita: We can just point to it, then.
Paolo: yes, DC is default, we can then add new things. Don’t like Creator; we call it differently, I use DC:Creator and define my own subfield.
Tudor: We can also have a look at SWRC Ontology.
Ron: let’s stick with Dublin Core, for a start! And see where it falls short.
3) What use cases are we serving?
Only 3 with coarse-grained; 1 and 2 need medium-grained.
Propose to keep ORB
5) How to implement/loop back to other groups.
Write white paper after medium-grained is done.