HCLSIG/SWANSIOC/Actions/RhetoricalStructure/meetings/20091201

Rhetorical Structure Group HCLS SIG W3C, Phone Meeting December 1 2009

Attendees: Tudor Groza (TG), Paolo Ciccarese (PC), Keith Gutfreund (KG), Anita de Waard (AdW - chair + scribe)

1. Use cases:

All: Markup can be manual, automatic and automated-assisted markup; add markup to documents so we can identify claims.
TG: can be coarse-grained: larger pieces of text that are a larger rhetorical block
PC: need to define rhetorical blocks into the documents, e.g. materials and methods, conclusions, etc.
AdW: need a unique terminology for larger vs. smaller-grained blocks
TG: too much semantics – can call them discourse elements, larger ones:
PC: structure of the document: cannot assure anything about it – one thing is to point to the content, identify claim, hypotheses etc. Can simply call it section, that is different than calling it Results
TG: in SALT: clear distinction between linear and rhetorical structure; rhetorical block ‘contribution’ can be inside Introduction– section Introduction is part of the linear structure.
AdW: One use case should include authoring!
TG: we should both add an authoring use case
PC: if we have a document and not performing any nano-annotation: need to recognise the section headings! That would be helpful already for text-mining. E.g. search text, but exclude references, or Experimental Methods.
TG: we did this, but titles are very varied: for computer science, very varied.
PC: in biology most section headings are recognisable.
PC – write into use case, for biology papers.
AdW: in Cell, section headings represent the content quite well!
TG: let’s do a small study – I have a parser for the Elsevier corpus – let’s see what are usual titles that are found – can we do a generalisation of this? Let’s see if the section headings are indeed like triples!
PC: even if not complete: as long as we can recognize Methods and Materials and References we would gain a lot.
Keith: we could ask Ellen if this was done?
PC: We agree that we have to recognise the sections of the articles – within those sections we have different blocks
TG: That is one way to go. We are thinking about a rhetorical structure; having linear structure in place could help –
KG: I’ve worked on 10 projects that have tried to establish a structure for documents, eg. for ScienceDirect – were never successful – general they are too broad, never fit nicely – nothing beats success! If you can find a set of documents that nicely adhere a template – then you can make a case that if you do follow these rules, you get all kinds of benefits, outweighs the cost of composing the template. At least a subset of documents should follow a template; pharmacology, can mark-up these sections – then semi-automate with plug-in, and are able to do a lot!
TG: From my understanding, we are planning a core terminology + model from different domains: solves issue of too broad a model. As for having a template: this is one thing we are trying to do here, but we are still far away from convincing people not to have their own artistic touch...
AdW: Authoring use case pharmacology add an authoring use case – do need document structure and either author in templates, or do recognise
PC: we do everything manually, are now moving in a different direction, manual annotation doesn’t scale...
AdW: did you talk to Agnes Sandor?
PC: SWAN does not want something that is 80% right, no way to do it automatically, easier for curators manually than to check an automated system.
Define a structure beforehand and then apply.
Use cases themselves? We will comment on use cases.
PC will try to get wiki: get comments in before next meeting. TG and AdW will add authoring task to their own use case!
Keith to add a template for authoring use cases!
PC Will send around link to wiki and ensure we all can make changes...

2. Document structure:

AdW: TG, can you make a start at the Chinese menu structure?
TG: Hard to propose a core model until people get the use cases – not a problem agreeing on it, but details will be the discussion
KG will start on making an XML schema of abcde format: am starting to do that! + RDF-S for entities?
TG: are going towards a terminology without deciding what we will do! How about we start the discussion from abcde – this is the core structure – each of the 5 ‘letters’ one by one, and discuss whether they are enough: Do we have a clear semantics, can they be divided further, and then basically as we go further modify the schema?
AdW: Core sentences is not in the acronym – we do need to discuss!
Let’s discuss it during the next call: AdW will start a take
TG: Are semantics clear? What does it mean? Contribution, do a little presentation on abcde – AdW will make a table explaining each of the components
PC: what if we have too many rows?
AdW: then we can consolidate the rows
PC: start updating the use cases?

3. Dates:

Use cases etc. up on 8th December
Next call: December 14th, 9 am EDT/2 pm GB /3 pm AMS