From W3C Wiki
Rhetorical Structure Group HCLS SIG W3C, Phone Meeting December 1 2009
Attendees: Tudor Groza (TG), Paolo Ciccarese (PC), Keith Gutfreund (KG), Anita de Waard (AdW - chair + scribe)
1. Use cases:
- All: Markup can be manual, automatic and automated-assisted markup; add markup to documents so we can identify claims.
- TG: can be coarse-grained: larger pieces of text that are a larger rhetorical block
- PC: need to define rhetorical blocks into the documents, e.g. materials and methods, conclusions, etc.
- AdW: need a unique terminology for larger vs. smaller-grained blocks
- TG: too much semantics – can call them discourse elements, larger ones:
- PC: structure of the document: cannot assure anything about it – one thing is to point to the content, identify claim, hypotheses etc. Can simply call it section, that is different than calling it Results
- TG: in SALT: clear distinction between linear and rhetorical structure; rhetorical block ‘contribution’ can be inside Introduction– section Introduction is part of the linear structure.
- AdW: One use case should include authoring!
- TG: we should both add an authoring use case
- PC: if we have a document and not performing any nano-annotation: need to recognise the section headings! That would be helpful already for text-mining. E.g. search text, but exclude references, or Experimental Methods.
- TG: we did this, but titles are very varied: for computer science, very varied.
- PC: in biology most section headings are recognisable.
- PC – write into use case, for biology papers.
- AdW: in Cell, section headings represent the content quite well!
- TG: let’s do a small study – I have a parser for the Elsevier corpus – let’s see what are usual titles that are found – can we do a generalisation of this? Let’s see if the section headings are indeed like triples!
- PC: even if not complete: as long as we can recognize Methods and Materials and References we would gain a lot.
- Keith: we could ask Ellen if this was done?
- PC: We agree that we have to recognise the sections of the articles – within those sections we have different blocks
- TG: That is one way to go. We are thinking about a rhetorical structure; having linear structure in place could help –
- KG: I’ve worked on 10 projects that have tried to establish a structure for documents, eg. for ScienceDirect – were never successful – general they are too broad, never fit nicely – nothing beats success! If you can find a set of documents that nicely adhere a template – then you can make a case that if you do follow these rules, you get all kinds of benefits, outweighs the cost of composing the template. At least a subset of documents should follow a template; pharmacology, can mark-up these sections – then semi-automate with plug-in, and are able to do a lot!
- TG: From my understanding, we are planning a core terminology + model from different domains: solves issue of too broad a model. As for having a template: this is one thing we are trying to do here, but we are still far away from convincing people not to have their own artistic touch...
- AdW: Authoring use case pharmacology add an authoring use case – do need document structure and either author in templates, or do recognise
- PC: we do everything manually, are now moving in a different direction, manual annotation doesn’t scale...
- AdW: did you talk to Agnes Sandor?
- PC: SWAN does not want something that is 80% right, no way to do it automatically, easier for curators manually than to check an automated system.
- Define a structure beforehand and then apply.
- Use cases themselves? We will comment on use cases.
- PC will try to get wiki: get comments in before next meeting. TG and AdW will add authoring task to their own use case!
- Keith to add a template for authoring use cases!
- PC Will send around link to wiki and ensure we all can make changes...
2. Document structure:
- AdW: TG, can you make a start at the Chinese menu structure?
- TG: Hard to propose a core model until people get the use cases – not a problem agreeing on it, but details will be the discussion
- KG will start on making an XML schema of abcde format: am starting to do that! + RDF-S for entities?
- TG: are going towards a terminology without deciding what we will do! How about we start the discussion from abcde – this is the core structure – each of the 5 ‘letters’ one by one, and discuss whether they are enough: Do we have a clear semantics, can they be divided further, and then basically as we go further modify the schema?
- AdW: Core sentences is not in the acronym – we do need to discuss!
- Let’s discuss it during the next call: AdW will start a take
- TG: Are semantics clear? What does it mean? Contribution, do a little presentation on abcde – AdW will make a table explaining each of the components
- PC: what if we have too many rows?
- AdW: then we can consolidate the rows
- PC: start updating the use cases?
- Use cases etc. up on 8th December
- Next call: December 14th, 9 am EDT/2 pm GB /3 pm AMS