Rhetorical Document Structure Group HCLS SIG W3C, Phone Meeting January 29th 2010

Attendees: Tudor Groza (TG), Paolo Ciccarese (PC), Keith Gutfreund (KG), Anita de Waard (AdW - chair), Alexandre Passant (AP), Jodi Schneider (JS), Joanne Luciano, Scott Marshall (scribe), Tim Clark, Jack Park


  • Paolo: Structuring a reference remains to be done (extracting and modeling in RDF)
  • Tudor: I'm working on that
  • Paolo: If I encounter a reference to another article, I should be able to attach that reference.
  • Matthias: What do you mean a more 'document-centric' representation?
  • Paolo: We don't yet refer to the article and the chunk of text where the claim is coming from.
    • ... We want the annotator to be able to attach other references to the article.
  • Anita: How do you encode that?
  • Paolo: We encode it in RDF.
    • ... Example: We can use CSS selectors to point to locations in HTML3
  • Anita: Back to top of the agenda.
  • Paolo: We could have 'core' sentences in the ABCDE model so I am fine with this.
    • ... We can label a sentence as 'scientific discourse' (discourse element) according to the model (abstract).
    • ... Usually, the Abstract represents the structure of the article but not all discourse within it.
    • I am saying that ABCDE can be used to label the different sections according to Abstract type and sentences can be labelled at a finer granularity according to SWAN-SIOC labels. In this way, they can be combined.
  • Joanne: Is the intent to capture the discourse within the paper as well as the literature?
  • Anita: Yes.
  • Joanne: Has this been done manually with a paper to create a standard?
  • Anita: We haven't decided yet on the final structure so it's all manual at the moment.
  • Tudor: We have all done this by hand with ABCDE, SWAN, and SALT.
  • Anita: Would it be possible for Paolo and Tudor to upload an example marked up document?
  • Tudor: Yes.
  • Paolo: You can already browse it in SWAN.
  • Anita: It would be easier if you upload a single example.
  • Paolo: Just having the triples, you might find yourself going back to SWAN to look at it through the interface.
    • ... I can give you the link to one document in the SWAN browser, as well as the triples.
  • Anita: Back to your question Joanne: We are looking at various ways of modeling the structure of documents. We have started with ABCDE because it is the simplest.
    • Medium is the structure that I made while I was writing my PhD. In this model, references are within the document. The Abstract could be a table of contents. The idea is that it should be very easy to access different components of a document.
    • ... Do you think that this model would be useful?
  • Tudor: It handles references better than ABCDE. Some terminology could be improved.
    • ... i.e. made more general.
    • Maybe we could take Background and split into Context and Motivation as an experiment.
  • Anita: Maybe you could create a comparison grid.
  • Paolo: We could consult with the SWAN curators (biologists that are reading the papers).
  • Jack: Work at KMI with climate change articles, could try out the models on those articles
    • ... Working on federation of structured discourse, centered around IBIS
  • Anita: Your input would be welcome.
  • Scott: Do you mean aggregation of information from disparate discourse sources, such as discussion forums, blogs, articles, etc. (ala SWAN-SIOC)?
  • Jack: Yes, automatically without human curation
  • Anita: Jack, could you send us something about your work?
  • Jack: Yes, I can send pdf to individuals (it's a tech report). I'm also working on a platform for a Global Climate change portal in Korea (?), which is the deliverable of my PhD work.
  • Anita: I spoke with Paul Groth recently (Concept Web Alliance), where people are working on a model for 'nanopublications'.
    • ... Significant overlap with our material.
  • Joanne: C-SHALS conference in Boston on Feb. 24 - 26, lot's of people from HCLS will be there.

Action items

  • Jack to try models on global climate change discourse
  • Paolo to consult with curators about models
  • Tudor and Paolo upload a single document example
  • Tudor to do an intermediate medium-grained model (in a comparison grid)


  • Next call: February 15th 2010 , 9AM Boston / 2PM Irish / 3PM Amsterdam
  • Paul Groth from the Concept Web Alliance to present his Nano-publications format