HCLSIG/SWANSIOC/Meetings/2010-09-20 Conference Call

From W3C Wiki

Dial-in & IRC Information

  • Dial-In #: +1.617.761.6200 (Cambridge, MA)
  • Dial-In #: +33.4.89.06.34.99 (Nice, France)
  • Dial-In #: +44.117.370.6152 (Bristol, UK)
  • Participant Access Code: 42572 ("HCLS2")
  • IRC Channel: irc.w3.org port 6665 channel #HCLS2 use IRC direct link or (see W3C IRC page for details, or see Web IRC)

Agenda

  • Housekeeping
  • Next Steps

Minutes

  • <Anita> Anita scribes, assisted by Scott and Tim
  • <Tim> On the call: Gully Burns, Howard Burrows, Paolo Ciccarese, Tim Clark, Sudeshna Das, Anita deWaard, Rob Frost, Joanne Luciano, Scott Marshall, David Newman, Jodi Schneider, Tony Scerri,
  • <Anita> Slideshare to slides - everyone's looking at it now...
  • <Anita> Topic: aligning text mining + Semantic web communities
  • <Anita> Text mining goal: algorithmically derive knowledge from documents, needs universal persistance that is consistent with web architecture
  • <Anita> SW goal: represents high-quality information so you can make it computable directly, need bridges to web of documents where sceince is greated...
  • <Anita> Rob: Add a check to text clustering as well - goal is building ontology then if you can initially run clustering to generate candidate concepts - also needed
  • <Anita> Gully: we are developing techniques to make curation pipelines for biocurators, modeling thier workflow + finding out what they need, model in UML, then going to text miners and telling them what they need when
  • <Anita> Gully will send email offline to fill in missing elements in slide
  • <Anita> Tim: post to wiki; Gully: can't will send around => Tim Posts
  • <Anita> Tim: all checks ok?
  • <Anita> Scott: there is a page that has related projects that Anita sent - share?
  • <Anita> Anita: sure
  • <Anita> Scotts scribes as Anita puts the page on the wiki
  • <mscottm> Tim: Text mining can assist in creating ontologies
  • <mscottm> ..and be used in mapping statements to documents on the web.
  • <mscottm> X1: wonders whether using ontologies to represent text mining is viable.
  • <mscottm> ..unless it's lightweight, it wouldn't be practical.
  • <mscottm> Rob: You can use statistical associations and classifiers.
  • <Tim> scott: it's a lightweight process - you can use SWAN/SIOC
  • <Tim> Sudeshna: AO covers some of that - captures provenance of where assertions and mappings came from
  • <Tim> Paolo: depends on what you are targeting - some entities can be detected but discourse is a bit more abstract - texmining can help for genes, proteins, etc if supervised
  • <jluciano> Looking for feedback here - would it be useful to say that we are looking to bridge the gap between scientific thinking (science) and the language that's use to record it (text mining/NLP)?
  • <mscottm> I would agree Joanne - that's one way of putting it.
  • <Anita> Ok - I think this is the page Scott meant? http://groups.google.com/group/forcnet/web/list-of-projects-that-aim-to-define-new-formats-research-objects?hl=en  = I made all the FoRC docs publicly available now
  • <jodi> Joanne: I would say that it's more about bridging the gap between scientific writing (papers, etc) and the meaning behind it (textmining/NLP to identify appropriate ontologies, relationships)
  • Anita slaps mscottm around a bit with a large fishbot
  • mscottm says fishbot, Huh?
  • <Anita> => * <mscottm> Scott - you want me to scribe again? (you're doing a great job and my RSI is acting up so very happy if you want to scribe :-))
  • <mscottm> Paolo: Once you've created it (for instance from software), it's possible that human intervention created the results. Creation tokens can be used easily by humans, should be possible by machines.
  • <jodi> Anita: thanks, that's a useful list!
  • mscottm asks Anita to join in as much as the RSI allows.
  • mscottm will keep trying to keep up!
  • <Anita> Sorry - where is the instructions to find how to interact with this chat software? Or even the instructions to find the instructions?
  • <mscottm> http://www.w3.org/2001/12/zakim-irc-bot
  • <Anita> Gully If we have formal structure to formalize the process of annotating the document for text mining and then allowing the text mining to analyze its representations - then annotators look at marked up document - then in a formal computational way going forward o is what's required.
  • <Anita> Scott: if PPI is made by text mining tool (e.g. in Taverna) then disclose fact it was made by software - as a type of assertion (so type of modality, AdW)
  • <paoloC_> http://code.google.com/p/annotation-ontology/wiki/Curation
  • <Anita> Scott: it helps to have clearly identified entities for text mining - Rob(?) - yes, differentiate from Gold Standard!
  • <mscottm> Scott: There are two sides: 1) using text mined assertions in linked data information retrieval and 2) using the well-defined ontological markup to properly train statistical models
  • <mscottm> Rob: you would want to build up toward a gold standard.
  • <mscottm> Gully: Text miners are most interested in the text mining algorithm itself and not so interested in infrastructure.
  • <Anita> Gully: most of the people who do this, 1) solve challenge-by-challenge basis, research oriented - rather than building sharable infrastructure, resistant to the idea of incorporatingsystems that force conventions, such as ontologies -  
  • <Anita> (Gully) If annotation technology was easy to use, good standards then much more inclined to do that
  • <Anita> At Biocreative III meeting in Bethesda last week discussion about this - thinking about how to develop standards on this, reluctance to adhere to Semantic Web approaches
  • <Anita> (More Gully) Need to understand needs of text mining/computational linguistic community
  • <Anita> (Tim) Gully's point is like when you get biomedical researchers to use ontologies: it has to reduce their effort, if you want it to work
  • <Anita> Gully knows a number of people, he is interested in talking to them (I think you were going to say Gully?)
  • <Anita> TIm: on last week's call we made AO & related Textmining interactions a 'first-class subtask'
  • <Anita> to work on integration between text mining and semantic web
  • <Anita> tim: potential alignments is Annotation Ontologies; Anita's subtask, Biomedical Article of the Future (Anita - I'd rather not call it that - name is already taken by Cell!) - generating documents - any others?
  • <Tim> Okay Anita - please rename it!
  • <mscottm> http://www.calbc.eu/
  • <Anita> Thanks Sudeshna
  • <sudeshna> Integration of claims with experimental data and computation - need to take into account that claim could be generated from text mining
  • <Anita> Scott: Gave a talk at EBI; Silver standard corpus now in XML can be translated into RDF by taking named entities + using fully qualified URIs instead of strings;
  • <Anita> Scott: met Adrien Coulet at NCBO and mined PubMed using PharmGKB lexicon to get gene/disease/drug relations
  • <Anita> Scott still - use Scientific Discourse ontologies and then microArray datasets, etc and linked data queries to pull out info from PubMed based on PharmGKB Lexicon with hypothetical relations
  • <Anita> In summary: a catalogue of assertion types (automated/ lightweight/silver/gold) is needed!
  • <mscottm> Rob: Generating a gold standard by group contributions. Recently read about similar from I2D2
  • <Anita> Tim: good idea, Harvard + Elsevier are working to define a corpus where document set is very focused and translational in reach, e.g. on curing a disease
  • <mscottm> Tim: I think it would be beneficial to use a focused document set.
  • <Anita> Anita: Happy to make that available to this community!
  • <Anita> Working on Integration of UIMa with Annotation Framework
  • <Anita> Gully: we (ISI/USC) are working with U Colorado in SciKnowMine; also working with Tsujii in Tokio, if there are ways we can create simplified software-based approach to share annotations that would make everything go faster
  • <Anita> Scott: Dietrich Rebholz-Schuhman is part of Pistoia, Tim and Anita will recognize it's similar to CODE_Neuro project we applied for - put together a bunch of vocabularies to access different resources in a semweb way;
  • <Anita> Scott: Ian Harrow of Pfizer, ?? and Dietrich were co-authors: ways to share text-mined information: federated linked data! in a markup query enter gene, disease, tissue, species; get set up results: gene coorcurrence link to paper
  • <Anita> Scott: Pistoia, is what?
  • <mscottm> http://www.pistoiaalliance.org/
  • <mscottm> includes many pharma companies for pooling IT and knowledge resources
  • <Anita> Tim: 11:01, end of slot! We established it is beneficial to align these communities; move to 2 weeks hence! October 4th call for further alignment.
  • <jodi> http://esw.w3.org/HCLSIG/SWANSIOC
  • <Zakim> SW_HCLS(Disc)10:00AM has ended