HCLSIG/SWANSIOC/Meetings/2010-09-20 Conference Call

Dial-in & IRC Information

Dial-In #: +1.617.761.6200 (Cambridge, MA)
Dial-In #: +33.4.89.06.34.99 (Nice, France)
Dial-In #: +44.117.370.6152 (Bristol, UK)
Participant Access Code: 42572 ("HCLS2")
IRC Channel: irc.w3.org port 6665 channel #HCLS2 use IRC direct link or (see W3C IRC page for details, or see Web IRC)

Agenda

Housekeeping

Discussion Slides for Discussion: Aligning Text mining and SemWeb Communities

Next Steps

Minutes

<Anita> Anita scribes, assisted by Scott and Tim
<Tim> On the call: Gully Burns, Howard Burrows, Paolo Ciccarese, Tim Clark, Sudeshna Das, Anita deWaard, Rob Frost, Joanne Luciano, Scott Marshall, David Newman, Jodi Schneider, Tony Scerri,
<Anita> Slideshare to slides - everyone's looking at it now...
<Anita> Topic: aligning text mining + Semantic web communities
<Anita> Text mining goal: algorithmically derive knowledge from documents, needs universal persistance that is consistent with web architecture
<Anita> SW goal: represents high-quality information so you can make it computable directly, need bridges to web of documents where sceince is greated...
<Anita> Rob: Add a check to text clustering as well - goal is building ontology then if you can initially run clustering to generate candidate concepts - also needed
<Anita> Gully: we are developing techniques to make curation pipelines for biocurators, modeling thier workflow + finding out what they need, model in UML, then going to text miners and telling them what they need when
<Anita> Gully will send email offline to fill in missing elements in slide
<Anita> Tim: post to wiki; Gully: can't will send around => Tim Posts
<Anita> Tim: all checks ok?
<Anita> Scott: there is a page that has related projects that Anita sent - share?
<Anita> Anita: sure
<Anita> Scotts scribes as Anita puts the page on the wiki
<mscottm> Tim: Text mining can assist in creating ontologies
<mscottm> ..and be used in mapping statements to documents on the web.
<mscottm> X1: wonders whether using ontologies to represent text mining is viable.
<mscottm> ..unless it's lightweight, it wouldn't be practical.
<mscottm> Rob: You can use statistical associations and classifiers.
<Tim> scott: it's a lightweight process - you can use SWAN/SIOC
<Tim> Sudeshna: AO covers some of that - captures provenance of where assertions and mappings came from
<Tim> Paolo: depends on what you are targeting - some entities can be detected but discourse is a bit more abstract - texmining can help for genes, proteins, etc if supervised
<jluciano> Looking for feedback here - would it be useful to say that we are looking to bridge the gap between scientific thinking (science) and the language that's use to record it (text mining/NLP)?
<mscottm> I would agree Joanne - that's one way of putting it.
<Anita> Ok - I think this is the page Scott meant? http://groups.google.com/group/forcnet/web/list-of-projects-that-aim-to-define-new-formats-research-objects?hl=en = I made all the FoRC docs publicly available now
<jodi> Joanne: I would say that it's more about bridging the gap between scientific writing (papers, etc) and the meaning behind it (textmining/NLP to identify appropriate ontologies, relationships)
Anita slaps mscottm around a bit with a large fishbot
mscottm says fishbot, Huh?
<Anita> => * <mscottm> Scott - you want me to scribe again? (you're doing a great job and my RSI is acting up so very happy if you want to scribe :-))
<mscottm> Paolo: Once you've created it (for instance from software), it's possible that human intervention created the results. Creation tokens can be used easily by humans, should be possible by machines.
<jodi> Anita: thanks, that's a useful list!
mscottm asks Anita to join in as much as the RSI allows.
mscottm will keep trying to keep up!
<Anita> Sorry - where is the instructions to find how to interact with this chat software? Or even the instructions to find the instructions?
<mscottm> http://www.w3.org/2001/12/zakim-irc-bot
<Anita> Gully If we have formal structure to formalize the process of annotating the document for text mining and then allowing the text mining to analyze its representations - then annotators look at marked up document - then in a formal computational way going forward o is what's required.
<Anita> Scott: if PPI is made by text mining tool (e.g. in Taverna) then disclose fact it was made by software - as a type of assertion (so type of modality, AdW)
<paoloC_> http://code.google.com/p/annotation-ontology/wiki/Curation
<Anita> Scott: it helps to have clearly identified entities for text mining - Rob(?) - yes, differentiate from Gold Standard!
<mscottm> Scott: There are two sides: 1) using text mined assertions in linked data information retrieval and 2) using the well-defined ontological markup to properly train statistical models
<mscottm> Rob: you would want to build up toward a gold standard.
<mscottm> Gully: Text miners are most interested in the text mining algorithm itself and not so interested in infrastructure.
<Anita> Gully: most of the people who do this, 1) solve challenge-by-challenge basis, research oriented - rather than building sharable infrastructure, resistant to the idea of incorporatingsystems that force conventions, such as ontologies -
<Anita> (Gully) If annotation technology was easy to use, good standards then much more inclined to do that
<Anita> At Biocreative III meeting in Bethesda last week discussion about this - thinking about how to develop standards on this, reluctance to adhere to Semantic Web approaches
<Anita> (More Gully) Need to understand needs of text mining/computational linguistic community
<Anita> (Tim) Gully's point is like when you get biomedical researchers to use ontologies: it has to reduce their effort, if you want it to work
<Anita> Gully knows a number of people, he is interested in talking to them (I think you were going to say Gully?)
<Anita> TIm: on last week's call we made AO & related Textmining interactions a 'first-class subtask'
<Anita> to work on integration between text mining and semantic web
<Anita> tim: potential alignments is Annotation Ontologies; Anita's subtask, Biomedical Article of the Future (Anita - I'd rather not call it that - name is already taken by Cell!) - generating documents - any others?
<Tim> Okay Anita - please rename it!
<mscottm> http://www.calbc.eu/
<Anita> Thanks Sudeshna
<sudeshna> Integration of claims with experimental data and computation - need to take into account that claim could be generated from text mining
<Anita> Scott: Gave a talk at EBI; Silver standard corpus now in XML can be translated into RDF by taking named entities + using fully qualified URIs instead of strings;
<Anita> Scott: met Adrien Coulet at NCBO and mined PubMed using PharmGKB lexicon to get gene/disease/drug relations
<Anita> Scott still - use Scientific Discourse ontologies and then microArray datasets, etc and linked data queries to pull out info from PubMed based on PharmGKB Lexicon with hypothetical relations
<Anita> In summary: a catalogue of assertion types (automated/ lightweight/silver/gold) is needed!
<mscottm> Rob: Generating a gold standard by group contributions. Recently read about similar from I2D2
<Anita> Tim: good idea, Harvard + Elsevier are working to define a corpus where document set is very focused and translational in reach, e.g. on curing a disease
<mscottm> Tim: I think it would be beneficial to use a focused document set.
<Anita> Anita: Happy to make that available to this community!
<Anita> Working on Integration of UIMa with Annotation Framework
<Anita> Gully: we (ISI/USC) are working with U Colorado in SciKnowMine; also working with Tsujii in Tokio, if there are ways we can create simplified software-based approach to share annotations that would make everything go faster
<Anita> Scott: Dietrich Rebholz-Schuhman is part of Pistoia, Tim and Anita will recognize it's similar to CODE_Neuro project we applied for - put together a bunch of vocabularies to access different resources in a semweb way;
<Anita> Scott: Ian Harrow of Pfizer, ?? and Dietrich were co-authors: ways to share text-mined information: federated linked data! in a markup query enter gene, disease, tissue, species; get set up results: gene coorcurrence link to paper
<Anita> Scott: Pistoia, is what?
<mscottm> http://www.pistoiaalliance.org/
<mscottm> includes many pharma companies for pooling IT and knowledge resources
<Anita> Tim: 11:01, end of slot! We established it is beneficial to align these communities; move to 2 weeks hence! October 4th call for further alignment.
<jodi> http://esw.w3.org/HCLSIG/SWANSIOC
<Zakim> SW_HCLS(Disc)10:00AM has ended