HCLSIG/SWANSIOC/Meetings/2010-09-20 Conference Call
Dial-in & IRC Information
- Dial-In #: +1.617.761.6200 (Cambridge, MA)
- Dial-In #: +33.4.89.06.34.99 (Nice, France)
- Dial-In #: +44.117.370.6152 (Bristol, UK)
- Participant Access Code: 42572 ("HCLS2")
- IRC Channel: irc.w3.org port 6665 channel #HCLS2 use IRC direct link or (see W3C IRC page for details, or see Web IRC)
Agenda
- Housekeeping
- Next Steps
Minutes
- <Anita> Anita scribes, assisted by Scott and Tim
- <Tim> On the call: Gully Burns, Howard Burrows, Paolo Ciccarese, Tim Clark, Sudeshna Das, Anita deWaard, Rob Frost, Joanne Luciano, Scott Marshall, David Newman, Jodi Schneider, Tony Scerri,
- <Anita> Slideshare to slides - everyone's looking at it now...
- <Anita> Topic: aligning text mining + Semantic web communities
- <Anita> Text mining goal: algorithmically derive knowledge from documents, needs universal persistance that is consistent with web architecture
- <Anita> SW goal: represents high-quality information so you can make it computable directly, need bridges to web of documents where sceince is greated...
- <Anita> Rob: Add a check to text clustering as well - goal is building ontology then if you can initially run clustering to generate candidate concepts - also needed
- <Anita> Gully: we are developing techniques to make curation pipelines for biocurators, modeling thier workflow + finding out what they need, model in UML, then going to text miners and telling them what they need when
- <Anita> Gully will send email offline to fill in missing elements in slide
- <Anita> Tim: post to wiki; Gully: can't will send around => Tim Posts
- <Anita> Tim: all checks ok?
- <Anita> Scott: there is a page that has related projects that Anita sent - share?
- <Anita> Anita: sure
- <Anita> Scotts scribes as Anita puts the page on the wiki
- <mscottm> Tim: Text mining can assist in creating ontologies
- <mscottm> ..and be used in mapping statements to documents on the web.
- <mscottm> X1: wonders whether using ontologies to represent text mining is viable.
- <mscottm> ..unless it's lightweight, it wouldn't be practical.
- <mscottm> Rob: You can use statistical associations and classifiers.
- <Tim> scott: it's a lightweight process - you can use SWAN/SIOC
- <Tim> Sudeshna: AO covers some of that - captures provenance of where assertions and mappings came from
- <Tim> Paolo: depends on what you are targeting - some entities can be detected but discourse is a bit more abstract - texmining can help for genes, proteins, etc if supervised
- <jluciano> Looking for feedback here - would it be useful to say that we are looking to bridge the gap between scientific thinking (science) and the language that's use to record it (text mining/NLP)?
- <mscottm> I would agree Joanne - that's one way of putting it.
- <Anita> Ok - I think this is the page Scott meant? http://groups.google.com/group/forcnet/web/list-of-projects-that-aim-to-define-new-formats-research-objects?hl=en = I made all the FoRC docs publicly available now
- <jodi> Joanne: I would say that it's more about bridging the gap between scientific writing (papers, etc) and the meaning behind it (textmining/NLP to identify appropriate ontologies, relationships)
- Anita slaps mscottm around a bit with a large fishbot
- mscottm says fishbot, Huh?
- <Anita> => * <mscottm> Scott - you want me to scribe again? (you're doing a great job and my RSI is acting up so very happy if you want to scribe :-))
- <mscottm> Paolo: Once you've created it (for instance from software), it's possible that human intervention created the results. Creation tokens can be used easily by humans, should be possible by machines.
- <jodi> Anita: thanks, that's a useful list!
- mscottm asks Anita to join in as much as the RSI allows.
- mscottm will keep trying to keep up!
- <Anita> Sorry - where is the instructions to find how to interact with this chat software? Or even the instructions to find the instructions?
- <mscottm> http://www.w3.org/2001/12/zakim-irc-bot
- <Anita> Gully If we have formal structure to formalize the process of annotating the document for text mining and then allowing the text mining to analyze its representations - then annotators look at marked up document - then in a formal computational way going forward o is what's required.
- <Anita> Scott: if PPI is made by text mining tool (e.g. in Taverna) then disclose fact it was made by software - as a type of assertion (so type of modality, AdW)
- <paoloC_> http://code.google.com/p/annotation-ontology/wiki/Curation
- <Anita> Scott: it helps to have clearly identified entities for text mining - Rob(?) - yes, differentiate from Gold Standard!
- <mscottm> Scott: There are two sides: 1) using text mined assertions in linked data information retrieval and 2) using the well-defined ontological markup to properly train statistical models
- <mscottm> Rob: you would want to build up toward a gold standard.
- <mscottm> Gully: Text miners are most interested in the text mining algorithm itself and not so interested in infrastructure.
- <Anita> Gully: most of the people who do this, 1) solve challenge-by-challenge basis, research oriented - rather than building sharable infrastructure, resistant to the idea of incorporatingsystems that force conventions, such as ontologies -
- <Anita> (Gully) If annotation technology was easy to use, good standards then much more inclined to do that
- <Anita> At Biocreative III meeting in Bethesda last week discussion about this - thinking about how to develop standards on this, reluctance to adhere to Semantic Web approaches
- <Anita> (More Gully) Need to understand needs of text mining/computational linguistic community
- <Anita> (Tim) Gully's point is like when you get biomedical researchers to use ontologies: it has to reduce their effort, if you want it to work
- <Anita> Gully knows a number of people, he is interested in talking to them (I think you were going to say Gully?)
- <Anita> TIm: on last week's call we made AO & related Textmining interactions a 'first-class subtask'
- <Anita> to work on integration between text mining and semantic web
- <Anita> tim: potential alignments is Annotation Ontologies; Anita's subtask, Biomedical Article of the Future (Anita - I'd rather not call it that - name is already taken by Cell!) - generating documents - any others?
- <Tim> Okay Anita - please rename it!
- <mscottm> http://www.calbc.eu/
- <Anita> Thanks Sudeshna
- <sudeshna> Integration of claims with experimental data and computation - need to take into account that claim could be generated from text mining
- <Anita> Scott: Gave a talk at EBI; Silver standard corpus now in XML can be translated into RDF by taking named entities + using fully qualified URIs instead of strings;
- <Anita> Scott: met Adrien Coulet at NCBO and mined PubMed using PharmGKB lexicon to get gene/disease/drug relations
- <Anita> Scott still - use Scientific Discourse ontologies and then microArray datasets, etc and linked data queries to pull out info from PubMed based on PharmGKB Lexicon with hypothetical relations
- <Anita> In summary: a catalogue of assertion types (automated/ lightweight/silver/gold) is needed!
- <mscottm> Rob: Generating a gold standard by group contributions. Recently read about similar from I2D2
- <Anita> Tim: good idea, Harvard + Elsevier are working to define a corpus where document set is very focused and translational in reach, e.g. on curing a disease
- <mscottm> Tim: I think it would be beneficial to use a focused document set.
- <Anita> Anita: Happy to make that available to this community!
- <Anita> Working on Integration of UIMa with Annotation Framework
- <Anita> Gully: we (ISI/USC) are working with U Colorado in SciKnowMine; also working with Tsujii in Tokio, if there are ways we can create simplified software-based approach to share annotations that would make everything go faster
- <Anita> Scott: Dietrich Rebholz-Schuhman is part of Pistoia, Tim and Anita will recognize it's similar to CODE_Neuro project we applied for - put together a bunch of vocabularies to access different resources in a semweb way;
- <Anita> Scott: Ian Harrow of Pfizer, ?? and Dietrich were co-authors: ways to share text-mined information: federated linked data! in a markup query enter gene, disease, tissue, species; get set up results: gene coorcurrence link to paper
- <Anita> Scott: Pistoia, is what?
- <mscottm> http://www.pistoiaalliance.org/
- <mscottm> includes many pharma companies for pooling IT and knowledge resources
- <Anita> Tim: 11:01, end of slot! We established it is beneficial to align these communities; move to 2 weeks hence! October 4th call for further alignment.
- <jodi> http://esw.w3.org/HCLSIG/SWANSIOC
- <Zakim> SW_HCLS(Disc)10:00AM has ended