16:19:14 <RRSAgent> RRSAgent has joined #diw2012
16:19:14 <RRSAgent> logging to http://www.w3.org/2012/02/02-diw2012-irc
16:21:11 <Nsfuser> Nsfuser has joined #diw2012
16:22:01 <LarryHunter> All of the first theme topics (except robotics, I guess) require formal knowledge representations (visualization may not require it, but certainly benefits).  I think that would be a good way to scope this sprawling topic
16:22:23 <PaoloCiccarese> +1
16:23:15 <Helena> Haym: no need ot be ocnvinced about big data
16:23:31 <Helena> Haym: given a world where we can get knowledge from data, what will be the challenges?
16:23:52 <LarryHunter> I prefer "formally represented knowledge" to "models", since not all models (e.g. ODEs) are what I think we mean
16:26:11 <evelyne> evelyne has joined #diw2012
16:26:15 <PaoloCiccarese> "formally represented knowledge" survives tools and allows better integration
16:26:28 <LarryHunter> It also covers all of the second theme topics
16:26:53 <LarryHunter> +1 to Paolo
16:32:40 <Helena> LarryHunter: improving the discovery process involves understanding what it is
16:33:03 <andrey_rzhetsky> ++1
16:33:46 <HuanLiu> +1
16:34:07 <Helena> steve: efficiency vs effectiveness
16:34:26 <LarryHunter> Characterizing the work of scientists in a way that identifies what kinds of problems there are that computation might help solve
16:34:47 <Helena> +1 to LarryHunter 
16:35:00 <LarryHunter> This is not what CS money usually gets spent on, and more should go to that kind of effort
16:36:20 <LarryHunter> What links all of these "state of the art" things you list is definitely formal representations of scientific knowledge
16:38:13 <LarryHunter> Especially for your vis examples, Munzner's nested model of design paper is spot on. http://www.cs.ubc.ca/labs/imager/tr/2009/NestedModel/NestedModel.pdf
16:38:53 <Haym_> Haym_ has joined #diw2012
16:39:45 <Karsten> +1 to LarryHunter
16:40:17 <Helena> "all models are wrong, some are useful" - should we be focusing on building models or building the tools that will build those models from the data? 
16:41:16 <Alexander> Alexander has joined #diw2012
16:42:18 <LarryHunter> Building models and creating tools that help people build models (from data or otherwise) are both valuable
16:42:21 <Karsten> I think both fall within the scope
16:43:07 <DavidJensen> Much of theory development is actually a search in representation space, not a search for a good model within a given representation.
16:44:27 <LarryHunter> I object to characterizing Galaxy as a workflow system.  The workflow aspect is not what drove adoption, it's a side benefit.  I think the same will happen when VisTrails gets integrated into Cytoscape
16:45:19 <Helena> knowledge representation models themselves should allow for self-correction based on new facts, like science itself
16:45:36 <DavidJensen> We should consider how well workflows "work" within current non-scientific fields.  If current workflow systems don't work well in some non-scientific fields, we should learn from that.
16:46:44 <PaoloCiccarese> Text extraction can be made more usable by annotation/visualization tools thorugh agreed models for sharing the results 
16:46:44 <LarryHunter> Yes, knowledge bases should be dynamic and updated.  One advantage of formal knowledge representation is that it has the potential not only to capture the most recent publications, but to link new results to existing knowledge
16:47:27 <Helena> i would argue that a paper does not contain "knowledge", it contains interpretation of results
16:47:29 <DavidJensen> Some current uses of IT within science (not to mention the current format of the scientific paper) is about presentation, not discovery.
16:47:39 <susan> susan has joined #diw2012
16:47:52 <LizBradley> LizBradley has joined #diw2012
16:48:30 <LarryHunter> +1 for Mariah's comment
16:48:40 <CeciliaAragon_> +1 for Mariah as well
16:48:41 <DavidJensen> ...or, I should say, not about representing the actual discovery workflow.
16:48:52 <HuanLiu> Knowledge representation changes around the wide use of Internet, Web, social media.
16:49:41 <HuanLiu> from a book/article form to a wiki one.
16:51:13 <Helena> a word of caution: if we are collecting our "facts" or knowledge representation from papers, we should be able to clearly distinguish good science from bad science (e.g. http://www.economist.com/node/21528593 about a recent scandal on bad science in a good institution)
16:51:41 <Helena> PaoloCiccarese: knowledge about applications bv domain models
16:53:36 <LarryHunter> Celia's point about automated reasoning on knowledge representations is of course fundamental.  KR isn't useful except as it supports reasoning (or perhaps communication/visualizations)
16:53:55 <Helena> why would knowledge representations help scientists?
16:54:13 <Helena> Hyam: reuse, revalidate, etc - they are a means to an end
16:55:08 <Helena> David: are workflows useful, under what circunstances?
16:56:27 <Helena> Phil: pharma companies already run a lot of procedural workflows that do not exist as well formalized in academia
16:56:35 <Helena> Phil: e.g. pipeline pilot
16:56:55 <Helena> PhilBourne: these breakdown in the "experimention mode", where changes happen all the time
16:57:01 <LarryHunter> "knowledge" as in knowledge representation isn't necessarily what philosophers would call it (justified true belief).  
16:57:08 <Helena> PhilBourne: it will be interesting to encopass that problem
16:58:13 <Helena> @LarryHunter: do you consider modeling uncertainty relevant (e.g. 60% of pancreatic cancer patients respond to drug Y)?
16:58:34 <LarryHunter> Modeling uncertainty certainly could be important
16:59:31 <Helena> Pat: models also lead to data, not just data lead to models
17:00:05 <Vasant> Vasant has joined #diw2012
17:00:13 <LarryHunter> In my domain (interpreting genome scale results), most of the work is taking a large number of existing models (of what genes do) and integrating (certain aspects of that).
17:00:31 <LarryHunter> together into an explanation of the experimental results
17:00:42 <LizBradley> Incorporating a priori knowledge is critical - Carla
17:00:53 <LarryHunter> +1 to carla -- 
17:01:47 <LarryHunter> Scientists also could use help in evaluating models
17:03:19 <Helena> there's thousands of "beautiful" models but completely untestable because the technology to measure certain parameters under certain discussions simply does not exist
17:03:30 <Helena> should we not focus on building testable models?
17:03:31 <LizBradley> Support generation of models in formalisms that scientists already use - Pat
17:04:32 <Vasant> On theme 1, it would be good to see some discussion of processes and tools for generating and prioritizing questions, designing and prioritizing experiments.
17:04:54 <LarryHunter> +1 to Vasant
17:05:09 <DavidJensen> Big problem right now is that we don't have sufficiently expressive representations for data and models.  Scientists know huge amounts of things that affect their data analysis and reasoning, but that can't current be represented formally.
17:05:25 <Vasant> On theme 2, it would be good to see some discussion of linking model based reasoning, experiment design, and focused data acquisition
17:05:34 <LarryHunter> Philosophers of science also have interesting things to say about discovery and insight, especially say Lindly Darden in biology
17:06:23 <Vasant> +1
17:07:24 <Helena> Raul: can patterns be classified as models? 
17:07:47 <LarryHunter> Patterns are certainly subject to formal knowledge representations
17:08:20 <andrey_rzhetsky> I think, patterns belong to models (and "laws" too)
17:08:24 <LizBradley> Visualizations that combine models, data, AND UNCERTAINTY - Janet
17:08:35 <Vasant> Model means different things in different communities - may be articulating what this group means by models would be good.
17:09:03 <LarryHunter> +1 to Vasant, another reason to eschew "model" and use "formal knowledge representation" instead
17:10:25 <LizBradley> My previous post was quoting Maria, not Janet
17:10:33 <Helena> Hod: models are means to optimize things
17:10:36 <andrey_rzhetsky> Larry, if your "formal knowledge representation" would include in its definition quantitative models, I would agree
17:10:39 <Vasant> What about utility of models, experiments, hypotheses?
17:11:07 <mrigas> mrigas has joined #diw2012
17:11:23 <Helena> this discussion is begging the question... what is Knowledge? (and how do we represent it)
17:13:33 <DavidJensen> Pat's point: We need to bring together simulation and machine learning
17:14:43 <DavidJensen> I concur entirely — Current ML systems don't learn models sufficient to drive simulation.  One gap is the the vast majority of simulations require specification of causal dependencies, and the vast majority of ML systems learn associational models.
17:14:50 <LarryHunter> In biology, we mostly take the assertions in the literature (including textbooks and journal articles, but also now databases of annotations) as "knowledge".  Representations of that knowledge are being built based on community-curated ontologies, like the Gene Ontology
17:15:58 <LarryHunter> For that reasons, biological knowledge isn't always true :-).  We are always looking for changes
17:16:04 <Vasant> addition to pat and DavidJensen’s comment regarding machine learning and simulation - I would add experimentation
17:16:10 <PaoloCiccarese> +1 to Larry databases of annotations are starting to grow and they are cross applications and domains
17:16:35 <PaoloCiccarese> annotation allows to flat down the variety of formats to the same level
17:17:34 <DavidJensen> "Explaining" an observation with current theory can require a huge amount of background knowledge about the setting in which the data were collected, the sources of uncertainty, etc. 
17:17:56 <Karsten> +1 DavidJensen
17:17:57 <Vasant> Two points about scientific knowledge - not necessarily true and subject to change; also different competing perspectives from different investigators / communities
17:18:58 <LarryHunter> Andrey, "formal knowledge representation" could include quantitative models, especially if they are couched in terms that can be linked to community curated ontologies
17:19:19 <andrey_rzhetsky> Larry, then I am with you :)
17:19:19 <DavidJensen> +1 to Vasant, we need to have facilities to allow multiple competing hypotheses and theories.
17:20:10 <Vasant> +1 to DavidJensen on the need for causal models - for both driving simulations and designing experiments, testing hypotheses
17:20:22 <Helena> @Larry can  you give an example of a  databases of annotations? or gene ontology one such database?
17:20:23 <CeciliaAragon_> I agree with what Alex Szalay is saying: large scale simulations from physical principles are increasingly accepted, but fewer people know how to program these high performance simulations - this gap could lead to isolation between HPC and other scientists -  we need to focus on usability and accessibility of simulations, models, visualizations
17:20:24 <Karsten> And along with that we need (formal) mechanisms for comparing, testing, evaluating those hypotheses
17:20:44 <Vasant> +1 to Karsten
17:20:59 <Karsten> @Cecilia: +1 and climate science is another great example of this.
17:21:03 <PaoloCiccarese> Karsten we go back to the way the hypotheses are represented and shared
17:21:25 <LizBradley> Argumentation frameworks are perfect for that - not only because they naturally handle that, but also because they reduce what Cecilia called the "impedance match."  Scientists communicate largely by arguing, and showing them your results in the form of an argument helps them.
17:21:41 <LarryHunter> Gene Ontology annotations, stored with the various model organism databases and EBI for the human (e.g. http://www.ebi.ac.uk/GOA/ )
17:22:34 <andrey_rzhetsky> to DavidJensen and Vasant: representing competing models and theories is hard/not addressed with current knowledge representation tools, I think
17:23:09 <Helena> @andrey_rzhetsky: good point - knowledge representations can be incompatible
17:23:30 <Vasant> +1 to andrey - that is why we need research on this topic!
17:23:34 <CeciliaAragon_> Thanks Liz: it is an underexplored area, understanding how to lower the impedance mismatch between human cognition and computation.
17:24:16 <nigam_> nigam_ has joined #diw2012
17:25:02 <LizBradley> Andrey, argumentation frameworks really are a good way to handle competing models & theories.
17:25:14 <Vasant> Checking compatibility or coherence of models of one phenomenon to models of models of related phenomena would be important
17:26:21 <LarryHunter> Formal knowledge representations allows us to write code that help support the  evaluation / comparison of alternative models
17:26:29 <DavidJensen> andrey_rzhetsky +1
17:26:30 <Helena> if a knowledge representation is a collection of relationships (facts..? annotations?), then these facts can be true, false, or uncertain (with a degree of trust)
17:27:43 <Helena> to decide the certainty of one of those relationships, ML tools should be able to trace back to the source of that relationship
17:27:59 <nigam_> IIS supported some work on evaluation / comparison of alternative models using Sem Web technology
17:28:00 <DavidJensen> That said, ML algorithms regularly compare huge numbers of models, but only in (often) a very constrained space.  The real disagreements is often about which space to search in.
17:28:03 <Vasant> Provenance of “discoveries” with respect to data used, ontologies used, analysis tools used is important if we want to automate aspects of discovery - we cannot recover from erroneous conclusions when data need to be corrected
17:28:53 <Helena> rssagent, create minutes
17:29:07 <nigam_> Michel Dumontier is working on storing the provenance of "discoveries" in the same framework as the data, ontologies and the reasoning rule.
17:30:11 <Helena> RRSAgent, create minutes
17:30:11 <RRSAgent> I have made the request to generate http://www.w3.org/2012/02/02-diw2012-minutes.html Helena
17:56:43 <andrey_rzhetsky> Helena, your link leads to a page that indicates that I have insufficient priviliges to view the minutes
17:57:04 <Helena> RSSAgent, make logs world-visible
17:58:02 <LarryHunter> Not world visible yet
17:58:32 <Helena> RSSAgent, make minutes world-visible
17:58:40 <Helena> rssagent is misbehaving :)
17:59:15 <Helena> RSSAgent, publish minutes
17:59:21 <Helena> RSSAgent, make logs world-visible
18:00:16 <Helena> RSSAgent, please make logs world-visible
18:00:58 <PaoloCiccarese> rrsagent, [please] make [these] logs world-visible
18:00:58 <RRSAgent> I'm logging. I don't understand '[please] make [these] logs world-visible', PaoloCiccarese.  Try /msg RRSAgent help
18:00:58 <Helena> RSSAgent, please behave!
18:01:32 <PaoloCiccarese> rrsagent, make logs world-visible
18:02:07 <LarryHunter> worked
18:08:38 <Susan> Susan has joined #diw2012
18:10:48 <nigam> nigam has joined #diw2012
18:11:24 <DavidJensen> DavidJensen has joined #diw2012
18:11:39 <lt> lt has joined #diw2012
18:12:32 <DavidJensen> I'm here
18:12:36 <lt> lt has joined #diw2012
18:12:40 <nigam> here
18:12:43 <sbs11ny> Here
18:13:09 <LizBradley> LizBradley has joined #diw2012
18:14:17 <Karsten> Karsten has joined #diw2012
18:14:20 <Haym> Haym has joined #diw2012
18:15:50 <Haym> just testing that I'm finally online
18:18:01 <LarryHunter> I am against social curation of molecular biology data.  Not even the authors of the articles are good enough at it (see FEBS Letters experiment on annotating protein-protein interactions).  Curation / Annotation is a skilled task.
18:19:50 <HelenaDeus> HelenaDeus has joined #diw2012
18:21:40 <DavidJensen> Being able to use the same workflows, data, and software is important.  However, this is duplication, not replication.  Both are important, but we don't want to computationally enable large numbers of common-cause failures in science.
18:22:25 <HelenaDeus> forensic bioinformatics is the science of uncovering scientific fraud
18:22:59 <HelenaDeus> but this happens post-publication, not during peer-review
18:24:00 <Karsten> it is a significant effort and there is no incentive during peer review
18:25:10 <Karsten> this may go back to credit -- if you uncover fraud pre-publication you don't get credit, if you do it post-publication you do
18:25:20 <HelenaDeus> peer-reviewing is voluntary, no-one can force reviewers to take the appropriate steps to replicate the results :(
18:25:23 <Haym> topic2 Renoir topic 3 DaVinci
18:27:30 <Karsten> Karsten has left #diw2012
18:37:01 <AlexanderSchliep> AlexanderSchliep has joined #diw2012
18:41:07 <HuanLiu> HuanLiu has joined #diw2012
18:42:01 <LarryHunter> topic: improving the experimentation and discovery (scientific) process
18:44:07 <LarryHunter> We need to characterize the scientific discovery process (in detail). 
18:44:17 <LarryHunter> Computer science can help in the planning of experiments
18:50:36 <Zakim> Zakim has left #diw2012
19:31:19 <HuanLiu> HuanLiu has joined #diw2012
19:31:48 <HuanLiu> Perils of `Bite Size' Science http://www.nytimes.com/2012/01/29/opinion/sunday/the-perils-of-bite-size-science.html?_r=1&src=tp&smid=fb-share
20:06:59 <Vasant> Vasant has joined #diw2012
20:08:07 <Vasant> How is the breakout session coming along?
20:13:13 <LarryHunter> Not bad.  Useful results, but not getting as far as I might have hoped
20:24:40 <Yolanda> Yolanda has joined #diw2012
20:41:35 <HelenaDeus> HelenaDeus has joined #diw2012
20:46:29 <Vasant> Vasant has joined #diw2012
20:48:03 <LarryHunter> Computational support
20:48:22 <SBS11NY> SBS11NY has joined #diw2012
20:49:49 <Vasant> It is important to articulate the research opportunities in Discovery Informatics as distinct from infrastructure needs
20:50:02 <Yolanda> Global needs slide: they discussed the formulation of assumptions
20:50:10 <LarryHunter> Could have told us that in advance.
20:50:23 <Yolanda> That is the topic of the breakout tomorrow :)
20:50:26 <mrigas> mrigas has joined #diw2012
20:51:21 <PhilBourne> PhilBourne has joined #diw2012
20:51:21 <LarryHunter> Research opportunities can be formulated from the "process" slide.  We tried to pick (and describe) steps that could be supported computationally.  
20:51:36 <Vasant> OK
20:51:56 <Yolanda> Collaboration slide: collaboration is very importnat, but the group did not get time to discuss it 
20:52:21 <LarryHunter> It was group 3's task, right?
20:52:35 <Yolanda> yep
20:53:23 <Yolanda> Actions slide: important to attach provenance and trust: how the result came about (provenance) and 
20:53:35 <Yolanda> what do other scientists think about the result (trust)
20:54:52 <Yolanda> Actions slide: making the body of knowledge in an area broadly available to scientists
20:55:21 <Yolanda> Pat Langley points out the report is focused more on experimental sciences rather than observational sciences
20:55:56 <Yolanda> Pat: a lot of issues carry over, but there are additional aspects
20:56:29 <jls> jls has joined #diw2012
20:57:05 <jls> observational science should be considered part of the research enterprise for these purposes
20:57:57 <jls> this process refers primarily to current, active computational action that could further research today
20:59:01 <jls> listing needs should be coupled with possible solutions for research directions
20:59:23 <Karsten> Karsten has joined #diw2012
20:59:34 <jls> NSF wants to know both short- and long-term research needs and directions
21:00:21 <jls> what informatics research challenges need to be addressed before we can move forward with the science?
21:00:55 <jls> who needs to be brought together to accomplish leaps forward in research? Clearly more collaboration is needed
21:00:59 <Yolanda> Maria: who needs to be involved in addressing challenges
21:01:29 <HelenaDeus> PhilBourne: take a step back - decide what has succeeded, what has failed and why
21:01:46 <jls> who needs to be brought together to accomplish leaps forward in research? Clearly more collaboration is needed
21:01:52 <Yolanda> Phil: recent NAS meeting this week about what can we learn about what has worked and what hasn't in terms of scientific data
21:02:17 <jls> what are the tools that we are using currently for their own endeavors and what is missing?
21:04:19 <jls> this group attempted to tackle defining what are computational research methods and processes
21:05:05 <jls> there are still missing computational tools needed, mentioned current opportunities but the process is generic
21:05:36 <Yolanda> Theme 1 perhaps would be better named as "computational support for the scientific process"
21:06:45 <LarryHunter> +1
21:07:13 <CeciliaAragon> CeciliaAragon has joined #diw2012
21:08:11 <Yolanda> Theme 2 perhaps could be named "computational means to integrate models and data"
21:10:15 <Yolanda> On slide "State of the Art": often the formulation of new representations happen in our heads, then the data is shaped to fit that represetnation
21:11:05 <Yolanda> some models may be powerful but end up having little uptake in science
21:13:38 <Yolanda> TurboTax model: tool tells you "typically most users don't do things that way", etc
21:17:14 <jls> -	we see examples of a type of science where people find entirely novel, unexpected uses for data and models (i.e. the Framingham data used to show obesity runs in social circles, you tube videos used to study animal music understanding, using the “where’s George” to track flight patterns, using baseball data to study mortality and handedness)
21:17:42 <Yolanda> Haym: the existence of some datasets enables interesting research (eg the Framingham data  and subsequent obesity findings)
21:18:22 <Yolanda> Haym: serendipitous or opportunistic uses of datasets
21:18:33 <jls> here we focus more on dataset selection and availability, need to publicize or find ways to connect the data with these unanticipated uses—metadata may ameliorate some of the problems that keep data from finding its full use
21:19:20 <jls> -	Metadata expansion may allow data to be utilized more fully
21:19:39 <Yolanda> Steve: good metadata is very important if you want to foster unintended uses of data
21:20:00 <jls> -	Concern that young researchers are collecting data and then reverse engineering the result sought
21:21:13 <jls> -	Secondary use of data from health HER/informatics limited by concerns about data ownership, data availability
21:22:13 <jls> -	In IT, a problem of protection: for each solution discovered by computer scientist, there is someone working-around it
21:23:30 <Yolanda> Possible title for Theme 2: computational frameworks to integrate models and data??
21:23:47 <Yolanda> Group agrees on (for now) Theme 2 title: "Data and Models" (for now)
21:26:47 <Yolanda> Social computing challenges (I) slide: these new social techniques may have a risk, which may result on only the more senior/established people to use them
21:27:45 <LizBradley> LizBradley has joined #diw2012
21:31:46 <Karsten> Karsten has joined #diw2012
21:34:20 <Yolanda> Social computing challenges (III) slide: enabling micro-contributions -- people are willing to do them if they are given the right small task
21:36:51 <Yolanda> Haym on the Google Flu Trends bullet: some people are contributing without knowing that they are
21:37:04 <jls> -	Micro-work can create large gains through amassing tiny tasks
21:38:12 <Yolanda> Rephrase that bullet: it is not about data about social activity
21:38:40 <jls> -	Important to include cases in which people do not intend to contribute to science
21:39:02 <jls> social activity creates data which has value
21:40:05 <jls> -	analysis of what scientists do (“droppings”) also has value and can teach about sciences, collaboration etc
21:41:01 <jls> -	unlike a lot of areas where we have the ability to do computational work from data there are large gaps that we may be missing and not anticipating
21:42:19 <jls> but framework provided should be flexible
21:42:32 <jls> spend more money on studying science!!!
21:43:04 <HelenaDeus> steve: figure out where the science happens
21:43:30 <jls> -	Where does science happen if it starts to become part of day to day life even while it continues to exist in the scientific sphere?
21:43:49 <LarryHunter> Analysis of what social / computing artifacts scientists leave (citations, collaborations, databases they access) are a good opportunity for studying what scientists do (that computation could support)
21:44:25 <jls> -	What is it about scientific social computing science that is between science and social computing in general?
21:44:54 <Yolanda> social scientists need to be part of the groups/teams that undertake projects in this area, not just CS researchers and domain scientists -- this is a point made by many people today!
21:45:34 <LarryHunter> User-centered designers are not necessarily social scientists, but they can play a similar role
21:45:39 <SBS11NY> Most of us have learned scientific practice by doing what others do in our particular communities, scientific practices are more local than one might imagine.
21:46:18 <jls> -	There is an interesting area including human computation – perhaps this is a way of capturing human insight
21:47:43 <HelenaDeus> PhilBourne: can't separate research and education - social compiuting will change both the way we do research and the way we educate
21:48:25 <jls> -	How will these technologies change the way we educate and perform research? It seems science is still reacting to technology than exploiting it
21:51:46 <Karsten> Karsten has left #diw2012
21:51:46 <DavidJensen> DavidJensen has joined #diw2012
21:53:10 <Karsten> Karsten has joined #diw2012
21:53:15 <Karsten> Karsten has left #diw2012
22:35:06 <HuanLiu> HuanLiu has joined #diw2012