16:19:14 RRSAgent has joined #diw2012 16:19:14 logging to http://www.w3.org/2012/02/02-diw2012-irc 16:21:11 Nsfuser has joined #diw2012 16:22:01 All of the first theme topics (except robotics, I guess) require formal knowledge representations (visualization may not require it, but certainly benefits). I think that would be a good way to scope this sprawling topic 16:22:23 +1 16:23:15 Haym: no need ot be ocnvinced about big data 16:23:31 Haym: given a world where we can get knowledge from data, what will be the challenges? 16:23:52 I prefer "formally represented knowledge" to "models", since not all models (e.g. ODEs) are what I think we mean 16:26:11 evelyne has joined #diw2012 16:26:15 "formally represented knowledge" survives tools and allows better integration 16:26:28 It also covers all of the second theme topics 16:26:53 +1 to Paolo 16:32:40 LarryHunter: improving the discovery process involves understanding what it is 16:33:03 ++1 16:33:46 +1 16:34:07 steve: efficiency vs effectiveness 16:34:26 Characterizing the work of scientists in a way that identifies what kinds of problems there are that computation might help solve 16:34:47 +1 to LarryHunter 16:35:00 This is not what CS money usually gets spent on, and more should go to that kind of effort 16:36:20 What links all of these "state of the art" things you list is definitely formal representations of scientific knowledge 16:38:13 Especially for your vis examples, Munzner's nested model of design paper is spot on. http://www.cs.ubc.ca/labs/imager/tr/2009/NestedModel/NestedModel.pdf 16:38:53 Haym_ has joined #diw2012 16:39:45 +1 to LarryHunter 16:40:17 "all models are wrong, some are useful" - should we be focusing on building models or building the tools that will build those models from the data? 16:41:16 Alexander has joined #diw2012 16:42:18 Building models and creating tools that help people build models (from data or otherwise) are both valuable 16:42:21 I think both fall within the scope 16:43:07 Much of theory development is actually a search in representation space, not a search for a good model within a given representation. 16:44:27 I object to characterizing Galaxy as a workflow system. The workflow aspect is not what drove adoption, it's a side benefit. I think the same will happen when VisTrails gets integrated into Cytoscape 16:45:19 knowledge representation models themselves should allow for self-correction based on new facts, like science itself 16:45:36 We should consider how well workflows "work" within current non-scientific fields. If current workflow systems don't work well in some non-scientific fields, we should learn from that. 16:46:44 Text extraction can be made more usable by annotation/visualization tools thorugh agreed models for sharing the results 16:46:44 Yes, knowledge bases should be dynamic and updated. One advantage of formal knowledge representation is that it has the potential not only to capture the most recent publications, but to link new results to existing knowledge 16:47:27 i would argue that a paper does not contain "knowledge", it contains interpretation of results 16:47:29 Some current uses of IT within science (not to mention the current format of the scientific paper) is about presentation, not discovery. 16:47:39 susan has joined #diw2012 16:47:52 LizBradley has joined #diw2012 16:48:30 +1 for Mariah's comment 16:48:40 +1 for Mariah as well 16:48:41 ...or, I should say, not about representing the actual discovery workflow. 16:48:52 Knowledge representation changes around the wide use of Internet, Web, social media. 16:49:41 from a book/article form to a wiki one. 16:51:13 a word of caution: if we are collecting our "facts" or knowledge representation from papers, we should be able to clearly distinguish good science from bad science (e.g. http://www.economist.com/node/21528593 about a recent scandal on bad science in a good institution) 16:51:41 PaoloCiccarese: knowledge about applications bv domain models 16:53:36 Celia's point about automated reasoning on knowledge representations is of course fundamental. KR isn't useful except as it supports reasoning (or perhaps communication/visualizations) 16:53:55 why would knowledge representations help scientists? 16:54:13 Hyam: reuse, revalidate, etc - they are a means to an end 16:55:08 David: are workflows useful, under what circunstances? 16:56:27 Phil: pharma companies already run a lot of procedural workflows that do not exist as well formalized in academia 16:56:35 Phil: e.g. pipeline pilot 16:56:55 PhilBourne: these breakdown in the "experimention mode", where changes happen all the time 16:57:01 "knowledge" as in knowledge representation isn't necessarily what philosophers would call it (justified true belief). 16:57:08 PhilBourne: it will be interesting to encopass that problem 16:58:13 @LarryHunter: do you consider modeling uncertainty relevant (e.g. 60% of pancreatic cancer patients respond to drug Y)? 16:58:34 Modeling uncertainty certainly could be important 16:59:31 Pat: models also lead to data, not just data lead to models 17:00:05 Vasant has joined #diw2012 17:00:13 In my domain (interpreting genome scale results), most of the work is taking a large number of existing models (of what genes do) and integrating (certain aspects of that). 17:00:31 together into an explanation of the experimental results 17:00:42 Incorporating a priori knowledge is critical - Carla 17:00:53 +1 to carla -- 17:01:47 Scientists also could use help in evaluating models 17:03:19 there's thousands of "beautiful" models but completely untestable because the technology to measure certain parameters under certain discussions simply does not exist 17:03:30 should we not focus on building testable models? 17:03:31 Support generation of models in formalisms that scientists already use - Pat 17:04:32 On theme 1, it would be good to see some discussion of processes and tools for generating and prioritizing questions, designing and prioritizing experiments. 17:04:54 +1 to Vasant 17:05:09 Big problem right now is that we don't have sufficiently expressive representations for data and models. Scientists know huge amounts of things that affect their data analysis and reasoning, but that can't current be represented formally. 17:05:25 On theme 2, it would be good to see some discussion of linking model based reasoning, experiment design, and focused data acquisition 17:05:34 Philosophers of science also have interesting things to say about discovery and insight, especially say Lindly Darden in biology 17:06:23 +1 17:07:24 Raul: can patterns be classified as models? 17:07:47 Patterns are certainly subject to formal knowledge representations 17:08:20 I think, patterns belong to models (and "laws" too) 17:08:24 Visualizations that combine models, data, AND UNCERTAINTY - Janet 17:08:35 Model means different things in different communities - may be articulating what this group means by models would be good. 17:09:03 +1 to Vasant, another reason to eschew "model" and use "formal knowledge representation" instead 17:10:25 My previous post was quoting Maria, not Janet 17:10:33 Hod: models are means to optimize things 17:10:36 Larry, if your "formal knowledge representation" would include in its definition quantitative models, I would agree 17:10:39 What about utility of models, experiments, hypotheses? 17:11:07 mrigas has joined #diw2012 17:11:23 this discussion is begging the question... what is Knowledge? (and how do we represent it) 17:13:33 Pat's point: We need to bring together simulation and machine learning 17:14:43 I concur entirely — Current ML systems don't learn models sufficient to drive simulation. One gap is the the vast majority of simulations require specification of causal dependencies, and the vast majority of ML systems learn associational models. 17:14:50 In biology, we mostly take the assertions in the literature (including textbooks and journal articles, but also now databases of annotations) as "knowledge". Representations of that knowledge are being built based on community-curated ontologies, like the Gene Ontology 17:15:58 For that reasons, biological knowledge isn't always true :-). We are always looking for changes 17:16:04 addition to pat and DavidJensen’s comment regarding machine learning and simulation - I would add experimentation 17:16:10 +1 to Larry databases of annotations are starting to grow and they are cross applications and domains 17:16:35 annotation allows to flat down the variety of formats to the same level 17:17:34 "Explaining" an observation with current theory can require a huge amount of background knowledge about the setting in which the data were collected, the sources of uncertainty, etc. 17:17:56 +1 DavidJensen 17:17:57 Two points about scientific knowledge - not necessarily true and subject to change; also different competing perspectives from different investigators / communities 17:18:58 Andrey, "formal knowledge representation" could include quantitative models, especially if they are couched in terms that can be linked to community curated ontologies 17:19:19 Larry, then I am with you :) 17:19:19 +1 to Vasant, we need to have facilities to allow multiple competing hypotheses and theories. 17:20:10 +1 to DavidJensen on the need for causal models - for both driving simulations and designing experiments, testing hypotheses 17:20:22 @Larry can you give an example of a databases of annotations? or gene ontology one such database? 17:20:23 I agree with what Alex Szalay is saying: large scale simulations from physical principles are increasingly accepted, but fewer people know how to program these high performance simulations - this gap could lead to isolation between HPC and other scientists - we need to focus on usability and accessibility of simulations, models, visualizations 17:20:24 And along with that we need (formal) mechanisms for comparing, testing, evaluating those hypotheses 17:20:44 +1 to Karsten 17:20:59 @Cecilia: +1 and climate science is another great example of this. 17:21:03 Karsten we go back to the way the hypotheses are represented and shared 17:21:25 Argumentation frameworks are perfect for that - not only because they naturally handle that, but also because they reduce what Cecilia called the "impedance match." Scientists communicate largely by arguing, and showing them your results in the form of an argument helps them. 17:21:41 Gene Ontology annotations, stored with the various model organism databases and EBI for the human (e.g. http://www.ebi.ac.uk/GOA/ ) 17:22:34 to DavidJensen and Vasant: representing competing models and theories is hard/not addressed with current knowledge representation tools, I think 17:23:09 @andrey_rzhetsky: good point - knowledge representations can be incompatible 17:23:30 +1 to andrey - that is why we need research on this topic! 17:23:34 Thanks Liz: it is an underexplored area, understanding how to lower the impedance mismatch between human cognition and computation. 17:24:16 nigam_ has joined #diw2012 17:25:02 Andrey, argumentation frameworks really are a good way to handle competing models & theories. 17:25:14 Checking compatibility or coherence of models of one phenomenon to models of models of related phenomena would be important 17:26:21 Formal knowledge representations allows us to write code that help support the evaluation / comparison of alternative models 17:26:29 andrey_rzhetsky +1 17:26:30 if a knowledge representation is a collection of relationships (facts..? annotations?), then these facts can be true, false, or uncertain (with a degree of trust) 17:27:43 to decide the certainty of one of those relationships, ML tools should be able to trace back to the source of that relationship 17:27:59 IIS supported some work on evaluation / comparison of alternative models using Sem Web technology 17:28:00 That said, ML algorithms regularly compare huge numbers of models, but only in (often) a very constrained space. The real disagreements is often about which space to search in. 17:28:03 Provenance of “discoveries” with respect to data used, ontologies used, analysis tools used is important if we want to automate aspects of discovery - we cannot recover from erroneous conclusions when data need to be corrected 17:28:53 rssagent, create minutes 17:29:07 Michel Dumontier is working on storing the provenance of "discoveries" in the same framework as the data, ontologies and the reasoning rule. 17:30:11 RRSAgent, create minutes 17:30:11 I have made the request to generate http://www.w3.org/2012/02/02-diw2012-minutes.html Helena 17:56:43 Helena, your link leads to a page that indicates that I have insufficient priviliges to view the minutes 17:57:04 RSSAgent, make logs world-visible 17:58:02 Not world visible yet 17:58:32 RSSAgent, make minutes world-visible 17:58:40 rssagent is misbehaving :) 17:59:15 RSSAgent, publish minutes 17:59:21 RSSAgent, make logs world-visible 18:00:16 RSSAgent, please make logs world-visible 18:00:58 rrsagent, [please] make [these] logs world-visible 18:00:58 I'm logging. I don't understand '[please] make [these] logs world-visible', PaoloCiccarese. Try /msg RRSAgent help 18:00:58 RSSAgent, please behave! 18:01:32 rrsagent, make logs world-visible 18:02:07 worked 18:08:38 Susan has joined #diw2012 18:10:48 nigam has joined #diw2012 18:11:24 DavidJensen has joined #diw2012 18:11:39 lt has joined #diw2012 18:12:32 I'm here 18:12:36 lt has joined #diw2012 18:12:40 here 18:12:43 Here 18:13:09 LizBradley has joined #diw2012 18:14:17 Karsten has joined #diw2012 18:14:20 Haym has joined #diw2012 18:15:50 just testing that I'm finally online 18:18:01 I am against social curation of molecular biology data. Not even the authors of the articles are good enough at it (see FEBS Letters experiment on annotating protein-protein interactions). Curation / Annotation is a skilled task. 18:19:50 HelenaDeus has joined #diw2012 18:21:40 Being able to use the same workflows, data, and software is important. However, this is duplication, not replication. Both are important, but we don't want to computationally enable large numbers of common-cause failures in science. 18:22:25 forensic bioinformatics is the science of uncovering scientific fraud 18:22:59 but this happens post-publication, not during peer-review 18:24:00 it is a significant effort and there is no incentive during peer review 18:25:10 this may go back to credit -- if you uncover fraud pre-publication you don't get credit, if you do it post-publication you do 18:25:20 peer-reviewing is voluntary, no-one can force reviewers to take the appropriate steps to replicate the results :( 18:25:23 topic2 Renoir topic 3 DaVinci 18:27:30 Karsten has left #diw2012 18:37:01 AlexanderSchliep has joined #diw2012 18:41:07 HuanLiu has joined #diw2012 18:42:01 topic: improving the experimentation and discovery (scientific) process 18:44:07 We need to characterize the scientific discovery process (in detail). 18:44:17 Computer science can help in the planning of experiments 18:50:36 Zakim has left #diw2012 19:31:19 HuanLiu has joined #diw2012 19:31:48 Perils of `Bite Size' Science http://www.nytimes.com/2012/01/29/opinion/sunday/the-perils-of-bite-size-science.html?_r=1&src=tp&smid=fb-share 20:06:59 Vasant has joined #diw2012 20:08:07 How is the breakout session coming along? 20:13:13 Not bad. Useful results, but not getting as far as I might have hoped 20:24:40 Yolanda has joined #diw2012 20:41:35 HelenaDeus has joined #diw2012 20:46:29 Vasant has joined #diw2012 20:48:03 Computational support 20:48:22 SBS11NY has joined #diw2012 20:49:49 It is important to articulate the research opportunities in Discovery Informatics as distinct from infrastructure needs 20:50:02 Global needs slide: they discussed the formulation of assumptions 20:50:10 Could have told us that in advance. 20:50:23 That is the topic of the breakout tomorrow :) 20:50:26 mrigas has joined #diw2012 20:51:21 PhilBourne has joined #diw2012 20:51:21 Research opportunities can be formulated from the "process" slide. We tried to pick (and describe) steps that could be supported computationally. 20:51:36 OK 20:51:56 Collaboration slide: collaboration is very importnat, but the group did not get time to discuss it 20:52:21 It was group 3's task, right? 20:52:35 yep 20:53:23 Actions slide: important to attach provenance and trust: how the result came about (provenance) and 20:53:35 what do other scientists think about the result (trust) 20:54:52 Actions slide: making the body of knowledge in an area broadly available to scientists 20:55:21 Pat Langley points out the report is focused more on experimental sciences rather than observational sciences 20:55:56 Pat: a lot of issues carry over, but there are additional aspects 20:56:29 jls has joined #diw2012 20:57:05 observational science should be considered part of the research enterprise for these purposes 20:57:57 this process refers primarily to current, active computational action that could further research today 20:59:01 listing needs should be coupled with possible solutions for research directions 20:59:23 Karsten has joined #diw2012 20:59:34 NSF wants to know both short- and long-term research needs and directions 21:00:21 what informatics research challenges need to be addressed before we can move forward with the science? 21:00:55 who needs to be brought together to accomplish leaps forward in research? Clearly more collaboration is needed 21:00:59 Maria: who needs to be involved in addressing challenges 21:01:29 PhilBourne: take a step back - decide what has succeeded, what has failed and why 21:01:46 who needs to be brought together to accomplish leaps forward in research? Clearly more collaboration is needed 21:01:52 Phil: recent NAS meeting this week about what can we learn about what has worked and what hasn't in terms of scientific data 21:02:17 what are the tools that we are using currently for their own endeavors and what is missing? 21:04:19 this group attempted to tackle defining what are computational research methods and processes 21:05:05 there are still missing computational tools needed, mentioned current opportunities but the process is generic 21:05:36 Theme 1 perhaps would be better named as "computational support for the scientific process" 21:06:45 +1 21:07:13 CeciliaAragon has joined #diw2012 21:08:11 Theme 2 perhaps could be named "computational means to integrate models and data" 21:10:15 On slide "State of the Art": often the formulation of new representations happen in our heads, then the data is shaped to fit that represetnation 21:11:05 some models may be powerful but end up having little uptake in science 21:13:38 TurboTax model: tool tells you "typically most users don't do things that way", etc 21:17:14 - we see examples of a type of science where people find entirely novel, unexpected uses for data and models (i.e. the Framingham data used to show obesity runs in social circles, you tube videos used to study animal music understanding, using the “where’s George” to track flight patterns, using baseball data to study mortality and handedness) 21:17:42 Haym: the existence of some datasets enables interesting research (eg the Framingham data and subsequent obesity findings) 21:18:22 Haym: serendipitous or opportunistic uses of datasets 21:18:33 here we focus more on dataset selection and availability, need to publicize or find ways to connect the data with these unanticipated uses—metadata may ameliorate some of the problems that keep data from finding its full use 21:19:20 - Metadata expansion may allow data to be utilized more fully 21:19:39 Steve: good metadata is very important if you want to foster unintended uses of data 21:20:00 - Concern that young researchers are collecting data and then reverse engineering the result sought 21:21:13 - Secondary use of data from health HER/informatics limited by concerns about data ownership, data availability 21:22:13 - In IT, a problem of protection: for each solution discovered by computer scientist, there is someone working-around it 21:23:30 Possible title for Theme 2: computational frameworks to integrate models and data?? 21:23:47 Group agrees on (for now) Theme 2 title: "Data and Models" (for now) 21:26:47 Social computing challenges (I) slide: these new social techniques may have a risk, which may result on only the more senior/established people to use them 21:27:45 LizBradley has joined #diw2012 21:31:46 Karsten has joined #diw2012 21:34:20 Social computing challenges (III) slide: enabling micro-contributions -- people are willing to do them if they are given the right small task 21:36:51 Haym on the Google Flu Trends bullet: some people are contributing without knowing that they are 21:37:04 - Micro-work can create large gains through amassing tiny tasks 21:38:12 Rephrase that bullet: it is not about data about social activity 21:38:40 - Important to include cases in which people do not intend to contribute to science 21:39:02 social activity creates data which has value 21:40:05 - analysis of what scientists do (“droppings”) also has value and can teach about sciences, collaboration etc 21:41:01 - unlike a lot of areas where we have the ability to do computational work from data there are large gaps that we may be missing and not anticipating 21:42:19 but framework provided should be flexible 21:42:32 spend more money on studying science!!! 21:43:04 steve: figure out where the science happens 21:43:30 - Where does science happen if it starts to become part of day to day life even while it continues to exist in the scientific sphere? 21:43:49 Analysis of what social / computing artifacts scientists leave (citations, collaborations, databases they access) are a good opportunity for studying what scientists do (that computation could support) 21:44:25 - What is it about scientific social computing science that is between science and social computing in general? 21:44:54 social scientists need to be part of the groups/teams that undertake projects in this area, not just CS researchers and domain scientists -- this is a point made by many people today! 21:45:34 User-centered designers are not necessarily social scientists, but they can play a similar role 21:45:39 Most of us have learned scientific practice by doing what others do in our particular communities, scientific practices are more local than one might imagine. 21:46:18 - There is an interesting area including human computation – perhaps this is a way of capturing human insight 21:47:43 PhilBourne: can't separate research and education - social compiuting will change both the way we do research and the way we educate 21:48:25 - How will these technologies change the way we educate and perform research? It seems science is still reacting to technology than exploiting it 21:51:46 Karsten has left #diw2012 21:51:46 DavidJensen has joined #diw2012 21:53:10 Karsten has joined #diw2012 21:53:15 Karsten has left #diw2012 22:35:06 HuanLiu has joined #diw2012