SV_MEETING_TITLE -- 02 Feb 2012

<LarryHunter> All of the first theme topics (except robotics, I guess) require formal knowledge representations (visualization may not require it, but certainly benefits). I think that would be a good way to scope this sprawling topic

<PaoloCiccarese> +1

Haym: no need ot be ocnvinced about big data
... given a world where we can get knowledge from data, what will be the challenges?

<LarryHunter> I prefer "formally represented knowledge" to "models", since not all models (e.g. ODEs) are what I think we mean

<PaoloCiccarese> "formally represented knowledge" survives tools and allows better integration

<LarryHunter> It also covers all of the second theme topics

<LarryHunter> +1 to Paolo

LarryHunter: improving the discovery process involves understanding what it is

<andrey_rzhetsky> ++1

<HuanLiu> +1

steve: efficiency vs effectiveness

<LarryHunter> Characterizing the work of scientists in a way that identifies what kinds of problems there are that computation might help solve

+1 to LarryHunter

<LarryHunter> This is not what CS money usually gets spent on, and more should go to that kind of effort

<LarryHunter> What links all of these "state of the art" things you list is definitely formal representations of scientific knowledge

<LarryHunter> Especially for your vis examples, Munzner's nested model of design paper is spot on. http://www.cs.ubc.ca/labs/imager/tr/2009/NestedModel/NestedModel.pdf

<Karsten> +1 to LarryHunter

"all models are wrong, some are useful" - should we be focusing on building models or building the tools that will build those models from the data?

<LarryHunter> Building models and creating tools that help people build models (from data or otherwise) are both valuable

<Karsten> I think both fall within the scope

<DavidJensen> Much of theory development is actually a search in representation space, not a search for a good model within a given representation.

<LarryHunter> I object to characterizing Galaxy as a workflow system. The workflow aspect is not what drove adoption, it's a side benefit. I think the same will happen when VisTrails gets integrated into Cytoscape

knowledge representation models themselves should allow for self-correction based on new facts, like science itself

<DavidJensen> We should consider how well workflows "work" within current non-scientific fields. If current workflow systems don't work well in some non-scientific fields, we should learn from that.

<PaoloCiccarese> Text extraction can be made more usable by annotation/visualization tools thorugh agreed models for sharing the results

<LarryHunter> Yes, knowledge bases should be dynamic and updated. One advantage of formal knowledge representation is that it has the potential not only to capture the most recent publications, but to link new results to existing knowledge

i would argue that a paper does not contain "knowledge", it contains interpretation of results

<DavidJensen> Some current uses of IT within science (not to mention the current format of the scientific paper) is about presentation, not discovery.

<LarryHunter> +1 for Mariah's comment

<CeciliaAragon_> +1 for Mariah as well

<DavidJensen> ...or, I should say, not about representing the actual discovery workflow.

<HuanLiu> Knowledge representation changes around the wide use of Internet, Web, social media.

<HuanLiu> from a book/article form to a wiki one.

a word of caution: if we are collecting our "facts" or knowledge representation from papers, we should be able to clearly distinguish good science from bad science (e.g. http://www.economist.com/node/21528593 about a recent scandal on bad science in a good institution)

PaoloCiccarese: knowledge about applications bv domain models

<LarryHunter> Celia's point about automated reasoning on knowledge representations is of course fundamental. KR isn't useful except as it supports reasoning (or perhaps communication/visualizations)

why would knowledge representations help scientists?

Hyam: reuse, revalidate, etc - they are a means to an end

David: are workflows useful, under what circunstances?

Phil: pharma companies already run a lot of procedural workflows that do not exist as well formalized in academia
... e.g. pipeline pilot

PhilBourne: these breakdown in the "experimention mode", where changes happen all the time

<LarryHunter> "knowledge" as in knowledge representation isn't necessarily what philosophers would call it (justified true belief).

PhilBourne: it will be interesting to encopass that problem

@LarryHunter: do you consider modeling uncertainty relevant (e.g. 60% of pancreatic cancer patients respond to drug Y)?

<LarryHunter> Modeling uncertainty certainly could be important

Pat: models also lead to data, not just data lead to models

<LarryHunter> In my domain (interpreting genome scale results), most of the work is taking a large number of existing models (of what genes do) and integrating (certain aspects of that).

<LarryHunter> together into an explanation of the experimental results

<LizBradley> Incorporating a priori knowledge is critical - Carla

<LarryHunter> +1 to carla --

<LarryHunter> Scientists also could use help in evaluating models

there's thousands of "beautiful" models but completely untestable because the technology to measure certain parameters under certain discussions simply does not exist

should we not focus on building testable models?

<LizBradley> Support generation of models in formalisms that scientists already use - Pat

<Vasant> On theme 1, it would be good to see some discussion of processes and tools for generating and prioritizing questions, designing and prioritizing experiments.

<LarryHunter> +1 to Vasant

<DavidJensen> Big problem right now is that we don't have sufficiently expressive representations for data and models. Scientists know huge amounts of things that affect their data analysis and reasoning, but that can't current be represented formally.

<Vasant> On theme 2, it would be good to see some discussion of linking model based reasoning, experiment design, and focused data acquisition

<LarryHunter> Philosophers of science also have interesting things to say about discovery and insight, especially say Lindly Darden in biology

<Vasant> +1

Raul: can patterns be classified as models?

<LarryHunter> Patterns are certainly subject to formal knowledge representations

<andrey_rzhetsky> I think, patterns belong to models (and "laws" too)

<LizBradley> Visualizations that combine models, data, AND UNCERTAINTY - Janet

<Vasant> Model means different things in different communities - may be articulating what this group means by models would be good.

<LarryHunter> +1 to Vasant, another reason to eschew "model" and use "formal knowledge representation" instead

<LizBradley> My previous post was quoting Maria, not Janet

Hod: models are means to optimize things

<andrey_rzhetsky> Larry, if your "formal knowledge representation" would include in its definition quantitative models, I would agree

<Vasant> What about utility of models, experiments, hypotheses?

this discussion is begging the question... what is Knowledge? (and how do we represent it)

<DavidJensen> Pat's point: We need to bring together simulation and machine learning

<DavidJensen> I concur entirely — Current ML systems don't learn models sufficient to drive simulation. One gap is the the vast majority of simulations require specification of causal dependencies, and the vast majority of ML systems learn associational models.

<LarryHunter> In biology, we mostly take the assertions in the literature (including textbooks and journal articles, but also now databases of annotations) as "knowledge". Representations of that knowledge are being built based on community-curated ontologies, like the Gene Ontology

<LarryHunter> For that reasons, biological knowledge isn't always true :-). We are always looking for changes

<Vasant> addition to pat and DavidJensen’s comment regarding machine learning and simulation - I would add experimentation

<PaoloCiccarese> +1 to Larry databases of annotations are starting to grow and they are cross applications and domains

<PaoloCiccarese> annotation allows to flat down the variety of formats to the same level

<DavidJensen> "Explaining" an observation with current theory can require a huge amount of background knowledge about the setting in which the data were collected, the sources of uncertainty, etc.

<Karsten> +1 DavidJensen

<Vasant> Two points about scientific knowledge - not necessarily true and subject to change; also different competing perspectives from different investigators / communities

<LarryHunter> Andrey, "formal knowledge representation" could include quantitative models, especially if they are couched in terms that can be linked to community curated ontologies

<andrey_rzhetsky> Larry, then I am with you :)

<DavidJensen> +1 to Vasant, we need to have facilities to allow multiple competing hypotheses and theories.

<Vasant> +1 to DavidJensen on the need for causal models - for both driving simulations and designing experiments, testing hypotheses

@Larry can you give an example of a databases of annotations? or gene ontology one such database?

<CeciliaAragon_> I agree with what Alex Szalay is saying: large scale simulations from physical principles are increasingly accepted, but fewer people know how to program these high performance simulations - this gap could lead to isolation between HPC and other scientists - we need to focus on usability and accessibility of simulations, models, visualizations

<Karsten> And along with that we need (formal) mechanisms for comparing, testing, evaluating those hypotheses

<Vasant> +1 to Karsten

<Karsten> @Cecilia: +1 and climate science is another great example of this.

<PaoloCiccarese> Karsten we go back to the way the hypotheses are represented and shared

<LizBradley> Argumentation frameworks are perfect for that - not only because they naturally handle that, but also because they reduce what Cecilia called the "impedance match." Scientists communicate largely by arguing, and showing them your results in the form of an argument helps them.

<LarryHunter> Gene Ontology annotations, stored with the various model organism databases and EBI for the human (e.g. http://www.ebi.ac.uk/GOA/ )

<andrey_rzhetsky> to DavidJensen and Vasant: representing competing models and theories is hard/not addressed with current knowledge representation tools, I think

@andrey_rzhetsky: good point - knowledge representations can be incompatible

<Vasant> +1 to andrey - that is why we need research on this topic!

<CeciliaAragon_> Thanks Liz: it is an underexplored area, understanding how to lower the impedance mismatch between human cognition and computation.

<LizBradley> Andrey, argumentation frameworks really are a good way to handle competing models & theories.

<Vasant> Checking compatibility or coherence of models of one phenomenon to models of models of related phenomena would be important

<LarryHunter> Formal knowledge representations allows us to write code that help support the evaluation / comparison of alternative models

<DavidJensen> andrey_rzhetsky +1

if a knowledge representation is a collection of relationships (facts..? annotations?), then these facts can be true, false, or uncertain (with a degree of trust)

to decide the certainty of one of those relationships, ML tools should be able to trace back to the source of that relationship

<nigam_> IIS supported some work on evaluation / comparison of alternative models using Sem Web technology

<DavidJensen> That said, ML algorithms regularly compare huge numbers of models, but only in (often) a very constrained space. The real disagreements is often about which space to search in.

<Vasant> Provenance of “discoveries” with respect to data used, ontologies used, analysis tools used is important if we want to automate aspects of discovery - we cannot recover from erroneous conclusions when data need to be corrected

rssagent, create minutes

<nigam_> Michel Dumontier is working on storing the provenance of "discoveries" in the same framework as the data, ontologies and the reasoning rule.

- DRAFT -

SV_MEETING_TITLE

02 Feb 2012

Attendees

Contents

Summary of Action Items

Scribe.perl diagnostic output