Jan, 25 2006 


Start of meeting:


Tonya and Eric


Go over agenda, go over ground rules

Want to start and define what the group is interested in and look for overlap of topics 


Introductions:

Eric Miller

This is NOT a conference - we are a working group that has read the charter and want to lend our energy to the group

Alfredo Morales - Cerebra vendor of semantic tech 

David Martin - inter institutional information exchange

Dennis Quan - UI semantic web 

Giles Day - Pfizer - Informatics 

Marijia - Usability topics 

Mary Chitty - Librarian online taxonomy since 2000 terminology 

Christoff teranode - software developer 

Matt Shannahan - teranode - strategically deal with RDF data 

Beni - SRI international knowledge management looking for 

Joanne Luciano - Pathway data and knowledge management

Jonathan reese - Science Commons - Neuromedicine 

Molly Ullmman - Modeling genetic genomic link to clinical data

Ralph - ontology consulting tools/ontology architecture 

Davide Zakamini - RDF vendor NLP active in health care and life science 

Tim Clark - Semantic Web - SWAN - support neurodegenerative disease

Ray Hookway - HP - Works with Harvard Partners (thanks for Lunch!)

Matt Cockrill - open med central - Capture more semantic information 

Joh n Madden - SNOMED - Duke - distributed ontology dev

xx - siemnes - Point of care decision support

Gallway HP - social semantic desktops 

Sean Martin - IBM - advanced tech group - Internet stack builds cancer modeling 

mark Musen - Sanford - Protege ontology's semantic integration 

Kay Chjong - Yale - tool interoperation - genomics - ISHOP 

David Hansen - Australia 

National library of medicine - NLM - medical terminologies 

Mass general - modeling and integration 

Agfa healthcare - clinical workflow - personalize your healthcare plan 

 XX - Elsvier - Neroscience 

drensden U. ontology based - Semantic "writ" browser 

Elsavier - RDFizing content and systems - structure scientific articles to enhance content

manchester - Semantic web for web services

Georgia U. - Decision support - public ontology development

Bob North Eastern - Biological Knowledge Lab - Biological Knowledge 

Susie Stevens - Oracle - RDF data-model in the database - want to support semantic web standards

Cindy Bailey - Plexis? BioSystems - Systems 

David Crofte - Merk - Bench to bed side interest

Brian Osborne - BioPerl 

Mitre - MIT - Text mining area and bio-ontologies exploited 

Alan - pathway databases - BioPax - how to use OWL integrate pathway data

Ted Slater - SemWeb interpret data 

Vipul - MGH - Clinical decision support

Stevens - Reassign development in OWL - PWL training

Rodger Cutler - Chevron - Semantic Web - "oil company dog waiting for scraps for life science domain"

Sandy - Harvard Center genetics and genomics - translational medicine - help biomarker discovery - integration into clinical systems

Eric Neumann - focussed around drug discovery, teranode corp, use of knowledge in org

Tonya - Partners - Build decision support for clinical research and development - don't know you want people to help her get her problems solved for partners - show leadership in country in decision support and personalized medicine

Charles Tilford - BMS Bioinformatics


Eric talks about expectations a little more:

Q's - common issues, activities, coordination, roadmap draft, next Meeting? 


We're not going to set standards - coordinate with HL7, CDISC -- need to form some sort of dialogue 


Range of perspectives - NCBC (national center for biomedical computer) - Tonya gives talk  first :

Can't buy software that has all the data that Partners has HIT is hard to interoperate with existing systems 

Think about where are the intersection points of knowledge - data is important but meaning is needed - continuously enhance knowledge repo - drive structured research annotations 


Composite interaction - composite application building at Partners - present existing data about a patient in a structured way - generate recommended orders - behind the example are about 400 rules - get the right lists - recognition engines don't work with high frequency of transactions

Summary - reduce cost/duration/risk of drug discovery, clinical trials, decision support, clinical performance measurement


--

Key:

Q = Question

A = Answer

C = Comment 

P= Park(ed) Item

--


Q: David Martin - Is there really going to be a change? What's the driver - Value Based Purchasing - changed a lot of people. Pay for performance. Can't put content to work - knowledge bases are too primitive to get people where they need to go. Consumerism, consumers need to be the watch dogs for themselves can't do it if they can't measure quality of care. 


A: Citibank pays billions of dollars in customer relationship management systems 

A: matt Shannahan - cost structure is different in finance - account systems are integrated - only have to pay for IT - no laboratory - transactional system is only cost - data complexity is higher in life sciences

A: Consumers probably can't manage themselves 


Q: Having problem figuring out deliverables. What are they for this group? (JM)

A: Aspirations (Brandt) - consumption and digestion of new medical knowledge drive it into point of care have common semantics that allow for these updates to get consumed without rewriting system 


C: Eric Miller - Nirvana points on parking lot board (Brian takes down some items on parking lot)


Eric Neumann - Translational Medicine 


Q: Bob - Lab result is published in english - not as data in a machine readable format - 

A: Doesn't define natural language mechanisms for text extraction - but rather a way of recoding information in machine readable format


C: Giles - Vast majority of data is structured - decision making is made on structured data - but systems don't know about each other

C: Tonya - reports are formatted in unstructured format - want to repurpose data for other means

C: XX RDF needs to be a vehicle of what we were doing before 

C: Robert Stevens - Europe library services recognize increasing predomance of online semantically describe literature linking methods to results pointing to databases

C: Sean Martin - Where's semantic information? - trapped in UI, business logic and in schema - is OWL stored with data so that we can reuse data 

C: Davide Natural Language and Computing - NLP is there and you can automatically make ontologies out of free text - one ontology s there problem is not solved - visualization still needs to be there integration work still a problem - we should focus on building on top of NLP applications

C: Tim Calrk - bench to bedside acceleration essence need to get bench to bench communication acceleration. Need to think about a way of connecting scientists and de-fragmenting and de-isolate little islands of knowledge

C: XX - Partners (woman red hair) - Want to report genomic information into eMedical record - want to build the infrastructure to modify for clinical 

C: Elsevier - RDF is about entities and relationships - low hanging fruits are entities - need to get act together to identify entities - then next step is relationships - as a publisher you need to define your entity - don't be funny just be clear 

C: UN has established in food and ag for each country. 


Park - structured and unstructured data 

Park - How do we stitch things together using RDF


Eric Miller Talks - 

Q: Can you go over difference between XML schema and RDF? 

A: No, we can talk later 


Want to develop technologies and standards that build off each other. Want to define a clear working relationship with each other

Facilitate framework for integration 

Capability of connecting data it to its definitions and context

Functionality to draw new conclusions

Want to leverage existing web - evolution not revolution

Development of an ontology and terminology is quite costly - want to lower the barriers of entry


Q: Stitching is the "now" part. There are incompatible ontologies how do we deal with this issue?

A: Need to take existing ontologies and stitch them together (different communities are at different stages of evolutionary web)


C: Speak to incentives other than reuse - 

C: Speak to most highly evolved communities and what the benefits have been 


C: Tim Clark - Legos Analogy - Morphing things is a great analogy to biology 

C: John Madden - stitching may not scale - A is the same than B and "really it's a little different" - need to figure out where the connections can be made to get more information

C: Vipul - Probabilistic approach to semantic web 


--


TODO: Rules and RIFF need to be discussed for the group 

Where can rules work? Metabolic network can be represented as a rule network. Can produce a complicated reasoning engine 

TODO: Use Case needs to be posted to the website - Schroeder 3 areas it becomes important - provide rules to reason - rules to specify workflows - graphs represented and reasoning over them


--


Demoes and Applications:


Regulatory limits on data - but data needs to be freed from the application - value of "as needed" data integration 

Vendors that do this well are valuable to industry 

citeseer provides RDF data - other places do not - Eric M. screen scrapes - wrap the pages and exposes them as RDF

When do you violate copyright? - unclear - oclc (example library one)

Would it be worth showing what kind of RDF was  created? - Not going to do it

Scraper? Isn't that code? Translater -> RDF description more semantics in the original star-bucks page - no specification just a bunch of code 

W3C is more declarative than application oriented - No API etc. 

Show RDF code on Day 2 

Antibodies need to find them against multiple different vendors - Use Cases 


P:  Roadmap 


Public health side of the community would find Eric's demo interesting

Can we tell the system (today) that I've got 2 coordinates and that they provide coordinates that a distance calculation be made

Oracle - 10.2 db utilizes RDF 

****AKT - ActiveSpace

SWOOP - ontology editors 

Go Pubmed - mapping 


****Eureka - inference engine  (Nokia)


P: Collecting demonstrations and applications


P: SPARQL front end to a relational database





Day 2 - Eric Neumann Discusses Agenda and asks o identify areas where people want to be involved and that are in line with HCLS charter


Point 1: What's not available today that is enabled by the semantic web

Draft Items: 


Need to have meetings online, conference calls etc. logistics need to be taken care of

Collaborative ontology development based on semantic web approached 

Workflow pathways

Representing protocols - merging of protocols in healthcare - Alfredo, Davide, Vipul 


Vipul thinks that this is a "knowledge merging" problem

Decision Support - Workflow Generation


Structured/Unstructured text extraction/manipulation


Vipul - HL7/W3C liaison


Efficiently assign metadata to "edges" meta-data associated with URI's (affymetrix chip that says to things are related)

Area of publishers 

Webservices for clinical systems - what do they have to expose 

probabilistic modeling - 

Robert Stevens - Ontology representation 

Vipul - Knowledge Acquisition - Versioning/Knowledge Provenance 

Agents 

Inference

Rules Systems (RIFF) - (Davide)  Rules and protocol execution (clinical)


Eric Miller: Need to specify narrow topic 

Scalability and benchmarking 

Roger - Access Control and policy

Scientific Literature - 

Database integration and interoperability 

Critical research topics?



Eric M. There's a charter and we probably need to review the problems that this group is going to tackle:

Core ontologies - need some "glue" that bridges some key areas - how do they work in the semantic web context

Best Practices - citation/versioning/cross mapping

Cross community data integration

Policy access control

Privacy and security de-identification 


 

Key to *'s

--

* = first level

** = second level 

n..

--


Bob:

Ontology best practice and management (what's the best practice and use of ontologies):

*Modeling data and process

**Workflows 

**Rules and inferences 


*Unstructured -> Structured (representing forms of information in RDF - BioRDF community)

**Parsing text (piggy bank scraper)

*** Data converter how-to

***Relational database -> RDF

***Excel -> RDF

***XML -> RDF


Semantic web echo system representing knowledge -> process oriented:

*Protocols/Processes and Context 

**Tim clark - processes of doing research and exchanging information



-----

Scenarios/Use Cases and value proposition 


T: Path/Roadmap 


T: Time/Feasibility 


-----


Notes 1/25/06



Robert Stevens

Describing life sciences experiments


- Lots of computers and lots of data. 

- Generated by high throughput experiments, in silico bioinformatics, combinatorial chemistry, etc.

- Functional genomics needs lots of omic experiments, in order to see what is happening to populations of molecules. 

- Want nice unambiguous identifiers and vocabularies for describing the experiments. 

- FuGE (Functional Genomics Experiments) is developing vocabularies for describing experiment. For example, what technology is being used. 

- FuGE is very open. Early recognition of the importance of openness, unlike the cheminformatics community, which is only just beginning to recognize its importance. 

- FuGE uses OWL in Protégé.

- Many people in FuGE including Susie Lewis, Barry Smith, etc.

- Standard ontology development cycle. What are the upper layers. Need to decide whether to go modular or central. Who owns what. Who does what. Different people want different views.

- eScience and semantic grid provide high level views of semantic experiments.

- Lots of Web services for bioinformatics. Taverna can access 3000. You can then build a workflow of processes. Example of workflow for disease use case. 

- Experiments have a life cycle. Need to build workflow, gather and coordinate data. Can then analyze data, and compare to other data. 

- Have an OWL ontology that describes web services. Have registries that support RDF. All metadata in RDF. Coordinate data with RDF, with derivation paths. Have all provenance info. Can use tools such as BioDASH to look at data. Then build a big repository that can then be queried.

- Collecting stuff, describing stuff. All classic sw tasks. Identiying , naming and mapping things. Getting data into RDF if fine, but how do you do the aggregation. Overcoming the semantic heterogeneity is a headache. Much work around the world that describes experiments.


RalphH – How generic is the solution for describing exeriements?


RS - FuGE ontology, is only applicable for describing functional genomics research. The workflow technology is very generic, moving from bioinfo. to cheminfo. It basically works, but coming across interesting issues. What is good enough for food and drug administration is different to what is good enough for a bench scientist. Has been used for astronomy data. 


John – We use object technology. Can you tell me more about re-use of terminology?


RS – Recording knowledge about workflow is relatively easy. If we are nicely semantically annotated, then you know the input, output and time performed. Has 3000 web services, perhaps 100 are described semantically. Doing the annotation is the hard work. Might want to record deeper biology of what’s going on. Could re-use ontologies. Again, and again, with work on metadata, finding biologists like using metadata, but they don’t like generating it. Need to separate knowledge model from user model. It’s all possible, but how to make it happen. Especially when you get the human involved


Xiaoshu Wang - What is criteria for module. What is scope of modules?


RS – Technically you can deal with modules in protégé owl. It’s getting easier to import ontologies. After that you’re on your own. For example, how to incorporate public and private user access. How do I represent uncertainty? Environmental genomics will want to use some of the modules of toxicogenomics. This is difficult unless you make one module per class, which is nonsense. 


Ralph H. – Like to sign up for this, and put it on parking lot. XW thinking of the same problems. Commonality variability analysis. DARPA sponsored ODM.


Eric N. – If working with data in RDF, can do a lot without ontologies to start with. Which should come first, the data or the ontologies. 


Xiaoshu W – 1000 terms in 1 namespace. To use 1 term, need to import the whole namespace.





Dennis Quan – IBM

The Semantic Web browser and BioDASH – Integrating complex HCLS data sources


- The high level perspective is that biology is like a puzzle that needs to be connected. 

- We have started found many puzzle pieces. Now we are focused on snapping the puzzle pieces together, to help cure disease.

- There are many technical changes. Many puzzle pieces don’t actually fit together, e.g. different naming convention. 

- There are many good individual web sites, but need to link data from many web sites. 

- When you discover a new connection, how do you record that, so that other people can use it?

- Customize user interface should be able to help people create and share information.

- RDF can help create a universal, portable standard for creating, and assembling knowledge fragments. RDF allows you to put the puzzle pieces together.

- BioDASH – shows the integration of data relating to GSK-3 beta. One view shows GSK3beta and its interaction with a few chemical compounds. Another view shows the wnt pathway, which includes GSK3Beta (although with a synonym). Data from both views can easily be aggregated in one view using ‘drag and drop’. SNP data can be added to the gene information. Can create a collection in BioDASH, which is like a bookmark in a browser. While doing research, scientists find things of interest, and these can be stored in a collection called something like ‘follow up’. Can annotate any data on the screen. Possible to view the same data in different ways based upon filters, e.g. toxicity data related to GSK3beta.

- Trying to lower the barriers of integrating data through the RDF data model. Creating user interfaces quickly that shows the connections in data sets. Making integrated data more tangible and manageable by the user. Capturing knowledge at all level of supposition.


Eric N. – Views and semantic lenses are a very important part of the SW.


Xiaoshu W. – What is the backend of BioDASH?


DQ – Uses a customized RDF store in C++.


Vipal K -  What if the graphs are huge?


Eric M. – The demo shows the power of the SW, without people needing to see angle brackets. With SPARQL people can sub-select graphs. So sub-dividing the space based upon user steerage. Lenses help the user draw into the specifics. This is a very powerful notion. The puzzle pieces are lining up.


Vipal K. – The University of Mexico has plugins for zooming in and out of graphs. 


DQ – Scalability of the UI – filtration is the right answer for technological and human computer interaction reasons. Lenses are designed to filter these things. 


Vipal K. – Clustering is commonly used for scalability.


DQ – Scalability at the data layer, think of it like a huge graph. Think of the Web. You don’t want to load the whole Web onto your PC. You just load a component, which is then flushed from the cache. Relational data have been designed to handle the scalability question. 


Amit S. – What do you think of the use of context. Use an ontology lens for the filtering for example.


DQ – Lenses are for filtering information or specific purposes. Lenses can be gathered in views, for particular uses of the data.


Yong G. – How do you do data entry.


Eric N. – Converted Excel to RDF and then put it into BioDASH


DQ – Could drag and drop, and thereby enter new data into the data model. 


Allan R. – What is RDF and SW adding to this? Used to relational model showing different views. A lot of functionality is enabled by existing technology.


DQ – Whole system could have been built using technology other than SW. But using it simplified the development model. Well-associated semantics for combining data together. Can’t easily bring XML data together. Now all data in one representation. You can now use one query language for querying the data. Any DLG could have done the trick, but it would have been isomorphic to RDF.


Eric N. – Integrate drug interaction into BioPAX and it would take a couple of months. Could very easily add with RDF.


Alan R. – Like to see focus on that. Want to better understand how RDF is helping.


Eric M. – One big difference is the Web. Any DLG could have done this. Reduced cost by wrapping it into the web infrastructure. The network effect is enabling it.




Ted Slater, Pfizer


Goal of the presentation is to let us know what people at Pfizer deal with every day. Providing general comments about semantics in pharma R&D. 


At the SWLS meeting in Oct. ’04, presented on challenge of using long lists of identifiers that go up or down. Pfizer are completely stymied by such data. Don’t know what the identifiers are trying to say. A little search box on different Web pages doesn’t help us to understand the data better.


Data integration and keyword queries are necessary things in pharma. But they are not a sufficient set of tools to tell stories about physiology. Need next steps. Basically, need the information from the laboratories. Experiments will support interpretation of the data. Ad hoc annotations are research projects on their own. But they are too much time and effort. Need context, and need to push relevant results to individual researchers who have expressed an interest. Not interested in lists. Can’t do queries all day long. Need to be able to do inference over semantically interoperable data. Need to stop banging heads against a wall. Need to make sure results are discoverable, and software can interpret what is going on in a semantic way. Will help understand what’s going on, and being able to push data out to the relevant people. Technology will consist of agents. Need to be able to express what I’m interest in. Want a ‘my agents’ tag. Want agents to find the data that I’m interested in. If new data entered about RNAi that is relevant to me, then I want to be notified. Can’t search for the data all of the time. Need software that is looking for this stuff for me.


What do we do in terms of disease, biomarkers and annotation. Deal with these topics all of the time.


Diseases – pathophysiology - disease areas include oncology, and arthritis. Need to keep up with literature. Don’t want to have to query everyday. Want an agent to scan literature for me, and to report the results to me, even if I don’t know all of the right keywords. 


Parachuting compounds. Some data about a compound may have been entered into a data store. Currently I wouldn’t even know if data was entered. Might not know that the target would be of interest. Very important to Pfizer. Can do it with technologies we are talking about here.


How can you find out if a compound is effective? Safety concerns are also a very serious business. One example at Pfizer is drug-induced vasculitis, either in research or in the clinical. We have a discoverable test case; so we can find patterns as to which proteins are present in the plasma. Need to monitor internal and external data stores. Its difficult for us to do, because of the way that all of the different stores are formatted, and the semantics. The more cleaned up the data is the better. Adverse events information is also very important. Think BioDASH demo is great. Discoverable is taking on meaning. Can use information without much more of a to do.


Biomarkers are also a big deal. There are different types of biomarkers, e.g. mechanistic biomarkers, safety biomarkers, and disease biomarkers. If a person has a disease, we want to make sure that the compound is hitting the target, and alleving the disease. Want information to be pushed out as to how good the biomarkers are. Need semantics associated with all of the data. Correlation is not as good as causation. 


Annotation is always a tough one, and has been for a long time. Unambiguous naming is very important. Then we’re happy. We can deal with all of the synonyms, as we can get them all, and we know what they all are. As there are many synonyms we are forced to do crazy things, and include complex keywords in the search. Can get all of the synonyms, but using them is a pain. Ontologies are important in life sciences, because of all of the synonyms.


? - Can you always know whether something exists?


TS – You can’t always. Sometimes you know that it exists in a different species.


Eric M. – I hear your pain! Frequently! The ‘my agent’ scenario would be a wonderful use case. If you could articulate that, then it would help rule vendors understand how to interact with the description logic folks. Try to hit the key pain points.


TS – sure.


Eric M. – What are the critical inferences that are needed? Hear need for different inferencing for different application domains. Sometimes OWL DL is very powerful. Sometimes RDFS with OWL Tiny works. Is there a pattern that the community has in terms of functional requirements?


TS – Don’t know. Can come up with use cases.


Joanne L. – Aggregation is one level, when merge data then you get another level, inferencing is a further level still.


TS – Some function that compares RNA profile and provides a similarity score. Also knowledge about pathways. Rule base could say ‘if new rna pofile in db, and if similar to another profile, and if rna metadata says looking at compound a on target x, and if  …,  then tell ted that the data has shown up’.


Xiaoshu W. – This is rule based, customized inference.




John Wilbanks

Scientific Compounds


Aggregation of literature should happen on the web, rather than in a central database. Aggregate through web of relationships instead of centralization.


Implement SW in a single tissue. Want to establish how hard it is to do this. What exactly is the benefit to a bench scientist.


Copyright is a monopoly on creative works. Arts have outstripped science. Give copyright to publisher for 75 years. Makes it difficult to aggregate data. The entertainment industry is causing the problems.


Going to text mine and curate the open access neuro literature. Text mine PubMed abstracts. Provide web client resources, download graph for re-use.


Even in journal titles there is lots of useful information. Interesting information includes co-occurrence and assertion of a biological relationship. 


Will have a web of relationships onto which other data can be attached.


Will generate an open API and use cases, documentation and metrics, methodologies, materials, and reference implementations.


All the data will snap together because it’s RDF.



Amith S – How can people help?


JW – Want help with ontologies, text mining approaches, scrappers for databases. Want help from content providers. Want to work with people who can help provide access to the information. Using a commercial text miner. Opportunity for publishers to provide peer review for lenses. Content providers can provide peer reviewed RDF graph.


Eric M. – Bravo. Important way to advance the community. Will help technologically and from licensing perspective. Put stake in the ground in terms of follow up.   




Vipal K

Partners


Examining the role of SW technologies in managing knowledge change and provenance. 

Now look in health care context. Discuss OWL efficiencies.


Definitions: close relationship between provenance and change. One constant is that knowledge will change. Want to know why it changed, and who changed it. The business problem is a lack of consistency, etc. Will document template over time.


The doctor fills out information about a patient in an electronic medical record. Need to know things like, are there any contraindications to fibric acid for this patient? We are modeling rules using ontologies. Each question can consist of a domain, the question itself, the attribute on which the question is based, and the value. Use Cerebra Toolkit to build an OWL DL model. 


There is tons of complex knowledge. These things change overtime. Mode of managing change is over 3 time periods. This is a knowledge event change. E.g. the normal range of AST may change. This propagates many changes. The problem is very complex. Could impact a whole knowledgebase. This is NP Complete. 


Ontologies are decision support tools. This is a classic case of dependency propagation. Need tools to manage the knowledge change provenance. For example, the definition of fibric acid may have changed. You may want to know way it changed, etc.


This is an ongoing project. We are collaborating with Cerebra. Semantics can play a crucial role. A reasoner can navigate a semantic model of knowledge and propagate change. 


 Xiaoshu W – SW based on open world assumptions, so if model changes then the model was originally wrong.


VK – The model was though correct at the time it was built.


Brian G. – Are you trying to minimize type II errors.


Tonya H – Trying to get away from the problem of being a heterogeneous, multi-center organization. If same concepts in many silos, then need to unite data, and when changes are made for this to be propagated across all of the modes. Relationship between drug information, and when one order set customer changes information relating to drug information change, then it can take 6 months. Want to reduce that time. Reduce labor needed to keep knowledge base up to date.


Amit S – Using logic based reasoner to propagate change. What about the named relationships?


VK – Determination of what has changed. The owner is notified. For use case that they have, they think that OWL is expressive enough for the propagation. Reasoner can reason over the relationships.


Tonya H – This is an experiment to see if OWL DL is rich enough.


VK – Looking at hybrid rule for temporal and spatial reasoning.


Brian G. – Is the clinician being informed by the model.


Tonya H – The knowledge engineering process, trying to manage through inheritance of propagation model. Think there must be an equivalent problem in R&D. 





Mark Musen

Stanford


The National Center for Biomedical ontology


NIH is getting into big science. New centers are being formed to help build an informatics infrastructure. 


Got funding 3 months ago. It’s a large consortium that is made up of a variety of centers. The project is focused at Stanford, but also includes Berkeley, Mayo, Victoria, Buffalo, UCSF, Oregon, and Cambridge.


Many resources are available on the web. These provide opportunities for biologists, and generate many problems. There is much incompatible data. It’s important that it can all work together.


Everything that was originally thought about ontologies is wrong. For example, data for inferencing, and text mining. Ontologies are used by biologists for simple tasks such as annotating data. Didn’t catch the real dissidence of people with an AI background, compared to the biologists’ view of using ontologies for tags. One focus of center is to reconcile the 2 different views.


Ontologies being used by biologists include GO. GO has had an unbelievable effect on biology. It’s not even a light ontology. It’s a lighter than air ontology! Used for annotation. But gives biologists some insight into using ontologies beyond annotation.


Many ontologies are available for biologists, and many of these are listed in OBO. Some ontologies are very rich, some are parsimony, and some are very specialized. Ontologies aren’t peer reviewed. Many will have errors. Some are in protégé, some in a DAG, some unknown.


Building ontologies is currently a cottage industry. People throw ontologies over the wall and see what happens. Want to move to the industrial age. Want to build ontologies based on standards. More confidence in semantics and content, and ability to track how they are used. 


In its wisdom, the NIH has recognized the problem of their being an inefficient infrastructure for biocomputing. Elias Zouhini took resources from all NIH groups to fund the NIH Roadmap to build the infrastructure. 7 centers have been built over the last 2 years.


We were funded in September. Bring together many people, who contribute in a variety of ways. Protégé is funded separately through NLM. 


There are 3 main pieces to the technology. Center will take over OBO, will reconstitute it into a way that will be more useful and interesting. Allowing indexing and alignment of ontologies, etc. BioPortal to access OBO ontologies. Open Biomedical data (OBD) – will be annotated with the new OBO ontologies. Will deal with the evolution of ontologies, so that data evolves with the ontology. The synergy will allow biologists, anywhere in the world, to be able to access the ontologies, and annotations. 


Need to add a lot of work with OBO. Help people build and extend ontologies. Build new generation of protégé. The OWL plugin currently sits ontop of protégé, which isn’t ideal. Locate ontologies. Some ontologies may be stored locally, while others may be stored remotely. Enable people to index them all. Have peer review of ontologies. Have an evaluation of concepts and relations, and make nuanced comments. Understand the differences in modeling approaches.


The goals of the center are to create integrated ontology libraries in cyberspace, meta-data standards for ontology annotation, etc.


We believe we have structure that enables people to store ontologies that use different standards. To understand what meta-data is needed.


Most exciting, is that one of the reasons that NIH has given us lots of money, is to ensure collaboration. Contributing libraries would be great. Special granting opportunities, including ‘collaborating R01’. Proposals in Jan. and May. Component of center is involved in driving biological projects.


Excited by opportunity to build a service layer for ontologies.


Eric M. – Parking Lot - HCLS challenges – OBO-edit and DAG-edit for RDF. 


MM – Challenge is to make bio community feel more comfortable with web standards. They are uncomfortable with web standards and angle brackets.


Xiaoshu W – Ontologies in many languages in OBO. Will they be converted to OWL? Then problem with URI for the ontologies, e.g. what is namespace for GO?


MM- Berkeley uncomfortable using namespaces


Roger C. – Ontologies with labels.


MM – Won’t own all ontologies. Will point to other ontologies.


Robert S – Not all ontologies comply with all rules. Big ontology alignment is an issue. They won’t all just snap together, if they aren’t in overlapping domains.



Joanne Luciano

BioPAX


I’m going to describe the pieces and attributes of BioPAX, to explore whether or not they fit into their global use case. Want to make sure that the lego piece makes sense. Have learnt through the process, and will share that information.


Pathways research has broad impact. Want to combine knowledge from many sources. How do we integrate such data?


What’s a pathway? It can be a metabolic pathway, a molecular interaction network, a signaling pathway, or a gene regulatory network. These are the 4 main conceptual models for pathways. They all get published in papers, and end up in pathways databases. They are not primary databases. Information is extracted from literature. The different pathways data stores need to be integrated. 


What’s the problem? Each pathways databank has its own format and syntax. To do integrative systems biology research, you need to use all of the data sources. People write parsers for all of the data stores. Now you only need to integrate with BioPAX.


OWL DL was chosen for its expressivity. Want to enable computability. OWL lite wasn’t expressive enough.


BioPAX uses other ontologies, e.g. uses GO compartment for cellular location.


Pathways are represented as a set of interacting parts.


Looking into how SBML can be annotated with BioPAX. 


BioPAX uses external references, e.g. Xref.


Can list all synonyms in BioPAX.


Use Protégé, GKB Editor SRI and SWOOP for editing ontologies. Use many reasoners.


Areas of interest: integration (combine sources in a meaningful way), identity (recognize same things in different contexts and names), composition (re-useable representations), exchange  


Eric N. – How do you see the BioPAX and SW efforts working together?


Alan R – Current version states you can’t add sub-classes. In future, you couldn’t add additional proteins, as the goal is to define an exchange language.


Xiaoshu W – No additional property associated with annotate. Still think about a tree structure. No common properties. Arbitrarily group everything under one umbrella. 


Robert S – Paper in Nature Biotechnology. Easy to find holes in ontologies. Less easy to get around to doing it well or properly. Have worked with JL to review use of OWL in BioPAX. Long list of known areas of concern. Lots of things not said that should be. BioPAX has a class called ‘utility class’, which is effectively miscellaneous, and therefore not very useful.


Matthew C – What role for BioPAX in literature in respect to pathways.


Xiaoshu W – Sent BioPAX people a manuscript regarding guidelines. 


JL – Want to discuss over next day or so.




Davide Zaccagnini, L&C

Semantic and Literature


Process text with NLP.

Apply semantics to schemas.

Easily integrate databases with ontologies

Life Scientists need query, browsing and visualizing


Steps.

1. Text is analyzed on a low level.

2. NLP, DB integration

3. Applying semantics

4. knowledge discovery (queries and visualization)


NLP been worked on for decades.

- Structural and grammatical parsing (paragraphs, sentences, words, etc)

- Syntactical parsing (dependencies between terms)

- Ontology-based semantic tagging (infers concepts and relationships)


Ontology created from text, and ontology used with text.


Use case: anti-diabetic drug at advanced development. Unexpected QT elongation in clinical trial.


Tried to understand through mining the literature, swissProt, proprietary databases, and GO. All information were integrated into the ontology. Then interrogated the information. Asked open and closed queries. Ended up with a plausible explanation as to the pathways that were causing the effect.


Bob F – It obviously works. Still well known that there are outstanding problems in NLP that no one knows the answers to. What they’ve done is to weight the evidence. Caution people that NLP is very difficult and complex. Yet they received reasonable results.


Davide Z. – Solid NLP won’t be seen for 10+ years. Already at 95% accuracy. Some fields aren’t interested in 100% accuracy. 



Scribe: Eric Neumann      Date: 25.1.06  Time: 4:25


John Madden , Duke U - Terminology Federation

Involved in SNOMED (pathology term)

Taxonomies and terminology :  Aristotle: Chain of Being

Different views of uses

AI or batch retrieval? Doctors do not make such distinctions

o Store records retrievably

o Ad hoc search

o Explore/mine records

o Support MD-machine interactions

o Decision support

o Medical AI

Facts, exam contextual facts, workflow context facts, interpretations, theories…

Wittgenstein: what we can say about things…

What’s missing? Context, relevance, agenda, workflow, para-consistency

Spheres of influence: which CV to chose from

De fact interoperability, inferencing, maintenance

Terminology complex scales poorly,  inference brittle,  restricts speech, maintenance in practice is hard 

Hierarchical semantics

Weakest link: Lousy ontologies at mid level can ruin cross mappings

Onto reconciliation

Federated semantics

o Different groups (with their ontologies) try to work across their ontologies (maps)

o No global guarantees of consistencies

o Locally consistent

o Globally tolerant of inconsistencies

o Finitely granular

UMLS thought about this: peer to peer or hierarchical


Helen Chen, AGFA – Adaptable Clinical Pathway

HC plan, for treating patients

Usually rigid, and not flexible for personalized medicine

Research framework

o Cycle SW -> action plan, work flows, data mgmt,  re-used in planning cycle

o Scalable, 

o Knowledge,  proofs (trust), rules, med K, process K, patient Info

o Action, workflow, decision, actions

o WOPEG/AJAR – web of proof engine (euler, cwm, jena)

o Full distributed knowledge, Declarative plan for step-wise clinical pathways, web based

o How to combine pathways: femur surgery + induced cardiac arrest

o ACP ontology !!!

o Handling unforeseeable side effects – really requires SW flexibility

o X-ray Contrast  clearance rules….

o Check for drug adverse effects


Davide Zaccagnini, L&C – Semantics and Literature

o






Scribe: Eric Neumann    Date 26.1.06 Time 1:45pm


Sam (Siemens), Helen Chen (AGFA) – Ealthcare Adaptive Process 

o Protocols/Plans for HC; coordinate in HC Institutions, interact with other flows, which prototypes are germane, events and response; actions ; conclusions/beliefs; no other standard party addressing these issues currently; 

o Deliverables: semantics and rules to representing protocols; relationships to other groups GLIF; what we learned and what we can do; HL7 gap to be addressed – 2 years time frame

o ACTION: write extended abstract, names of participants, define itemize deliverables; Proposal to be written and sent in

Tim Clark (Alzheimer’s Forum), Giles Day (Pfizer) – Research Knowledge Lifecycle

o Propagate through academic communities, knowledge artifacts of research process

o Make semantically explicit all connections

o Deliverable: key scenarios, in R&D

o Draft Ontologies and grounding applications with defined interaction points

o ACTION:  Come up with scenarios quickly; Proposal to be written and sent in 

Robert Futrelle (NEU), Matt Cocherill  (BioMedCentral) – Unstructured Data (Text)

o Definition: 

Obtain Unstructured and semi-structured text 

Generate semi-structured text: finding Entities

Move to strong structures, e.g., RDF-OWL

Agreement resolution of concepts and Ids

Identify common vocabulary (DC like)

Expose to users (views) – feedback? Reviewing 

Source and context

Tool development

o DELIVERABLES: 

Publish flow model as example, Document demos, use cases, critique (form vs content), fix

o ACTION: Proposal to be written and sent in

Susie Stephens—Structured Data 

o Build demo – Neurosciences, NC, SWAN; 

o Timeline:

Stage 1 – 3 mth

Data sources and access; show data integration; scalability

Bench-to-bedside? Try our feasibility

Tools GRDDL, SPARQL, OWL, help from Brian Gilman

Neuroscience input from J Wilbanks

Set up WIKI

Stage 2 – 6 months, 

convert data sets into RDF: doc from Excel, Word, RDB, XML

Analysis of semantic requirements (Onto coord)

Screen scraping  API

Documentation

Stage 3 - 1 year

New Scientific insights

Explore data integration – validation 

Use (?) ontologies

Teaching tool - best practice learned

o DELIVERABLES :  Data source to RDF plan and implementations 

o ACTION: Proposal  to be written and sent in


Vipul Kashyup,  – Ontologies  Best Practices and Mgmt, 

o Use Cases: 

share template for all groups to contribute

Mass Spec, 

lab test and observations

pathologist reports (ratings), 

Brain Atlas, 

DNA seq + web services, 

micro array data (tox onto and extensions), 

glycomic data annotated and merged with other annotations, 

diabetic patient visit in an emergency

o Deliverables: use-cases; guidelines for ontology, solution design for [a]particular use case; GO+SNOMED conversion; Development WIKI (WIKI of WIKIS); answers to present questions

o ACTION: Use case completion; monitoring/interaction with other groups; common vision with other groups (HL7 links) - HL7 contacts: John, Molly, Vipul, Eric; work with NIST on mapping URI to ID stds; (EMBL DDJ, GB); Proposal to be written and sent in

o Questions’s (for ontologies in general or SW perspective?)

What is an Ontology?

What Knowledge should the represent? URI = ID

How should K be represented? Probabilistic representations

How should ontologies be created? Different actors: Subj, modeler, consumer; building blocks and templates for HCLS; ontology registries; best practices

How should ontologies be maintained? Life cycle alignment

How should Ontologies be evaluated?

How should ontologies be used?