See also: IRC log
<ericP> Q: is there a way to share templates?
<ericP> Lena: we have bioportal
<Lena> new version of my slides: https://confluence.deri.ie:8443/download/attachments/40304735/bos_linkedData4LifeSciences1.pptx
<Lena> these are the slides for the afternoon: https://confluence.deri.ie:8443/download/attachments/40304735/BigLinkedDataLifeSciences_20120827_bos.pptx
<ericP> scribenick: dbooth
<ericP> dbooth: David Booth, involved with Cleveland Clinic for several years
<ericP> ... working with SemWeb for research
<ericP> ... worked with Pangenex (SP?) for decision support
<ericP> ... keen user of SPARQL as a rules language
Maryam: Working in sem in LS. Project in sem in LS. PubMed.
Nima: Haarvard med school. Partners Healthcare clinical decision support system.
__: MIT. Financial models.
<ericP> Maryam Panahiazar - MUlti-Dimensional integrative approach to comparing genes for knowledge DIScovery
... Cancer research.
Sheng Yu Research fellow, harvard. Med records.
<ericP> Sheng Yu
<ericP> Loren Wilde
___: Clinician in audiology.
<ericP> Peter Mager
Peter: CS background. Looking at gene seq and synth biology. Also run a seminar series for IEEE. Having someone from Church lab about writingn thing on DNA sequenses.
<ericP> stuart turner
StuartTurner: Veterinarian. Post doc in bionfornatics. Research in biosurveillance. Want to sem webify a project ant NCI.
Franz: Entogen, int in deriv of structured data from unstructured.
ChrisBouton: Neurobio, engineer, sponsory, triplemapper.
Terence: Up-to-date, CDS, many teaching hospitals. In charge of info retrieval, and to get better int w EMR sys.
Yakubo: Visiting MIT for dissertation. Nat Lan Proc to symbolic then computation. Background philos. Formal modeling, logic.
Luke: Lead dev SADI, linking dymaic data into SW.
DanielaBourges: Harv Med School, working on SW years, Eagle-I project.
<Lena> (created a tweeter hashtag for this hackaton: #hclshack)
JustinLancaster: CSO QuickBio BioMedServer. Dnamic modeling of sys biology. Came to LS after env sci 10 yrs ago. Simulate complex sys from the knowledge, in the pattern of OpenBEL, then dev automated hypothesis.
Dipankar: Classifiers, clustering. Don't work w SW but want to learn.
<ericP> julie mcmurry
JulieMcMurry: Eagle-I, int biomed resource data. Background in vaccine, immun.
<ericP> juliane schneider
JuliaSchneider: metadata librarian, Harvard Med School. LD, entity extraction, trying to hook MeSH and Medline and Astrophysics.
<Frans> small correction - Entagen product is TripleMap - triplemap.com
<Lena> tweethach is #hclshack
<Justin> Hi Justin Lancaster (BiomedServer.com -- kwiKBio project) email@example.com --- based in Boston area.
<Lena> tweethacs is #hclshack
<Lena> tweethash is #hclshack
<Frans> anyone have a link for the meetup just described?
-> http://www.meetup.com/The-Cambridge-Semantic-Web-Meetup-Group/ Cambridge Semantic Web Meetup
<luke> Frans: this looks like the meetup http://www.meetup.com/The-Cambridge-Semantic-Web-Meetup-Group/
<Frans> thanks - signed up
Q: How do you specify where to get the data for the query?
A: That will be explained in a moment.
scribe: A query interface often has a place to put the URL of the data you wish to query.
Try query at: http://librdf.org/query/
<mary> can not get the slides from web<http://www.cambridgesemantics.com/semantic-university/sparql-by-example#q_negation_new_not_exists_r>
Q: How often are the XML datatype URIs dereferenced?
A: Almost never. The app doesn't look up the URIs, typically just the developer who needs to look up a detail about that datatype.
Q: SPARQL queries RDF resources. But how do you create RDF from other data sources and keep it updated?
A: There's a lot of tools to do that. It isn't standardized. There are software tools for mapping relational databases to RDF.
scribe: Same kind of ETL and data integration approaches already existing.
Q: When you design relational DBs, there are design considerations. Are there similar design considerations for RDF?
A: The design is in the developer's lap. There are fewer things that the DBA must think about, but it may increase in the future.
Q: In rel DB, there's a limited number of columns. But in PubMed, you have a gazillion docs, and all free text. What's the best practice to make sure the user can take advantage of that free text DB? how do you define the API or protocol between a DB designer and user?
A: SPARQL wiped them out. Make it so that the person who writes the queries has an intuitive graph pattern to walk.
Q: How do you pick out the data most useful to the users?
scribe: If you designed RDF data for PubMed, how do you pick what data is most useful?
A: One approach is the get users together and design what attributes you want. At the other end of the spectrum, we can pull out tons of information. But because in RDF we aren't limited by the number of columns, we're going to pull them all out, and let the users do SPARQL queries to pull out what they want.
scribe: For the first approach you may as well use a rel DB. For the other extreme, it isn't really practical. The approach I've seen in practice is an incremental path on that continuum. But incrementally adding new things on user requrements can be done in RDF without breaking existing data. You don't have to go back and create new tables.
dbooth: Great answer!
Q: Is the entailment features in SPARQL intended for integrity constraints and checking?
Lee: No. Different tools and platforms are treating constraints and checking their own way.
EricP: in the TMO (Translational Medicine Ont), we were evolving the ont and it would break the results of SPARQL queries. We would track what queries broke (went from 5 answers to 0).
dbooth: You can also think of constraints as SPARQL queries that look for violations.
Q: Is it possible to query for an artist that has at least one of those attributes?
A: Yes. BOUND is a SPARQL construct that asks whether a variable has a value in the query, so you can use that.
Lee: I've been interested in doing a life-sciences-specific version of this tutorial. Let me know if you're interested in helping on that.
Q: Is there a place where these best practices are being collected?
Lee: Semantic University is one place.
dbooth: A good way to develop a CONSTRUCT query is first to develop and debug it as a SELECT query, and then convert it to CONSTRUCT after you've debugged the WHERE clause.
Q: When would it be better to create a view using CONSTRUCT versus converting the query?
EricP: Question of materializing the view or not. Same trade-offs as in DB world.
Q: What if a value is unbound when you're doing CONSTRUCT?
A: That triple is automatically filtered out, per the SPARQL standard.
Q: Are you guaranteed that the Amazon and Nile lengths are in the same units?
A: No. You'd better be careful in your query. Good practice that I like (but it bother's ontologists): put the units in the predicate name, e.g. :lengthInKm
Q: Queries involving time durations?
A: Yes, there is time arithmetic.
scribe: They're defined in terms
of the XML Schema operators spec.
... But it isn't required for SPARQL 1.1 conformance.
Q: Is there a difference between MINUS and using the old !BOUND idiom?
A: They're pretty much the same, except maybe some edge cases.
EricP: SADI talk will be next, after lunch, at 1:30pm Eastern. THen Helena's talk after that.
<luke> Questions from Max:
<luke> How are triples hashed/indexed?
<luke> What sits between SPARQL and the web data?
<luke> Where is the logic stored for the relations?
<luke> How good are SPARQL queries for proprietary data?
[Lunch break until 1:30pm Eastern US]
<luke> Slides for this talk are at http://sadiframework.org/slides/MIT2012.pdf
<frago> thank you, I was about to ask about the slides
<ericP> scribenick: ericP
luke: use case for computed
... my clinic recently changed their gold-standard for COPD factors
... was costly 'cause the data was all stored in the old format
luke: semantic web services (e.g.
OWL-S) aim for the world where the service models the state of
the universe before and after
... e.g. tell ciri to purchase plane tickets and do all the debiting etc.
<mary> sadi slide is not avalaible!
<frago> i'm following from http://sadiframework.org/slides/MIT2012.pdf
mary, just takes a while to download
luke: given a db of heights and weights, use a SADI service to query BMI
Maryam: can you orchestrate SADI services?
Luke: the goal is that *you*
don't have to, that it happens for you
... the SHARE client exports the latest Taverna workflow format
luke: the input is a named individual, and the output is the same individual with a hello:greeting property
luke: ideally service description
would describe services
... for now, use the SADI registry
luke: OWL reasoning in Java 'cause thats where the reasoners are
luke: SHARE is a SPARQL processor
which matchs queries against the local store + the services in
... also decomposes OWL classes
... if you have a bunch of triples which constitute an entity, you can use an OWL class to capture them
Maryam: how do you invoke WSDL services?
luke: SHARE is for SADI
... you can describe that in WSDL, but it's just RDF-in/RDF-out
... there are ways to use e.g. SAWSDL or wrappers to make WSDL services available as SADI
... example: increasing creatinine (blood urea nitrogen) level indicates a rejected transplat
includes OWL class patients:AtRiskPatient
Luke: I want the genes in a
pathway and the proteins they code for
... interface completes from LSRN (Life Sciences Resources Network)
... could use identifier.org (if they export RDF)
[something close to http://sadiframework.org/content/2010/06/10/cardioshare-walkthrough/ ]
<luke> Download SHARE command-line client: https://code.google.com/p/sadi/wiki/SHAREClient
<luke> SHARE example queries: http://biordf.net/cardioSHARE/queries.html
-> http://sadi.googlecode.com/files/SHARE-client-0.1.jar share client jar
-> http://biordf.net/cardioSHARE/queries.html example queries
<Justin> What was command line string to execute the jar file?
<Justin> ... for the SHARE client
<mary> if anybody found the link for paper?
<dbooth> justin, java -Xmx1024m -jar SHARE-client-0.1.jar
luke: [Re: http://biordf.net/cardioSHARE/queries.html
... phd student in our lab trying to emulate clinical classification in OWL
... i.e. have the OWL reasoner perform diagnosis support like a clinician
... the measurement units in his data were inconsistent and frequently unspecified
... #14 demos that SHARE maps to units to a standard, and can guess them when not specified
Maryam: phylogeny analysis is a hard case
Lena: no single tree of
... also they are huge
luke: we needed a stable taxonomy
so we're using one from NCBO
... if you build the tree from the ribosomal RNA, you hard-code you biases
SADI can't solve the social prob, but can address the size
luke: SADI can't solve the social
prob, but can address the size
... we had a group using doing molecular modeling
... a query for the polygons on a molecular surface exceeded 4G
... so back to the phylogeny use case, probably need to pack the hierarchy as a literal and unpack when needed
Peter Mager: can't you pass a parameter?
luke: yep, we have URIs and we
can use them for this
... there's a predicate called rdf:isDefinedBy which SADI derefs
maryam: phylogeny researchers
care about methods used
... can i go through the info for methods?
luke: if they write it down, but
we don't force them
... Jim McCusker proposed pointing to the code for the service in a public repo
ericP: for e.g. homology, you code the way something is known to be homologous
luke: we use OWL to infer that
e.g. BLAST homology is a form of homology
... there are tools to use SAWSDL to make WSDL services be SADI services
StuartTurner: where do you see SADI going in the next few years?
luke: i described SADI in terms
of a toolkit 'cause i work on the toolkit
... but SADI is just a set of practices
... i worked on a submission to W3C
<luke> link to SADI summary/spec: https://code.google.com/p/sadi/wiki/SADITrail
ericP: the trust axis can also capture latent nuances which make data more applicable
mark: the prob with federation is
that we can't move 1k genome sets around on the network
... (without exotics fiber infrastructure)
Lena: but are you using all that data? can you select for the data you need?
mark: need to be able to run
analysis on computers that you don't own
... e.g. i don't want to pull the 1k genome data and swissprot to my local computer
ericP: [mumbles about the Grid marrying the SemWeb]
[shipping code to data a la SciDB]
Justin: are you losing too much by believing that you will capture all of the understanding of the publishers? [@@corrections please]
Lena: i think the question is can
we capture that that knowledge
... if we can eliminate human subjectivity, you increase the quality of the data
... having spent a year manually capturing data, i recognize how imprecise it is
Justin: i'm not convinced that letting the machine crank on 1k dimensions will capture the intuitions of the scientist
peter: there's a lot of
interesting data which is lost
... e.g. astronomers who fedex disk arrays around the country to do fourier analysis
... that data could be still be useful but is not available
ericP: [meta data tracking of raw data, e.g. disk arrays]
Lena: [fold-it example]
<tez> "Big Data" should enable both cases debated
@@1: going back a few slides, you spoke of identifiers disappearing on you. that seems like a higher priority
<tez> How can the "crackpot" be accelerated ?
Lena: yes that's first, but it's basically a solved problem
TFMorris: i don't know what your example was, but how has this been solved?
Lena: through frameworks,
exposing e.g. RDBs on the SemWeb
... my issue was URLs which were simply non-dereferencable
maryam: one of the probs with
reactome is that it's just for humans
... on KEGG we can't find a molecule in a pathway on certain days
Lena: need to capture context and
... e.g. reactome is curated human data
maryam: how can we decide whether KEGG and Reactome is better?
Lena: we invented pathways to
keep things in boxes
... (like species)
... we need to capture the underlying data
... imo, we need to talk about system states instead of pathways
Justin: there are so many
variables which can be measured on a patient
... the data mining problem is so huge, but coming at it with a big data approach might allows us to compartmentalize and analyze
StuartTurner: in clinical care we
have clinical practice to avoid opinion-based medicine
... humans make cognative mistakes and have biases
Justin: we want the machine to amplify the human
Lena: can't we have the machine perform the standardized tests while humans work on e.g. new methodologies
This is scribe.perl Revision: 1.136 of Date: 2011/05/12 12:01:43 Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: RRSAgent_Text_Format (score 1.00) Succeeded: s/Shimu:/Sheng Yu/ Succeeded: s/___/JulieMcMurry/ Succeeded: s/Loren/Lorin/ Succeeded: s/Lorin/Loren/ Succeeded: s/may/maybe/ Succeeded: s/lap/lab/ Succeeded: s/@@1/TFMorris/ Found ScribeNick: dbooth Found ScribeNick: ericP Inferring Scribes: dbooth, ericP Scribes: dbooth, ericP ScribeNicks: dbooth, ericP WARNING: No "Present: ... " found! Possibly Present: Bob_Powers ChrisBouton DanielaBourges Dipankar Frans Franz HelenaDeus JuliaSchneider JulieMcMurry Justin JustinLancaster Lee LeeF Lena MITKiva Maryam MattMackdonald Peter Ray StuartTurner TFMorris Terence Yakubo aaaa aabb aacc aadd aaee aaff aagg aahh aaii amy_ apo apo_ bobP daniela dbooth ericP frago graham helena julianeschneider luke mark mary mattmac nima sanofi scribenick sdc terrence tez w3c warzeld You can indicate people for the Present list like this: <dbooth> Present: dbooth jonathan mary <dbooth> Present+ amy WARNING: No meeting title found! You should specify the meeting title like this: <dbooth> Meeting: Weekly Baking Club Meeting WARNING: No meeting chair found! You should specify the meeting chair like this: <dbooth> Chair: dbooth Got date from IRC log name: 27 Aug 2012 Guessing minutes URL: http://www.w3.org/2012/08/27-hack-minutes.html People with action items:[End of scribe.perl diagnostic output]