IRC log of dwbp-DataUsage on 2014-04-01

Timestamps are in UTC.

09:21:43 [RRSAgent]
RRSAgent has joined #dwbp-DataUsage
09:21:43 [RRSAgent]
logging to
09:21:44 [newton_]
09:23:45 [yaso]
We should work here
09:24:39 [Zakim]
Zakim has joined #dwbp-DataUsage
09:26:44 [ericstephan]
09:26:55 [ericstephan]
My mic is not working but i can listen
09:27:11 [ericstephan]
Good morning everyone
09:34:20 [ericstephan]
Data set selection might mean many things to many groups....
09:39:50 [ericstephan]
Yes I can here a bit :-)_
09:41:22 [ericstephan]
Knowing data is credible or trustworthy would be extremely important
09:42:04 [yaso]
Maybe we can have a feedback on the data to reinforce the quality of the data
09:42:40 [ericstephan]
Yes very good especially feedback from respected experts
09:43:07 [ericstephan]
A leading researcher feedback versus a non-expert
09:45:05 [ericstephan]
Would this help?
09:47:00 [ericstephan]
So citations, scholarly value, might be useful?
09:50:32 [BernadetteLoscio_]
BernadetteLoscio_ has joined #dwbp-DataUsage
09:51:09 [adrianov_]
Eric: suggests
09:51:36 [BernadetteLoscio_]
Hi Eric!
09:52:00 [ericstephan]
Hello Bernadette!
09:54:46 [ericstephan]
Isn't one simple use case of API and data selection the use mime type?
10:02:45 [yaso]
10:05:49 [yaso]
I’m not feeling good :/
10:06:07 [ericstephan]
I use the msm to describe services
10:09:44 [BernadetteLoscio_]
really interesting
10:09:49 [ericstephan]
For some datasets (terascale and petascale) and in science it is more advantageous to move the API to the data rather than having the API operate on the data. From a data usage perspective I might want to know what API I could operate on data types I was already aware.
10:10:38 [BernadetteLoscio_]
im not sure if i understand :(
10:12:20 [yaso]
It’s a problem of performance, Eric?
10:13:49 [ericstephan]
10:13:53 [ericstephan]
That is correct
10:14:03 [ericstephan]
It would also be true of streaming data
10:14:42 [ericstephan]
Streaming data as opposed to a fixed dataset
10:16:20 [newton_]
Streaming data is about real-time data, you mean?
10:18:00 [ericstephan]
10:20:20 [ericstephan]
That is also good providing information about the organization or individual using the project
10:20:28 [ericstephan]
10:21:14 [ericstephan]
Data provenance is very important, it might also be important to describe what doesn't work with the dataset
10:24:48 [ericstephan]
Who is publishing and who is using the data, how it is being used, and what APIs are capable of using the data?
10:28:35 [ericstephan]
Yes As Phil was talking about yesterday how data is associated with other data?
10:29:07 [ericstephan]
PDF file was read and a table was generated from that. How do we describe that association ?
10:32:41 [HadleyBeeman]
rrsagent, make logs public
10:35:39 [ericstephan]
sooo...describing dataset from a Data governance, ownership, stewardship, access from a community perspective? Interesting
10:36:29 [yaso]
yesyou can also describe the associations of the datasets you’re using
10:36:59 [ericstephan]
Its almost like a differentiating data in the wild from "formalized" datasets?
10:37:28 [yaso]
10:38:06 [ericstephan]
A use case might be relying on Google Maps for some data but adding my own Point of Interest mapping points to a map. You could rely on Google Maps but maybe not my POI data
10:39:48 [yaso]
Yes, this is a good use case. Just like about drugs: if I have a index of drugs and take a dataset of FDA (if it was open) and then I add my impressions about each drug, to share
10:39:59 [yaso]
(I saw something like this in Brazil)
10:40:00 [ericstephan]
Yes great example
10:41:05 [ericstephan]
Depending on your perspective the "expertise" could be relative. What the FDA says versus personal experience
10:41:20 [yaso]
10:41:32 [yaso]
I add this 2 cases on the wiki
10:41:42 [yaso]
Now I’m gonna take some coffee
10:41:44 [ericstephan]
10:41:47 [ericstephan]
Okay me too
10:59:14 [ericstephan]
Are we going back to the main group?
11:01:33 [HadleyBeeman]
Not sure yet, eric. We should be starting up again in a minute or two
11:01:39 [HadleyBeeman]
^ ericstephan
11:01:57 [ericstephan]
Okay thank you Hadley
11:06:08 [newton_]
Eric, you left the hangout?
11:06:30 [fkyanai]
fkyanai has joined #dwbp-DataUsage
11:06:36 [ericstephan]
It left me :-)
11:06:45 [newton_]
11:06:48 [newton_]
We can start a new
11:07:49 [ericstephan]
Okay I am back on.
11:08:16 [ericstephan]
Sorry no video from my side, but it is still dark and my picture looks creepy working by the light of the monitor :-)
11:08:58 [newton_]
It is ok
11:09:11 [newton_]
Yaso, Berna and Adriano are coming
11:09:23 [ericstephan]
11:11:24 [yaso]
yaso has joined #dwbp-datausage
11:19:51 [ericstephan]
11:27:37 [ericstephan]
visual analytics might be another example
11:44:05 [adrianov_]
scriber adrianov
11:44:30 [newton_]
11:44:35 [newton_]
You can edit also
11:44:40 [newton_]
11:45:34 [adrianov_]
ericstephan: vocab should enable privacy config
11:46:41 [adrianov_]
newton_: an important point (related to revenue) is how to value the data
11:47:36 [adrianov_]
BernadetteLoscio_: the vocab should reflect the process of charging
11:49:22 [adrianov_]
BernadetteLoscio_: discussion on wether or not SLAs are on the scope of data usage
11:51:22 [ericstephan]
Maybe the providers of data need to know the fee for serving their data
11:51:36 [ericstephan]
If it is served for instance on AWS
11:53:45 [Zakim]
Zakim has left #dwbp-DataUsage
11:54:39 [BernadetteLoscio_]
11:55:09 [BernadetteLoscio_]
rrsagent, make logs public
12:03:58 [adrianov_]
BernadetteLoscio_: discussing about the scope: privacy, revenue, traceability and gathering feedback
13:04:04 [HadleyBeeman]
HadleyBeeman has joined #dwbp-DataUsage
13:11:59 [HadleyBeeman]
rrsagent, draft minutes
13:11:59 [RRSAgent]
I have made the request to generate HadleyBeeman
13:18:09 [fkyanai]
fkyanai has joined #dwbp-DataUsage
13:18:58 [adrianov]
adrianov has joined #dwbp-DataUsage
13:19:19 [fkyanai]
fkyanai has left #dwbp-DataUsage
13:19:22 [fkyanai]
fkyanai has joined #dwbp-DataUsage
13:19:31 [fkyanai]
Hi !
13:19:37 [fkyanai]
Eric, are you online ?
13:19:45 [fkyanai]
The new link to the hangout
13:19:46 [fkyanai]
13:21:08 [newton]
newton has joined #dwbp-DataUsage
13:22:00 [newton]
Hi Eric
13:23:17 [adrianov]
ALL: scope includes traceability, gathering feedback, and other aspects, namely privacy and revenue
13:23:42 [adrianov]
scriber: adrianov
13:24:55 [BernadetteLoscio]
BernadetteLoscio has joined #dwbp-DataUsage
13:25:00 [adrianov]
BernadetteLoscio: other aspects also include provenance
13:25:22 [newton]
scribe: adrianov
13:25:56 [ericstephan]
Hello is everyone coming back? I'll get back on line
13:26:08 [adrianov]
BernadetteLoscio: our focus is on who is using the data
13:31:51 [newton]
BernadetteLoscio: we are going to organize all items collected in the first brainstorm
13:35:11 [newton]
... and classify them into categories: Traceability, Feedback, Other aspects (including data provenance, revenue and privacy).
13:43:12 [ericstephan]
I think there are many aspects of the provenance vocabulary we could borrow or use as a basis. The difference being the PROV is describing what happened Data Usage describes what is possible.
13:46:45 [newton]
This is the link of Data usage notes
13:46:46 [newton]
13:50:15 [adrianov]
general challenges
13:50:26 [ericstephan]
To Me the points in Dataset selection/Processing/usability can be organized under: Who What When Why How
14:04:46 [ericstephan]
I think of provenance as just "Data Usage History" from our perspective
14:05:04 [ericstephan]
Does this make sense?
14:06:12 [ericstephan]
I have a dataset A, here is how it was used, who used it, and here is how they used it. This is the data usage history...
14:06:15 [yaso]
yaso has joined #dwbp-datausage
14:06:33 [adrianov]
makes sense to me
14:08:03 [ericstephan]
Its past (PROV) and present/future (Data Usage) tense use of data.
14:08:56 [ericstephan]
It is very complementary to provenance
14:08:58 [ericstephan]
14:09:38 [HadleyBeeman]
rrsagent, draft minutes
14:09:38 [RRSAgent]
I have made the request to generate HadleyBeeman
14:09:53 [ericstephan]
I've heard it called predictive provenance, now we call it data usage
14:09:55 [ericstephan]
14:10:47 [yaso]
scribe: newton
14:11:32 [yaso]
scribe: yaso
14:14:24 [ericstephan]
I added a few points to our wiki page
14:19:20 [newton]
BernadetteLoscio: the point now is "how can we, as consumer of data, give a feedback about the dataset"
14:19:50 [ericstephan]
There might be different kinds of feedback, blogging versus following a protocol?
14:20:03 [BernadetteLoscio]
14:20:33 [BernadetteLoscio]
we can describe data feedback on a machine readable format?
14:20:46 [ericstephan]
14:20:49 [BernadetteLoscio]
14:20:58 [newton]
What do you suggest to do that?
14:20:59 [ericstephan]
In some cases the machines will be giving the feedback
14:21:02 [BernadetteLoscio]
do you know if there is a vocab for this?
14:21:11 [ericstephan]
Not off hand
14:21:19 [ericstephan]
Its a great question
14:21:30 [ericstephan]
It almost follows under document transclusion
14:22:24 [BernadetteLoscio]
do you think that this should include in the data usage vocabulary?
14:22:29 [ericstephan]
I wonder if something is available under BFO
14:23:12 [BernadetteLoscio]
BFO is a vocab?
14:23:32 [ericstephan]
Basic Foundation Ontology Its something that came out of the BIomedical community to manage research data
14:23:55 [BernadetteLoscio]
yeah... maybe... i dont know this
14:23:57 [ericstephan]
OBI and AIO use BFO....
14:24:41 [ericstephan]
I know a little about it, OBI is used to describe how data is processed or used and I'm not sure if it handles feedback but I can check just a sec....
14:25:12 [yaso]
we’re gonna have more coffee
14:25:13 [yaso]
14:25:15 [ericstephan]
sounds good
14:42:37 [newton]
newton has joined #dwbp-DataUsage
14:43:33 [yaso]
yaso has joined #dwbp-datausage
14:45:18 [BernadetteLoscio]
hi Eric!
14:45:21 [BernadetteLoscio]
we're back!
14:47:54 [ericstephan]
14:57:08 [yaso]
15:00:29 [BernadetteLoscio]
15:08:21 [ericstephan]
Example way of leveraging PROV as a base line Instead of prov:wasGeneratedBy we use duv:Generates
15:16:12 [HadleyBeeman]
HadleyBeeman has joined #dwbp-DataUsage
15:20:51 [newton]
15:20:59 [newton]
There is someone here?
15:21:20 [newton]
15:21:49 [fkyanai]
fkyanai has left #dwbp-DataUsage
15:22:20 [HadleyBeeman]
rrsagent, prepare minutes
15:22:20 [RRSAgent]
I'm logging. I don't understand 'prepare minutes', HadleyBeeman. Try /msg RRSAgent help
15:22:46 [HadleyBeeman]
rrsagent, draft minutes
15:22:46 [RRSAgent]
I have made the request to generate HadleyBeeman
15:23:56 [yaso]
yaso has joined #dwbp-datausage
15:28:18 [yaso]
RRSAgent, draft minutes
15:28:18 [RRSAgent]
I have made the request to generate yaso
15:28:35 [BernadetteLoscio]
15:36:03 [ericstephan]
Hi is everyone back?
15:50:23 [yaso]
yaso has joined #dwbp-datausage
15:51:35 [yaso]
yaso has left #dwbp-DataUsage
16:18:08 [HadleyBeeman]
rrsagent, draft minutes
16:18:08 [RRSAgent]
I have made the request to generate HadleyBeeman
16:27:59 [newton]
newton has joined #dwbp-DataUsage