IRC log of dwbp-DataUsage on 2014-04-01

Timestamps are in UTC.

09:23:45 [yaso]
We should work here
My mic is not working but i can listen
Good morning everyone
Data set selection might mean many things to many groups....
Yes I can here a bit :-)_
Knowing data is credible or trustworthy would be extremely important
Maybe we can have a feedback on the data to reinforce the quality of the data
Yes very good especially feedback from respected experts
A leading researcher feedback versus a non-expert
Would this help?
So citations, scholarly value, might be useful?
Eric: suggests
Hi Eric!
Hello Bernadette!
Isn't one simple use case of API and data selection the use mime type?
I’m not feeling good :/
I use the msm to describe services
really interesting
For some datasets (terascale and petascale) and in science it is more advantageous to move the API to the data rather than having the API operate on the data. From a data usage perspective I might want to know what API I could operate on data types I was already aware.
im not sure if i understand :(
It’s a problem of performance, Eric?
That is correct
It would also be true of streaming data
Streaming data as opposed to a fixed dataset
Streaming data is about real-time data, you mean?
That is also good providing information about the organization or individual using the project
Data provenance is very important, it might also be important to describe what doesn't work with the dataset
Who is publishing and who is using the data, how it is being used, and what APIs are capable of using the data?
Yes As Phil was talking about yesterday how data is associated with other data?
PDF file was read and a table was generated from that. How do we describe that association ?
sooo...describing dataset from a Data governance, ownership, stewardship, access from a community perspective? Interesting
yesyou can also describe the associations of the datasets you’re using
Its almost like a differentiating data in the wild from "formalized" datasets?
A use case might be relying on Google Maps for some data but adding my own Point of Interest mapping points to a map. You could rely on Google Maps but maybe not my POI data
Yes, this is a good use case. Just like about drugs: if I have a index of drugs and take a dataset of FDA (if it was open) and then I add my impressions about each drug, to share
10:41:05 [ericstephan]
Depending on your perspective the "expertise" could be relative. What the FDA says versus personal experience
I add this 2 cases on the wiki
visual analytics might be another example
ericstephan: vocab should enable privacy config
11:46:41 [adrianov_]
newton_: an important point (related to revenue) is how to value the data
11:47:36 [adrianov_]
BernadetteLoscio_: the vocab should reflect the process of charging
11:49:22 [adrianov_]
BernadetteLoscio_: discussion on wether or not SLAs are on the scope of data usage
Maybe the providers of data need to know the fee for serving their data
BernadetteLoscio_: discussing about the scope: privacy, revenue, traceability and gathering feedback
ALL: scope includes traceability, gathering feedback, and other aspects, namely privacy and revenue
13:25:00 [adrianov]
BernadetteLoscio: other aspects also include provenance
BernadetteLoscio: our focus is on who is using the data
BernadetteLoscio: we are going to organize all items collected in the first brainstorm
13:35:11 [newton]
... and classify them into categories: Traceability, Feedback, Other aspects (including data provenance, revenue and privacy).
13:43:12 [ericstephan]
I think there are many aspects of the provenance vocabulary we could borrow or use as a basis. The difference being the PROV is describing what happened Data Usage describes what is possible.
This is the link of Data usage notes
general challenges
13:50:26 [ericstephan]
To Me the points in Dataset selection/Processing/usability can be organized under: Who What When Why How
14:04:46 [ericstephan]
I think of provenance as just "Data Usage History" from our perspective
14:05:04 [ericstephan]
Does this make sense?
14:06:12 [ericstephan]
I have a dataset A, here is how it was used, who used it, and here is how they used it. This is the data usage history...
makes sense to me
14:08:03 [ericstephan]
Its past (PROV) and present/future (Data Usage) tense use of data.
14:08:56 [ericstephan]
It is very complementary to provenance
14:08:58 [ericstephan]
BernadetteLoscio: the point now is "how can we, as consumer of data, give a feedback about the dataset"
14:19:50 [ericstephan]
There might be different kinds of feedback, blogging versus following a protocol?
14:20:33 [BernadetteLoscio]
we can describe data feedback on a machine readable format?
In some cases the machines will be giving the feedback
do you know if there is a vocab for this?
It almost follows under document transclusion
do you think that this should include in the data usage vocabulary?
I wonder if something is available under BFO
BFO is a vocab?
Basic Foundation Ontology Its something that came out of the BIomedical community to manage research data
yeah... maybe... i dont know this
OBI and AIO use BFO....
I know a little about it, OBI is used to describe how data is processed or used and I'm not sure if it handles feedback but I can check just a sec....
Example way of leveraging PROV as a base line Instead of prov:wasGeneratedBy we use duv:Generates
