Re: thoughts on the dataset usage vocab from Annette Greiner on 2015-08-13 (public-dwbp-wg@w3.org from August 2015)

From: Annette Greiner <amgreiner@lbl.gov>
Date: Thu, 13 Aug 2015 13:27:12 -0700
To: Eric Stephan <ericphb@gmail.com>
Cc: Bernadette Farias Loscio <bfl@cin.ufpe.br>, "Purohit, Sumit" <sumit.purohit@pnnl.gov>, Data on the Web Best Practices Working Group <public-dwbp-wg@w3.org>
Message-Id: <B4D4492D-9A30-4801-94E2-71A90C2F77A9@lbl.gov>
Hi Eric,
(1) I was thinking it would be good to support the notion of fragment identifiers. I think that is more granular than a dataset.
(2) prov:Usage is interesting. The example of a podium being used leaves me wondering how/whether the authors expected it to be used in the context of datasets. It does seem like that’s something prov should be concerned with. I don’t know how well that would work in a context that was not using the rest of prov.
-Annette
--
Annette Greiner
NERSC Data and Analytics Services
Lawrence Berkeley National Laboratory
510-495-2935

On Aug 11, 2015, at 2:54 PM, Eric Stephan <ericphb@gmail.com> wrote:

> Hi Annette, 
> 
> Sorry for the long delay responding back to you.   I'll respond in two parts:
> 
> 1)  In terms of the question about granularity I am wondering if the resolution for 169 (https://www.w3.org/2013/dwbp/track/issues/169) discussed at the July 3 meeting  is sufficient to meet your needs including both the Dataset and Distribution in the model.  We still need to update the model.  Does this meeting your granularity needs?
> 
> 2) From a usage standpoint the way you are describing usage and the previous discussions (See http://www.w3.org/2013/meeting/dwbp/2015-07-03 and search for "Eric Stephan: good comments about vocab reuse" for beginning of discussion on https://www.w3.org/2013/dwbp/track/issues/170 where we looked at reusing the PROV vocabulary to describe usage.  I did a bit more digging and found a discussion on a class "prov:Usage" http://www.w3.org/TR/2013/REC-prov-o-20130430/#Usage .  The example shows how "prov:Role" shows the context of the usage.  
> 
> Thoughts?
> 
> Cheers,
> 
> Eric 
> 
>  
> 
> On Mon, Jul 27, 2015 at 1:10 AM, Annette Greiner <amgreiner@lbl.gov> wrote:
> Hi Eric,
> Folllowing up on Friday's discussion about the DUV, first I want to say that you have done a great job of thinking through a lot of the relationshipis between existing vocabularies and terms that we would need in a dataset usage vocabulary. There is already a lot of good information and useful stuff in there. I'd like to push things a little further toward addressing the use cases that come to mind for me when I think about what a dataset usage vocabulary might offer. As a developer, I want to find out about uses that others are making of the data that I make available, and there are a few aspects of those usages that are of particular interest. I think it would be very helpful if the vocabulary could provide means of expressing them.
> 
> It would be interesting to know whether others are using the full dataset or parts thereof. That helps me understand what is deemed useful and helps prioritize future work. One of the reasons I've been thinking of positioning an instance of dataset usage as an oa:annotation is that those annotations can apply at a pretty granular level, so it would be possible to express the usage of a subset of a dataset.
> 
> It would be useful to know whether others are using a dataset that I've published as an ongoing dependency or not. That is, did they pull the data once and are they using it without need to pull again, or are they calling the API at runtime? It's pretty common for at least one project I've worked on (the Materials Project) to have users that pull from their API a single time, to get a database of their own from which they can work locally. It is also possible for them to create a new web application that calls the API at runtime, which creates a dependency. If I needed to inform those who were using my API on an ongoing basis of some issue, knowing which people's work had dependencies on it would be a great help.
> 
> It would be useful for reporting to granting agencies to know how a published dataset is being used, whether for analysis, republishing, visualization, remixing, citation, description, correction, rating, critique, or  feedback. Some of these uses have much clearer value to the granting agency than others.
> 
> In the current model, it seems that feedback is the sole term that inherits from oa:annotation. I think of feedback as just one type of usage, and it seems more logical to me to have all types of usage inherit from oa: annotation, so that one can annotate the dataset with any of them. I imagine the original dataset would be the target and the new usage would be the body of an annotation with a motivation like "commenting" or "describing", or an extension motivation such as "visualizing" or "analyzing" or "remixing".
> 
> -Annette
>
Received on Thursday, 13 August 2015 20:29:51 UTC