Re: dwbp-ISSUE-94 (Git for data): Dataset versioning and dataset replication [Use Cases & Requirements Document]

About this issue

I read Rufus's post last year and paid attention to cases that could be
expressive for the wg. Despite OKFN has implemented many tools for data
using git concepts, I think that the best experience that we had with
github and data was a hackathon that we held with the Ministry of
Justice. We recieved pdf "data" [1], and the participants decided to
work on these files to serialize to csv, and the work was all made at
github.

We opened a list of issues and worked at the dataset untill it gets
ready to be used at the projects. [2].

Yes, all items raised by Peter Hanečák at the issue proved to be true
for git:

track changes in data
provide possibility to review the history of changes
provide audit trail
get access to whichever previous version of data, not only to most
recent version
get datasets updates more efficiently

However, I keep wondering if it is an ideal situation that the WG give
opinion on the use of a particular tool for version control of data as
best practice or if we just should list use cases and try to extract
procedures that can turn in to bp.

If the second option is true, maybe we can look at http://data.okfn.org/

Just to illustrate what I'm talking about tools, this one [3] for
geodata versioning is also interesting and inspired by git structure.


[1] https://github.com/W3CBrasil/PerguntasMJ/issues?q=is%3Aissue+is%3Aclosed

[2]
http://dados.gov.br/dataset?groups=defesa-seguranca&tags=acidentes+de+tr%C3%A2nsito

[3] http://geogig.org/


yaso





Em 11/10/14, 5:49 PM, Data on the Web Best Practices Working Group Issue
Tracker escreveu:
> dwbp-ISSUE-94 (Git for data): Dataset versioning and dataset replication [Use Cases & Requirements Document]
> 
> http://www.w3.org/2013/dwbp/track/issues/94
> 
> Raised by: Phil Archer
> On product: Use Cases & Requirements Document
> 
> <a href="https://www.w3.org/2013/dwbp/wiki/Second-Round_Use_Cases#Dataset_versioning_and_dataset_replication">Another use case</a> from Peter Hanečák poses the problem of tracking of changes to datasets which, AFAIAC is part of provenance but he goes deeper than that, which might be instructive. He goes on to suggest that the way to provide this info is to host datasets on a Git repository.
> 
> How does the WG wish to handle this use case?
> 
> 
> 


-- 
Brazilian Internet Steering Committee - CGI.br
W3C Brazil Office
@yaso - yaso.eu

55 11 5509-3537 (4025)
skype: yasocordova

Received on Tuesday, 11 November 2014 19:07:21 UTC