Timisoara/Scribe/Robert Ulrich - re3data.org - making research data repositories discoverable
From Share-PSI EC Project
- Facilitator: Robert Urlich
- Scribe: Benedikt Kämpgen
Summary
- Publishing Open research data has similar but aggravated problems than publishing general Open Data.
- Diversity of data, requirements, use cases.
- No technicians
- Not to mention archiving requirements
- Tools for publishing research data are available
- Data repositories (re3data.org, more than 1,000 ones)
- Semantic MediaWiki
- Linked Data
- However, the fundamental problem we see in the way many researchers do their research
- Downloading data
- Transforming it into formats of commercial tools
- Funded for their publications and not for datasets.
- Better would be a webby-way
- To have research pipelines that are fully reproducible.
- Access and produce Open Research Data over the web.
- Possible best practices:
- Research data should not be forgotten as part of PSI and serve a stress test of the results of Share-PSI.
- Researchers should be trained in data science techniques.
Scribe Notes
- Research data is very specific to the fields.
Everyone presenting himself/herself:
- Robert Ulrich
- http://www.re3data.org/
- Here for gathering feedback about a best practice in publishing research data
- Camelia
- What exactly is published?
- Robert
- No research papers.
- Research data repositories are indexed.
- Andrew McKenzie
- Research and teaching
- Open Knowledge Foundation
- Open Academy
- Data Journalism
- Social sciences
- Camelia Margea
- More recently confronted with the opening phenomenon during activities for the Faculty’s journal:
- http://www.tjeb.ro
- West University of Timisoara, Faculty of Economics and Business Administration - Business Information Systems Department
- Opening up research
- Adina Barbulescu
- West University of Timisoara, Faculty of Economics and Business Administration - Economics department
- Benedikt
- KIT in Karlsruhe
- Two ways to publish data
- Semantic MediaWiki
- Linked Data (RDF)
- Andras
- Hungary
- Research institute of informatics
- Social science repository
- Publishing a bilingual economics journal.
- REPEC (indexing mechanism for articles, intelligent journal: text, data, algorithms)
- RDF browser and editor (lodmilla)
Social science data archive: http://openarchive.tk.mta.hu/117/ Sharing Linked data visualization: http://lodmilla.sztaki.hu/lodmilla/?id=hhaeuotogto2k565i7644folcg
- Benedikt: How much manual effort to connect articles with datasets
- Andras: Too much for most researchers.
- Benedikt: I guess problem is that annotation is only done after the research is finished.
- Wondering whether there are approaches for publishing your data during the research.
- Robert: 1) problem because of privacy and data security. 2) researchers do not want to make available raw data.
- Andrew: Also, even after the paper has been published, they would not do it.
- It is also a funding topic. If people are funded to publish their data, they are more willing to do it.
- Andras: Benefit: Have dataset as an own publication.
- Andrew
- A lot that drives Open Data is outside of the public market.
- People needs rewards, but money may not be enough, but publishing.
- Raw data usually are not project deliverables (as final published papers are) – this could be another reason why researchers are not motivated to publish data, even the research is founded (thus, financially rewarded) – Robert and Camelia agreed
- Andrew also said that researchers can be motivated (or informed about that aspect) by citations to their work, that could be actually doubled when referencing paper+data
- Robert
- Scientists are coming and asking how to publish data.
- Researchers are focusing on their research they only want a short "how-to publish research data".
- Benedikt: How to compare data portal?
- Robert: icon system
- Ranked by the icon
- Currently indexed around 1,000 data repositories.
- Some would also demand money for their service.
- Adina
- Are repositories categorised?
- Robert: Categorisation is difficult if it goes into details.
- Robert
- We use classification of DFG in Germany.
- Adopt the fields every two year.
- Andrew: Do you scrape the data, or how do you organise?
- We collect research data repositories manually.
- We have experts evaluating such repositories.
- Currently manual.
- Goal would be to crawl the metadata about the repository.
- Andras:
- Why not link to DBpedia?
- Andrew
- Would it not be possible to broadcast instead of approaching single repositories.
- Robert:
- Librarians would concentrate on the accuracy of the data.
- Technicians (with a Web background) more concentrate on completeness and minimised manual effort.
- Benedikt:
- What are the disadvantages of simply providing your research data as a CSV or RDF on the institutes website.
- Robert:
- The problem is long term preservation.
- There are discussions about what data to forget
- Andras:
- I.e. data cemetery
- Also magnetic tape?
- Robert: Still problematic because there may not be software to read the data.
- Andras:
- What are your plans?
- Archive may not be large.
- Andrew
- I think you should separately view the archiving problem from the problem of making research data available during a project.
- The problem is the funding of the publishers.
- Andrew
- Are there tools?
- We had problems with dcat
- Are there tools to automatically create RDF.
- Andras
- Problem that researchers use commercial software.
- Not sufficient interoperability of tools
- Requires switches of how to do data-driven research
- Camelia
- Probably also a teaching issue.
- Robert: old habits hard to change (remarked over time how difficult users switch from Google drive to Wiki – actually, almost no switch, no matter how friendly the editor/GUI is)
- Camelia
- Presenting open source alternatives to each software could be a solution for preparing future researchers to see and use in other ways their data (a broader perspective).
- Probably also a teaching issue.