Warning:
This wiki has been archived and is now read-only.

Timisoara/Scribe/Robert Ulrich - re3data.org - making research data repositories discoverable

From Share-PSI EC Project
Jump to: navigation, search
  • Facilitator: Robert Urlich
  • Scribe: Benedikt Kämpgen

Summary

  • Publishing Open research data has similar but aggravated problems than publishing general Open Data.
  • Diversity of data, requirements, use cases.
  • No technicians
  • Not to mention archiving requirements
  • Tools for publishing research data are available
    • Data repositories (re3data.org, more than 1,000 ones)
    • Semantic MediaWiki
    • Linked Data
  • However, the fundamental problem we see in the way many researchers do their research
    • Downloading data
    • Transforming it into formats of commercial tools
    • Funded for their publications and not for datasets.
  • Better would be a webby-way
    • To have research pipelines that are fully reproducible.
    • Access and produce Open Research Data over the web.
    • Possible best practices:
      • Research data should not be forgotten as part of PSI and serve a stress test of the results of Share-PSI.
      • Researchers should be trained in data science techniques.

Scribe Notes

  • Research data is very specific to the fields.

Everyone presenting himself/herself:

  • Robert Ulrich
  • http://www.re3data.org/
  • Here for gathering feedback about a best practice in publishing research data
  • Camelia
    • What exactly is published?
  • Robert
    • No research papers.
    • Research data repositories are indexed.
  • Andrew McKenzie
    • Research and teaching
    • Open Knowledge Foundation
    • Open Academy
    • Data Journalism
    • Social sciences
  • Camelia Margea
  • More recently confronted with the opening phenomenon during activities for the Faculty’s journal:
    • http://www.tjeb.ro
    • West University of Timisoara, Faculty of Economics and Business Administration - Business Information Systems Department
    • Opening up research
  • Adina Barbulescu
    • West University of Timisoara, Faculty of Economics and Business Administration - Economics department
  • Benedikt
    • KIT in Karlsruhe
    • Two ways to publish data
      • Semantic MediaWiki
      • Linked Data (RDF)
  • Andras
    • Hungary
    • Research institute of informatics
    • Social science repository
    • Publishing a bilingual economics journal.
    • REPEC (indexing mechanism for articles, intelligent journal: text, data, algorithms)
    • RDF browser and editor (lodmilla)
Social science data archive:
http://openarchive.tk.mta.hu/117/

Sharing Linked data visualization:
http://lodmilla.sztaki.hu/lodmilla/?id=hhaeuotogto2k565i7644folcg

  • Benedikt: How much manual effort to connect articles with datasets
    • Andras: Too much for most researchers.
  • Benedikt: I guess problem is that annotation is only done after the research is finished.
  • Wondering whether there are approaches for publishing your data during the research.
  • Robert: 1) problem because of privacy and data security. 2) researchers do not want to make available raw data.
  • Andrew: Also, even after the paper has been published, they would not do it.
  • It is also a funding topic. If people are funded to publish their data, they are more willing to do it.
  • Andras: Benefit: Have dataset as an own publication.
  • Andrew
    • A lot that drives Open Data is outside of the public market.
    • People needs rewards, but money may not be enough, but publishing.
    • Raw data usually are not project deliverables (as final published papers are) – this could be another reason why researchers are not motivated to publish data, even the research is founded (thus, financially rewarded) – Robert and Camelia agreed
    • Andrew also said that researchers can be motivated (or informed about that aspect) by citations to their work, that could be actually doubled when referencing paper+data


  • Robert
    • Scientists are coming and asking how to publish data.
    • Researchers are focusing on their research they only want a short "how-to publish research data".
  • Benedikt: How to compare data portal?
    • Robert: icon system
    • Ranked by the icon
    • Currently indexed around 1,000 data repositories.
    • Some would also demand money for their service.
  • Adina
    • Are repositories categorised?
    • Robert: Categorisation is difficult if it goes into details.
  • Robert
    • We use classification of DFG in Germany.
    • Adopt the fields every two year.
  • Andrew: Do you scrape the data, or how do you organise?
    • We collect research data repositories manually.
    • We have experts evaluating such repositories.
    • Currently manual.
    • Goal would be to crawl the metadata about the repository.
  • Andras:
    • Why not link to DBpedia?
  • Andrew
    • Would it not be possible to broadcast instead of approaching single repositories.
  • Robert:
    • Librarians would concentrate on the accuracy of the data.
    • Technicians (with a Web background) more concentrate on completeness and minimised manual effort.
  • Benedikt:
    • What are the disadvantages of simply providing your research data as a CSV or RDF on the institutes website.
    • Robert:
      • The problem is long term preservation.
      • There are discussions about what data to forget
    • Andras:
      • I.e. data cemetery
      • Also magnetic tape?
      • Robert: Still problematic because there may not be software to read the data.
  • Andras:
    • What are your plans?
    • Archive may not be large.
  • Andrew
    • I think you should separately view the archiving problem from the problem of making research data available during a project.
    • The problem is the funding of the publishers.
  • Andrew
    • Are there tools?
    • We had problems with dcat
    • Are there tools to automatically create RDF.
  • Andras
    • Problem that researchers use commercial software.
    • Not sufficient interoperability of tools
    • Requires switches of how to do data-driven research
  • Camelia
    • Probably also a teaching issue.
      • Robert: old habits hard to change (remarked over time how difficult users switch from Google drive to Wiki – actually, almost no switch, no matter how friendly the editor/GUI is)
      • Camelia
        • Presenting open source alternatives to each software could be a solution for preparing future researchers to see and use in other ways their data (a broader perspective).