Precise Data Identification Services for Long Tail Research Data release_idtfqwyvv5acpicldujduw2jo4

by Stefan Proell, Kristof Meixner, Andreas Rauber

Published by Figshare.

2016  

Abstract

While sophisticated research infrastructures assist scientists<br>in managing massive volumes of data, the so-called long tail<br>of research data frequently suffers from a lack of such ser-<br>vices. This is mostly due to the complexity caused by the va-<br>riety of data to be managed and a lack of easily standardise-<br>able procedures in highly diverse research settings. Yet, as<br>even domains in this long tail of research data are increas-<br>ingly data-driven, scientists need efficient means to precisely<br>communicate, which version and subset of data was used in a<br>particular study to enable reproducibility and comparability<br>of result and foster data re-use.<br>This paper presents three implementations of systems sup-<br>porting such data identification services for comma sepa-<br>rated value (CSV) files, a dominant format for data ex-<br>change in these settings. The implementations are based<br>on the recommendations of the Working Group on Dynamic<br>Data Citation of the Research Data Alliance (RDA). They<br>provide implicit change tracking of all data modifications,<br>while precise subsets are identified via the respective subset-<br>ting process. These enhances reproducibility of experiments<br>and allows efficient sharing of specific subsets of data even<br>in highly dynamic data settings
In text/plain format

Archived Files and Locations

application/pdf  453.6 kB
file_wldhmiahhjfkhbpjamq6o75kye
s3-eu-west-1.amazonaws.com (web)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article-journal
Stage   published
Date   2016-09-22
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: d8de0a0d-3f5e-45af-9a37-a7b813cd9cc7
API URL: JSON