Identifying data sharing in biomedical literature release_mlxar2a3c5hw7bddb5wl3536ly

by Heather A Piwowar, Wendy W Chapman, Wendy Chapman

Published in AMIA Annual Symposium Proceedings.

2008   p596-600

Abstract

Many policies and projects now encourage investigators to share their raw research data with other scientists. Unfortunately, it is difficult to measure the effectiveness of these initiatives because data can be shared in such a variety of mechanisms and locations. We propose a novel approach to finding shared datasets: using NLP techniques to identify declarations of dataset sharing within the full text of primary research articles. Using regular expression patterns and machine learning algorithms on open access biomedical literature, our system was able to identify 61% of articles with shared datasets with 80% precision. A simpler version of our classifier achieved higher recall (86%), though lower precision (49%). We believe our results demonstrate the feasibility of this approach and hope to inspire further study of dataset retrieval techniques and policy evaluation.
In text/plain format

Archived Files and Locations

application/pdf  181.8 kB
file_zl3e53ov45faveqdahotekwpc4
europepmc.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article-journal
Stage   published
Date   2008-11-06
Language   en ?
PubMed  18998887
PMC  PMC2655927
Proceedings Metadata
Open Access Publication
Not in DOAJ
In ISSN ROAD
In Keepers Registry
ISSN-L:  1559-4076
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 8acffa7e-90f4-4fac-82e8-e6a9550d2b3f
API URL: JSON