Data Curation with Deep Learning [Vision] release_ymch4jazxzanzpv7dbmhl5beiy

by Saravanan Thirumuruganathan, Nan Tang, Mourad Ouzzani, AnHai Doan

Released as a article .

2019  

Abstract

Data curation - the process of discovering, integrating, and cleaning data - is one of the oldest, hardest, yet inevitable data management problems. Despite decades of efforts from both researchers and practitioners, it is still one of the most time consuming and least enjoyable work of data scientists. In most organizations, data curation plays an important role so as to fully unlock the value of big data. Unfortunately, the current solutions are not keeping up with the ever-changing data ecosystem, because they often require substantially high human cost. Meanwhile, deep learning is making strides in achieving remarkable successes in multiple areas, such as image recognition, natural language processing, and speech recognition. In this vision paper, we explore how some of the fundamental innovations in deep learning could be leveraged to improve existing data curation solutions and to help build new ones. In particular, we provide a thorough overview of the current deep learning landscape, and identify interesting research opportunities and dispel common myths. We hope that the synthesis of these important domains will unleash a series of research activities that will lead to significantly improved solutions for many data curation tasks.
In text/plain format

Archived Files and Locations

application/pdf  4.2 MB
file_5pxlxjivcffzjlhbk4tuixcvmu
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2019-03-24
Version   v2
Language   en ?
arXiv  1803.01384v2
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 08486379-9906-41d0-9bf0-9a55f4d15e1a
API URL: JSON