Data Curation with Deep Learning [Vision]
release_ymch4jazxzanzpv7dbmhl5beiy
by
Saravanan Thirumuruganathan, Nan Tang, Mourad Ouzzani, AnHai Doan
2019
Abstract
Data curation - the process of discovering, integrating, and cleaning data -
is one of the oldest, hardest, yet inevitable data management problems. Despite
decades of efforts from both researchers and practitioners, it is still one of
the most time consuming and least enjoyable work of data scientists. In most
organizations, data curation plays an important role so as to fully unlock the
value of big data. Unfortunately, the current solutions are not keeping up with
the ever-changing data ecosystem, because they often require substantially high
human cost. Meanwhile, deep learning is making strides in achieving remarkable
successes in multiple areas, such as image recognition, natural language
processing, and speech recognition. In this vision paper, we explore how some
of the fundamental innovations in deep learning could be leveraged to improve
existing data curation solutions and to help build new ones. In particular, we
provide a thorough overview of the current deep learning landscape, and
identify interesting research opportunities and dispel common myths. We hope
that the synthesis of these important domains will unleash a series of research
activities that will lead to significantly improved solutions for many data
curation tasks.
In text/plain
format
Archived Files and Locations
application/pdf 4.2 MB
file_5pxlxjivcffzjlhbk4tuixcvmu
|
arxiv.org (repository) web.archive.org (webarchive) |
1803.01384v2
access all versions, variants, and formats of this works (eg, pre-prints)