End-to-End Entity Resolution for Big Data: A Survey
release_xw4jd57oyzdepkcnsrjkhub33y
by
Vassilis Christophides, Vasilis Efthymiou, Themis Palpanas, George
Papadakis, Kostas Stefanidis
2019
Abstract
One of the most important tasks for improving data quality and the
reliability of data analytics results is Entity Resolution (ER). ER aims to
identify different descriptions that refer to the same real-world entity, and
remains a challenging problem. While previous works have studied specific
aspects of ER (and mostly in traditional settings), in this survey, we provide
for the first time an end-to-end view of modern ER workflows, and of the novel
aspects of entity indexing and matching methods in order to cope with more than
one of the Big Data characteristics simultaneously. We present the basic
concepts, processing steps and execution strategies that have been proposed by
different communities, i.e., database, semantic Web and machine learning, in
order to cope with the loose structuredness, extreme diversity, high speed and
large scale of entity descriptions used by real-world applications. Finally, we
provide a synthetic discussion of the existing approaches, and conclude with a
detailed presentation of open research directions.
In text/plain
format
Archived Files and Locations
application/pdf 7.4 MB
file_dngqytrsknablkiyaytrpt2h7i
|
arxiv.org (repository) web.archive.org (webarchive) |
1905.06397v2
access all versions, variants, and formats of this works (eg, pre-prints)