Benchmarking Declarative Approximate Selection Predicates
release_jg3fqtqhmnamnb55wqvgzrxeua
by
Oktie Hassanzadeh
2009
Abstract
Declarative data quality has been an active research topic. The fundamental
principle behind a declarative approach to data quality is the use of
declarative statements to realize data quality primitives on top of any
relational data source. A primary advantage of such an approach is the ease of
use and integration with existing applications. Several similarity predicates
have been proposed in the past for common quality primitives (approximate
selections, joins, etc.) and have been fully expressed using declarative SQL
statements. In this thesis, new similarity predicates are proposed along with
their declarative realization, based on notions of probabilistic information
retrieval. Then, full declarative specifications of previously proposed
similarity predicates in the literature are presented, grouped into classes
according to their primary characteristics. Finally, a thorough performance and
accuracy study comparing a large number of similarity predicates for data
cleaning operations is performed.
In text/plain
format
Archived Files and Locations
application/pdf 478.4 kB
file_6aojbm7b3jb6fiiuptm2vb2y6a
|
arxiv.org (repository) web.archive.org (webarchive) |
0907.2471v1
access all versions, variants, and formats of this works (eg, pre-prints)