An Efficient Data Structure for Fast Mining High Utility Itemsets
release_mda2fwcjm5fszhrdit7joonkf4
by
Zhi-Hong Deng, Shulei Ma, He Liu
2015
Abstract
In this paper, we propose a novel data structure called PUN-list, which
maintains both the utility information about an itemset and utility upper bound
for facilitating the processing of mining high utility itemsets. Based on
PUN-lists, we present a method, called MIP (Mining high utility Itemset using
PUN-Lists), for fast mining high utility itemsets. The efficiency of MIP is
achieved with three techniques. First, itemsets are represented by a highly
condensed data structure, PUN-list, which avoids costly, repeatedly utility
computation. Second, the utility of an itemset can be efficiently calculated by
scanning the PUN-list of the itemset and the PUN-lists of long itemsets can be
fast constructed by the PUN-lists of short itemsets. Third, by employing the
utility upper bound lying in the PUN-lists as the pruning strategy, MIP
directly discovers high utility itemsets from the search space, called
set-enumeration tree, without generating numerous candidates. Extensive
experiments on various synthetic and real datasets show that PUN-list is very
effective since MIP is at least an order of magnitude faster than recently
reported algorithms on average.
In text/plain
format
Archived Files and Locations
application/pdf 1.3 MB
file_qozqro4hqbdavnetohoxqwyi2q
|
arxiv.org (repository) web.archive.org (webarchive) |
1510.02188v1
access all versions, variants, and formats of this works (eg, pre-prints)