An Efficient Data Structure for Fast Mining High Utility Itemsets release_mda2fwcjm5fszhrdit7joonkf4

by Zhi-Hong Deng, Shulei Ma, He Liu

Released as a article .

2015  

Abstract

In this paper, we propose a novel data structure called PUN-list, which maintains both the utility information about an itemset and utility upper bound for facilitating the processing of mining high utility itemsets. Based on PUN-lists, we present a method, called MIP (Mining high utility Itemset using PUN-Lists), for fast mining high utility itemsets. The efficiency of MIP is achieved with three techniques. First, itemsets are represented by a highly condensed data structure, PUN-list, which avoids costly, repeatedly utility computation. Second, the utility of an itemset can be efficiently calculated by scanning the PUN-list of the itemset and the PUN-lists of long itemsets can be fast constructed by the PUN-lists of short itemsets. Third, by employing the utility upper bound lying in the PUN-lists as the pruning strategy, MIP directly discovers high utility itemsets from the search space, called set-enumeration tree, without generating numerous candidates. Extensive experiments on various synthetic and real datasets show that PUN-list is very effective since MIP is at least an order of magnitude faster than recently reported algorithms on average.
In text/plain format

Archived Files and Locations

application/pdf  1.3 MB
file_qozqro4hqbdavnetohoxqwyi2q
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2015-10-08
Version   v1
Language   en ?
arXiv  1510.02188v1
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 8590f328-2f8b-476a-97f6-bcba314a773d
API URL: JSON