Forest Fire Clustering: Cluster-oriented Label Propagation Clustering and Monte Carlo Verification Inspired by Forest Fire Dynamics
release_fa67tuo2qzahflozfwuxgzo4ku
by
Zhanlin Chen, Philip Tuckman, Jing Zhang, Mark Gerstein
2021
Abstract
Clustering methods group data points together and assign them group-level
labels. However, it has been difficult to evaluate the confidence of the
clustering results. Here, we introduce a novel method that could not only find
robust clusters but also provide a confidence score for the labels of each data
point. Specifically, we reformulated label-propagation clustering to model
after forest fire dynamics. The method has only one parameter - a fire
temperature term describing how easily one label propagates from one node to
the next. Through iteratively starting label propagations through a graph, we
can discover the number of clusters in a dataset with minimum prior
assumptions. Further, we can validate our predictions and uncover the posterior
probability distribution of the labels using Monte Carlo simulations. Lastly,
our iterative method is inductive and does not need to be retrained with the
arrival of new data. Here, we describe the method and provide a summary of how
the method performs against common clustering benchmarks.
In text/plain
format
Archived Content
There are no accessible files associated with this release. You could check other releases for this work for an accessible version.
Know of a fulltext copy of on the public web? Submit a URL and we will archive it
2103.11802v1
access all versions, variants, and formats of this works (eg, pre-prints)