Efficient Sparse Clustering of High-Dimensional Non-spherical Gaussian
Mixtures
release_fuhe65ojezbqvbr7dwe7zesdyq
by
Martin Azizyan and Aarti Singh and Larry Wasserman
2014
Abstract
We consider the problem of clustering data points in high dimensions, i.e.
when the number of data points may be much smaller than the number of
dimensions. Specifically, we consider a Gaussian mixture model (GMM) with
non-spherical Gaussian components, where the clusters are distinguished by only
a few relevant dimensions. The method we propose is a combination of a recent
approach for learning parameters of a Gaussian mixture model and sparse linear
discriminant analysis (LDA). In addition to cluster assignments, the method
returns an estimate of the set of features relevant for clustering. Our results
indicate that the sample complexity of clustering depends on the sparsity of
the relevant feature set, while only scaling logarithmically with the ambient
dimension. Additionally, we require much milder assumptions than existing work
on clustering in high dimensions. In particular, we do not require spherical
clusters nor necessitate mean separation along relevant dimensions.
In text/plain
format
Archived Files and Locations
application/pdf 232.6 kB
file_3ibsaukzrfhbhcaxqcvshbqm6i
|
arxiv.org (repository) web.archive.org (webarchive) |
1406.2206v1
access all versions, variants, and formats of this works (eg, pre-prints)