Efficient Sparse Clustering of High-Dimensional Non-spherical Gaussian Mixtures release_fuhe65ojezbqvbr7dwe7zesdyq

by Martin Azizyan and Aarti Singh and Larry Wasserman

Released as a article .

2014  

Abstract

We consider the problem of clustering data points in high dimensions, i.e. when the number of data points may be much smaller than the number of dimensions. Specifically, we consider a Gaussian mixture model (GMM) with non-spherical Gaussian components, where the clusters are distinguished by only a few relevant dimensions. The method we propose is a combination of a recent approach for learning parameters of a Gaussian mixture model and sparse linear discriminant analysis (LDA). In addition to cluster assignments, the method returns an estimate of the set of features relevant for clustering. Our results indicate that the sample complexity of clustering depends on the sparsity of the relevant feature set, while only scaling logarithmically with the ambient dimension. Additionally, we require much milder assumptions than existing work on clustering in high dimensions. In particular, we do not require spherical clusters nor necessitate mean separation along relevant dimensions.
In text/plain format

Archived Files and Locations

application/pdf  232.6 kB
file_3ibsaukzrfhbhcaxqcvshbqm6i
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2014-06-09
Version   v1
Language   en ?
arXiv  1406.2206v1
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: ffdd2e6b-a6b8-4966-91cf-0184f9120fb8
API URL: JSON