ScRAE: Deterministic Regularized Autoencoders with Flexible Priors for Clustering Single-cell Gene Expression Data
release_6v4wr3irenbb3cktersgyutzji
by
Arnab Kumar Mondal, Himanshu Asnani, Parag Singla, Prathosh AP
2021
Abstract
Clustering single-cell RNA sequence (scRNA-seq) data poses statistical and
computational challenges due to their high-dimensionality and data-sparsity,
also known as `dropout' events. Recently, Regularized Auto-Encoder (RAE) based
deep neural network models have achieved remarkable success in learning robust
low-dimensional representations. The basic idea in RAEs is to learn a
non-linear mapping from the high-dimensional data space to a low-dimensional
latent space and vice-versa, simultaneously imposing a distributional prior on
the latent space, which brings in a regularization effect. This paper argues
that RAEs suffer from the infamous problem of bias-variance trade-off in their
naive formulation. While a simple AE without a latent regularization results in
data over-fitting, a very strong prior leads to under-representation and thus
bad clustering. To address the above issues, we propose a modified RAE
framework (called the scRAE) for effective clustering of the single-cell RNA
sequencing data. scRAE consists of deterministic AE with a flexibly learnable
prior generator network, which is jointly trained with the AE. This facilitates
scRAE to trade-off better between the bias and variance in the latent space. We
demonstrate the efficacy of the proposed method through extensive
experimentation on several real-world single-cell Gene expression datasets.
In text/plain
format
Archived Files and Locations
application/pdf 5.8 MB
file_5fqcrrz32bf2poszsnk6xrfupi
|
arxiv.org (repository) web.archive.org (webarchive) |
2107.07709v1
access all versions, variants, and formats of this works (eg, pre-prints)