KiWi: A Scalable Subspace Clustering Algorithm for Gene Expression
Analysis
release_6twh54lpfbhlxl3qq4lijclb6a
by
Obi L. Griffith, Byron J. Gao, Mikhail Bilenky, Yuliya Prichyna,
Martin Ester, Steven J.M. Jones
2009
Abstract
Subspace clustering has gained increasing popularity in the analysis of gene
expression data. Among subspace cluster models, the recently introduced
order-preserving sub-matrix (OPSM) has demonstrated high promise. An OPSM,
essentially a pattern-based subspace cluster, is a subset of rows and columns
in a data matrix for which all the rows induce the same linear ordering of
columns. Existing OPSM discovery methods do not scale well to increasingly
large expression datasets. In particular, twig clusters having few genes and
many experiments incur explosive computational costs and are completely pruned
off by existing methods. However, it is of particular interest to determine
small groups of genes that are tightly coregulated across many conditions. In
this paper, we present KiWi, an OPSM subspace clustering algorithm that is
scalable to massive datasets, capable of discovering twig clusters and
identifying negative as well as positive correlations. We extensively validate
KiWi using relevant biological datasets and show that KiWi correctly assigns
redundant probes to the same cluster, groups experiments with common clinical
annotations, differentiates real promoter sequences from negative control
sequences, and shows good association with cis-regulatory motif predictions.
In text/plain
format
Archived Files and Locations
application/pdf 968.4 kB
file_3tjzdohcuvcalmvduqgmxm2bvm
|
arxiv.org (repository) web.archive.org (webarchive) |
0904.1931v1
access all versions, variants, and formats of this works (eg, pre-prints)