Gerbil: A Fast and Memory-Efficient k-mer Counter with GPU-Support
release_3xvtaxvuqrhizfazrhhfcd6ssi
by
Marius Erbert, Steffen Rechner, Matthias Müller-Hannemann
2016
Abstract
A basic task in bioinformatics is the counting of k-mers in genome strings.
The k-mer counting problem is to build a histogram of all substrings of
length k in a given genome sequence. We present the open source k-mer
counting software Gerbil that has been designed for the efficient counting of
k-mers for k≥32. Given the technology trend towards long reads of
next-generation sequencers, support for large k becomes increasingly
important. While existing k-mer counting tools suffer from excessive memory
resource consumption or degrading performance for large k, Gerbil is able to
efficiently support large k without much loss of performance. Our software
implements a two-disk approach. In the first step, DNA reads are loaded from
disk and distributed to temporary files that are stored at a working disk. In a
second step, the temporary files are read again, split into k-mers and
counted via a hash table approach. In addition, Gerbil can optionally use GPUs
to accelerate the counting step. For large k, we outperform state-of-the-art
open source k-mer counting tools for large genome data sets.
In text/plain
format
Archived Files and Locations
application/pdf 633.0 kB
file_iqjdwwnwnbet7lfc5fe7yz7pem
|
arxiv.org (repository) web.archive.org (webarchive) |
1607.06618v1
access all versions, variants, and formats of this works (eg, pre-prints)