High-performance sparse matrix-matrix products on Intel KNL and
multicore architectures
release_fcrxzslsireo7kmdakcc4yq7ma
by
Yusuke Nagasaka, Satoshi Matsuoka, Ariful Azad, Aydın Buluç
2018
Abstract
Sparse matrix-matrix multiplication (SpGEMM) is a computational primitive
that is widely used in areas ranging from traditional numerical applications to
recent big data analysis and machine learning. Although many SpGEMM algorithms
have been proposed, hardware specific optimizations for multi- and many-core
processors are lacking and a detailed analysis of their performance under
various use cases and matrices is not available. We firstly identify and
mitigate multiple bottlenecks with memory management and thread scheduling on
Intel Xeon Phi (Knights Landing or KNL). Specifically targeting multi- and
many-core processors, we develop a hash-table-based algorithm and optimize a
heap-based shared-memory SpGEMM algorithm. We examine their performance
together with other publicly available codes. Different from the literature,
our evaluation also includes use cases that are representative of real graph
algorithms, such as multi-source breadth-first search or triangle counting. Our
hash-table and heap-based algorithms are showing significant speedups from
libraries in the majority of the cases while different algorithms dominate the
other scenarios with different matrix size, sparsity, compression factor and
operation type. We wrap up in-depth evaluation results and make a recipe to
give the best SpGEMM algorithm for target scenario. A critical finding is that
hash-table-based SpGEMM gets a significant performance boost if the nonzeros
are not required to be sorted within each row of the output matrix.
In text/plain
format
Archived Files and Locations
application/pdf 1.1 MB
file_5x76zebf4zbhvgmgmdd225jjdq
|
arxiv.org (repository) web.archive.org (webarchive) |
1804.01698v1
access all versions, variants, and formats of this works (eg, pre-prints)