Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication release_bbkfuurrkbfajfhlhbfiqnwskq

by Grzegorz Kwasniewski, Joost VandeVondele Department of Computer Science, ETH Zurich, Swiss National Supercomputing Centre

Released as a article .

2019  

Abstract

We propose COSMA: a parallel matrix-matrix multiplication algorithm that is near communication-optimal for all combinations of matrix dimensions, processor counts, and memory sizes. The key idea behind COSMA is to derive an optimal (up to a factor of 0.03% for 10MB of fast memory) sequential schedule and then parallelize it, preserving I/O optimality. To achieve this, we use the red-blue pebble game to precisely model MMM dependencies and derive a constructive and tight sequential and parallel I/O lower bound proofs. Compared to 2D or 3D algorithms, which fix processor decomposition upfront and then map it to the matrix dimensions, it reduces communication volume by up to √(3) times. COSMA outperforms the established ScaLAPACK, CARMA, and CTF algorithms in all scenarios up to 12.8x (2.2x on average), achieving up to 88% of Piz Daint's peak performance. Our work does not require any hand tuning and is maintained as an open source implementation.
In text/plain format

Archived Files and Locations

application/pdf  2.4 MB
file_vjqqgyy7cjgszfky4civ5wzxge
arxiv.org (repository)
web.archive.org (webarchive)
Read Archived PDF
Preserved and Accessible
Type  article
Stage   submitted
Date   2019-08-29
Version   v2
Language   en ?
arXiv  1908.09606v2
Work Entity
access all versions, variants, and formats of this works (eg, pre-prints)
Catalog Record
Revision: 97b599f0-5c62-4d50-a602-8ed2be8a17e4
API URL: JSON