Extreme Scale FMM-Accelerated Boundary Integral Equation Solver for Wave
Scattering
release_2segh75rf5f2dmbt5cdr6qq7xe
by
Mustafa Abduljabbar, Mohammed Al Farhan, Noha Al-Harthi, Rui Chen, Rio
Yokota, Hakan Bagci, David Keyes
2018
Abstract
Algorithmic and architecture-oriented optimizations are essential for
achieving performance worthy of anticipated energy-austere exascale systems. In
this paper, we present an extreme scale FMM-accelerated boundary integral
equation solver for wave scattering, which uses FMM as a matrix-vector
multiplication inside the GMRES iterative method. Our FMM Helmholtz kernels
treat nontrivial singular and near-field integration points. We implement
highly optimized kernels for both shared and distributed memory, targeting
emerging Intel extreme performance HPC architectures. We extract the potential
thread- and data-level parallelism of the key Helmholtz kernels of FMM. Our
application code is well optimized to exploit the AVX-512 SIMD units of Intel
Skylake and Knights Landing architectures. We provide different performance
models for tuning the task-based tree traversal implementation of FMM, and
develop optimal architecture-specific and algorithm aware partitioning, load
balancing, and communication reducing mechanisms to scale up to 6,144 compute
nodes of a Cray XC40 with 196,608 hardware cores. With shared memory
optimizations, we achieve roughly 77% of peak single precision floating point
performance of a 56-core Skylake processor, and on average 60% of peak single
precision floating point performance of a 72-core KNL. These numbers represent
nearly 5.4x and 10x speedup on Skylake and KNL, respectively, compared to the
baseline scalar code. With distributed memory optimizations, on the other hand,
we report near-optimal efficiency in the weak scalability study with respect to
both the logarithmic communication complexity as well as the theoretical
scaling complexity of FMM. In addition, we exhibit up to 85% efficiency in
strong scaling. We compute in excess of 2 billion DoF on the full-scale of the
Cray XC40 supercomputer.
In text/plain
format
Archived Files and Locations
application/pdf 3.6 MB
file_hsdbtw5ctrht5omz63vycgwcki
|
arxiv.org (repository) web.archive.org (webarchive) |
1803.09948v1
access all versions, variants, and formats of this works (eg, pre-prints)