Venkata, Shamis, Sampath, Graham, Ladd, 2013. Optimizing blocking and nonblocking reduction operations for multicore systems: Hierarchical design and implementation, in: . IEEE.. https://doi.org/10.1109/cluster.2013.6702676