Li, et al.. Locality-aware CTA Clustering for Modern Gpus. ACM Press, 2017, doi:10.1145/3037697.3037709.