In this paper, we analyze a particular spatial locality case (called horizontal locality) inherent to manycore accelerator architectures employing barrel execution of SPMD kernels, such as GPUs. We then propose an adaptive memory access granularity framework to exploit and enforce the horizontal locality in order to reduce the interferences among accelerator cores memory accesses and hence improve DRAM efficiency. With the proposed technique, DRAM efficiency grows by 1.42X on average, resulting in 12.3% overall performance gain, for a set of representative memory intensive GPGPU applications.
- Memory hierarchy
- Multi-core/single-chip multiprocessors
- SIMD processors