TY - JOUR
T1 - The Case for Polymorphic Registers in Dataflow Computing
AU - Ciobanu, Catalin Bogdan
AU - Gaydadjiev, Georgi
AU - Pilato, Christian
AU - Sciuto, Donatella
PY - 2018/12
Y1 - 2018/12
N2 - Heterogeneous systems are becoming increasingly popular, delivering high performance through hardware specialization. However, sequential data accesses may have a negative impact on performance. Data parallel solutions such as Polymorphic Register Files (PRFs) can potentially accelerate applications by facilitating high-speed, parallel access to performance-critical data. This article shows how PRFs can be integrated into dataflow computational platforms. Our semi-automatic, compiler-based methodology generates customized PRFs and modifies the computational kernels to efficiently exploit them. We use a separable 2D convolution case study to evaluate the impact of memory latency and bandwidth on performance compared to a state-of-the-art NVIDIA Tesla C2050GPU. We improve the throughput upto 56.17X and show that the PRF-augmented system outperforms the GPU for for 9 x 9 or larger mask sizes, even in bandwidth-constrained systems.
AB - Heterogeneous systems are becoming increasingly popular, delivering high performance through hardware specialization. However, sequential data accesses may have a negative impact on performance. Data parallel solutions such as Polymorphic Register Files (PRFs) can potentially accelerate applications by facilitating high-speed, parallel access to performance-critical data. This article shows how PRFs can be integrated into dataflow computational platforms. Our semi-automatic, compiler-based methodology generates customized PRFs and modifies the computational kernels to efficiently exploit them. We use a separable 2D convolution case study to evaluate the impact of memory latency and bandwidth on performance compared to a state-of-the-art NVIDIA Tesla C2050GPU. We improve the throughput upto 56.17X and show that the PRF-augmented system outperforms the GPU for for 9 x 9 or larger mask sizes, even in bandwidth-constrained systems.
KW - Dataflow computing
KW - Parallel memory accesses
KW - Polymorphic register file
KW - Bandwidth
KW - Vector lanes
KW - Convolution
KW - High performance computing
KW - High-level synthesis
KW - ARCHITECTURE
UR - http://www.scopus.com/inward/record.url?scp=85019115897&partnerID=8YFLogxK
U2 - 10.1007/s10766-017-0494-1
DO - 10.1007/s10766-017-0494-1
M3 - Article
SN - 0885-7458
VL - 46
SP - 1185
EP - 1219
JO - International journal of parallel programming
JF - International journal of parallel programming
IS - 6
ER -