Abstract
Heterogeneous systems are becoming increasingly popular, delivering high performance through hardware specialization. However, sequential data accesses may have a negative impact on performance. Data parallel solutions such as Polymorphic Register Files (PRFs) can potentially accelerate applications by facilitating high-speed, parallel access to performance-critical data. This article shows how PRFs can be integrated into dataflow computational platforms. Our semi-automatic, compiler-based methodology generates customized PRFs and modifies the computational kernels to efficiently exploit them. We use a separable 2D convolution case study to evaluate the impact of memory latency and bandwidth on performance compared to a state-of-the-art NVIDIA Tesla C2050GPU. We improve the throughput upto 56.17X and show that the PRF-augmented system outperforms the GPU for for 9 x 9 or larger mask sizes, even in bandwidth-constrained systems.
Original language | English |
---|---|
Pages (from-to) | 1185-1219 |
Number of pages | 35 |
Journal | International journal of parallel programming |
Volume | 46 |
Issue number | 6 |
DOIs | |
Publication status | Published - Dec-2018 |
Externally published | Yes |
Keywords
- Dataflow computing
- Parallel memory accesses
- Polymorphic register file
- Bandwidth
- Vector lanes
- Convolution
- High performance computing
- High-level synthesis
- ARCHITECTURE