TY - GEN
T1 - On implementability of polymorphic register files
AU - Ciobanu, Cǎtǎlin
AU - Kuzmanov, Georgi
AU - Gaydadjiev, Georgi
PY - 2012/11/23
Y1 - 2012/11/23
N2 - This paper studies the implementability of performance efficient multi-lane Polymorphic Register Files (PRFs). Our PRF implementation uses a 2D array of p x q linearly addressable memory banks, with customized addressing functions to avoid address routing circuits. We target one single-view and a set of four non redundant multi-view parallel memory schemes that cover all widely used access patterns in scientific and multimedia applications: 1) p x q rectangle, p·q row, p·q main and secondary diagonals; 2) p x q rectangle, p·q column, p·q main and secondary diagonals; 3) p·q row, p·q column, aligned pxq rectangle; 4) pxq, q xp rectangles (transposition). Reconfigurable hardware was chosen for the implementation due to its potential in enhancing the PRF runtime adaptability. For a proof of concept, we prototyped a 2 read, 1 write ports PRF on a Virtex-7 XC7VX1140T-2 FPGA. We consider four sizes for the 16 lanes PRFs - 16x16, 32x32, 64x64 and 128x128 and three multi-lane configurations, 8, 16 and 32, for the 128 x 128 PRF. Synthesis results suggest clock frequencies between 111 MHz and 326 MHz while utilizing less than 10% of the available LUTs. By using customized addressing functions, the LUT usage is reduced by up to 29% and the clock frequency is up to 77% higher compared to a straight-forward implementation.
AB - This paper studies the implementability of performance efficient multi-lane Polymorphic Register Files (PRFs). Our PRF implementation uses a 2D array of p x q linearly addressable memory banks, with customized addressing functions to avoid address routing circuits. We target one single-view and a set of four non redundant multi-view parallel memory schemes that cover all widely used access patterns in scientific and multimedia applications: 1) p x q rectangle, p·q row, p·q main and secondary diagonals; 2) p x q rectangle, p·q column, p·q main and secondary diagonals; 3) p·q row, p·q column, aligned pxq rectangle; 4) pxq, q xp rectangles (transposition). Reconfigurable hardware was chosen for the implementation due to its potential in enhancing the PRF runtime adaptability. For a proof of concept, we prototyped a 2 read, 1 write ports PRF on a Virtex-7 XC7VX1140T-2 FPGA. We consider four sizes for the 16 lanes PRFs - 16x16, 32x32, 64x64 and 128x128 and three multi-lane configurations, 8, 16 and 32, for the 128 x 128 PRF. Synthesis results suggest clock frequencies between 111 MHz and 326 MHz while utilizing less than 10% of the available LUTs. By using customized addressing functions, the LUT usage is reduced by up to 29% and the clock frequency is up to 77% higher compared to a straight-forward implementation.
UR - http://www.scopus.com/inward/record.url?scp=84869216676&partnerID=8YFLogxK
U2 - 10.1109/ReCoSoC.2012.6322873
DO - 10.1109/ReCoSoC.2012.6322873
M3 - Conference contribution
AN - SCOPUS:84869216676
SN - 9781467325721
T3 - ReCoSoC 2012 - 7th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip, Proceedings
BT - ReCoSoC 2012 - 7th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip, Proceedings
PB - IEEE
T2 - 7th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip, ReCoSoC 2012
Y2 - 9 July 2012 through 11 July 2012
ER -