Penn Logo
Vertical Line

Implementation of Computation Group

Divider

Impact of Parallelism and Memory Architecture on FPGA Energy Consumption

Edin Kadric, David Lakata, and André DeHon
ACM Transactions on Reconfigurable Technology and Systems (TRETS) , Volume 9, Number 4, Article No. 30, DOI: 10.1145/2857057, August, 2016.


The energy in FPGA computations is dominated by data communication energy, either in the form of memory references or data movement on interconnect. In this paper, we explore how to use data placement and parallelism to reduce communication energy. We show that parallelism can reduce energy and that the optimal level of parallelism increases with the problem size. We further explore how FPGA memory architecture (memory block size(s), memory banking, and spacing between memory banks) can impact communication energy, and determine how to organize the memory architecture to guarantee that the energy overhead compared to the optimally matched architecture for the design is never more than 60%. We specifically show that an architecture with 32-bit wide, 16Kb internally-banked memories placed every 8 columns of 10 4-LUT Logic Blocks is within 61% of the optimally matched architecture across the VTR~7 benchmark set and a set of parallelism-tunable benchmarks. Without internal banking, the worst-case overhead is 98%, achieved with an architecture with 32-bit wide, 8Kb memories placed every 9 columns, roughly comparable to the memory organization on the Cyclone~V (where memories are placed about every 10 columns). Monolithic 32-bit wide, 16Kb memories placed every 10 columns (comparable to 18Kb and 20Kb memories used in Virtex~4 and Stratix~V FPGAs) have a 180% worst-case energy overhead. Furthermore, we show practical cases where designs mapped for optimal parallelism use 4.7x less energy than designs using a single processing element.

Copyright Kadric, Lakata, DeHon 2016. Publication rights licensed to ACM. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive version was published in ACM Transactions on Reconfigurable Technology and Systems (TRETS) , http://dx.doi.org/10.1145/2857057.



Divider
Room# 315, 200 South 33rd Street, Electrical and Systems Engineering Department, Philadelphia, University of Pennsylvania, PA 19104.