Penn Logo
Vertical Line

Implementation of Computation Group

Divider

REFINE: Runtime Execution Feedback for INcremental Evolution on FPGA Designs

Dongjoon Park and André DeHon
Proceedings of the International Symposium on Field-Programmable Gate Arrays, (FPGA, March 3--March 5, 2024)



FPGA design optimization is challenging for developers for two main reasons. First, developers cannot easily identify a bottleneck of the design to know where to focus optimization effort to improve the application execution time. Second, slow, monolithic FPGA compilation makes evaluation of each design change costly. Together, these make FPGA development different and more challenging than traditional software development where software engineers are accustomed to using rich profiling tools to improve their designs through a series of quick, incremental refinements. To address these issues, we propose a fast bottleneck identification scheme using runtime feedback and separate FPGA compilation. Our scheme systematically identifies bottlenecks in streaming computations based on FIFO event counters extracted from hardware execution and guides developers to the operations that limit performance. We showcase our support for bottleneck identification with the fast, automatic design space exploration, iterating initial design points quickly with a separate, incremental compilation strategy. When the design reaches the point that latency cannot improve with the separate compilation approach, we migrate to the monolithic design flow that does not have the area overhead and communication bandwidth limit of separate compilation approach. Then, the remaining design space, if any, is explored with a monolithic flow. When tested on the AMD ZCU102 embedded platform with realistic HLS dataflow designs, our approach correctly identifies bottlenecks improving application latency 2.2–12.7× while reducing tuning time by 1.3–2.7× compared to monolithic flow.

Copyright Park, DeHon 2024. Publication rights licensed to ACM. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive version was published in the Proceedings of the International Symposium on Field-Programmable Gate Arrays, http://dx.doi.org/10.1145/3626202.3637560.

Divider
Room# 315, 200 South 33rd Street, Electrical and Systems Engineering Department, Philadelphia, University of Pennsylvania, PA 19104.