HLS-Compatible, Embedded-Processor Stream LinksEric Micallef, Yuanlong Xiao, and André DeHon
Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines, (FCCM, May 9--12, 2021)
Fine-grained dataflow streaming between parallel compute operators provides both a simple form of concurrency and high performance operation. These streams are regularly used to support concurrency within HLS computations on the FPGA. We provide a compatible stream API and implementation that allows FPGA operators to interoperate with operators implemented on the embedded, hardcore processors on SoC FPGAs. With our stream interface, individual operators can be written in C and compiled to either the embedded core or the FPGA from a single source file, and neither FPGA-mapped nor processor-mapped operators need to know whether the other side of the stream is implemented on an embedded core or on the FPGA. This capability also eases processor integration for debugging and development. Our streams support over 100 MB/s per core between the ARM A53 cores on the Zynq UltraScale+ and the FPGA fabric even when all four A53 cores concurrently share a single AXI channel.
© 2021 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
This material is presented to ensure timely
dissemination of scholarly and technical work. Copyright and all
rights therein are retained by authors or by other copyright
holders. All persons copying this information are expected to
adhere to the terms and constraints invoked by each author's
copyright. In most cases, these works may not be reposted without
the explicit permission of the copyright holder.