Pipelining Saturated AccumulationKarl Papadantonakis, Nachiket Kapre, Stephanie Chan, and André DeHon
IEEE Transactions on Computer, Volume 58, Number 2, pp. 208--219, February, 2009.
Aggressive pipelining and spatial parallelism allow integrated circuits (e.g., custom VLSI, ASICs, FPGAs) to achieve high throughput on many Digital Signal Processing applications. However, cyclic data dependencies in the computation can limit parallelism and reduce the efficiency and speed of an implementation. Saturated accumulation is an important example where such a cycle limits the throughput of signal processing applications. We show how to reformulate saturated addition as an associative operation so that we can use a parallel-prefix calculation to perform saturated accumulation at any data rate supported by the device. This allows us, for example, to design a 16-bit saturated accumulator which can operate at 280MHz on a Xilinx Spartan-3 (XC3S-5000-4) FPGA, the maximum frequency supported by the component's DCM.
© 2009 IEEE. Authors/employers may reproduce or authorize others to reproduce The Work, material extracted verbatim from the Work, or derivative works to
the extent permissible under United States law for works authored by U.S. Government employees, and for the author's personal use or for
company or organizational use, provided that the source and any IEEE copyright notice are indicated, the copies are not used in any way that
implies IEEE endorsement of a product or service of any employer, and the copies themselves are not offered for sale.