Notes on Context Distribution
Original Issue: February, 1995
Last Updated: Sat Apr 8 21:32:47 EDT 1995
This note briefly describes some basic configurations for context distribution for DPGAs (tn95).
In its simplest form, the context line may come from off-chip and is buffered and distributed to each array-element (gate) memory and each interconnect (crossbar, switchbox, etc.) memory. This distribution occurs in a balanced form ( e.g. balanced H-tree) much like a clock network (See Figure ).
The context source may be on or off chip. In the most minimal case, an off-chip controller provides the context signal (See Figure ). The ``context source'' in Figure then becomes the i/o buffers which bring the context specification onto the chip. In a more sophisticated case, the context source may be a piece of hardwired logic on-chip, an on-chip sequencer or processor, or even another piece of programmable logic on-chip (See Figure ).
The context may also be configurable. This configuration would, of course, either not be context'ed or would be controlled by a different context signal than the array configurations. Figure show a generic example of such a configurable context scheme.
Figure shows a specific configuration on top of the generic scheme. In the specific configuration the top, right quadrant is acting as a controller for the rest of the array. The logic in the top, right quadrant produces the context signal which is eventually distributed through the configurable context distribution to the rest of the array. Also shown in the example is that the top quadrant also controls its own context selection.
The context distribution may be purely combinational from the context source or may be pipelined. The context distribution may entail a large delay, especially when the context source is off-chip. Without pipelining, the time for context distribution will add to cycle time. By adding pipeline registers in the distribution path ( e.g. Figure ), this distribution time need not limit the clock frequency for array operation. In general, pipeline registers would generally be added at selective fanout points. e.g. pipeline registers might be added at every second or fourth fanout buffer point.
At the tail of the context distribution, it may make sense to distributed decoded context lines to individual memory arrays. For example, in the first generation prototype DPGA, we had a single local decoder for each group of four array elements. As shown in Figure the two context lines in the prototype were decoded into the four context select lines which were actually distributed to each memory. Each group of four crossbar context memories also shared a local decoder in the same manner.
Pragmatically, there is a balance between distribution bandwidth and decoder space. We save routing area by distributing fully encoded lines. However, in the extreme, that would require a decoder for every memory to decode the encoded lines into memory selects. We save decoder space by sharing decoder across multiple memory elements. In the extreme here, we have one decoder and are distributing one line for each context rather than lines for contexts. In general, one chooses a point between these extremes depending on the relative premium for routing channels and area in the particular design.
In the DPGA prototype, separate, decoded context read and context write signals were actually distributed across the memory columns. Figure is the memory column showing the decoded read and write lines required by each memory cell. Figure shows the local decode schematic for the prototype. This local decode: