Previous: Projections Up: Review and Extrapolation Next: Terminology
Review of Major Concepts
After reading this thesis, you should appreciate the following
- Our reconfigurable computing space, RP-space, is largely
characterized by architectural choices surrounding the storage,
distribution, binding, and control of instructions.
[Chapters and ]
- These choices about instruction resources, in turn, are largely
responsible for defining the circumstances under which a given
architecture within the RP-space is most efficient.
- Using a multilevel configuration scheme, the deployment of
chip resources, including those for instructions,
can be deferred until run-time. Consequently, resource
allocation, instruction distribution, and control can be
tailored to the needs of the application, making such a device
efficient over a broader range of application characteristics than
architectures whose resources are bound at fabrication time.
- There are three primary consumers of area on reconfigurable
components: (1) instructions, (2) interconnect, and (3) intermediate
- Task descriptions (instructions) are small compared to their physical
realizations. [Chapter , Chapter , and
- Nonetheless, instruction storage space is not trivial. A large
number of instructions (typically 10-100) often take up as much space
as the active interconnect and computational elements required to
actually perform the instruction. [Chapter ,
Chapter , and Section ]
- We can compress the area for an implementation by increasing the
instruction to active area ratio, but the benefits diminish past the
point where the total area for stored instruction and data equal the
active area on which they are evaluated. [Chapter ]
- The ``optimal'' amount of each of these resources
arise from different sources. [Section ]
- Instructions and intermediates are dictated by the computational
task to be performed.
- Active interconnect and, to a lesser extent active compute
resources, are dictated by the ratio between desired
computational throughput and primitive computational speed.
- Interconnect is the dominant feature determining device area in
conventional FPGAs. [Sections ,
, and ]
- Interconnect requirement growth is superlinear in array size.
Consequently, either interconnect area will continue to grow
relative to non-interconnect area, or gate utilization will
decrease as array sizes grow. [Sections
- Since the non-interconnect area is trivial compared to network
area for conventional FPGAs, optimizing for gate utilization is often
short sighted and can result in unnecessarily large implementations.
- There are two interconnect functions typically required to realize
a computation -- spatial transport and temporal transport. To use
silicon area most efficiently, these should be separated and
handled via different mechanisms. [Chapter ,
especially Section ]
- Data values can be transported forward in time through
registers or memories. While this ties up register area for the
period of transport, it is much cheaper than tying up critical
active, routing resources which occupy much more area.
- Active interconnect can easily be the dominant area feature on a
general-purpose device. It is used most efficiently when
its resources are pipelined and reused at their capacity level --
i.e. wires and switches should not sit idle holding a value
once it has propagated past them. Rather, they should be
redeployed to route new data once they have performed their
spatial transport task.
- Memory plays two fundamental roles in reconfigurable computing
architectures: (1) storage for instructions, (2) retiming of
intermediate data. Both roles arise from the sharing of expensive,
active hardware resources among multiple logical functions.
[Identified in Chapters through and
summarized in Section ]
- Since interconnect is the major consumer of space on FPGAs,
conventional architectures limit the interconnect by depopulating
interconnect switches as much as possible.
- Physical place and route on devices with limited interconnect is
computationally difficult because it is necessary to simultaneous
satisfy a large number of constraints in order to find a valid
mapping of the design netlist onto the physical network.
- We can alleviate the place and route problem in several different
ways, each with different costs:
- Provide rich interconnect ( e.g. HP PLASMA). Easier
mapping comes at the cost of greater cell area and lower
computational density. [Section ]
- Provide rich, time-switched interconnect ( e.g. UCB
DHARMA). Rigid evaluation levels and lack of retiming can
make this an expensive solution, as well, especially for larger
arrays. [Section ]
- Provide rich retiming and time-switching ( e.g. TSFPGA).
Cell area can actually be lower than conventional FPGAs, but is
higher than in DPGAs. This scheme sacrifices the high, peak
computational throughput of traditional FPGAs.
- Eliminate interconnect ( e.g. University of Toronto VEGA).
This approach saves some additional area over DPGAs, but at the
cost of significantly lower computational throughput and density
than all other options. [Section ]
Our focus and demonstration of these characteristics has been
within the limited realm of RP-space. Nonetheless, most of the
features which characterize RP-space show up more generally in
general-purpose computational devices. Consequently, many of the
characteristics identified here may have broader application to the extent
they are not dominated by effects abstracted away in the RP-space