Previous: Projections Up: Review and Extrapolation Next: Terminology

Review of Major Concepts

After reading this thesis, you should appreciate the following major concepts:

Our reconfigurable computing space, RP-space, is largely characterized by architectural choices surrounding the storage, distribution, binding, and control of instructions. [Chapters and ]
These choices about instruction resources, in turn, are largely responsible for defining the circumstances under which a given architecture within the RP-space is most efficient. [Chapter ]
Using a multilevel configuration scheme, the deployment of chip resources, including those for instructions, can be deferred until run-time. Consequently, resource allocation, instruction distribution, and control can be tailored to the needs of the application, making such a device efficient over a broader range of application characteristics than architectures whose resources are bound at fabrication time. [Chapter ]
There are three primary consumers of area on reconfigurable components: (1) instructions, (2) interconnect, and (3) intermediate data.
- Task descriptions (instructions) are small compared to their physical realizations. [Chapter , Chapter , and Section ]
- Nonetheless, instruction storage space is not trivial. A large number of instructions (typically 10-100) often take up as much space as the active interconnect and computational elements required to actually perform the instruction. [Chapter , Chapter , and Section ]
- We can compress the area for an implementation by increasing the instruction to active area ratio, but the benefits diminish past the point where the total area for stored instruction and data equal the active area on which they are evaluated. [Chapter ]
- The ``optimal'' amount of each of these resources arise from different sources. [Section ]
  - Instructions and intermediates are dictated by the computational task to be performed.
  - Active interconnect and, to a lesser extent active compute resources, are dictated by the ratio between desired computational throughput and primitive computational speed.
Interconnect is the dominant feature determining device area in conventional FPGAs. [Sections , , and ]
Interconnect requirement growth is superlinear in array size. Consequently, either interconnect area will continue to grow relative to non-interconnect area, or gate utilization will decrease as array sizes grow. [Sections and ]
Since the non-interconnect area is trivial compared to network area for conventional FPGAs, optimizing for gate utilization is often short sighted and can result in unnecessarily large implementations. [Section ]
There are two interconnect functions typically required to realize a computation -- spatial transport and temporal transport. To use silicon area most efficiently, these should be separated and handled via different mechanisms. [Chapter , especially Section ]
- Data values can be transported forward in time through registers or memories. While this ties up register area for the period of transport, it is much cheaper than tying up critical active, routing resources which occupy much more area.
- Active interconnect can easily be the dominant area feature on a general-purpose device. It is used most efficiently when its resources are pipelined and reused at their capacity level -- i.e. wires and switches should not sit idle holding a value once it has propagated past them. Rather, they should be redeployed to route new data once they have performed their spatial transport task.
Memory plays two fundamental roles in reconfigurable computing architectures: (1) storage for instructions, (2) retiming of intermediate data. Both roles arise from the sharing of expensive, active hardware resources among multiple logical functions. [Identified in Chapters through and summarized in Section ]
Since interconnect is the major consumer of space on FPGAs, conventional architectures limit the interconnect by depopulating interconnect switches as much as possible. [Chapter especially Sections and ]
Physical place and route on devices with limited interconnect is computationally difficult because it is necessary to simultaneous satisfy a large number of constraints in order to find a valid mapping of the design netlist onto the physical network. [Chapter ]
We can alleviate the place and route problem in several different ways, each with different costs:
- Provide rich interconnect ( e.g. HP PLASMA). Easier mapping comes at the cost of greater cell area and lower computational density. [Section ]
- Provide rich, time-switched interconnect ( e.g. UCB DHARMA). Rigid evaluation levels and lack of retiming can make this an expensive solution, as well, especially for larger arrays. [Section ]
- Provide rich retiming and time-switching ( e.g. TSFPGA). Cell area can actually be lower than conventional FPGAs, but is higher than in DPGAs. This scheme sacrifices the high, peak computational throughput of traditional FPGAs. [Chapter ]
- Eliminate interconnect ( e.g. University of Toronto VEGA). This approach saves some additional area over DPGAs, but at the cost of significantly lower computational throughput and density than all other options. [Section ]

Our focus and demonstration of these characteristics has been within the limited realm of RP-space. Nonetheless, most of the features which characterize RP-space show up more generally in general-purpose computational devices. Consequently, many of the characteristics identified here may have broader application to the extent they are not dominated by effects abstracted away in the RP-space model.

André DeHon <andre@mit.edu> Reinventing Computing MIT AI Lab