Previous: Review of Major Concepts Up: Review and Extrapolation

Terminology

see tau.
see lambda.
active computing resources
The portions of a general-purpose architecture which actually compute results or transport data -- e.g. ALUs, switches, wires. The term is typically used to distinguish such resources from overhead resources used to store descriptions or intermediate data.
active interconnect
Switches and wires which actually produce a physical connection between a source and a destination. The term is used to distinguish resources used to actually perform switching from descriptions of switching operations or storage for intermediate data. Chapter is primarily focussed on active interconnect, while Chapters and introduce forms of switched interconnect where the distinction becomes quite important.
bit processing element
A generic term for the primitive computational unit which produces one bit of result. Conventionally, each FPGA LUT is a bit processing element, as is each bit-slice in an SIMD ALU datapath. See Chapter .
context
A generic term used to refer to a slice of instructions and intermediate data used by a general-purpose device on a single cycle. See configuration context and data context.
control stream
An independent thread of execution. When the computation varies with time and data, the control stream determines which sets of instructions are executed on a give cycle. A computational device may support a single control stream ( e.g. processors, SIMD, pure VLIW) or multiple control streams ( e.g. MSIMD, MIMD). See Section .
configurable computing
Computing by configuring interconnect between programmable function units to wire up computations spatially. See Sections and .
configurable computing architectures
Architectures where there is only one or a few instructions loaded per active computing element and there is limited bandwidth to reload an entire configuration context. These architectures are used for configurable computing where the computation is typically arranged via spatial interconnect of computing elements as opposed to programmable computing architectures which realize computation by rapid temporal reuse of a few, central active computing resources. See Section .
computational density
See functional density.
computational throughput
Computations performed per unit time. i.e. Operations completed per unit time.
configuration context
The collection of bits which describe the behavior of a general-purpose machine on one operation cycle. Equivalently, the collection of all instructions required to specify the behavior of a general-purpose device at one point in time. See Section .
data context
The data used by a general-purpose device on one cycle of execution.
distance delay
The critical path delay through a placed circuit taking into account the distance between logically adjacent functional units. See Section .
datapath granularity
Datapath width. The number of bit processing elements or interconnect switches controlled in SIMD fashion by a single instruction. See Section .
deployable resources
Resources whose role can be determined at run-time. e.g. A memory which can be used as an instruction store or as a data store; Interconnect which can be used to distribute instructions or to deliver data between functional units. Distinguished from resources which are dedicated to a single function at fabrication time. See Section .
dynamic
Marked by a continuous usually productive activity or change. In this context usually used to distinguish quantities, particularly, instructions, which change on a cycle-by-cycle basis. Contrast with static and quasistatic. See Section .
dynamic instruction distribution
Instruction distribution allowing instructions to change on a cycle-by-cycle basis. See Section .
DPGA
Dynamically Programmable Gate Array -- Fine-grained programmable array where each processing element has a small, local configuration memory allowing processing elements to change instructions, array-wide, on a cycle-by-cycle basis. See Chapters and .
FPGA
Field Programmable Gate Array -- A collection of configurable processing units embedded in a configurable interconnection network. See Sections and .
functional density
Computations performed per unit space-time. Usually measured in . See Section .
functional diversity
The number of different functions which are resident and rapidly accessible from a unit of computational area. The density of instructions stored on a general-purpose computing device. See Section .
general-purpose computing
Computing using devices which can be configured to solve any number of computing tasks. See Section .
iDPGA
Dynamically Programmable Gate Array with input retiming registers -- A DPGA including input retiming registers. See Chapter .
input depth
The temporal range of input retiming registers in the iDPGA or similar architectures. See Chapter .
input folding
A style for reducing the amount of active switching interconnect by sharing crossbar inputs among multiple sources. See Section .
instruction
The set of bits which describe the behavior of one computational unit and its associated interconnect. See Section .
instruction context
See configuration context.
instruction density
See functional diversity.
instruction depth
Number of instructions per compute element stored local to the compute element.
irregular computing task
Task which require a large sequence of different computations and where operations are heavily data-dependent. See Section .
Kolmogorov complexity
Of all programs which can be used to calculate a particular set of values, the length of the smallest such one. Ultimately, this is the least number of bits into which a piece of data can be described. Kolmogorov complexity is, primarily, a conceptual description of the lower bound as there is no algorthimic way to find such the bound. See any information theory text such as [CT91].
lambda
() -- half the minimum feature size in a silicon process. Lambda is used to normalize out the effects of different process sizes when comparing implementations. Area normalized to units is roughly comparable between processes which differ primarily in feature size. See Section .
low instruction entropy
Computing tasks which require a limited set of operations with very regular flow, admitting to heavy compression of instruction distribution requirements. See Section .
lookup table
A small, typically programmable, memory where the address bits act as inputs and data read out serves as an output. An -input, -output lookup table can implement any, deterministic mapping between input bits and output bits. We frequently refer to a -input, 1-output lookup table as a -LUT. See Section .
LUT
Look Up Table -- see lookup table.
MATRIX
Multiple ALU architecture with Reconfigurable Interconnect -- A flexible general-purpose computing architecture which defers binding of instructions and instruction resources until use. Instruction storage and distribution resources are unified with datapath compute, memory, and interconnect resources, allowing the basic instruction architecture to be defined at run-time. See Chapter .
metaconfiguration
A higher and more primitive level of configuration than traditional instructions which defines the sources and distribution paths for dynamic control including instructions. See multi-level configuration. See Section and .
microcycle
One primitive machine cycle on architectures which evaluate logical tasks over several smaller clock cycles. See Section . Microcycle evaluation is a common theme in Chapters through .
multicontext
Having more than one configuration for the entire general-purpose device. Usually used to refer to devices or architectures which hold multiple such configurations on chip. Also used to describe evaluation schemes which compute a result using more than one device-wide configuration. See Chapter .
multi-level configuration
Hierarchical configuration where higher levels of configuration describe the architecture, behavior, and distribution used by lower levers of configuration. See metaconfiguration. See Section .
output folding
A style for reducing the amount of active switching interconnect by sharing crossbar outputs among multiple sinks. See Section .
partial reconfiguration
The ability for individual or small numbers of processing units to change instructions without requiring an entire reload of all instructions across a general-purpose computing device. See Sections and .
quasistatic
Changing, but on an time scale much slower than standard operation. An intermediate point of activity between dynamic and static.
quasistatic instruction distribution
Instructions which change during an application, but do so slowly compared to the rate of execution. A quasistatic instruction might be in effect for hundreds of cycles before changing. See Section .
Rent's Rule
An empirical relationship between the number of i/o's in and out of a cluster of logic and the number of logical elements inside the logic (). See Section .
regular computing task
Tasks which need to repeatedly perform the same collection of operations to a large amount of data with little data-dependent flow control. See Section .
retiming
Changing the time at which particular events occur. In this work, used largely to describe the transportation of signals forward in time between the point in time when they are generated to the point in time when they are consumed. See Section . Retiming is a major theme in Chapters through .
robust architectural points
Design points where we can bound the inefficiency to some constant percentage when the task has different characteristics from the architecture. See Chapter starting in Section .
RP-space
A high-level abstraction of the reconfigurable computing design space parameterized by key instruction and interconnect features. See Chapter .
run-time reconfiguration
The ability to change device configuration during a computational task.
segmentable datapath
A SIMD controlled -bit datapath which can be dynamically or quasistatically reconfigured to treat the datapath as , -bit words, for certain, restricted, values of . See Section .
subarray
An organizational unit in array architectures composed of multiple processing elements but not the entire device. In the DPGA and TSFPGA, the subarray defines the extent of local interconnect and the set of processing elements which share common resources such as decoders and instruction distribution. See Section .
spatial transport
Movement of intermediate data in space from the point of production to the point of consumption. See Section .
static
Showing little change; characterized by a lack of movement, animation or progression. In this context used primarily to distinguish values and instructions which do not change during an operational epic. Contrast with static and quasistatic. See Section .
static instruction distribution
Instruction distribution where instructions are set at the beginning of a computational task and do not changed during execution. See Section .
programmable computing architectures
General-purpose computing architectures which heavily and rapidly reuse a single or small number of active computing resources for many different functions ( e.g. conventional microprocessors). See Section .
tau
() The delay parameter for a process. One is the delay required for one inverter to drive a single, equally large inverter.
temporal pipelining
Reusing general-purpose resources in time to evaluate different components of a single logical task. Like spatial pipelining, the result is produced after traversing a number of pipelining stages. Unlike spatial pipelining, the same physical resources are used to evaluate each stage of the pipeline. Temporal pipelining reduces spatial requirements, whereas spatial pipelining increases throughput. See Sections and .
temporal transport
Movement of intermediate data in time from the microcycle on which the value is produced to the one where it is consumed. See Section .
timestep
A particular microcycle in the evaluation of a computing task. See Section .
time-switched input register
An input register supporting data retiming on architectures which time-switch their interconnect. The input register loads the value from its associated network output only when the current timestep matches a programmed value. See Section .
TSFPGA
Time-Switched Field Programmable Gate Array -- Fine-grained programmable array where the physical interconnect is shared and switched in time. See Chapter .
yielded computational density
The effective computational density which an application or task extracts from a computational device. Mismatches in datapath granularity, interconnect richness, or control may cause a device to provide computational capacity below its peak. See Section and examples given in Chapter .


André DeHon <andre@mit.edu> Reinventing Computing MIT AI Lab