Previous: Contents Up: Reconfigurable Architectures for General-Purpose Computing Next: List of Tables
List of Figures
- First Order Size Comparison for Configurable Designs
- LUT and Interconnect Primitives for Multicontext FPGA
- TSFPGA Organization
- MATRIX Basic Functional Unit
- Temporal Reuse of Limited Active Silicon on General-Purpose
Computing Devices
- High-Level FPGA Abstraction
- FPGA Array
- Canonical 4-LUT Processing Element
- Parallel and
- Serial and
- Basic Organization for a Processor
- Inner Loop of Processor Implementation for Windowed Average
- Processor Implemention for Parity Computation
- Gate Implementation of any Function Computed by 7-input Lookup
Table
- Windowed Average -- Pipelined FPGA
Implementation
- 32-bit Parity -- 4-LUT Implementation
- Abacus (SIMD) Implementation of Windowed
Average
- Windowed Average -- MATRIX
Implementation
- 32-bit Parity -- MATRIX Implementation
- Comparison of Programmable and Custom Multiply Functional Densities
- Conventional FPGA Interconnect Topology
- FPGA Interconnect Caricature
- Logical Structure of Hierchical Interconnect
- Switching node in 2-ary Hierarchical Interconnect
- Switches per LUT -- Equation versus Direct
Calculation
- Switches per LUT -- Equation versus Direct
Calculation
- Overhead Growth versus for various
- Overhead for versus
- Continuous Overhead for versus
- Continuous Efficiency for versus
- Continuous Efficiency for versus (Log Scale)
- Sample versus Overheads
- E(overhead) versus for Uniform Distribution
- Network Bits per LUT v/s Rent Exponent for (K=4)
- Network Bits per LUT v/s Number of LUTs for (K=4)
- Single Context FPGA Area
- Multicontext FPGA Area
- Peak Computational Density Versus Contexts and Datapath Width
- Compute and Instruction Densities Versus Contexts and Datapath Width
- Efficiency as a Function of Architectural and Task Granularity for
Single Context Architectures
- Efficiency as a Function of Architectural and Task Granularity
- Efficiency as a Function of Architectural and Task
Granularity
- Efficiency versus Task Data Width for a 1024-context, 32-bit
Granularity Device
- Efficiency as a Function of Task Path Length and Architectural Contexts
- Efficiency as a Function of Task Path Length and Architectural Contexts
- Efficiency versus Task Path Length for a 16-context, Single-bit
Granularity Device
- Efficiency versus Task Path Length for a 256-context, 128-bit
Granularity Device
- Efficiency for Conventional FPGA Design Point (, )
- Efficiency for Coarse-Grain, Deep Memory Design Point (,
)
- Efficiency for Fixed ,
- Efficiency for DPGA Design Point (, )
- LUT and Interconnect Primitives for Multicontext FPGA
- ASCII HexBinary Task Description
- ASCIIHex Binary Circuit Retimed for Full Pipelining
- Typical Multicomponent System
- Multifunction Component in System
- Function Distribution in System
- Annotated Die Photo of DPGA Prototype
- Area Breakdown versus Number of Contexts for des Benchmark
- Area Breakdown versus Number of Contexts for C880 Benchmark
- Area Breakdown versus Number of Contexts for alu2 Benchmark
- Area versus Throughput for Multicontext Implemenations of alu2 Benchmark
- versus for
Coarse-grain Interleaved Contexts
- Simple FSM Example
- Two Context Implementation of Simple FSM Example
- Area and Delay versus Number of Contexts for cse FSM
Benchmark (Area Target)
- Area and Delay versus Number of Contexts for cse FSM
Benchmark (Delay Target)
- Memory-based Implementation for Simple FSM Example
- Canonical Video Coding Pipeline
- Temporally Systolic Video Coding Pipeline
- Control Distribution on DPGA Prototype
- Multiple Controllers -- Hardwired Control
- Multiple Controllers -- Configurable Control
- Array Self Control Example
- FPGA Array Element
- DPGA Array Element
- DPGA Array Element with Input Registers
- iDPGA Array Element ,
- ASCIIHex Binary Implementation versus Contexts and
Input Register Depth
- alu2 Implementation Area versus Throughput
- alu2 Area Ratios versus Throughput
- Average Area Ratios versus Throughput
- Average Area Ratios versus Contexts and Throughput
- 4-LUT with Time-Switched Input Register
- Output Folding
- Input Folding
- Input and Output Folding
- Two-Context DPGA as Input and Output Fold
- TSFPGA Subarray Composition
- TSFPGA Array Element Composition
- Sample Inter-Subarray Network Connections
- Sample Delay Increases with Context Packing
- MATRIX BFU
- BFU Control Logic
- BFU Port Architecture
- VLIW/MSIMD Convolution Implementation
- Configurable Datapaths
- Datapath Composition: MATRIX versus Conventional Architecture
- Configurable Instruction Streams
- Configurable Control Streams
- MATRIX BFU Composition
- MATRIX Implemenation of Full 8-TAP, 4096 shift, VSR
- Processor Implementation of VSR
- MATRIX RVF Array
- RVF Dataslice and Logic for Cells Below th Postion
- Control for MATRIX RVF for Cells Below th Postion
- Processor Implementation of RVF
- MATRIX BFIR Datapath
- Processor Implementation of BFIR
- Efficiency for MATRIX and Fixed 8-bit Architecture
()
- Efficiency for MATRIX
- FPGA and DPGA efficiency in RP-space
- Comparing efficiency of FPGA and Processor idealizations in RP-space