Transit Note #103
Computational Quasistatics
Andre DeHon
Ian Eslick
Original Issue: February, 1994
Last Updated: Wed Oct 25 01:23:17 EDT 1995
Abstract:
We present a quasistatic computational model which provides the
performance advantages of static computation while achieving the
expressiveness and flexibility inherent in dynamic computing models.
Feedback information from run time provides the compiler with details of
program dynamics. This additional information enables the compiler to
evolve efficient and specialized static execution sequences for commonly
occurring dynamic computations. By incorporating the compiler into the
lifetime of each program, quasistatic computation allows the compiler to
automatically adapt programs to actual usage. This paper introduces a
conceptual framework intended to facilitate system-level thought about the
role of feedback and program adaption in the design and optimization of
computer systems. We conclude the paper by examining the implications of
this computational paradigm to future computer systems, programming
languages, and architectures.
Introduction
In present computing systems, we can distinctly characterize many
elements of our computations as static or dynamic. Informally, static
computations are characterized by predictable control flow, timing, and
data access patterns. Static elements of computation can be determined and
scheduled at compile time. In contrast, dynamic elements of computation
are characterized by frequent, often unpredictable, changes in control
flow, variable timing, and random data access patterns. Over the past
decade, we have seen continuous advances in computational performance.
Notably, however, the rate of performance increase for static computations
has been much larger than the rate for dynamic computation. There are two
key reasons for the growing performance gap:
- Compiler Technology -- Advances in compiler technology have
allowed us to view and reorganize code heavily around information known at
compile time in order to reduce execution latency. Since static
computations exhibit more known and predictable information at compile time,
the compiler can apply higher quality and more aggressive
optimization to static computations. Many of these optimizations
exploit properties and values known at compile time to effectively move
computation from run time to compile time.
- Architecture Trends -- Our ability to deliver high throughput
at reasonable costs has increased more rapidly than our ability to deliver
comparable reductions in latency. For instance, we can easily employ
parallelism at various levels ( e.g. ILP, pipelining, multiprocessing)
to increase raw throughput. With predictable control flow and data
access, we can better exploit throughput to reduce task execution latency.
When dynamic dependencies must be resolved to determine the execution path
or data access, resolution latency is the key factor limiting the rate of
task execution rather than available throughput.
Nonetheless, dynamic computations are essential to our computing systems:
- Natural Unpredictability -- Many events and objects with
which our computations must contend are naturally unpredictable.
- Expressiveness -- Many computations are more easily
expressed in a dynamic form.
- Flexibility -- The late binding inherent in dynamic
computations often defers commitments offering considerably more
flexibility than purely static computations.
The goal of quasistatic computation is to narrow the gap between static
and dynamic computation by capturing the best elements of both.
Quasistatics introduces a feedback path between run-time execution and
compilation to provide the compiler with information about run-time
dynamics. Consequently, the compiler obtains information about program
behavior that is not traditionally available during the compilation of
dynamic code. This feedback allows the compiler to better predict control
flow, timing, and data access patterns.
In micro forms, we have already seen quasistatic approaches used to
reduce execution latency ( e.g. Trace Scheduling
[Fis81], Profile-driven Compilation [Wal91]
[Sam91], Synthesis [MP89]). Nonetheless, the
techniques to date have been relatively independent and ad hoc. In
this paper, we provide a paradigmatic framework for integrating feedback
and recompilation into the basic computational model. This framework is
intended to facilitate system-level thought about the role of feedback and
program adaptation in the design and optimization of computer systems.
Since ``static'' and ``dynamic'' are actually relative terms, the
following section examines static and dynamic computation in more depth to
better characterize their computational properties and implications.
Section formulates the quasistatic computational paradigm and
provides examples of quasistatic utility and employment. Mature research
in several areas provides many of the basic building blocks which make
quasistatic techniques ripe for exploitation. We highlight some of these
enabling technologies and recognize the intellectual heritage of
quasistatic computation in Section . Before concluding,
we take a brief look at the implications which the adoption of quasistatic
computing styles will have on future computing systems
(Section ).
Static and Dynamic Computation
Static: showing little change; characterized by a lack of movement,
animation or progression
Dynamic: marked by a continuous usually productive activity or change
When describing computer systems and artifacts, we generally apply the
words ``static'' and ``dynamic'' in concrete terms to distinguish
characteristics. The terms are applied to indicate when changes may occur,
when decisions are made, or when binding occurs. Usually these terms are
used to refer to particular, often orthogonal, aspects of a computer system
such as typing, scheduling, or memory management. In this section we
highlight common static and dynamic aspects of computer systems and their
implications.
Static Computation
Loosely, computer scientists define aspects of a computation which
can be resolved and fixed at compile time as static. Hence, we have:
- Static typing ( e.g. FORTRAN)
- Static scheduling ( e.g. traditional basic-block scheduling;
traditional, single-threaded control flow)
- Statically allocated memory ( e.g. top-level globals in most
languages)
Beyond these examples, where our nomenclature clearly identifies
them as static, many other aspects of our programming languages can exhibit
static properties:
- Operators -- In languages like C, Pascal, and FORTRAN the
basic operators have fixed meaning which can be unambiguously
determined at compile time.
- Functions, Subroutines -- Some languages and systems require
that all functions be defined and resolvable at compile or link time.
- Data Structures -- Arrays in most languages and fixed
record structures in languages like COBOL, Pascal, or C, can exhibit
regular structure which makes data access predictable and allow
compile-time computation of many addresses and offsets.
Modern compiler technologies allow us to exploit all of these static
structures to achieve high performance. With full knowledge of operators
and subroutines, a compiler can aggressively reorder code for lower latency
execution, sometimes breaking down artificial abstraction barriers like
procedure and function invocations. Much of the work associated with
evaluating a computation can be performed at compile time. The executable
produced simply evaluates the data- and control-flow dependent residue of
the computation described by the original program.
Almost no practical programming language can be termed purely-static,
but languages do vary in their static expressiveness. FORTRAN serves as a
strong example of a mostly static language. FORTRAN does have dynamic
aspects in flow control, but the language tends to encourage structures
with large amounts of predictability. Numerical computations in FORTRAN
constitute the most canonical example of ``static'' computations. The
literature contains considerable evidence that the static aspects of this
class of computations can be exploited to achieve very high computational
performance.
Dynamic Computation
Similarly, computer scientists define aspects of a computation
which are not resolved until employed at run time as dynamic. We
have:
- Dynamic typing ( e.g. run-time, polymorphic typing in LISP and
many object-oriented programming languages)
- Dynamic scheduling ( e.g. dynamic dataflow, traditional
operating-system scheduling and multithreading)
- Dynamically allocated memory ( e.g. heap allocated data such as
cons in LISP, or malloc in C)
- Dynamic exception handling and error detection ( e.g. page faults)
- Dynamic address translation ( e.g. virtual memory systems)
- Dynamic linking ( e.g. Multics, SunOS)
Again, beyond our nomenclature, many aspects of our programming
languages can exhibit dynamic properties:
- Operators -- In languages like most dialects of LISP the
basic operators can be redefined as part of a computation and hence
must be resolved each time they are used at run time.
- Functions, Subroutines -- Languages like LISP
provide expressive freedom by allowing operator creation and
redefinition during computation. This expressive power is accompanied
by the requirement for run-time resolution. Similarly, many
object-oriented languages have convenient operator inheritance
relationships which can only be resolved at run time.
- Pointers -- Languages like C and LISP allow flexible and
spatially efficient data structures to be created from pointers.
Data access requires run-time traversal of these data structures.
- Aliasing -- All languages suffer to some extent from the
dynamic aspects of aliasing. Two different ``names'' can refer to
the same ultimate object. This property forces compilers to forgo
many potential optimizations to preserve the language semantics.
- Unpredictable Latency -- Almost all non-dedicated computer
systems exhibit unpredictable latencies in memory and network access
times. Time-shared systems further complicate the predictability of
task completion rates. This lack of predictability makes static
scheduling inefficient, virtually necessitating some level of dynamic
scheduling to efficiently tolerate variable latency operations.
In many ways, dynamic support allows programs to be more efficient,
expressive, and flexible than they would be otherwise. Nonetheless,
bindings and decisions which must be resolved at run time are costly for
several reasons:
- Interpretive overhead -- Decisions cost execution time to
resolve when postponed to run time.
- Indirection overhead -- Run-time resolution often requires
translation or lookup. This adds overhead time, often requires
additional memory bandwidth for resolution, and makes memory
access less predictable.
- Reduced knowledge and power -- Decisions
made at run-time must be inexpensive time-wise and use a minimum
amount of information. Consequently, these decisions are forced
to be simple and local. In contrast, off-line decisions made by
the compiler can afford to take a more global perspective and
perform more detailed analysis before committing a decision.
LISP and C both exhibit a wealth of dynamic properties. Clever
programmers are aware of the dynamic and static aspects of their respective
programming languages and the system implementations they employ.
Consequently, they will strategically strength reduce the facilities they
employ as dictated by the requirements of their application.
Unfortunately, the distinction between static and dynamic computational
aspects is often subtle. The programmer must have a sophisticated
understanding of operator semantics to properly perform this kind of
optimization. Further, if a value is going to change at all after
compilation, the programmer is forced to adopt a dynamic solution
regardless of the actual frequency of the change.
Symbolic processing and system-level management are canonical examples of
``dynamic'' computations. These are characterized by heavy pointer
manipulation, frequent and statically unpredictable jumps in control flow,
statically unpredictable operation latency, and regular rebinding of data
and operators. The literature presents evidence that this class of
programs is much harder to optimize than FORTRAN numerical code.
Quasistatic Computation
Outside of computer science the terms ``static'' and ``dynamic'' serve
as relative quantifiers rather than absolutes. ``Static'' does not mean
fixed and unchanging. Rather it connotes infrequent change. Similarly, we
can adopt a compilation and computation model which does not make such a
rigid and absolute resolution-time distinction. We use the term
``quasistatic'' to describe this computational model which allows dynamic
changes to occur but also judiciously employs recompilation to update the
executable instruction stream to reflect information acquired from
run time. Here, the executable may remain unchanged for relatively long
periods of time, for example a program run, but is allowed to change to
adapt to common usage conditions.
An important observation is that entities whose values and properties are
allowed to change under dynamic computational models do not, necessarily,
change as frequently as they are used. For example, an operator binding
does not, normally, change with every use of the operator. Further, the
sequence of operator bindings employed by a program may remain mostly
constant from run to run. Similarly, a variable's value may remain mostly
constant over a program run and may even be constant or highly
predictable over a sequence of program runs. If we can discover these
quasistatic values, bindings, or other characteristics of a program, we can
feed them back to the compiler. The compiler can use its knowledge of the
quasistatic values to produce code optimized for the stable and predictable
epochs of these values. While optimizing around fed back quasistatic data,
the compiler may perform non-local code reorganization and take advantage
of other techniques normally restricted to the compilation of mostly static
code. Of course, the compiler must also assure that code so compiled
can properly handle changes that invariably occur and signal a change
of quasistatic epochs.
We can view this invariant factorization as a complex form of code motion
or hoisting. By empirically recognizing epochs and properties which are
invariant within each epoch, we can factor computations involving the
invariant out of the epoch, leaving only the residue computation which
results from the invariant evaluation. Factoring and residue generation
are exactly the function performed by partial evaluation
[JGS93].
With quasistatics, we use run-time feedback to identify invariants which
empirically exist over portions of code for relatively long periods of
execution time even when there is no static evidence or guarantee that such
invariants exist. There are two key reasons why such invariants may
arise:
- A value or property may have the potential to be dynamic, but
that option is not being exercised or the option is being exercised
only infrequently. For example, a variable in a
program may be taken from a configuration file read when the program
is invoked. The user may have the option to change the
configuration, but may change it very infrequently. By discovering
that the value is constant across many computations, the compiler
can generate an executable for the user where this
value is treated as a constant. The constant can be propagated
using classical propagation techniques to generate a more
efficient, specialized executable for the user. If the user does
change the variable, the compiled code recognizes that its
invariant has changed and invokes the more general execution
sequence instead of the specialized one. If this new value remains
invariant, the compiler can then generate a version specialized
around the new quasistatic value.
- A value or property may, in fact, be invariant. Discovery of the
invariant may be beyond the state of the art in static analysis
or may require exponential computation. Without resorting
to overly expensive static analysis, empirical, run-time feedback can
be used to identify promising invariants.
Dynamic, Run-time Feedback
Dynamic, run-time feedback is the key to quasistatic computation. By
monitoring run-time behavior we can acquire information about quantities
unknown at compile time. For these unknown quantities, we can use feedback
to obtain:
- Values -- the typical value or range of values exhibited by
a variable or property ( e.g. , is an integer,
executing procedure requires 1000 cycles)
- Stable Epochs -- ranges over which a variable or property
is usually constant ( e.g. is not changed during this
procedure)
- Variance -- how tightly bound is a value, or how well can we
predict a value ( e.g. 98% of the time procedure
executes in 100010 cycles).
- Correlations -- predictability of values based on other
values ( e.g. when , 99.9% of the time executes
in 100+ cycles)
It is important to remember that information obtained empirically from
run-time is only predictive. We can never use this information to
conclude that we will only see certain values. Nonetheless, we can
determine the normal cases and specialize the executable code accordingly.
When we take advantage of run-time feedback, we make the explicit
assumption that the future resembles the past. This is often a good
assumption to make, and the variance over time provides a good indication
of where it is most fruitful to apply this assumption.
To see why the assumption is useful, we can review some of the components
which contribute to each execution signature captured by run-time feedback:
- Some of the characteristics we are capturing are simply
representative of the program being run. ( e.g. function
usage, certain access patterns)
- Some characteristics are representative of the program running on a
particular computer. ( e.g. access-time for references,
miss-rate)
- Some characteristics will represent the effects of external
resource usage and sharing. ( e.g. network loading while the
program was run)
- Some characteristics are indicative of the gross data set. (
e.g. the characteristics may be a signature of running spice
on a 100-node CMOS circuit)
- Some characteristics will be representative of the particular data
sets employed.
We can optimize around characteristics generated by (1) and (2) and
reasonably expect that the future will resemble the past. By determining
variance and correlation, we can determine the predictable effects of (3)
and optimize around them. We may also be able to determine strategic
correlations ( e.g. at 1 AM, we can expect to see higher
throughput and lower-latency out of our LAN than at 1 PM). To the
extent that a user runs similar datasets, we can expect the future to
display similar characteristics due to (4) and (5). Again, variance
over time helps the compiler to separate gross characteristics which recur
from characteristics which are unique to a few datasets.
A Simple Example
Consider the following code snippet from a hypothetical placement program
written in C:
manhattan_cost(x1,y1,x2,y2,placed)
int x1,y1,x1,y2;
OBJECT_ARRAY placed;
{return(abs(x1-x2)+abs(y1-y2));
}
density_cost(x1,y1,x2,y2,placed)
int x1,y1,x1,y2;
OBJECT_ARRAY placed;
{return(horizontal_channel_density_approximation(x1,x2,placed)+
vertical_channel_density_approximation(y1,y2,placed));
}
. . .
place_objects()
{
. . .
for (obj=all_objects;obj!=NULL;obj=obj->next)
{
. . .
pick_new_location(&xnew,&ynew,obj,placed);
new_cost = 0;
for (links=obj->connections;links!=NULL;link=links->next)
new_cost += (*cost_function)(xnew,ynew,links->x,links->y,placed);
. . .
}
. . .
}
. . .
place_objects is reused during different phases of placement and
optimization to place the objects and recompute the value of each
placement. During early phases of optimization, the cheap cost function
( manhattan_cost) is used. In later phases, a more expensive and
accurate cost function ( density_cost) is used instead. At a high
level, the optimization manager decides when to change the cost metric used
for each phase of optimization.
Unfortunately, since the cost function is accessed via indirection
through a pointer whose value is not constant, the cost function
computation cannot be inlined and optimized at compile time. With run-time
feedback, the compiler can discover the range of values taken on by
cost_function. During a recompilation pass, the compiler can then
inline the computation. For example, after determining that 80% of the
time cost_function is resolved it actually refers to
manhattan_cost, the compiler might reorganize the code as follows:
place_objects()
{
. . .
if (cost_function == &manhattan_cost)
{for (obj=all_objects;obj!=NULL;obj=obj->next)
{
. . .
pick_new_location(&xnew,&ynew,obj,placed);
new_cost = 0;
for (links=obj->connections;links!=NULL;link=links->next)
new_cost +=(abs(xnew-links->x)+abs(ynew-links->y));
. . .
}
}
else
{ [original code with indirection]
}
. . .
}
. . .
Here, we avoid a pointer indirection and anonymous procedure call overhead
in a frequently executed portion of code.
If the compiler cannot guarantee that the value of the
cost_function pointer is not modified inside the if block, it may
mark the page containing cost_function as read-only. In this
manner, the code can identify any unanticipated change in the value. When
an unexpected change does occur, patch up code can be executed to assure
proper program semantics.
Range of Application
Feedback can be beneficial for almost any quantity which is unknown at
initial compile time. For example:
- Aliasing -- Feedback can indicate typical address values
and hence aliasing. The compiler can compile code specialized to
the aliasing which actually occurs. To assure correctness, the
code can insert dynamic checks to guarantee that specialized code is
only used in place of general code when appropriate. Of course,
this kind of transformation is only beneficial when the aggregate
savings due to specialization justify the added checks. As we
saw in Section , these assumption checks can
often be hoisted out of frequently executed code segments.
- Timing -- Often unpredictable timing necessitates dynamic
scheduling. With feedback on operation timings, including data
access latencies and interprocessor communication delays, the
compiler can more reasonably schedule many computations statically.
( e.g. With accurate access latency estimates, the compiler
can schedule prefetch to minimize stalls in computations.)
- Procedure Bindings -- As demonstrated in the previous
section, anonymous function bindings can be resolved via
feedback. When bindings are highly predictable, they can be
inlined and optimized just as efficiently as function calls
which are statically apparent.
- Values -- Common values can be identified via feedback.
Using partial-evaluation techniques, we can generate code
segments specialized around the common values. When
encountered, the specialized code can be used in place of the
general code.
- Ranges -- Knowing the typical range of a data value often
enables optimizations which are not possible when the value is
completely unknown. Here again, we can specialize code to run
most efficiently when the data value falls into the range most
commonly seen at run time.
- Control Flow -- Information on past control flow increases
the predictability of program control flow. Increased
predictability allows the compiler to exploit additional
throughput to reduce task execution latency.
- Access/Usage Patterns -- Good a priori estimates
of resource and memory usage allow the compiler to tailor better
strategies for resource management. ( e.g. If we can
predict the typical amount of dynamic memory and chunk-size used
by a program, we can do a better job of allocating the memory to
reduce management overhead and avoid memory fragmentation than
traditional, knowledgeless algorithms
[BZ93] [GZ92]. Knowledge
of access and communication patterns allows the compiler to
redistribute data and code to improve locality and minimize cache
conflicts.)
Advantages
Quasistatic computation is aimed at reducing run-time execution latency
while allowing dynamic expressiveness and flexibility. In this section we
examine the general advantages of quasistatic computing from several
perspectives.
Eliminate Unused Functionality
One key optimization is
aggressive operator and data structure strength reduction. When values and
functions are statically known, the compiler can often determine the exact
functionality required by an operator or data structure and strength reduce
it to cheaper operations which have the same semantic effect. Using
feedback data in a quasistatic computing model, we gain the same advantages
by noting the features actually employed. As always, we need to provide an
escape mechanism which allows the computation to be patched up if the
additional functionality is, unexpectedly, required. In many ways, this
strength reduction can take the burden of optimization and function
selection off the programmer since unused functionality contained in the
operators will be discovered and specialized away. For instance, a LISP
list which is created with uniform data elements and is not
arbitrarily ``grown'' can be replaced with an array. In many cases, this
will make traversal and access much cheaper since the array requires less
indirection to access any given element. This class of transforms can
achieve a significant reduction in the run-time overhead associated with
more powerful data structures by moving the overhead associated with
data-structure traversal from run time to compile time.
Common-Case Optimization
Automated, common-case optimization
will be another key benefit of quasistatic compilation. Feedback from
run-time tells the compiler what paths are most commonly executed and what
data values occur most commonly. The compiler can reorganize and
restructure executable code to reduce the execution latency in the most
common cases.
Execution Latency Compression
We can also view quasistatic
computation from an information theoretic perspective. If a quantity is
known statically, or is highly predictable, the run-time value of the
quantity conveys little or no information. The known value can be folded
into computation at little run-time cost on average. Values whose
probability distributions are highly non-uniform admit to compression
using non-uniform coding, or in this case execution, sequences. Only
quantities which are completely unpredictable cannot be compressed by
recoding and hence must be fully evaluated each time they are encountered
at run time. Quasistatic computation including adaptive recompilation,
then, is directly analogous to adaptive data compression.
Embodiments
Quasistatic computation is only a net benefit to the extent that it
reduces execution latency. Certainly feedback analysis and recompilation
require additional computation. Fortunately, here we can readily exploit
throughput to perform the additional computation.
At a coarse granularity, we can exploit otherwise ``idle'' machine cycles
between programming runs to perform analysis and recompilation. When the
user is away pondering the results of a previous run, or at home for the
night, his workstation can analyze feedback and regenerate executables. In
the most basic incarnation, the compiler can tune execution and replace
whole executable binaries between program invocations. This approach is
probably the easiest to introduce into existing computer systems. However,
limited collection bandwidth and storage space may necessitate very frugal
collection of feedback data. Additionally, this scheme does not allow
programs to adapt to usage during a single execution.
On parallel or multithreaded systems, we might implement one or more
compilation servers. When application tasks do not consume all the
resources on the machine, compiler threads can analyze feedback data and
generate new execution sequences. Compilation threads will be
coarse-grained so can easily and efficiently exploit the throughput
available on parallel computers. ``Idle'' workstations on a LAN can also
serve efficiently in this capacity. Again, care must be taken to limit the
quantity of feedback data. Dynamic adjustment can be used to
match the data to the available collection bandwidth and analysis cycles.
Both of the previous suggestions borrow only spare throughput to perform
recompilation. However, strategic analysis and recompilation interspersed
with a program can also result in net reductions in application latency.
Interspersed analysis requires much more attention to analysis and
recompilation efficiency. This approach allows executable adaptation
during long program runs.
In addition to these cases where recompilation is initiated by an
external agent in response to feedback, the executable can explicitly
request compilation. In cases where earlier feedback suggests that a value
is constant past a point in the program, but the value of the constant is
unpredictable, it may be beneficial for the program to explicitly request
recompilation at that point during program execution. Several researchers
have demonstrated the advantages of this kind of run-time code synthesis in
various scenarios. Massalin and Pu employ run-time compilation in the
Synthesis kernel [MP89] to generate specialized kernel
operations. Keppel, Eggers, and Henry demonstrate that the explicit use of
run-time code generation can often produce better executable code than is
possible with purely static compilation [KEH93].
Foundational and Enabling Work
Quasistatics continues the intellectual legacy of common-case
optimization, trace-scheduling, and profile-driven compilation. Manual
optimization for the normal case is a well known system technique
[Lam84] for achieving good average performance. As a system
design technique, common-case optimization relies on the designer to
collect data, analyze the feedback, and optimize accordingly. Trace
Scheduling [Fis81] [FERN94] and
profile-driven ( e.g. [Wal91] [Sam91])
compilation take this idea one step further and automate code
reorganization based on profile traces from sample program runs.
Quasistatics takes the next step beyond these techniques by integrating
optimization based on observed common cases into the lifetime of the
program.
Run-time code generation research also plays an inspirational and
foundational role in quasistatic computation. The Synthesis kernel
effectively demonstrates the power of invariant factorization to produce
highly efficient, specialized functions at run time
[MP89]. Value-specific optimization demonstrates more
general and portable application of run-time code generation
[KEH93]. Deferred compilation employs static, staging
analysis on functional programming languages to automatically determine
opportunities for invariant factorization [LL93].
Quasistatic computation completes the picture by allowing the compiler to
automatically recognize opportunities for run-time code generation which
are not easily identified by static analysis.
Maturing research in partial evaluation and specialization ( e.g.
[Ber90] [JGS93]) provides us with
the tools to take advantage of known data values to produce streamlined
executables.
Quasistatic optimization would be of limited utility without efficient
techniques for lightweight profiling and tracing of execution flow and data
values. Many recent developments in this area play a key role enabling
quasistatic computation. Appel introduces simple, compiler-inserted
profiling annotations which support lightweight profile collection
[ADM88]. Larus demonstrates the efficiency of modern
techniques for compiler-based program tracing
[Lar93]. Wahbe presents efficient schemes for data
value monitoring during program execution [WLG93]. His data
tracing techniques allow programs to identify when values change and can be
much more lightweight than trap-based techniques such as the read-only,
virtual-memory page scheme suggested in Section .
In quasistatic computation, the compiler may be called upon continuously
to incrementally recompile portions of the program. Several compilers for
object-oriented languages, including the SELF compiler
[CUL91], employ dynamic compilation to generate executable
code from a simple intermediate form during execution. Notably, since the
SELF compiler is employed dynamically, it can often exploit knowledge of
data values which are not known until run time. Researchers at Sun
Microsystems have developed techniques for modifying more conventional
compilers for efficient incremental recompilation
[FST91].
Implications
As a discipline, we recognize the importance of analyzing dynamic
behavior and optimizing our systems accordingly. While feedback is used in
spot applications, we have not, traditionally, included it as a fundamental
part of our computational model. As noted herein, there are many good
reasons to integrate automatic feedback generation, consumption, and
feedback-oriented adaptation into our base-line computing model.
Once we accept this new model, many optimizations which the clever
programmer could make can now be automated and performed by a clever
compiler. This has the obvious benefit of taking some of the burden for
extracting performance off the application and system programmers. More
fundamentally, though, the model protects our programs against inevitable
changes in technology, changes in external systems, and shifts in usage
patterns.
One key shift which quasistatic computation encourages is a general move
to more explicit management of resources ( e.g. caches, virtual memory
paging, memory management schemes). Rather than relying on oblivious
mechanisms which simply react to events at run time, we can predict likely
events and usage patterns. We can often avoid the overhead associated with
belated discovery of an exceptional case ( e.g. data item not in the
cache, TLB miss) by explicitly predicting that it will occur and providing
a hint in the instruction stream to explicitly handle or prepare for the
event.
Future Computer Systems
Our systems should be designed to:
- Collect statistical and usage information -- Collection should be
cheap and flexible. At times the compiler will require more
detailed information. Ideally, it should be possible to get more
detailed feedback data on some aspect or portion of a system while
collecting less information on the rest of the system.
- Communicate statistical and usage information to the compiler --
Many of our contemporary systems can collect this kind of
information, but the consumers have traditionally been people.
Future systems will need to seamlessly provide this information to
the compiler as needed.
- Accept hints issued explicitly by the instruction
stream -- When a priori knowledge of future events can make
processing more efficient, the system should be designed to accept
and utilize hints provided by the executing program. Ideally,
there will be correlation between the kinds of statistics and usage
information a system collects and the kinds of hints it can
exploit.
- Support speculative operations whenever possible -- Much of the
advantage afforded by quasistatic computation results from
aggressive optimization for identified, common-case execution.
Speculative support allows programs to execute down paths which may
turn out to be incorrect without causing unrepairable damage.
The more freedom the system provides for such speculation, the more
performance quasistatic computation can extract.
Future Architectures
To some extent, quasistatic computation allows us to better exploit the
characteristics we already know how to architect well. In particular,
quasistatic computation sets out to increase predictability allowing us to
better exploit throughput to increase performance. This shows up partially
as larger statically scheduled code blocks. It also shows up in the
accurate estimates of timing and data requirements, allowing data access
latency to be more completely overlapped with useful computation.
New architectures should explicitly consider the system support suggested
in the previous section:
- Feedback -- Computational devices should be designed
to allow lightweight extraction of relevant statistical
information.
- Hints -- Almost any heuristically managed resource (
e.g. caches, TLBs) should provide support for explicit
management from the instruction stream. Support for software
prefetching [CKP91] is a good example of such a
hint. Non-binding prefetch [RL92] interacts better
with speculatively executed prefetch operations.
- Speculation -- In traditional architectures, a crude form of
common-case speculation and backout is provided via faults and
traps. Unfortunately, the overhead associated with these traps on
modern processors is extremely high, making such optimization
fruitful only for highly predictable events. Support for
common-case speculation with lightweight backout would increase the
range of worthwhile application. Torch, for example, introduces
support for instruction boosting across basic-block boundaries to
efficiently exploit speculation on superscalar architectures
[SLH90].
With the advent of reconfigurable logic, specialization is not limited to
the executable instruction stream. Architecturally, we can now design
general-purpose computing systems which can be configured, on the fly, to
provide application-specific acceleration. While the field of custom
computing using reconfigurable logic is relatively young, researchers using
modest amounts of reconfigurable logic have already demonstrated orders of
magnitude higher performance than general-purpose microprocessors or
workstations on numerous applications [BP93]. One of the key
advantages of reconfigurable logic is that the architecture can be
specialized to match the computational requirements of the application.
Quasistatic computation can employ feedback to extract the critical
computational requirements for an application. Further, the quasistatic
compiler gathers information on the properties of the data normally seen
during the computation. Combining this information, the compiler can
synthesize minimum cost computing structures tailored to common-case
execution requirements.
Future Programming Languages
One key benefit of quasistatic computation is that we do not have to
forgo dynamic expressibility in order to achieve high performance. We have
seen that dynamic features do not inherently limit performance. Rather,
the unpredictability of dynamic quantities ultimately limits performance.
By adopting a quasistatic computational model, we recognize that all
dynamics are not equal, and we only pay a run-time cost for dynamic
features when they are used in unpredictable ways.
Languages and programming environments should be designed with the
feedback model in mind. Libraries should be able to provide multiple
implementation options for routines or data structures. Run-time feedback
will then allow the compiler to select the appropriate version for each
application ( e.g. [Low78] [Sam91]). As noted in
Section some versions of a routine or data structure
may be applicable only in a limited context; that is a version may not
provide all the functionality or be valid over the whole range of values
supported by other versions. As long as the programmer communicates the
range of applicability of each version to the compiler, the compiler can
select appropriate versions based on actual requirements seen during
execution.
Conclusion
We have articulated a paradigm for computing which differs substantially
from traditional static and dynamic computational models. Quasistatic
computation allows us to have dynamic expressibility and flexibility while
obtaining performance comparable to purely static code. Feedback and
adaptive recompilation throughout program lifetime provide the key elements
which make this kind of optimization possible. In effect, quasistatic
computation embodies classical common-case optimizations in an automated
manner. Quasistatic computation can also be viewed as a technique which
exploits computational regularities to adaptively compress program run
time. Through continuous adaptation and recompilation, quasistatic
computation allows programs to deliver high performance even across changes
in external assumptions, technology, and usage patterns. Many technologies
which enable quasistatic computation are beginning to mature, making this
style of computing ripe for exploitation. Future computing systems should
support feedback, hints, and speculation to take full advantage of the
opportunities afforded by quasistatic computing.
See Also...
References
- ADM88
-
Andrew Appel, Bruce Duba, and David MacQueen.
Profiling in the Presence of Optimization and Garbage Collection.
CS-TR 197-88, Princeton University, November 1988.
- Ber90
-
Andrew A. Berlin.
Partial Evaluation Applied to Numerical Computation.
In ACM Conference on LISP and Functional Programming. ACM, ACM,
1990.
- BP93
-
Duncan A. Buell and Kenneth L. Pocek, editors.
Proceedings of the IEEE Workshop on FPGAs for Custom Computing
Machines, Los Alamitos, California, April 1993. IEEE Computer Society, IEEE
Computer Society Press.
- BZ93
-
David Barret and Benjamin Zorn.
Using Lifetime Predictors to Improve Memory Allocation Performance.
In Proceedings of the SIGPLAN Conference on Programming Language
Design and Implementation, pages 187-196. SIGPLAN, ACM, June 1993.
[FTP link].
- CKP91
-
David Callahan, Ken Kennedy, and Allan Porterfield.
Software Prefetching.
In Proceedings of the Fourth International Conference on the
Architectural Support for Programming Languages and Operating Systems, pages
40-52. ACM, April 1991.
- CUL91
-
Craig Chambers, David Ungar, and Elgin Lee.
An Efficient Implentation of SELF, a Dynamically-Typed
Ojbect-Oriented Language Based on Prototypes.
LISP and Symbolic Computation, 4(3), 1991.
- DeH93
-
Andre DeHon.
Smart Compiler Advantages.
Transit Note 99, MIT Artificial Intelligence Laboratory, December
1993.
[tn99 HTML link] [tn99 PS link].
- DEK93
-
Andre DeHon, Ian Eslick, and Thomas F. Knight Jr.
Toward 21st Century Computing: Transit Perspective.
Transit Note 93, MIT Artificial Intelligence Laboratory, September
1993.
[tn93 HTML link] [tn93 PS link].
- DEMK93
-
Andre DeHon, Ian Eslick, John Mallery, and Thomas F. Knight Jr.
Prospects for a Smart Compiler.
Transit Note 87, MIT Artificial Intelligence Laboratory, June 1993.
[tn87 HTML link] [tn87 PS link].
- FERN94
-
Joseph A. Fisher, John R. Ellis, John C. Ruttenberg, and Alexandru Nicolau.
Parallel Processing: A Smart Compiler and a Dumb Machine.
ACM SIGPLAN Notices, 19(6):37-47, June 1994.
- Fis81
-
Joseph Fisher.
Trace Scheduling: A Technique for Global Microcode Compaction.
IEEE Transactions on Computers, 30(7):478-490, July 1981.
- FST91
-
Alastair Fyfe, Ivan Soleimanipour, and Vijay Tatkar.
Compiling from Saved State: Fast Incremental Compilatoin with
Traditional UNIX Compilers.
In Proceedings of the Winter 1991 USENIX Conference, pages
161-170. USENIX Assoiciation, 1991.
- GZ92
-
Dick Grunwald and Benjamin Zorn.
CustoMalloc: Efficient Synthesized Memory Allocators.
CU-CS 602-92, University of Colorado, Department of Computer Science,
University of Colorado, Boulder, Colorado, July 1992.
[FTP link].
- JGS93
-
Neil D. Jones, Carsten K. Gomard, and Peter Sestoft.
Partial Evaluation and Automatic Program Generation.
Prentice Hall International Series in Computer Science. Prentice
Hall, 1993.
- KEH93
-
David Keppel, Susan Eggers, and Robert Henry.
Evalu RunCompiled
Value-Specific Optimizations.
Technical Report 93-11-02, Department of Computer Science and
Engineering,University of Washington, Seattle, WA 98195, November 1993.
[FTP link].
- Lam84
-
Butler W. Lampson.
Hints for Computer System Design.
IEEE Software, 1(1):11-28, January 1984.
- Lar93
-
James R. Larus.
Efficient Program Tracing.
Computer, 26(5):52-60, May 1993.
- LL93
-
Mark Leone and Peter Lee.
Deferred Compilation: The Automation of Run-Time Code Generation.
CMU-CS 93-225, Carnegie-Mellon, December 1993.
[FTP link].
- Low78
-
James Low.
Automatic Data Structure Selection: An Example and Overview.
Communications of the ACM, 21(5):376-385, May 1978.
- MP89
-
Henry Massalin and Calton Pu.
Threads and Input/Output in
the Syn Kernel.
In Pro of the 12th ACM Symposium on Operating
Systems Principles, pages 191-201. ACM, December 1989.
- RL92
-
Anne Rogers and Kai Li.
Software Support for Speculative Loads.
In Proceedings of the Fifth International Conference on the
Architectural Support for Programming Languages and Operating Systems, pages
38-50. ACM, October 1992.
- Sam91
-
Alan Dain Samples.
Profile-driven Compilation.
PhD thesis, U.C. Berkeley, April 1991.
U.C. Berkeley CSD-91-627.
[FTP link].
- SLH90
-
Michael Smith, Monica Lam, and Mark Horowitz.
Boosting Beyond Static Scheduling in a Superscalar Processor.
In Proceedings of the 17th International Symposium on Computer
Architecture, pages 344-354. ACM, May 1990.
- Wal91
-
David Wall.
Predicting Program Behavior Using Real or Estimated Profiles.
In Proceedings of the ACM SIGPLAN 1991 Conference on Programming
Language Design and Implementation, pages 59-70. SIGPLAN, ACM, 1991.
- WLG93
-
Robert Wahbe, Steven Lucco, and Susan Graham.
Practical Data Breakpoints: Design and Implementation.
In Proceedings of the ACM SIGPLAN 1993 Conference on Programming
Language Design and Implementation. SIGPLAN, ACM, 1993.
Sigplan Notices, Volume 28, Number 6.
[FTP link].