CS265 Expert Topic - AJ Shankar

Incremental Computation
AJ Shankar
aj at cs dot berkeley dot edu
Presentation Slides

Incremental computation is an optimization technique that takes advantage of the resuability of successive, iterative computations that vary slightly from one another. For instance, if you wanted to compute the successive sums of each m-element window in an n-element array (with m < n) then you might use the following code:

for (int i = 0 ; i < n-m ; i++) {
	for (int j = i ; j < m ; j++) {
		sum[i] += ary[j];
	}
}

which is an O(n²) algorithm. However, using incremental computation, we can optimize the code in the following way:

for (int j = 0 ; j < m ; j++) {
	sum[0] += ary[j];
}

for (int i = 1 ; i < n-m ; i++) {
	sum[i] = sum[i-1] - ary[i-1] + ary[i+m-1];
}

taking advatange of the fact that we've already computed the previous window, which differs from the current one in only two values. The new algorithm is O(n).

I'd like to explore optimization algorithms that identify opportunities for incremental computation, as well as some of its more interesting applications.

Two notes
There is an entire area of study devoted to creating incremental algorithms; that is, algorithms that are explicitly designed to efficiently accomodate incremental changes to their inputs, or that incrementally compute a solution so that if computation is terminated part-way, a useful result can still be obtained. I do not deal with this area; instead, the papers I cite below deal with the automated incrementalization of existing code at compile-time. (...although at least one of the papers bridges the gap between the two areas by proposing an automated system that can introduce incrementalization on a function-call level.)

Also, considerable research has been done in developing an incremental model of computation; code is rendered incremental at run-time via function caching (storing the results of previous calls to the same function) and other methods. This approach relies on a runtime mechanism and as such never explicitly constructs incremental code that can be run by conventional means.

Papers

J. Earley, "High Level Iterators and a Method of Automatically Designing Data Structure representation," ERL-M416, Computer Science Division, University of California, Berkeley, February, 1974
The original paper that introduces the idea; couldn't find it online.

Robert Paige, J. T. Schwartz, Expression continuity and the formal differentiation of algorithms, Proceedings of the 4th ACM SIGACT-SIGPLAN symposium on Principles of programming languages, p.58-71, January 17-19, 1977, Los Angeles, California
(14 pages) This paper explores the idea of incremental computation at an extremely high level, with respect to sets. It introduces some preliminary heuristics to determine if an expression can be optimized. It also discusses the feasibility of an system that would automatically employ those heuristics and the symbolic transformations he discusses in the paper -- and reaches the conclusion that while full automation may be too difficult, even semi-automation would be beneficial.

Robert Paige, Shaye Koenig, Finite Differencing of Computable Expressions, ACM Transactions on Programming Languages and Systems (TOPLAS), v.4 n.3, p.402-454, July 1982
(53 pages) A more comprehensive look at the ideas presented in the previous paper. It goes through several step-by-step examples of pseudo-mechanical transformation to incremental code; however, the paper still only deals with a very high-level language.

Yanhong Liu, Tim Teitelbaum, Deriving Incremental Programs, Technical Report TR 93-1384, Department of Computer Science, Cornell University, Ithaca, New York, September 1993.
(41 pages) The first systematic approach to incrementalizing programs written in common functional languages. This paper defines many of the terms used in incremental computation, formalizes the notion of partial and complete incremental functions, and provides a step-by-step automated approach to deriving incremental functions. It also introduces a time model by which one can determine when to use a cached result, and when it might be faster to recompute the whole function. Finally, it provides a proof of correctness for the deriviations it suggests, as well as some potential future improvements.

Yanhong Liu, Tim Teitelbaum, Systematic derivation of incremental programs, Science of Computer Programming, v.24 n.1, p.1-39, February 1995
(30 pages) The published version of the above technical report. More formal, less readable =). I skimmed it, so I might be missing something...

Yanhong Liu, Tim Teitelbaum, Caching Intermediate Results for Program Improvement, Proceedings of the Symposium of Partial Evaluation and Semantics-Based Program Manipulation, June 21-23, 1995, La Jolla, California
(12 pages) Extends the basic "cached-result" framework with a second phase, one which exploits intermediate results computed in the original computation of f(x). Describes the "cache-and-prune" algorithm for identifying which intermediate results are useful, and employing them in the incrementalized code.

Yanhong Liu, Scott Stoller, Tim Teitelbaum, Discovering Auxiliary Information for Incremental Computation, Proceedings of POPL'96, January 21-24, 1996, St. Petersburg Beach, Florida
(14 pages) This paper describes phase three of of the overall incremental computation problem: discovering, computing, maintaining, and exploiting auxiliary information about the input value. (Note that this information is by definition not computed by f(x) and therefore is not considered an intermediate result.) The usual step-by-step analysis follows.

Yanhong Liu, Efficient Computation via Incremental Computation, Pacific-Asia Conference on Knowledge Discovery and Data Mining, December 1999
(24 pages) This summary paper provides an excellent overview of the basic ideas behind incremental computation, and also covers some of the additional research done to date, including the discovery and use of intermediate results and auxiliary information. It also steps through several simple examples and describes the author's first go of an actual implementation, CACHET. This paper is a good place to start.

Yanhong Liu, Scott Stoller, From recursion to iteration: what are the optimizations?, Proceedings of PEPM'00, January 22-23, 2000, Boston, Massachusetts
(19 pages) Liu again expands the scope of incremental computation to tackle recursive functions. Describes algorithm to convert recursion to iteration that handles multiple base cases and multiple recursive cases. Also provides some interesting stats on how tail recursion can actually result in slower code than normal recursion; the proposed algorithm avoids these slowdowns.

Yanhong Liu, Scott Stoller, Ning Li, Tom Rothamel, Optimizing Aggregate Array Computations in Loops, Technical Report TR 02-xxxx, Computer Science Department, State University of New York at Stony Brook, Stony Brook, New York, July 2002
(34 pages) I only had time to skim this paper. It tackles the specific task of applying incremental computation techniques to aggregate array computations in loops. Unlike previous papers, it includes comprehensive performance measurements (running time, cache misses, etc) taken from a variety of real-world algorithms.