Compiler Construction 1990-1999

> Literature references and annotations by Dick Grune,
Last update: Sat Nov 19 18:14:54 2022.

These references and annotations were originally intended for personal use and are presented here only in the hope that they may be useful to others. There is no claim to completeness or even correctness. Each annotation represents my understanding of the text at the moment I wrote the annotation.
> No guarantees given; comments and content criticism welcome.

* Robin Hunter, The Essence of Compilers, Prentice Hall, London, 1999, pp. 237.
Compiler construction with lex and yacc, pleasantly presented.

* Eelco Visser, Strategic pattern matching, in 10th International Conference, RTA-99, Lecture Notes in Computer Science #1631, ed. by Paliath Narendran and Michael Rusinowitch, Springer-Verlag, New York, 1999, pp. 30-44.
The rewrite rules in the nameless tree rewriting language from Visser, ICFP '98 (called "System S" here), is extended with a condition: $ l → r $ where s, meaning "l can be rewritten to r, provided the rewrite s succeeds". The resulting language is called Stratego. A compiler for it is available, which translates Stratego to C, roughly in Prolog fashion. Needless to say, the compiler itself is in Stratego.
     Three programming idioms in Stratego are explained and examined. The use of two of them is simplified by introducing new syntax structures, syntactic sugar; none of them requires an extension of System S. The first is contextual patterns, although it is not the patterns themselves that are contextual: they allow context to be transferred over long distances. A context pattern is a pattern that is allowed to match at any depth in the tree to which it is applied. It then binds the variables in it. These variables can then be used in other context patterns, both for context checking and for building new context-dependent trees.
     The second is recursive patterns. These are actually strategies, used as patterns of unspecified depth. They can be used to check trees for conjunctive and disjunctive normal form, for example.
     The third is overlays. Since the patterns in trees are often repetitive and verbose, macro-like abbreviations for them are introduced, which allow, for example, a tree like (Sym("App"), [(Sym("App"), [Lit("+"), l]), r]) to be written as Plus(l, r). I have no idea why they are called "overlays".
     These new idioms are used in the specification of a small type checker, in which type-correct expressions reduce to nothing and type-incorrect expressions reduce to a kind of minimal unmatchable core.

* Per Brinch Hansen, Java's insecure parallelism, ACM SIGPLAN Notices, vol. 34, #4, April 1999, pp. 38-45.
Claims that Java does not support monitors, but the claim seems to be based on the fact that the Java defaults 'unsynchronized' for methods and 'public' for fields do not lead to a safe monitor when one is programmed naively. This means that the programmer loses on compiler-supported error detection.

* C. Clark, Build a tree -- Save a parse, ACM SIGPLAN Notices, vol. 34, #4, April 1999, pp. 19-24.
Explains the difference between processing the nodes recognized during parsing on the fly and storing them as a tree. Obvious, but experience has shown that this has to be explained repeatedly.

* Eelco Visser, Zine-el-Abidine Benaissa, Andrew Tolmach, Building program optimizers with rewriting strategies, in ACM SIGPLAN International Conference on Functional Programming ICFP '98, ACM SIGPLAN Notices, vol. 34, #1, Jan. 1999, pp. 13-26.
A tree rewriting language is introduced. Its basic ingredients are (named) rewrite rules pattern → replacement, similar to the ones used in functional languages; they rewrite a tree to a tree. A rewrite rule applied to a node succeeds when its pattern matches the node, and then yields the replacement.
     The rewrite rules can be combined into (named) strategies, using sequential composition, non-deterministic choice, etc. An example is the rewrite rule (r1 + r2) <+ r3, which means "apply rule r1 or r2, and if both fail, apply rule r3". The rules can be parameters to the strategy; so the above strategy could be named preferably_1_or_2_else_3(r1, r2, r3). Strategies cannot be generally recursive, but repetition and recursion can be expressed using $mu$ notation: $mu$x(...x...) creates a subtree named x in which x can be used.
     The strategies can contain operators that describe the distribution of the rules over the children of the node the strategy is applied to. For example, the monadic strategy operator all applies its operand to all children of the node it is applied to, and succeeds if all applications succeed. So the traditional top-down tree visit using strategy s can be described by topdown(s) = $mu$x(s; all(x)): apply s to the node at hand and then apply the top-down visit to all children. All traditional tree visits (and many others) can be described succinctly this way.
     The above format allows "context-free" rewriting; a stronger mechanism is required for transferring information to arbitrary places in the tree. To this end, the rewriting mechanism is decomposed into its two parts: match, which matches the left-hand side and binds variables in it; and build, which constructs the replacement, using the bound variables. In addition, contexts are defined in which the variables can be transferred. This allows context-sensitivity.
     Using these mechanisms, the authors define an optimizing rewriter for RML (Reduced ML) that does, among other things, code hoisting, dead code removal, constant propagation and inlining. The rewriter covers one page, one third of which is type declarations.

* Robert Morgan, Building an Optimizing Compiler, Digital Press / Butterworth-Heinemann, Bedford / Boston, Mass., pp. 450. 1998,
Very practical and easily accessible account of building an optimizing compiler, to a reasonable depth. Discusses static single assignment methods and partial redundancy methods for code optimization. Provides a generalization of register allocation techniques.

* Anders Kristensen, Template resolution in XML/HTML, Computer Networks and ISDN Systems, vol. 30, #1-7, 1998, pp. 239-249.
Describes a macro processor which rewrites XML trees to HTML text, using templates. It does macro variable substitution, URL resolution, variable name substitution, and conditional inclusion. It also features template node handlers, which can convert a node (or the entire tree) any way they want; they are expressed in and XML-like syntax. All substitution and rewriting is dependent on the context in which it happens.
     [ Pica Summary: ] This paper describes a framework for applying templates to applications and documents on the Web. The primary motivation is the need of Web application developers to separate program logic from presentation logic. A template is a prototypical document or part thereof. It consists of content in the target language, HTML, XML, or plain text, plus markup specifying variable parts of the document. The Template Markup Language (TML) is an application of XML which defines a generic and flexible set of template markup elements. (Template Resolution in XML/HTML) is a framework for processing TML. It excels in being highly extensible -- both in the types of values variables can take, variables being URLs, and in the set of template elements recognized.

* Vineeth Kumar Paleri, Y.N. Srikant, Priti Shankar, A simple algorithm for partial redundancy elimination, ACM SIGPLAN Notices, vol. 33, #12, Dec. 1998, pp. 35-43.
[[ preliminary abstract ]] Improvement over Morel and Renvoise, and Dhamdhere.

* Christof Keßler, Scheduling expression DAGs for minimal register need, Computer Languages, vol. 24, #1, Sept. 1998, pp. 33-53.
Predecessor to Keßler and Bednarski, "A dynamic programming approach to optimal integrated code generation", ACM SIGPLAN Notices, Aug. 2001, 165-174.

* Thomas Reps, Maximal-munch tokenization in linear time, TOPLAS, vol. 20, #2, March 1998, pp. 259-273.
[[ preliminary abstract ]] The paper recognizes the quadratic nature of finding all tokens in the input stream, due to rescanning the overshoot. The problem is solved by tabulating the results of the overshoot and reusing them during the rescan. Methods to reduce the table size are discussed.

* Eric DuJardin, Eric Amiel, Eric Simon, Fast algorithms for compressed multimethod dispatch tables, ACM TOPLAS, vol. 20, #1, Jan. 1998, pp. 116-165.
[[ preliminary abstract ]] A multimethod of a class is a set of methods with the same name that differs only in the actual types of the dynamically-typed (polymorphic) parameters. Since the object type itself can be viewed as one such parameter, there is no fundamental problem, but there is a practical one: with n such parameters, the dispatch tables are now $n+1$-dimensional, and since on each dimension there are usually quite a number of types, each table gets gigantic. Gigabyte or even terabyte sized tables would easily arise.
     In principle, the authors apply fairly usual table compression techniques; the trick is to combine them with table generation so we don't end up first generating huge tables. Also, more than 98 percent of the naive table entries would correspond to calls that could never occur. Their technique reduces terabytes to kilobytes.

* Derek Beng Kee Kiong, Compiler Technology -- Tools, Translators, and Language Implementation, Kluwer, Boston, 1997, pp. 210.
Typically Singapore-style well-behaved no-risk approach to compiler design. Broad insightful coverage with relatively little detail. Good, smooth introduction to compiler design, but the lack of detail might force the student to take a lot on faith.

* Augusto Sampaio, An Algebraic Approach to Compiler Design, World Scientific, 1997, pp. 188.
Introduces a process algebra and shows how it can be used to transform a source program into an object program in a provably correct way. Examples using the OBJ3 rewriting system given.

* Steven Muchnick, Advanced Compiler Design and Implementation, Morgan Kaufmann Publishers, San Francisco, Ca 94104, 1997, pp. 856.
Massive, encyclopedic work on advanced issues in compiler construction, with more than 60 percent of the text concerning target code optimization, in 21 chapters and 3 appendices. The algorithms are written in ICAN, Informal Compiler Algorithm Notation, a stylized Modula-2, extended with serious set operations and quantifiers. Chapters 3 to 6 cover the preliminaries: symbol tables, intermediate codes, run-time support, and automatic generation of code generators. Chapters 7 to 10 are concerned with analysis: control flow, data flow, dependence and aliasing. Chapters 11 to 15 discuss optimization: introduction, early optimizations, redundancy, loop, procedure call. Chapters 16 and 17 treat register allocation and code scheduling. Chapters 18 to 20 discuss advanced optimizations: control flow, interprocedural and multi-level memory. The last chapter covers case studies and future trends. Appendix B supplies advice on implementing the data structures used in compiler construction.

* Kenneth C. Louden, Compiler Construction -- Principles and Practice, PWS Publishing, 1997, pp. 582.
Modern treatment of the subjects of the Red Dragon Book, more or less in the same framework. Perhaps a good modern replacement of Aho, Sethi and Ullman. Uses a running demo language TINY.

* P.D. Terry, Compilers and Compiler Generators -- An Introduction Using C++, Int. Thomson Comp. Press, London, 1997, pp. 579.
Very student-friendly, very soft introduction to compiler construction, using Coco/R, an LL(1) recursive descent parser generator (by Mössenböck). Compiler construction made as simple as possible, but rather limited in nature. Full of case studies.

* Ting Yu, Owen Kaser, A note on "On the conversion of indirect to direct recursion", ACM TOPLAS, vol. 19, #6, Nov. 1997, pp. 1085-1087.
Algorithmic correction to Kaser, Ramakrishnan, Pawagi, "On the conversion of indirect to direct recursion", ACM Letters on Programming Languages and Systems, Vol. 2, No. 1-4, March-Dec. 1993, pp. 151-164.

* Deborah L. Whitfield, Mary Lou Soffa, An approach for exploring code-improving transformations, ACM TOPLAS, vol. 19, #6, Nov. 1997, pp. 1053-1084.
Final version of Whitfield & Soffa, "Automatic generation of global optimizers", ACM SIGPLAN '91 Conference Programming Language Design and Implementation, June 1991, pp. 120-129.
     A test bed for global optimizations is constructed as follows. A fairly extensive language, called "Gospel" for "General Optimization Specification Language", is used for the specification of global optimizations. A specification consists of a set of precondition-action pairs, where the precondition specifies code patterns and dependencies, and the action specifies modifications of the code. The specification is translated into a flexible global optimizer by a program called, yep, Genesis.
     To operate the system, the source program is translated into an intermediate code + a set of data dependencies of various kinds. This representation can then be compiled using the global optimizer; the user can decide which optimizations to run in which order, to experiment with their applicability.
     The authors selected 8 global optimizations (constant propagation, loop fusion, etc.) for their present experiment, and analysed their possible interference. One result is that in about half of the combinations interference is theoretically possible.
     Next, the system was applied to two large sets of Fortran programs and applications of the 8 optimizations was tallied. The result was that for this mix constant propagation is by far the most applicable global optimization, followed by loop unrolling and dead code elimination. The only serious interference found was positive, between constant propagation and dead code elimination. Experimentation with the order of optimizations showed that there is no one best order, unsurprisingly.

* Henry G. Baker, The COMFY 6502 compiler, ACM SIGPLAN Notices, vol. 32, #11, Nov. 1997, pp. 25-30.
Report and analysis of a Lisp-based macro processor and set of macros written in 1976 and together forming the compiler for a high-level assembler for the 6502, called COMFY 6502. The idea is to show that minimal means can sometimes yield superb results. A rewrite to GNU Emacs Lisp is supplied.

* Jens Ernst, William Evans, Christopher W. Fraser, Steven Lucco, Todd A. Proebsting, Code compression, ACM SIGPLAN Notices, vol. 32, #5, June 1997, pp. 358-365.
Two forms of code compression are distinguished:
     1. wire code, which is optimized for transmitting over a "wire"; the entire program is compressed, transmitted, decompressed and the run;
     2. interpretable code, in which the compressed code remains compressed during execution, and is only decompressed segment-wise when run.
     The best technique the authors found for wire code compression consists of:
     1. Convert the whole program into a sequence of small trees, each consisting of an operator and some operands; some of the operands may again be small subtrees.
     2a. Make an inventory of the occurring tree patterns, where small trees that differ in the values of their operands (leaves) only are considered equal, and number the patterns. (Operands of the same type but different size are considered different.)
     2b. Make an inventory of the occurring operands (constants), grouped by size, and number them independently in each group. Say there are N groups.
     2c. Convert the program to 1+N streams of numbers, one with tree pattern numbers and N with operand numbers; since the tree patterns indicate the number and sizes of the operands they require, the streams can be read back simultaneously without synchronization problems.
     3. Renumber each sequence using move-to-front: renumber the item just read to 0 and rearrange the rest. This keeps the numbers in the sequences low, for further compression.
     4. Huffman-compress the streams.
     5. Combine with the initial number tables (called index tables here) and compress with gzip.
     This is what the authors call "very simple". They obtain compression ratios of 100 : 24 : 20.4 and 100 : 27 : 20.8 (uncompressed : just gzip : the above scheme), which is a 20% gain over gzip.
     Then they describe the interpretable code compression scheme, in which they do many non-simple things. It achieves about the same compression as gzip, but the result can be decompressed and interpreted piecemeal, on the fly.

* Norman Ramsey, Mary F. Fernández, Specifying representations of machine instructions, ACM TOPLAS, vol. 19, #3, May 1997, pp. 492-524.
In spite of the title, the paper describes the automatic generation of a generator for assemblers in library form. The instructions and the bit patterns and other paraphernalia are specified in a number of tables that can refer to each other. The tables have more or less an Ada look and feel.
     Much, argued, detail is given, supported by numerous examples.

* Pedro Diniz, Martin Rinard, Dynamic feedback: An effective technique for adaptive computing, in 1997 ACM SIGPLAN Conference on Programming Language Design and Implementation, ACM SIGPLAN Notices, vol. 32, #5, May 1997, pp. 71-84.
[ tries pieces of compiled code to see what is best; only slightly less crazy than Collberg]

* Christian S. Collberg, Reverse interpretation + mutation analysis = automatic retargeting, in 1997 ACM SIGPLAN Conference Programming Language Design and Implementation, ACM SIGPLAN Notices, vol. 32, #5, May 1997, pp. 57-70.
See Collberg, "Automatic derivation of compiler machine descriptions", TOPLAS, 24, 4, July 2002.

* G. Ramalingam, Harini Srinivasan, A member lookup algorithm for C++, in 1997 ACM SIGPLAN Conference Programming Language Design and Implementation, ACM SIGPLAN Notices, vol. 32, #5, May 1997, pp. 18-30.
[ it's more complicated than expected ]

* Henk Alblas, Albert Nymeyer, Practice and Principles of Compiler Building with C, Prentice Hall, London, 1996, pp. 427.
The first 300 pages contain a step by step account of the construction of a specific compiler from SLANG (Simple LANGuage) to VIM (VIrtual Machine). Constructing the front end is done twice, once using manual techniques, with a handwritten lexical analyser and a recursive descent syntax analyser combined with identification, and once using attribute grammars. The back end does on-the-fly code generation for the VIM; the VIM code is then translated to C, compiled and run. Actual timing measurements are supplied.
     The next 100 pages treat three subject more theoretically: scanning, parsing and attribute evaluation. However, LR parser table construction is not discussed, and nor is multi-visit partitioning of attributes.

* Niklaus Wirth, Compiler Construction, Addison-Wesley, Harlow, Eng., 1996, pp. 176.
Full but still short description of a compiler in Oberon which translates a subset of Oberon, Oberon-0, to some abstract RISC machine. Almost no information outside that path. Literature references are minimal.

* Jan Bosch, Delegating compiler objects, in Compiler Construction: 6th International Conference, CC'96, Lecture Notes in Computer Science #1060, ed. by Tibor Gyimóthy, New York, Springer-Verlag, 1996, pp. 326-340.
Very high-level, rather vague description of parse tree nodes as objects, plus a tool PHEST to experiment with them. The objects also do the parsing.

* Jan Vitek, R. Nigel Horspool, Compact dispatch tables for dynamically typed object-oriented languages, in Compiler Construction: 6th International Conference, CC'96, Lecture Notes in Computer Science #1060, ed. by Tibor Gyimóthy, New York, Springer-Verlag, 1996, pp. 309-325.
Excellent explanation of the implementations of dynamic binding in dynamically typed object-oriented languages. First the basic algorithms of dispatch table search, selector-indexed dispatch tables and virtual-function tables are explained. Next several traditional table compression algorithms are applied to these tables. The actual research reported consists of fine-tuning the dispatch table algorithm by partitioning the table(s).

* Todd A. Proebsting, Benjamin R. Whaley, One-pass optimal tree parsing -- with or without trees, in Compiler Construction: 6th International Conference, CC'96, Lecture Notes in Computer Science #1060, ed. by Tibor Gyimóthy, New York, Springer-Verlag, 1996, pp. 294-308.
The original BURG algorithm requires two passes: one for the labelling and one for the reduction phase. For a large class of grammars, this can be reduced to one pass. The algorithm to do so is sketched but not given explicitly. The appendix contains an example.

* Dominique Boucher, Marc Feeley, Abstract compilation: a new implementation paradigm for static analysis, in Compiler Construction: 6th International Conference, CC'96, Lecture Notes in Computer Science #1060, ed. by Tibor Gyimóthy, New York, Springer-Verlag, 1996, pp. 192-207.
Static analysis often requires repeated traversal of the syntax tree, both because more than aspect may have to be analysed and because some analyses are iterative, to find a fixed point.
     The idea is to generate from a program P an "abstract compilation": a new program $AC(P)$ which can be compiled normally and be executed with descriptions of several analyses. The results of the calls of $AC()$ can then be used to optimize P.
     The paper gives lots of theory and shows that this technique is profitable for many functional programs.

* Mikael Pettersson, A compiler for natural semantics, in Compiler Construction: 6th International Conference, CC'96, Lecture Notes in Computer Science #1060, ed. by Tibor Gyimóthy, New York, Springer-Verlag, 1996, pp. 177-191.
[[ preliminary abstract ]] The "natural semantics" from the title is Relational Meta-Language RML, a Prolog-like language for specifying a type system with attached semantics. The compiler generates C, using continuations. The paper gives a clever implementation of continuations in C, called "dispatching switches".

* Albert Nymeyer, Joost-Pieter Katoen, Ymte Westra, Henk Alblas, Code generation = A* + BURS, in Compiler Construction: 6th International Conference, CC'96, Lecture Notes in Computer Science #1060, ed. by Tibor Gyimóthy, New York, Springer-Verlag, 1996, pp. 160-176.
A bottom-up rewriting system describing machine instructions is combined with the A* search algorithm and a cost function to generate optimal code.

* Christina Cifuentes, Structuring decompiled graphs, in Compiler Construction: 6th International Conference, CC'96, Lecture Notes in Computer Science #1060, ed. by Tibor Gyimóthy, New York, Springer-Verlag, 1996, pp. 91-105.
One of the few papers on decompilation. An intuitively reasonable set of patterns, including those characteristic of short-circuit operators, is used in a predescribed order to match to control flow graph as it resulted from decompilation. This allows decompilation into a structured language, as far as the original permits. The paper gives 29 literature references on automatic structuring.

* Clark Verbrugge, Phong Co, Laurie Hendren, Generalized constant propagation -- a study in C, in Compiler Construction: 6th International Conference, CC'96, Lecture Notes in Computer Science #1060, ed. by Tibor Gyimóthy, New York, Springer-Verlag, 1996, pp. 74-90.
A very clear description of numeric range propagation in C, both intraprocedural and interprocedural, with special attention to convergence. Recommended!

* Jian Wang, Guang R. Gao, Pipelining-dovetailing: a transformation to enhance software pipelining for nested loops, in Compiler Construction: 6th International Conference, CC'96, Lecture Notes in Computer Science #1060, ed. by Tibor Gyimóthy, New York, Springer-Verlag, 1996, pp. 1-17.
The nested loop is partially unrolled into a start-up phase, a repeating phase and a close-down phase. The tail of one cycle in the repeating phase and the head of the next are then pipelined together. This is all done in software, which means that the result is another nested loop which will allow better pipelining.

* M. Anton Ertl, Andreas Krall, Removing anti-dependencies by repairing, in Compiler Construction: 6th International Conference, CC'96, Lecture Notes in Computer Science #1060, ed. by Tibor Gyimóthy, New York, Springer-Verlag, 1996, pp. 33-43.
During instruction rescheduling, a statement that assigns to a variable x cannot be moved up past a statement reading from x, which limits the rescheduling. Two ways have been published to circumvent the problem: renaming, which renames x in its second live range to say $ x prime $; and rematerialization, which recalculates the value of x when it is needed again. The first increases register pressure and is impossible when the live range is a loop; the second is inefficient, except when the value is a constant.
     The authors propose and analyse a variant of rematerialization: repairing. Repairing is applicable when the instruction that spoiled x is invertible: generate the inverting instruction before x is used again. In many case the CPU unit for it is free anyway.

* Olivier Danvy, Karoline Malmkjær, Jens Palsberg, Eta-expansion does The Trick, ACM TOPLAS, vol. 18, #6, Nov. 1996, pp. 730-751.
[[ preliminary abstract ]] On the relationship between eta-expansion and partial evaluation.

* Todd A. Proebsting, Charles N. Fischer, Demand-driven register allocation, ACM TOPLAS, vol. 18, #6, Nov. 1996, pp. 683-710.
First local interference graphs are constructed; next global information is used to assign to each node an estimate of the gain that would result if the corresponding variable would be given a register. This estimate is then used to guide the register colouring algorithm. This technique is intended to avoid ad-hoc register spilling.

* Vugranam C. Sreedhar, Guang R. Gao, Yong-Fong Lee, Identifying loops using DJ graphs, ACM TOPLAS, vol. 18, #6, Nov. 1996, pp. 649-658.
Dominator arcs are drawn in a flow-graph, in such a way that they replace the original arcs if they exist. The remaining arcs in the flow-graph are called Join arcs, hence the name DJ graphs. Several extensions of the usual dominator graph algorithms are given, in particular those that efficiently identify irreducible loops.

* Stan Liao, Srinivas Devadas, Kurt Keutzer, Steven Tjiang, Albert Wang, Storage assignment to decrease code size, ACM TOPLAS, vol. 18, #3, May 1996, pp. 235-253.
Code size is an important factor in programs for digital signal processing. It can be reduced by judicious memory lay-out, so that autoincrement and autodecrement instructions which are usually found on these machines, can be used to access data successively. The paper shows how to express the data access pattern of a program in a graph and how to derive the optimal use of the autoincrement and autodecrement instructions from this. Even the simplest version of the problem is proven to be NP-complete, but clever and efficient heuristic algorithms are given. The average code size reduction achieved is about 25 percent.

* Lal George, Andrew W. Appel, Iterated register coalescing, ACM TOPLAS, vol. 18, #3, May 1996, pp. 300-324.
Chaitin's and Briggs' coalescing heuristics are combined and refined, resulting in far fewer move instructions.

* David Evans, Static detection of dynamic memory errors, in 1996 ACM SIGPLAN Conference Programming Language Design and Implementation, ACM SIGPLAN Notices, vol. 31, #5, May 1996, pp. 44-53.
LCLint is [apparently] an aggressive lint-like C program checker. The paper describes the dynamic memory allocation checking facilities of LCLint as it is used by the user. C provides too little information for this purpose and the program has to be annotated with pseudo-comments, of the form /*@keyword@*/. Several of these keywords and their concepts are described. For example, the annotation /*@owner@*/ to a pointer variable V means that V is the owner of the memory chunk it points to, and that allocating and deallocation it can only be performed through V. LCLint assembles the statuses of all pointer variables by iterating over the program graph (interprocedurally as well). Inconsistencies are then reported. Several, very "sneaky" bugs in real-world programs were found with this technique.

* Mark W. Bailey, Jack W. Davidson, Target-sensitive construction of diagnostic programs for procedure calling sequence generators, in 1996 ACM SIGPLAN Conference Programming Language Design and Implementation, ACM SIGPLAN Notices, vol. 31, #5, May 1996, pp. 249-257.
Procedure calling sequences are specified, implicitly or explicitly, as a list of rules. A simple example is: the first three parameters are passed in registers if they fit, the rest goes on the stack. Such rules tend to be too informal (as the example shows), which is why the authors have created a language, CCL (for Calling Conventions Language), to specify the rules precisely.
     A procedure calling sequence generator, which is a part of any compiler, can be viewed as a finite-state transduction automaton. The input tokens of this automaton are the types of the parameters, its output tokens are the code segments generated, and its state holds enough information, collected from the placing of earlier parameters, to decide where the next parameter is going to be placed, for all of its possible types. Such a state could contain a list of registers still available, the most recent position on the stack, with alignment, etc.
     The present paper is concerned with automatically generating test suites for these procedure calling sequences. To this end, a sufficient set of paths through the FSA must be traced to cover "all" the possibilities, and test programs for each of them must be generated.
     Test path set generation. As a heuristic, the "sufficient set" is defined as containing at least one occurrence of a passage through each state from each of its incoming arcs to each of its outgoing arcs; this makes sure that each possible junction of two code segments is tested. Such a set can be found by depth-first search on the FSA.
     There are two important considerations here. 1. The test paths do not need to include a cycle, so the depth-first search terminates. 2. Any path starting from the start and stopping anywhere is in principle acceptable. This means that the algorithm can emit a path and start backtracking at any moment; it does not require to see a leaf or a cycle.
     To guide the search, each transition is marked followed or not, and each state as visited or not. The visiting algorithm obeys two rules: 1. when a node marked visited is reached, the path is still extended with one more step along all outgoing transitions, regardless of their markings; 2. otherwise, a transition marked followed is not followed again. If we meet a node marked visited, we did so through a non-marked arc (rule 2); we stop the recursion but take one more step, to get the last combination. If we meet a transition marked followed, we do not follow it again, because we have already collected all combinations that lie behind it.
     Results. Using the procedure calling sequences specifications for 7 mature compilers, test sets were generated for them and were run. The results are shocking: not one of them generates correct code for all test calls! For one thing, gcc 2.4.5 failed on a call with three parameters, each of type struct(char). In total, 22 bugs were discovered.

* Michael Wolfe, High Performance Compilers for Parallel Computing, Addison-Wesley, Redwood City, Ca., 1996, pp. 570.
Summary: The book has a classical bottom-up structure, consisting of three parts: basics, optimization preliminaries, and parallelism. The basics cover high performance parallel language features; graph concepts; and linear algebra, in particular integer problems. The optimization preliminaries cover data dependence for scalars, arrays, pointers and procedures; loop restructuring; and locality. The parallelism part (we are at page 385 now!) consist of two subparts, one on techniques and one on machine architecture. Techniques include concurrency detection and code, and vector code. The machine architecture section discusses message passing machines and scalable shared-memory machines.
     Annotation: The book clearly builds up towards the final goal: high-performance compilers for parallel computing, and each chapter is a stepping stone for the rest of the book. The first two parts could well be used for an advanced course on (traditional) compiler optimization. The text is very readable and never difficult. The author often uses set theory notation where I think text would do, and the formalism is not always tight; e.g. on page 170 quantifiers for Y\d\s-21\s+2\u and Y\d\s-22\s+2\u are missing. Many algorithms are given in full, but some techniques have intuitive ideas behind them which remain unexpressed in formalism and algorithms.

* J.P. Bennett, Introduction to Compiling Techniques -- A First Course Using ANSI C, Lex and Yacc, McGraw Hill, 1996, pp. 283.
Densely packed 200-page treatment of of compiling for imperative languages, including optimization; based on his VSL (yep, Very Simple Language). The other 80 pages contain code for VSL.

* Christopher W. Fraser, David R. Hanson, A Retargetable C Compiler -- Design and Implementation, Benjamin/Cummings, Redwood City, Ca., 1995, pp. 564.
lcc, a retargetable C compiler, is described in "literary program" style: the book is about 30 percent code and 70 percent text, interspersed on a paragraph by paragraph basis. Explanation of basic principles is minimal, for example bottom-up tree rewriting for code generation is explained in 1 page, followed by some 30 pages showing how to do it.
     The front end consists of a handwritten lexical analyser and a handwritten predictive parser (possible since the grammar is very stable). Context checking (called "semantics" here) is handled in ad-hoc fashion, as far as I can see. Code selection is done by bottom-up tree rewriting and register allocation by (handwritten) register tracking. The three back ends covered generate assembly code (for the MIPS R3000, the SPARC and the Intel 80X86).
     Very detailed description of a real-life compiler, with surprisingly (shockingly?) little use of tools. Not for the novice, but still has exercises as if a text book. See also Fraser & Hanson,"A retargetable compiler for ANSI C", ACM SIGPLAN Notices, Vol. 26, No. 10, Oct. 1991, pp. 29-43.

* Reinhard Wilhelm, Dieter Maurer, Compiler Design, Addison-Wesley, Wokingham, England, 1995, pp. 606.
One of the few compiler construction books that cover other paradigms besides the imperative one. The book has a remarkable structure: the first half covers code to be generated for the various paradigms, and the second half describes front end techniques.
     Although the material covered is definitely adequate, the book suffers from a total lack of motivation: all is said but nothing is explained. Also, the translation (from German) is sometimes regrettable: abbreviations are not translated and continue to abbreviate their German originals, and in some places the text becomes clearer when translated back to German.

* Jim Holmes, Object-Oriented Compiler Construction, Prentice-Hall International, Englewood Cliffs, NJ, 1995, pp. 483.
Reports the construction of a Pascal compiler using object-oriented techniques systematically wherever possible. Bottom-up guidelines are given to find the top classes for the several class hierarchies for the nodes in the parse tree. To this end, strings of precedence relations between non-terminals are derived from the grammar, for example "Ident < Variable < Factor < Term < Simple_expr < Expr". The rightmost members of the longest such strings are nominated for top classes, which results in our example in "An Ident is a special kind of Variable, which is a special kind of Factor, which is a special kind of Term, which is a special kind of Simple_expr, which is a special kind of Expr".
     The parse tree for Pascal requires 55 different classes, most of them based eventually on ExprCls and StatementCls. The parse tree is constructed using yacc (no other parsing technique is explained). Context checks are done in the object constructors. Each object has at least the operations execute()/evaluate() for interpretation and emit() for code generation, in addition to an auxiliary operation print(). Each object in the supplied code also has the virtual operations optimize() and peephole(), but these remain virtual throughout the book. The emit() operation generates code for the Sparc. The author admits that attempts to generate code directly from the parse tree carry to little context to be very successful, and hints at a postprocessing peephole optimizer.
     The book covers only that part of compiler construction that will be useful in object orientation, and is not a general compiler construction book. Lots and lots of code.

* Luiz Carlos Castro Guedes, Edward Hermann Haeusler, José Lucas Rangel, Object oriented semantics directed compiler generation: A prototype, in Theory and Practice of Software Development TAPSOFT '95, Lecture Notes in Computer Science #915, 1995, pp. 807-808.
Based on denotational semantics. Accepts a syntactic and a denotationally semantic description of a language and generates from them an AST-building parser and a "Denotational Compiler". The latter contains types for denotational specifications and classes for implementations. Not much more info is given in this two-page abstract. The produced code is about ten times slower than that of Turbo Pascal.

* Karel Driesen, Urs Hölzle, Minimizing row displacement dispatch tables, in OOPSLA Conference '95, ACM SIGPLAN Notices, vol. 30, #10, Oct. 1995, pp. 141-155.
The dispatch table (singular!) in dynamically-typed object-oriented language is actually a large matrix: method[selector*type]. This matrix is very sparse, and is an invitation to compression algorithms. So, although the subject of this paper seems to be dispatch tables, it is actually the compression algorithm.
     Excellent description of row displacement table compression, applied to method dispatching in dynamically-typed object-oriented languages. The main ideas are
     1. Be careful in which direction you slice your matrix. Here slicing up along selectors works best.
     2. Pack the slices starting with the fullest first.
     Several algorithm improvements to reduce the packing time (which can be considerable!) are discussed and experiemental results are given.

* A.M. Sloane, An evaluation of an automatically generated compiler, ACM Trans. Programming Languages and Systems, vol. 17, Sept. 1995, pp. 691-703.
A rigorously equivalent version of the Icont compiler (written in C) is written using Eli, and the code sizes, run times and memory requirements are compared; both compilers are only lightly optimized, and generate exactly the same code. Code size for EliIcont is about half of that of the Icont compiler, with Icont being about 10 percent declarative and EliIcont about 80 percent. Run times were about 5 ± 5 percent faster for the EliIcont compiler. Memory requirements were about 7 times higher for EliIcont. Possibilities are suggested to reduce the latter.

* Jukka Paakki, Attribute grammar paradigms -- A high-level methodology in language implementation, Computing Surveys, vol. 27, #2, June 1995, pp. 196-256.
Very clear exposition of the state of the art in language implementation by attribute grammars.
     First, the traditional Knuth attribute grammars and their implications are explained, using examples rather than math, algorithms and proofs. It is then shown that this scheme leaves much to be desired, both in user friendliness (unstructured and flat) and in expressive power (no circularities allowed, no lazy evaluation, etc.).
     The second part of the paper addresses the organizational paradigms invented to remedy the clumsiness. Several structuring options are examined; the main combinations are considering a {non-terminal | production rule} as a { procedure | module | class }. Examples of existing systems are discussed and shown in some detail.
     The third part discusses various improved evaluation paradigms: equating attribute evaluation rules with Horn clauses, equating attributes with logical variables, equating synthesized attributes with functional functions, allowing lazy evaluation, incorporating fixed-point, incremental and or parallel evaluation, etc.
     The fourth section sums up the differences between the various techniques using a very simple 4-rule attribute grammar, and gives balanced advice on choosing a paradigm or an attribute evaluator. More than 100 literature references conclude the paper.

* Mary F. Fernández, Simple and effective link-time optimization of Modula-3 programs, in 1995 ACM SIGPLAN Conference Programming Language Design and Implementation, ACM SIGPLAN Notices, vol. 30, #6, pp. 103-115. June 1995,
Opaque types and method dispatching are efficiency sinks, whose bad effects can only be removed when all modules, including those of the libraries, are available, that is, at link time. The naive solution would be to compile everything from source code every time, but that is inefficient in compile time. A compromise is found in generating tree-formed object code for each module, in mill code (Modula 3 Intermediate Linker Language?), and to do the actual code generation, including all the optimization you want, on the final collection of modules, using an iburg BURS tree rewriter. So, rather than classic linking, we get link-time post-compiling.
     A system is described that does this for Modula 3 and that could be applied to C++. Link/post-compilation times are much higher than when just using UNIX ld, but the total compilation time is lower, since the UNIX compiler generates assembly code and the BURS rewriter generates binary immediately. The number of instructions goes down by 3-17%; run-time speed-up is less, due to instruction cache effects(?).

* T.A. Proebsting, BURS automata generation, ACM Trans. Programming Languages and Systems, vol. 17, #3, pp. 461-486. May 1995,
The paper describes a code generator generator called burg, which is essentially a simple bottom-up tree matcher with dynamic programming for instruction costs. The point is, however, that with great rigour all decisions that can be taken at code generator generator time are indeed taken; also, the CGG algorithm is heavily optimized using a variety of techniques. The pattern matching and dynamic programming decisions are combined in a single FSA table that will, from the states of the children, yield the the new state of the node in a single table look-up. A top-down scan will then decide what code to generate at each node, again using a table. The dynamic programming possibilities are less extensive, however, than those in systems that do dynamic programming at code generation time. Intensive state trimming algorithms are used to keep the tables small.
     The system has the same purpose and uses the same interface as the iburg system of Fraser, Hanson and Proebsting, "Engineering a simple, efficient code-generator generator", ACM Letters on Programming Languages and Systems, Vol. 1, No. 3, Sept. 1992, pp. 213-226. It is about 30 times faster and produces code that is about 6-12 times faster.

* Peter A. Buhr, Are safe concurrency libraries possible?, Commun. ACM, vol. 38, #2, pp. 117-120. Feb. 1995,
Shows that no optimization is safe in the presence of concurrency, unless all concurrent actions are marked in the text and the compiler is aware of these markings. (This is about existing languages extended with some concurrency primitives.)

* J.R. Ullmann, Compiling in Modula-2 -- A First Introduction to Classical Recursive Descent Compiling, McGraw-Hill, ???, 1994, pp. 400.
[ Not in the Dutch university libraries. ]

* John Elder, Compiler Construction -- A Recursive Descent Model, Prentice-Hall, New York, 1994, pp. 437.
Traditional compiler construction principles explained using the Model Modula-2 compiler as a case study, in a "literary program" fashion. The book is 80 percent text and 20 percent code; both are very clear and very traditional. The last 120 pages contain the code of the entire compiler, in Modula-2. The compiler generates naive code for the Apple Macintosh. The notion of "optimization" is mentioned in a section "Further reading", and register allocation is not mentioned at all. Only 2½ page of resulting target code is shown, but the principles of absolute and relocatable code are explained.
     Smooth and streamlined introduction to compiler construction for the Pascal / Modula-2 language series.

Patricia Pineo and Mary Lou Soffa A Practical Approach to the Symbolic Debugging of Parallelized Code Lecture Notes in Computer Science #786 * Terence J. Parr, Russell W. Quong, Adding semantic and syntactic predicates to LL(k): pred-LL(k), in Compiler Construction: 5th International Conference, CC'94, ed. by Peter A. Fritzson, Lecture Notes in Computer Science #786, New York, Springer-Verlag, 1994, pp. 263-277.
Describes the principles of ANTLR, a modern version of LLgen. Like LLgen it is an LL parser generator, featuring actions in all places and conflict resolvers, called "semantic predicates". The main differences are:
     -- ANTLR is LL(k), rather than LL(1).
     -- Semantic predicates are "hoisted": an alternative is chosen only if the input matches the FIRST of the alternative or any of its sub-alternatives and, if there is a conflict, the semantic predicate of the alternative returns true AND all semantic predicates found on the way to the alternative that will eventually match the input return true. Also, semantic predicates have to be side-effect-free; it is not clear if (and how) this is tested.
     -- Syntactic predicates test for the presence of a complete syntactic construct, thus allowing backtracking:

P: (A)? A B C | D
takes the first alternative only if a full A can be parsed. The test is done by recursively calling the parser with the actions switched off; A is in essence parsed twice. Of course this can interfere with semantic predicates, and it is stated that "during syntactic predicate evaluation, the semantic predicates that are evaluated must be functions of values computed when actions were enabled". It is found that backtracking takes negligible time.

* James S. Uhl, R. Nigel Horspool, Flow grammars -- a flow analysis methodology, in Compiler Construction: 5th International Conference, CC'94, ed. by Peter A. Fritzson, Lecture Notes in Computer Science #786, New York, Springer-Verlag, 1994, pp. 203-217.
Positions in the program are named (=continuations) and grammar rules are constructed in which these position names are the nonterminals and the intervening code fragments are the terminals. Productions of this grammar are possible flow paths through the program.
     The grammar of the flow paths inside one procedure is regular; if procedure calls are involved, the grammar is context-free. Moreover, a state can be defined for each position in the grammar, and the terminals are functions that transform the state. This setup can be used to calculate all kinds of properties of the program and its variables.
     As an example, the state is defined as a set of booleans, one for each variable in a procedure, and the system is used to do liveness analysis. The equations are solved by iteration. Next, the state is extended to liveness values for all variables in the program, and the same system now does global liveness analysis.
     Although it seems to me that all this is isomorphic to the usual in/gen/kill/out analysis, the framework is clean, well-known, and extends easily to interprocedural analysis.
     The paper is based on the PhD work of the first author.

* Lal George, Florent Guillame, John H. Reppy, A portable and optimizing back end for the SML/NJ compiler, in Compiler Construction: 5th International Conference, CC'94, ed. by Peter A. Fritzson, Lecture Notes in Computer Science #786, New York, Springer-Verlag, 1994, pp. 83-97.
[Interesting application of continuations].

* Andrew W. Appel, Axiomatic bootstrapping: a guide for compiler hackers, ACM TOPLAS, vol. 16, #6, pp. 1699-1718. Nov. 1994,
In traditional T-diagrams, left, right and bottom are marked with only one parameter: language / machine. The author uses 6 parameters: architecture, calling conventions, data type layout, environment representation, initial environment and primitive environment. Each can apply to running, generation and environment. Of the 18 combinations 13 are used. This allows the creation of T-blocks for booting, retargeting, booting by reparsing (elaboration) and stable versions.

* Dawson R. Engler, Todd A. Proebsting, An efficient retargetable dynamic code generation system, in Sixth International Conference on Architectural Support for Programming Languages and Operating Systems ASPLOS-VI, ACM SIGPLAN Notices, vol. 29, #11, Nov. 1994, pp. 263-272.
Dynamic (i.e. run-time) code generation is effected by supplying a library routine dcg_gen(), which accepts a pointer to a forest of ASTs, each of which describes a function. Dcg_gen() generates code for the ASTs using burg; binary code is produced immediately and register allocation is done on the fly; forward jumps are back-patched. The machine dependency is fully contained in the burg tables, which makes retargeting easy.
     The paper gives several examples of the applicability of the technique.
     One is matrix multiplication of large sparse matrices: ad-hoc routines are compiled dynamically for each column of the first matrix. These routines contain code for non-zero elements only. For large matrices, speed-ups of one to two orders of magnitude are achieved, depending on the actual processor.
     Another is an interpreter, which, under circumstances, can decide to compile certain routines. Again a factor of 50 in speed was gained.
     Some problems with flushing the instruction cache after code modification on the SPARC are mentioned.

* David B. Whalley, Automatic isolation of compiler errors, ACM TOPLAS, vol. 16, #5, Sept. 1994, pp. 1648-1659.
Describes the program vpoiso which automatically isolates errors in the compiler vpo.
     To find the one evil optimization in a compiler that contains thousands of them and applies each of them many times in a program, each application is counted in a counter and each piece of optimizing code is controlled by a test which compares the counter to a preset maximum, max. If the maximum is exceeded the optimization is not done.
     When a problem arises, the program is translated with a compiler with max set to 0 and then run; next the procedure is repeated with max set to infinity. If the results of the runs differ, the problem is in the optimizations. Now the system uses binary search to find the minimum value of max that will just trigger the error; this binary search typically takes some 10 minutes. The system then brings the (human) debugger to a viewer positioned at the offending compiler code.
     The paper also proposes a scheme under which non-optimization errors can be pinpointed automatically. It assumes the presence of another (native) compiler, which uses the same calling conventions; it assumes that neither compiler does interprocedural optimizations (it seems to me that they have to agree on more things than these, for example on data layout, character string representation, etc.) The test program is translated with vpoiso and with the native compiler; these translations are considered as lists of translated routines. Next the linker is rigged so as to include translated routines from vpoiso to a preset maximum max, and take the translations of the rest from the native compiler. For max set to zero we get the native translation, for max set to infinity we get the vpoiso version. The linker then uses binary search to find the minimum value of max that will just trigger the error. This identifies one routine that is translated incorrectly by vpoiso; it is up to the compiler constructor to take it from there.

* Timothy P. Justice, Rajeev K. Pandey, Timothy A. Budd, A multi-paradigm approach to compiler construction, ACM SIGPLAN Notices, vol. 29, #9, Sept. 1994, pp. 29-37.
Demonstrates how scanning can be done using object-orientation, parsing using logic programming and code generation using the functional approach. Unsurprisingly, parsing works best, it being essentially Definite Clause Grammars. But the point is not the novelty of the approaches, but the fact that they can work together; details of this are not shown, unfortunately.

* Damian Conway, Parsing with C++ deferred expressions, ACM SIGPLAN Notices, vol. 29, #9, Sept. 1994, pp. 9-16.
Sequel to Conway, "Parsing with C++ classes", ACM SIGPLAN Notices, Vol. 29, No. 1, pp. 46-52, Jan 1994. Expressions in actions are deferred by redefining all operators to construct an expression tree rather than to evaluate immediately. These expressions are then carried around and evaluated when needed, in lazy evaluation style. Examples are given.

* Christian Ferdinand, Helmut Seidl, Reinhard Wilhelm, Tree automata for code selection, Acta Informatica, vol. 31, #8, pp. 741-760. Aug. 1994,
Systematic mathematical derivation of tree parsing algorithms, including the subset algorithm for bottom-up tree-rewriting systems; the compression of the tables is integrated into the algorithms. The table compression reduces the original requirement of about 53M entries for a code generator for the NSC32000 to about 32k entries and then to about 3800 entries.
     Can be read as a sequel to Hoffmann & O'Donnell, "Pattern matching in trees", J. ACM, Vol. 29, No. 1, pp. 68-95, Jan. 1982.

* Todd M. Austin, Scott E. Breach, Gurindar S. Sohi, Efficient detection of all pointer and array access errors, in ACM SIGPLAN '94 Conference Programming Language Design and Implementation, ACM SIGPLAN Notices, vol. 29, #6, June 1994, pp. 290-312.
All pointers are replaced by so called safe pointers, records which contain the original pointer plus additional info: base and size, to detect spatial access errors; storage class: heap / local / global, to prevent freeing of local and global variables; and a capability. These capabilities can be looked up in a database: if the capability is there, the pointer is still valid, otherwise it is stale. This catches temporal access errors.
     The method catches all "normal" access errors; it is still possible to subvert the system by overlaying pointer by other types in unions and thus mess up the pointer and the additional info.
     A preprocessor is used to perform the substitution on normal C programs. Execution overheads range from 130 percent to 540 percent.

* Jens Knoop, Oliver Rüthing, Bernhard Steffen, Partial dead code elimination, in ACM SIGPLAN '94 Conference Programming Language Design and Implementation, ACM SIGPLAN Notices, vol. 29, #6, June 1994, pp. 147-158.
Code is partially dead if the effect of it is killed along some but not all paths further on. The remedy is to move the code forwards, splitting it over branches, until it, or actually all its copies, reach places where they are either fully dead or fully live. The paper gives algorithms to detect such code and to effect the code motion. The latter is quite complicated.

* Tim A. Wagner, Vance Maverick, Susan L. Graham, Michael A. Harrison, Accurate static estimators for program optimization, in ACM SIGPLAN '94 Conference Programming Language Design and Implementation, ACM SIGPLAN Notices, vol. 29, #6, June 1994, pp. 85-97.
Techniques for careful (but fairly unsurprising) static analysis of programs, using a combination of heuristics and a Markov model. The graphs in the paper must have been in colour but are now printed in black and white, regrettably.

* Torbjörn Granlund, Peter L. Montgomery, Division by invariant integers by multiplication, in ACM SIGPLAN '94 Conference Programming Language Design and Implementation, ACM SIGPLAN Notices, vol. 29, #6, June 1994, pp. 61-72.
The old CDC code revived, without mentioning it. The results are derived with much mathematical rigour, for various hardware division properties.

* Scott L. Burson, Continuations without copying, ACM SIGPLAN Notices, vol. 29, #5, pp. 27-30. May 1994,
Intermediate between continuations by copying and everything on the heap: when a stack frame is about to be removed from the stack but is part of a continuation, link it to the heap instead. This creates holes in the stack, which require extra pointers. See also Bobrow & Wegbreit, CACM, Oct 1973.

* Preston Briggs, Keith D. Cooper, Linda Torczon, Improvements to graph coloring register allocation, ACM TOPLAS, vol. 16, #3, May 1994, pp. 428-455.
Describes two optimizations to Chaitin's 1981 paper: optimistic colouring, and improved rematerialization.
     Chaitin's technique for handling register spills was to pass the number of available registers to the colouring algorithm and to generate spill code on the fly whenever the algorithm got into trouble. Optimistic colouring does the colouring under the assumption that there are enough registers, and adds spill code afterwards, if necessary. It turns out that optimistic colouring can avoid spilling in cases where there are just enough registers. In tests, it reduced the number of spills by between -2 and +42 percent, with a rough average of 20 percent.
     Rematerialization is the technique to avoid register spills by recalculating the result in a register rather than storing it to memory and retrieving it afterwards. The paper describes several detailed techniques to identify values that are cheaper to recalculate than to read from memory. In tests, it reduced the number of spills by between -26 and +33 percent, with a rough average of 20 percent.
     The paper gives much advice on the implementation of the above techniques and the actual register allocation. Recommended for anybody who writes a register allocator.

* Rajiv Gupta, Mary Lou Soffa, Denise Ombers, Efficient register allocation via coloring using clique separators, ACM TOPLAS, vol. 16, #3, May 1994, pp. 370-386.
The graph colouring process for very large graphs can be sped up by using a graph decomposition technique by Tarjan. The authors show how the decomposition can be effected without construction the entire graph first.

* S. Kannan, T.A. Proebsting, Correction to 'Producing good code for the case statement', SP&E, vol. 24, #2, Feb. 1994, pp. 233.
A recurrency relation is derived fro $ minCluster sub i $, the minimum number of clusters for the case items from 1 to i. It can be computed in $ O ( n sup 2 ) $ for all values from 1 to n. The clustering follows from there.

* Damian Conway, Parsing with C++ classes, ACM SIGPLAN Notices, vol. 29, #1, pp. 46-52. Jan. 1994,
Unlike Hall, "Parsing with C++ constructors", ACM SIGPLAN Notices, Vol. 28, No. 4, April 1993, pp. 67-68, the author does not use the constructors to do the parsing but rather redefines the | and the + as the disjunction and concatenation operators in grammars. This allows the programmer to write grammar rules as expressions in C++; evaluating such an expression will parse the corresponding part of the input stream and produce the parse tree.
     The are several frills. A lazy lexical analyzer is described, which allows part of the input stream to be reanalyzed if needed. The parser does naive backtracking (if one member in an alternative fails, the alternative is rejected). It is suggested (in one sentence) to include grammar rules in the lazy evaluation; this would reduce the parser from exponential to $ O ( n sup 3 ) $, but this is not pointed out.

* William M. Waite, Lynn Robert Carter, An Introduction to Compiler Construction, Harper Collins College Publ., New York, 1993, pp. 438.
The book seems to be written for an unusual audience: students with a minimum of experience; the authors have taken great pains never to be difficult. Large sections of the book consist of Pascal code plus explanations; the choice of Pascal as the implementation language seems to be induced by curriculum requirements and the authors themselves admit that Pascal is in fact unsuitable for compiler construction. Code generation is geared to a real machine, the DEC VAX, which is unusual, to say the least, for a book appearing in 1993. The book is unusual in other respects too: the authors deemphasize the use of tools! The book derives probably from lecture notes of long standing: the not infrequent inclusion of references to MS(!) theses from the University of Colorado, Boulder Co., in the bibliographies and the virtual absence of recent literature references reflect this.
     Although the book has 438 pages, the actual text covers only 342 of them; the last 96 pages contain Sample Project Documentation for a workshop, and 53(!) pages worth of index. Although the text uses attributes extensively, the word "attribute" does not occur in the index, and is mentioned in the text only on page 192.

* David A. Watt, Programming Language Processors, Prentice Hall, New York, 1993, pp. 452.
Part three of Watt's Magnum Opus. Surprisingly, the book is disappointing; as it says in the preface (p. xvi): "... depth has been sacrificed for breadth", but in some cases breadth also seems to have been sacrificed. Subjects that are conspicuously absent are LR(1) parsing, conversions and coercions, operator precedence and register allocation (the latter except saying that "Several factors make code generation for a register machine rather complicated"(p. 317); End of story). The text on type checking is minimal. Another sore point is the use of non-standard terminology: `tombstone' diagram for T-diagram, `starters' for FIRST set, and the use of x instead of == for is-defined-as.
     This is unfortunately not a valid book on compiler construction. (But maybe it wasn't meant to be?)

* GNAT Team, The GNAT project -- A GNU-Ada9X Compiler, 1993, pp. 17.
The purpose of the GNAT (GNU NYU Ada Translator) project is to quickly produce a user-friendly Ada compiler. GNAT turns the Ada text into an AST, adorns it with attributes (which also replace the usual symbol table), applies intermediate code generation to it (called "expansion" here), cuts up the AST and feeds it piecemeal to the GCC backend, which has been slightly extended for the purpose. This transforms one source .adb file into one object .o file. This procedure conforms to the traditional and convenient make model of C program compilation: as in C, the specification files (.h files) are analysed each time they are needed, which is unusual for Ada compilers. The paper gives salient details of each of these phases.
     The above procedure is not strong enough to support the Ada notion of separate compilation. To meet its needs, the object files are collected by a binder, which checks the consistency of the object files offered, and which composes a pre-main routine which elaborates the various packages in the proper order. The resulting set of object files is then fed to a normal linker, and an executable results.
     All front-end code (around 110,000 lines) is in Ada 9X, properly bootstrapped. The full compiler compiles at about 8000 lines/min on a 66MHz machine.

* Michael D. Tiemann, An exception-handling implementation for C++, in The Evolution of C++, ed. by Jim Waldo, MIT Press, Cambridge, Mass., 1993, pp. 174-185.
After wandering through many designs, the author settles on a construction

try {
except ep {
    EX {
        // code for exception EX, value available as ep
    EY {
        // code for exception EY, value available as ep
    default {
        // code for all other exceptions, value available as ep
        raise ep;
in which 'ep' has different types in the different branches.
     Rather than testing return codes inside the try part, exceptions start an exception interpreter, which looks at the instruction address at which the exception occurred (or to where the routine passing on an exception returned) and looks it up in a map prepared by the compiler, to find the error handler. This avoids overhead for non-exceptional execution.

* Jurgen Heyman, A 100% portable inline debugger, ACM SIGPLAN Notices, vol. 28, #9, Sept. 1993, pp. 39-.
Simscript II.5 is translated to C, and rather than relying on the C debugger, the generated C code is heavily instrumented for debugging. Much technical detail.

* Thomas Ball, James R. Larus, Branch prediction for free, in ACM SIGPLAN '93 Conference on Programming Language Design and Implementation, ACM SIGPLAN Notices, vol. 28, #6, June 1993, pp. 300-313.
Many heuristics are combined to do compile-time branch prediction. "For free" in this case means: without going through a compile-profile-compile loop to gather statistics. The average miss rate is about 20 percent.

* Daniel Weise, Roger Crew, Programmable syntax macros, in ACM SIGPLAN '93 Conference on Programming Language Design and Implementation, ACM SIGPLAN Notices, vol. 28, #6, June 1993, pp. 156-165.
Programmable syntax macros have almost the same form as the existing syntax structures, but are executed at compile time. Many examples from Lisp and C are given. Reminiscent of the old PL/I preprocessor.

* Mickey R. Boyd, David B. Whalley, Isolation and analysis of optimization errors, in ACM SIGPLAN '93 Conference on Programming Language Design and Implementation, ACM SIGPLAN Notices, vol. 28, #6, June 1993, pp. 26-25.
Preliminary version of Whalley, "Automatic isolation of compiler errors", ACM TOPLAS, Vol. 16, No. 5, Sept. 1994, pp. 1648-1659.

* Robert Wahbe, Steven Lucco, Susan L. Graham, Practical data breakpoints: design and implementation, in ACM SIGPLAN '93 Conference on Programming Language Design and Implementation, ACM SIGPLAN Notices, vol. 28, #6, June 1993, pp. 1-12.
Extensive analysis is described to reduce the number of instructions that must be replaced by breakpoint code. As a result, 75 percent of the naive checks can be suppressed.

* Philip W. Hall IV, Parsing with C++ constructors, ACM SIGPLAN Notices, vol. 28, #4, April 1993, pp. 67-68.
Raises the idea to rig the constructors of an OO language to do the parsing but fails to tell what to do about choices and failure. The setup and example the author gives work for LL(0) grammars only; lexical token choices are handled behind the screens and no test is made to see if they succeed.

* J. Templ, A systematic approach to multiple inheritance, ACM SIGPLAN Notices, vol. 28, #4, April 1993, pp. 61-66.
Covers independent multiple inheritance only. When C inherits from both A and B, simulate this by having C inherit from A only using single inheritance, create an empty object inheriting from B and put in pointer to this B-based object in C: twin objects. Translate all accesses accordingly. The paper shows generalizations to n-fold multiple inheritance, plus optimizations.
     One needs the empty B-based object (rather than B itself) to allow overriding of the operations on B in C.

* Amitabh Srivastava, David W. Wall, A practical system for intermodule code optimization at link-time, J. Program. Lang., vol. 1, #1, March 1993, pp. 1-18.
Describes the OM system, an advanced linker. It decompiles the code in all the modules to be linked into a symbolic register transfer language, links them, does extensive optimization on the result and compiles it back to object code. Note that the modules may originate from source code in more than one language.
     Although much of the high-level language structure is lost, much low-level structure becomes visible, for code motion, liveness analysis, etc. The total gain is about 5 percent. Running the linker on some selected programs cost between 5 and 90 seconds.

* Robert Metzger, Sean Stroud, Interprocedural constant propagation: an empirical study, ACM Letters on Programming Languages and Systems, pp. 213-232. vol. 2, #1-4, March-Dec. 1993,
Constant are progagated through procedure calls. For each call, this creates a new procedure, with one parameter less, which is now fixed. This new procedure can now be inlined, in which case probably many new optimizations will be possible, or it can be cloned, in which case the cloned version may give opportunity to fewer optimization but can be called from more than one place.
     Pages and pages of tables with results are given, detailing the different effects, but no general picture emerges, except that it helps.

* Owen Kaser, C.R. Ramakrishnan, Shaunak Pawagi, On the conversion of indirect to direct recursion, ACM Letters on Programming Languages and Systems, vol. 2, #1-4, March-Dec. 1993, pp. 151-164.
Attempts the conversion of indirect to direct recursion by inlining. Proves that inlining can remove all indirect recursion only if no strongly connected component in the call graph contains two or more disjunct loops (in other words: if each strongly connected component in the call graph contains at least one node which is on all loops in the SCC; this node is then inlined, etc.) Also shows that if inlining is interleaved with tail-recursion removal, much more recursion can be eliminated.

* Peter Bumbulis, Donald D. Cowan, RE2C: A more versatile scanner generator, ACM Letters on Programming Languages and Systems, vol. 2, #1-4, March-Dec. 1993, pp. 70-84.
RE2C is appropriately named: the only thing it does is converting regular expressions to C; but it does it well. RE2C is a preprocessor for C that will replace indicated lists of regular-expression / action pairs occurring in the C code with their equivalents in C; each such a list processes one token as defined in the list. Allowing several lists easily implements the user-maintained state kludge in lex. RE2C leaves buffering to the user; it requires three entities to be defined: a pointer to the next character in the buffer, a pointer to the limit of the buffer, and a routine to reset these two pointers so as to ensure at least n characters in the buffer.
     All this helps to keep the generated code small and fast. In addition, two important optimizations is applied.
     The first is to limit testing for end of buffer to a few states in the automaton and to ensure there that there will be enough characters in the buffer to reach any next checking state. (Rather than finding the minimum number of nodes to remove to make the graph acyclic, the authors just test at any node that is an element of a non-trivial strongly connected component, which they admit is overkill.) The second is the judicious replacement of switch statements in the generated code by sequences of if-statements or binary if-trees. The authors claim this helps: RE2C does a better job of switch statements than most compilers. Furthermore, some more minor optimizations are introduced, for example not testing against the end of a range if the value is known to be in range.
     It is 2 to 3 times as fast as flex and much smaller, and about 10 percent faster and smaller than GLA (Gray, 1988).

* Mike Beaven, Ryan Stansifer, Explaining type errors in polymorphic languages, ACM Letters on Programming Languages and Systems, vol. 2, #1-4, March-Dec. 1993, pp. 17-30.
The polymorphic language is ML. It is extended with two functions, Why and How, which accept as operands parts of the parse tree, and which the user can use interactively to elicit information about type errors. Many examples.

* Timothy P. Justice, An object-oriented compiler construction toolkit, MS Thesis, Oregon State University, Corvallis, Or., March 1993, pp. 44.
The "important" syntactic notions (statement, expression, type, etc.) are implemented as classes which supply their own code generation and optimization. Generic classes are used for lists of these. This simplifies the compiler.

* W.M. Waite, An executable language definition, ACM SIGPLAN Notices, vol. 28, #2, Feb. 1993, pp. 21-40.
The paper claims to be "an executable language definition", but does nothing to substantiate that claim, except for referring to an ftp site where the corresponding software can be found. It defines a subset of C using a literate-programming version of an attribute grammar. I don't know what to think of it; as it stands, it's not enough.

* Erik Meijer, Calculating Compilers, (PhD Thesis), Katholieke Univ. Nijmegen, Nijmegen, Feb. 1992, pp. 129.
Formal derivation of a compiler, starting from the commutativity of some semantic and transformation operators.

* David Saloman, Assemblers and Loaders, Ellis Horwood, Chichester, 1992, pp. 294.
Full description of many facets of assemblers, linkers, loaders, macro processors, conditional assembly, high-level assemblers, dynamic linking, and how to implement them. Mentions disassembly, but does not delve deeply into it; no closure algorithm to distinguish instructions and data. Implementations are described in flow charts. I find the book a bit short on examples. The bibliography is extensive with over 100 references, but a bit chaotic.
     Also describes the Microsoft Macro Assembler, the Borland Turbo Assembler, the VAX Macro assembler and the Macintosh MPW assembler. An appendix contains a description of addressing modes.

* Thomas W. Parsons, Introduction to Compiler Construction, Computer Science Press, New York, 1992, pp. 359.
Does exactly what it purports to do: being a stepping stone to Aho, Sethi and Ullman; and does it very well. Clear and pleasant style, beautifully done. Recommended for anybody who wants an easy but serious introduction to compiler construction. The literature references are a bit old, though.

* John R. Levine, Tony Mason, Doug Brown, Lex and Yacc, (2nd Edn.), Sebastopol, O'Reilly, 1992, pp. 366.
The book is a very gradual introduction to the full intricacies of Lex and Yacc. The book starts with a general overview, followed by and informal description of Lex and Yacc programs. This knowledge is then applied in two examples, a compiler for a menu generation language, and an SQL to C converter. Following are a hundred pages of precise specs of the Lex and Yacc languages; error handling is covered in another 10 pages. The book closes with an extensive description of the different Lexes and Yaccs in existence. The internal workings of Lex and Yacc are not touched upon.

* Thomas Pittman, James Peters, The Art of Compiler Design, Simon & Schuster, Hemel Hempstead, England, 1992, pp. 500.
Very pleasant and educationally smooth book, based consistently on a grammatical approach. Small but adequate bibliographies. Contains a bit too much code and too many type fonts to my taste. Friendlier than but not so sharp as Aho, Sethi and Ullman (1986).

* Andrew W. Appel, Compiling with Continuations, Cambridge University Press, Cambridge UK, 1992, pp. 262.
Describes the compiler for "Standard ML of New Jersey", which translates to various machine codes including the SPARC, and which uses continuations heavily.
     First all ML syntactic sugar is removed from the program, to leave a very simple abstract syntax tree, shown in a Lisp/ML-like notation, mini-ML; this phase is not described in the book. Next, continuations are collected in each position in the syntax tree, make all flow of control explicit. This continuation-passing style (CSP) syntax tree is now modified repeatedly under the influence of many optimizing transformations. Next, closures are made explicit and are converted.
     In principle, each variable is to be kept in a register, in a very simple correspondence. Since there is only a limited number of registers, the expressions inside the functions are manipulated symbolically so as to contain no more than k variables for a k-register machine. Spill code is produced to this end. Code generation is then straightforward.
     Each of these phases is described in detail; as it says on the cover, all the nitty-gritty details of compiling to really good machine code are covered. The performance evaluation chapter evaluates the compiler (more than half the compilation time is spent in the assembler!), and the generated programs. The benchmark is a set of well-known programs (life, lex, yacc, etc.) rewritten in ML. The most-important optimizations turn out to be: inline expansion of functions called only once, elimination of unused variables and constant folding of selections of known records. Common sub-expression elimination helped nothing.
     The book concludes with chapters on garbage collection, parallel programming and future directions. The book is written in an easy-going and sometimes self-critical style, and is full of examples. It is not a textbook though, in that there are no exercises. The literature list is extensive, and remarkably full of references about garbage collection.

* Karen A. Lemone, Fundamentals of Compilers -- An Introduction to Computer Language Translation, CRC Press, Boca Raton, Fl., 1992, pp. 184.
The author starts with an interesting equation: the design and implementation of a good compiler : a good compiler = the design and implementation of a good compiler text book : this book. And in the next paragraph we find "These two texts, ..., are modularized to deal with the pedagogical complexity of compiling." It shows. For one thing, there is a recurrent diagram of the structure of a compiler and each chapter starts with this diagram with the areas covered in that chapter highlighted. For another, textual explanations are kept very short --"succinct" would be a better word-- and are soon followed (or occasionally preceded) by diagrams and examples. There are also tons of exercises, and a running compiler project, often with several options in the instructions.
     The contents are those of the average introductory compiler construction text, except that attributes are emphasized from the beginning. Front-ends are covered in 107 pages, symbol tables in 10, and back-ends in 30.

* Karen A. Lemone, Design of Compilers -- Techniques of Computer Language Translation, CRC Press, Boca Raton, Fl., 1992, pp. 310.
Packs a surprising amount of information covering the entire compilation process in relatively few pages, but is unfortunately much less pedagogical than Part I. It even contains quite a number of features that are far from pedagogical: p. 20: the explanation of late binding is hard to understand and probably wrong; p. 26: the example of polymorphism is actually an example of a (probably) overridden method; p. 79 & 83: the notion of a "closure" is used without being introduced; p. 226: register allocation by graph colouring is explained, but the graph colouring algorithm itself is delegated to Exercise 9, which reads: "Write a greedy algorithm to perform register allocation by coloring". Also, garbage collection is covered in 12 lines.
     On the other hand, semantic analysis is done in terms of attribute grammars, and there are three extensive chapters on optimization, one on product-quality compilers and one on compilers for special architectures, which supply much unusual material. And there is a wealth of exercises here. One running project is the compilation of PSOOL, a rather verbose object-oriented language designed by the author, another concerns the compilation of SubAda, that subset of Ada which is common to most algorithmic languages. The literature references are by chapter rather than accumulated at the end as in Part I, and repeat each other and those in Part I.

* Amitabh Srivastava, Unreachable procedures in object-oriented programming, ACM Letters on Programming Languages and Systems, vol. 1, #4, Dec. 1992, pp. 355-364.
Summary: plea for intelligent loaders.
     The author points out that while C and FORTRAN programs contain very few unused routines (average 3 percent, maximum found 5 percent), C++ and other object-oriented programs often contain many more of them (average 15 percent, maximum found 26 percent). The reason is that classes define an interface, which has to be implemented, regardless of whether all routines in it are used. Also, debugging routines in a non-OO environment reside in accessible files and can be removed at will, where in an OO system they are part of the class, which one might prefer to leave undisturbed. In both cases, about half of the unreachables is indirectly unreachable, which means that is is only called by an already unreachable routine. So a simple call check is not sufficient.
     A similar problem exists with library routines: if one routine from an class translation is used, all routines from the class translation are brought in.
     Traditionally, unreachable routines are identified by constructing the call graph; indirectly called routines are incorporated by assuming that all routines whose addresses are taken are called in all indirect calls. Given the abundance of virtual methods (whose address is taken, but which need never be called) this is too conservative. Also, it does not solve the greedy library load problem.
     On the basis of these measurements and analysis the author suggests to introduce an intelligent loader (which is not described further).

* Deryck F. Brown, Hermano Moura, David A. Watt, ACTRESS: An action semantics directed compiler generator, in Compiler Construction, 4th International Conference, CC'92, ed. by U. Kastens and P. Pfahler, Lecture Notes in Computer Science #641, Springer-Verlag, Oct. 1992, pp. 95-109.
[ Action semantics in action; [ read Watt "Programming Language Syntax and Semantics" 1991 first

* Werner Assmann, Another solution of scoping problems in symbol tables, in Compiler Construction, 4th International Conference, CC'92, ed. by U. Kastens and P. Pfahler, Lecture Notes in Computer Science #641, Springer-Verlag, Oct. 1992, pp. 66-72.
First a very general solution to the scoping problem is proposed: a matrix with a row for each identifier and a column for each scope; the entries denote the visibility and/or permissions. Next, regularities in this matrix are located and exploited for table compression by structuring. Very efficient data structures for the scope tables result. Full algorithms in Modula-2 are given, which is good because the text occasionally verges on the incomprehensible.

* Josef Grosch, Transformation of attributed trees using pattern matching, in Compiler Construction, 4th International Conference, CC'92, ed. by U. Kastens and P. Pfahler, Lecture Notes in Computer Science #641, Springer-Verlag, Oct. 1992, pp. 1-15.
[[ preliminary abstract ]] Describes a language puma, for "pattern matching and unification" (?!?), the basic element of which is the transformation of a matched tree section. In fact, it is a general programming language for the "tree transformation paradigm", and is discussed as such, but its main application is intended to be tree transformations in compiler. Examples are given.

* Kenneth Walker, Ralph E. Griswold, The maintenance of intermediate values in goal-directed evaluation, ACM Letters on Programming Languages and Systems, vol. 1, #3, Sept. 1992, pp. 284-298.
Goal-directed evaluation in Icon introduces extensive backtracking flow of control that is not immediately connected to syntax structures. This backtrack flow of control can be added naively to the control flow graph, but this yields control flow graphs that allow little further optimization under the traditional techniques. All this results in a lot of superfluous recomputation of values.
     By incorporating the backtracking carefully in the control flow graph, many instances of values can be found that can be reused after backtracking, since none of the values they depend on can have changed since last time.
     First the process is visualized as follows. The control flow graph is depicted as the usual syntax tree with additional arcs. The operands of an operation are combined in a horizontal bar that stretches as wide as the tree below it. The effects of backtracking can be shown relatively easily in such a representation.
     Next, an attribute grammar is discussed that will allow the detection of opportunities for reuse of values. It endows each node with four additional attributes, "can generate multiple results", "where to go on resume", "where to go on fail", and "lifetime". Precise calculations for various control structures are given.
     No indication is given of how much this helps.

* Christopher W. Fraser, David R. Hanson, Todd A. Proebsting, Engineering a simple, efficient code-generator generator, ACM Letters on Programming Languages and Systems, vol. 1, #3, Sept. 1992, pp. 213-226.
Detailed description of a straightforward bottom-up rewrite system with dynamic programming. This version does not do any precomputing: most of the work is done at compile (code generation) time, rather than at code generator generation time. This decision greatly simplifies the code generator generator. An advantage is that costs need not be constants, but can be functions of the attributes of the nodes, etc. Another advantage lies in debugging: when something goes wrong (bad code is produced), it is easy to produce debugging output showing all possible matches with their costs. A clear disadvantage is that the resulting code generator is slower than that of burg, but since there is more to a compiler than code generation alone, compilation times were only about 20 percent lower for iburg. A timing comparison with Twig (here written with a capital T) (another precomputing burs code generator generator, see Aho, Ganapathi & Tjiang, "Code generation using tree pattern matching and dynamic programming", ACM Trans. Programming Languages and Systems, Vol. 11, No. 4, Oct. 1989, pp. 491-516) failed because of resilient errors in Twig.
     The code generator generator is written in Icon and takes 642 lines.

* Torbjörn Granlund, Richard Kenner, Eliminating branches using a super-optimizer and the GNU C compiler, in ACM SIGPLAN '92 Conference on Prog. Lang. Design and Implementation, ACM SIGPLAN Notices, vol. 27, July 1992, pp. 341-352.
The next generation in super-optimization, for the IBM RS/6000. The search is done by recursive descent iterative deepening, on a tree the nodes of which are instructions with their results. Different pruning criteria are used for the first instruction, those in the middle, and the last instruction. Constants in instructions are restricted to a few basic values. The fan-out is between 100 and 1000; three-instruction sequences are found in a few seconds, 5-instruction sequences may require an hour.
     The program found a number of sequences that were unknown to the architects of the IBM RS/6000.

* Todd A. Proebsting, Simple and efficient BURS table generation, in ACM SIGPLAN '92 Conference on Prog. Lang. Design and Implementation, ACM SIGPLAN Notices, vol. 27, July 1992, pp. 331-340.
Preliminary version of T.A. Proebsting, "BURS automata generation", ACM Trans. Programming Languages and Systems, 17, 3, 461-486, May 1995.

* Todd A. Proebsting, Charles N. Fischer, Probabilistic register allocation, in ACM SIGPLAN '92 Conference on Prog. Lang. Design and Implementation, ACM SIGPLAN Notices, vol. 27, July 1992, pp. 300-310.
A preliminary version of Proebsting & Fischer, Demand-driven register allocation, TOPLAS, 18, 6, Nov 1996, 683-710.

* Steven W.K. Tjiang, John L. Hennessy, Sharlit -- A tool for building optimizers, in ACM SIGPLAN '92 Conference on Prog. Lang. Design and Implementation, ACM SIGPLAN Notices, vol. 27, July 1992, pp. 82-93.
Densely written and highly technical description of a tool for generating global optimizers. The general idea is that the control-flow graph has already been constructed; now this graph must be traversed in various ways to determine various properties of entities in the graph. To this end the user (= compiler writer) specifies a small set of routines for each such calculation at a single node; the system will then orchestrate the efficient activation of these routines at the appropriate nodes. In this way, the administration of traversing the graph iteratively is separated well from the various administrations needed for the calculations, and the latter are also separated from each other.
     The system is applicable both to context-free grammars in which the nodes are basic blocks and those in which the nodes are single instructions, thus allowing the user to cut one level out of the code generation process. Special measures are taken to avoid having the full administration at every node in the latter case, since that would result in prohibitive memory requirements.
     The path simplifier ...
     It is not clear to me how this system can do liveness analysis efficiently: all information propagation seems to be forward.

* Michael Weiss, The transitive closure of control dependence: the iterated join, ACM Letters on Programming Languages and Systems, vol. 1, #2, June 1992, pp. 178-190.
Much theory about control and data dependence.

* Pohua P. Chang, Scott A. Mahlke, William Y. Chen, Wen-Mei Hwu, Profile-guided automatic inline expansion for C programs, SP&E, vol. 22, #5, May 1992, pp. 349-369.
A weighted call graph of the entire program including parts of the libraries is constructed using profiling. Indirect calls are pessimistically supposed to lead to all procedures whose addresses have been computed. Vararg procedures are not inlined, as are recursive procedures with inordinately large stack requirements (which are known statically in C). Recursion cycles in the call graph are broken by removing the lightest arc. Rules for the order of inlining are given (based on the weights); these update the weights, but no explicit algorithm is supplied.
     All this results in a systematically positive speedup: 2 to 46 percent, with an average of 11 percent.

* Christopher W. Fraser, Robert R. Henry, Todd A. Proebsting, BURG -- fast optimal instruction selection and tree parsing, ACM SIGPLAN Notices, vol. 27, #4, April 1992, pp. 68-76.
BURG generates fast BURS tree parsers in the BURM language, which is actually C (+libraries, I assume). The decisions of a two-pass tree parser using weight functions and dynamic programming are precomputed during compile-compile time. This creates states (in the LR sense) for each node. The paper is mainly a user's manual, and explains many details. The syntax of the BURG language is similar to yacc.

* Keith D. Cooper, Mary W. Hall, Linda Torczon, Unexpected side effects of inline substitution: A case study, ACM Letters on Programming Languages and Systems, vol. 1, #1, March 1992, pp. 22-32.
Inlining in the large FORTRAN program linpackd resulted in the elimination of about 98 percent of all function calls, and yet the code ran 8.5 percent slower on the MIPS R2000. The cause was found to be the following. According to the FORTRAN standard, the actual parameters in a call may not alias each other; compilers do not usually check this, but do rely on it. This allows the elimination of many floating point interlock instructions for indexed elements on the MIPS R2000, since the compiler can assume that elements do not alias each other. After inlining this information is gone, the compiler is not strong enough to recover it and has to assume the worst.
     Several remedies are proposed: the authors favour doing stronger analysis.

* Preston Briggs, Keith D. Cooper, Linda Torczon, Coloring register pairs, ACM Letters on Programming Languages and Systems, vol. 1, #1, March 1992, pp. 3-13.
Considers nodes for register pairs as single nodes that require two colours and have two arcs to interfering nodes. Shows that optimistic graph colouring does much better on such graphs than pessimistic graph colouring.

* T.J. Parr, H.G. Dietz, W.E. Cohen, PCCTS reference manual version 1.00, ACM SIGPLAN Notices, vol. 27, #2, Feb. 1992, pp. 88-165.
Full 77-pages long manual of PCCTS (Purdue Compiler-Construction Tool Set); most of the text is about how to use PCCTS rather than on how it works. PCCTS is an LL(k) parser generator, in which k is determined as follows. For each rule that cannot be parsed with LL(1), a separate parser is attempted with successively higher values for k, "until either all ambiguities are resolved or the used-defined maximum k has been reached". Obviously, it will not handle left-recursive grammars. PCCTS uses a very simple variant of Wirth's follow set error recovery mechanism.
     Further differences with LLgen are: 1. PCCTS includes a lexical analyser generator. 2. PCCTS rules can have result values. 3. Actions in PCCTS are demarcated with << and >>. 4. Rule parameters in PCCTS are marked with a $ dollar $; position-dependent yacc-like notations $ dollar 1$, $ dollar 2 $, ... are also possible.
     Also, in addition to the possibility of executing arbitrary code at arbitrary points in the grammar, PCCTS can create abstract syntax trees automatically. A library is provided to manipulate these ASTs.

* Robert W. Gray, Vincent P. Heuring, Steven P. Levi, Anthony M. Sloane, William M. Waite, Eli: a complete, flexible compiler construction system, Commun. ACM, vol. 35, #2, pp. 121-130. Feb. 1992,
The paper is too vague to make a specific abstract about it; it seems to describe a system that manages various phases of compiler construction, using a database of tools.

* Peter Lee, Topics in Advanced Language Implementation, MIT Press, Cambridge, Mass., 1991, pp. 402.
Collection of advanced papers edited by Peter Lee. Part I: advanced implementation techniques, about register allocation, garbage collection, etc. Part II: experience with advanced implementations: various implementations of advanced languages. Part III: languages for parallel and distributed systems: futures, Connection Machine Lisp, etc. Part IV: new and unconventional languages and techniques: higher-order logic, combinator graph reduction, etc.

* Peter Steenkiste, Advanced register allocation, in Topics in Advanced Language Implementation, ed. by Peter Lee, Cambridge, Mass., MIT Press, 1991, pp. 25-45.
Starts with a very short introduction to the subject and then examines on various techniques for interprocedural register allocation, especially in the presence of recursive procedures. No explicit algorithms, few examples but much explanation and many literature references.

* James R. Cordy, Charles D. Halpern-Hamu, Eric Promislow, TXL: A rapid prototyping system for programming language dialects, Computer Languages, vol. 16, #1, pp. 97-107. 1991,
TXL (Turing Extender Language) is a little language for extending the syntax and semantics of the Turing programming language. An extension consists of a replacement syntax rule which replaces (but hopefully extends) an existing syntax rule in the language, plus mappings of syntax trees that contain such extensions onto syntax trees that do not contain these extensions. The transformations can replace or rearrange elements, but no attributes are passed around as far as I can see. In fact TXL is a context-free macro processor designed for rapid prototyping of language dialects. Examples are given.
     As with all macro processors, error reporting is a problem. Also, if the base language fundamentally lacks a concept, TXL may not be able to introduce it;an example of such a concept is self-reference. Furthermore, the efficiency of the new concepts is difficult to predict.

* Helmut Emmelmann, Code selection by regularly controlled term rewriting, in Code Generation -- Concepts, Tools, Techniques, ed. by Robert Giegerich and Susan L. Graham, International Workshop on Code Generation, Springer-Verlag, Berlin, May 1991, pp. 3-29.
Describes theory and (nameless) implementation of a BURS using a finite-state tree automaton. The gist is that the theory is fine, but a naive implementation breaks all software and severe optimizations and size reductions are required. Unfortunately these optimizations are only hinted at, not explained, or else the reader is referred to other papers and technical reports.

* Josef Grosch, Tool support for data structures, Structured Programming, vol. 12, 1991, pp. 31-38.
Drawing on the correspondence between structured types and graphs, a graph structures handling package is described, in which the node type definitions are expressed as context-free attributed grammar rules. Given the attribute grammar, C or Modula-2 code can be generated, containing access and manipulation routines for the graph structure.
     Parallels between this system and the following subjects (each of which could in principle be used to achieve the same effects) are examined: variant records; type extensions; context-free grammars; attribute grammars; the interface description language IDL; object-oriented languages; and tree grammars.
     Exceptional clear paper, and interesting in that it draws together notions from many somewhat disjunct subjects.

* Irina Athanasiu, Limbaje formale si translatoare, (in Rumanian: Formal Languages and Translators), Institutul Politehnic Bucuresti, Bucarest, Rumania, 1991, pp. 229.
Traditional and adequate textbook treatment of formal language theory (grammars, automata, Turing machines), lexical analysers and parsers.

* Ronald Mak, Writing Compilers and Interpreters -- An Applied Approach, John Wiley, New York, 1991, pp. 516.
A guided account of how to write a Pascal interpreter in Turbo C, and a Pascal compiler which generates assembler (IBM-PC). For advanced hobbyists and modest programmers. At least half of it is code. No theory, but does not pretend otherwise.

* Christopher W. Fraser, David R. Hanson, A retargetable compiler for ANSI C, ACM SIGPLAN Notices, vol. 26, #10, Oct. 1991, pp. 29-43.
The front end is entirely hand-written (on the assumption that ANSI C is fixed). The importance of fast character handling and fast symbol table handing is emphasized; the symbol table handling has been optimized for fast scope entry and exit.
     The front end does some preliminary code optimizing, much in the same way the ACK compiler does: jumps are generated sparingly, very well tuned algorithms are generated for switch statements, and register declarations are generated automatically. All this makes the back ends simpler. Register declarations are generated as follows: all variables that do not have their address taken are candidates. Then a count is made for each variable, determining how often it is used: use in the main loop counts for 1, use in a loop counts for 10, in the then or else part of an if-statement for 0.5, etc. The variables are then sorted with the highest usage count first, and their declarations are reordered to reflect this. The back end can then just hand out registers as the declarations come in, until it runs out of them. The count information can be supplemented by profiling data.
     The paper is a survey only. For details the reader is referred repeatedly to other publications; particularly, the code generators are not described.
     Benchmarks show that the compiler is twice as fast as gcc and produces code that is 30 percent slower.

* Thomas P. Murtagh, An improved storage management scheme for block structured languages, ACM Trans. Programming Languages and Systems, vol. 13, #3, pp. 372-398. July 1991,
[[ preliminary abstract ]] [combines activation records] [ is this the final version of Murtagh A less dynamic memory allocation scheme for Algol-like languages, Eleventh ACM Symposium on Principles of Programming Languages, pp. 283-289, 1984? ]

* Jürgen Börstler, Ulrich Möncke, Reinhard Wilhelm, Table compression for tree automata, ACM TOPLAS, vol. 13, #3, July 1991, pp. 295-314.
[[ preliminary abstract ]] [Many problem-dependent tricks to compress the tables.]

* Keith D. Cooper, Mary W. Hall, Linda Torczon, An experiment with inline substitution, SP&E, vol. 21, #6, June 1991, pp. 581-601.
Calls are inlined bottom-up ("in reverse topological order"), which is possible since in FORTRAN procedures cannot be recursive. Only calls to procedures of limited size are inlined: for example, any call to a maximum of 25 lines and a call from inside a loop to a maximum of 175 lines. This avoided 99.5 percent of all calls at run time.
     Detailed analysis of the effects is provided. The object code size increased between -1 and 137 percent. The execution times varied between -20 and 20 percent. Several causes of this phenomenon are examined: increased register pressure, increased floating point interlock, array aliasing, break-down of data-flow analysis in the compiler.
     In addition, five causes are described that prevented some calls from being inlined; most of them are related to questionable use of FORTRAN.

* David Callahan, Brian Koblenz, Register allocation via hierarchical graph coloring, in ACM SIGPLAN '91 Conference Programming Language Design and Implementation, ACM SIGPLAN Notices, vol. 26, #6, June 1991, pp. 192-203.
The control graph is considered as a tree of nesting sets of basic blocks; such a set of basic blocks is called a "tile". The boundaries between the sets are preferred as places to dump spill code; this tends to put the spill code in less active places. Algorithms are given for tiling the control graph, for colouring the graphs and for issuing spil code. An additional advantage is that rather than one interference graph, one now has a set of them, each much smaller than the total. The graph colouring inside the basic blocks is done using a standard algorithm.

* Deborah Whitfield, Mary Lou Soffa, Automatic generation of global optimizers, in ACM SIGPLAN '91 Conference Programming Language Design and Implementation, ACM SIGPLAN Notices, vol. 26, #6, June 1991, pp. 120-129.
Preliminary version of Whitfield & Soffa, "An approach for exploring code-improving transformations", ACM TOPLAS, Vol. 19, No. 6, Nov. 1997, pp. 1053-1084, which see.

* W.G. Morris, CCG -- A prototype coagulating code generator, in ACM SIGPLAN '91 Conference Programming Language Design and Implementation, ACM SIGPLAN Notices, vol. 26, #6, June 1991, pp. 45-.
Implementation of Karr's code generation by coagulation. Lots of practical rules for inconsistency resolution. Obtains a speed-up of about 20 percent over gcc, mainly due to a balanced combination of inlining and register allocation.
     An interesting experiment is that in which the arcs are compiled in most-frequent-first, random and least-frequent-first order. This yields ratios of 0.79, 0.90 and 1.26 resp. with respect to gcc. This means that even with random arc ordering the author's compiler generates code that is 10 percent faster than gcc.

* Jos Warmer, Hans van Vliet, Processing SGML documents, Electronic Publishing, vol. 4, #1, March 1991, pp. 3-26.
Explanation of the set-up and workings of the Amsterdam SGML parser, plus reasons why to use it.
     The Appendix is actually a separate paper describing an application of the SGML and the Amsterdam SGML parser in software engineering. A DTD is constructed for software specifications. The software specification using this DTD can then be used to produce three things: a formatted printout using nroff, definition modules and empty implementation modules for Modula-2, and header files and empty code files for C. Once the decision about the programming language chas been taken, actual code can be added to the SGML file, thus merging the design, printing and running of the project.

* Janice C. Shepherd, Why a two-pass front end?, ACM SIGPLAN Notices, vol. 26, #3, March 1991, pp. 88-94.
Lists reasons why they had to convert their FORTRAN compiler from one-pass to two-pass. The general reason was avoiding backpatching.
     A more specific reason was the desire to compact the symbol table, which was often inflated by declarations of names that were never used. These names stemmed from include files or other library definitions. It was profitable to have a small symbol table, since during code optimization many data structures were declared with sizes proportional to the size of the symbol table and because the symbol table was included in the object code.
     Five other reasons had to do with specific FORTRAN issues, for example generating return code for a function with multiple entry points.
     All these problems were solved by a two-pass setup. The conversion increased the average compiler time by about 4 percent and greatly simplified the compiler. The author makes a strong case against introducing restrictions in a programming language with the sole purpose of allowing compilation in one pass.

* Wolfgang Keller, Automated generation of code using backtracking parsers for attribute grammars, ACM SIGPLAN Notices, vol. 26, #2, Feb. 1991, pp. 109-117.
Describes a Graham-Glanville code generator based on an attributed LALR(1) parser that does backtracking when stuck either on a missing template or an attribute inconsistency. Several degrees of backtracking can be specified. A fast implementation of deleting and copying of attribute trees in backtracking is given.

* Carine Fédèle, Construction automatisée des compilateurs: le système CIGALE, PhD Thesis, Univeristé de Nice -- Sophia Antipolis, Nice, France, Jan. 1991, pp. 483.
Detailed (and pleasantly illustrated) description of a compiler writing system based on attribute grammars. The system is written in Pascal and produces EM code, which is then further processed by the Amsterdam Compiler Kit. The system is tested on a real-world example: a compiler for Tim Budd's Leda; the Leda compiler description is 29 pages long. The resulting compiler is then run on three programs in the Pascal subset of Leda, and compile and run times are compared with those of the Berkeley and ACK Pascal compilers. The three run times are quite comparable; the compile times of the Leda-CIGALE compiler are about 40% higher than those of the other two.
The last picture in the book shows a smurf (shtroumpf) lazying on the beach with a long drink.

* Allen I. Holub, Compiler Design in C, Prentice-Hall, Englewood Cliffs, N.J., 1990, pp. 924.
Massive work, full of pictures, tables, code fragments, notes in the margin, spelling out everything. Tells you how to call lex, how to represent a grammar as a data structure, 6½ page on how to process a function header, etc. You get the point. As the author says, "this book is really an in-depth presentation of several well-documented programs".
     It reminds me of a master shriner who teaches shrine making to his apprentices by showing in detail how he does it. Very valuable and very time-consuming.
     The net complains about lots of errors, though.

* Dick Grune, Ceriel J.H. Jacobs, Parsing Techniques: a Practical Guide, Ellis Horwood, Chichester, 1990, pp. 322.
Full, informal coverage of the field of parsing, plus an informal introduction to grammars, Type 0 to Type 4. Bibliography of over 400 classified annotated entries.

* Kai Koskimies, J. Paakki, Automatic Language Implementation -- A Pragmatic Approach, Ellis Horwood, Chichester, England, 1990, pp. ????.
[ Not in the Dutch university libraries. ]

* Mads Tofte, Compiler Generators, EATCS Monographs on Theoretical Computer Science, #19, Springer Verlag, Berlin, 1990, pp. 146.
Subtitle: What they can do, what they might do and what they will probably never do. Considers a compiler as a semantical transformer and explores mathematical and philosophical properties of such semantical transformers and the languages they can be written in. The text is readable on the narrative level but requires considerable mathematical expertise to be read line by line. The author supplies much personal experience and background.

* Mason, Brown, lex & yacc, O'Reilly, 1990,
See Levine, Mason, Brown, Lex and Yacc, 2nd Ed., 1992.

* Donnely, Stallman, The Bison Manual, ~1990,
John Levine writes: Part of the on-line distribution of the FSF's Bison, a reimplementation of yacc. As with everything else from the FSF, full source code is included.

* D.M. Dhamdhere, A usually linear algorithm for register assignment using edge placement of load and store instructions, Computer Languages, vol. 15, #2, pp. 83-94. 1990,
Sequel to Dhamdhere (Comput. Lang. 1988); refinements of same.

* J.R. Nawrocky, C.H.A. Koster, On display optimization for Algol-like languages, Computer Languages, vol. 15, #1, pp. 27-39. 1990,
Considers compressing / combining the displays of the procedures in a program to reduce update time. Establishes an algebra for procedure calling and proves some variants of the problem to be NP-complete. For programs withour formal procedures, the problem can be solved in polynomial time. Lots of complicated theory.

* James R. Larus, Abstract execution: A technique for efficiently tracing programs, Software -- Practice and Experience, vol. 20, #12, Dec. 1990, pp. 1241-1258.
A program trace can be considered an optimization: one wants to answer a question about the behavior of the program, and rather than running the program to find out, one analyses a trace from a previous run. Viewing both extremes (full trace, no additional runs / no trace, all additional runs) as end points of a spectrum allows us to consider intermediate solutions: excerpt-like traces that require partial reruns of the program to answer questions. This is called abstract execution (why "abstract"?).
     The AE system accepts a program I in C and produces two programs, P and $ P prime $. When run, P, runs fast and produces a short "Significant Events" trace. The answer questions about I, $ P prime $ is run and fed the SE trace; $ P prime $ runs fast most of the time, and produces the answers. This reduces the disk space needed for the trace by a factor of 10 to 38 (or several hundreds if compressed).

* Henk Alblas, Joos Schaap-Kruseman, An attributed ELL(1)-parser generator, in Compiler Compilers, Third International Workshop, CC'90, ed. by D. Hammer, Lecture Notes in Computer Science #477, Schwerin, Springer-Verlag, Oct. 1990, pp. 208-209.
Describes TCGS (for Twente Compiler Generating System), an L-attributed system based on a recursive-descent parser, in Pascal. The attribute rules are given as Pascal procedures, with declared input and output parameters; the compiler generator checks that attributes have obtained a value before being passed to a Pascal procedure. Although there is some emphasis in the paper on the combined processing of syntax and semantics, it is too short to tell if semantic information can influence parsing.

* J. Grosch, H. Emmelmann, A tool box for compiler construction, in Compiler Compilers, Third International Workshop, CC'90, ed. by D. Hammer, Lecture Notes in Computer Science #477, Schwerin, Springer-Verlag, Oct. 1990, pp. 106-116.
Impressive set of programs designed and written by Josef Grosch and his students. The toolbox, which seems to be nameless, contains

Rex	scanner generator (see SP&E, Nov. 1989, pp. 1089-1103)
Lalr	LALR(1) parser generator (see Lecture Notes in Compter Science 371, pp 81-92)
Ell	LL(1) parser generator (ditto)
Ast	generator for abstract syntax trees (see Struct. Progr., 1991, pp. 31-88)
Ag	ordered attribute evaluator generator, including higher-order AGs
Estra	cost-controlled transformation of attributed syntax trees
BEG	back end generator (see SIGPLAN Notices, July 1989, pp. 227-237)
Reuse	library of reusable modules
For most of these programs, the user can specify the precise form of the input and the output, for easier retargeting. The system is available in C and in Modula-2.
     The attribute evaluator is pure: all information is in the attributes, including the symbol table. The implementation uses pointer semantics and data sharing. Although the system yields efficient compilers (comparable to hand-written ones) the lack of a code optimization tool is felt.

* H. Mössenböck, A generator for production quality compilers, in Compiler Compilers, Third International Workshop, CC'90, ed. by D. Hammer, Lecture Notes in Computer Science #477, Schwerin, Springer-Verlag, Oct. 1990, pp. 42-55.
Description of and flyer for Coco/R, a Coco variant which produces Oberon. Emphasizes the use of simple tools. Remarkably, although throughout the paper the author speaks of compiler generation, only the lexical, syntactic and semantic procesing are treated; no word about code generation.
     Coco/R generates a recursive-descent left-to-right single-pass parser and attribute evaluator in one. The input is LLgen / yacc like, with the attributes and the code in Oberon. Syntax error recovery uses synchronization points indicated by the compiler writer.
     It is conceivable that code generation could be programmed using the attributes, but the paper shows no sign of it.

* Fred C. Chow, John L. Hennessy, The priority-based coloring approach to register allocation, ACM TOPLAS, vol. 12, #4, Oct. 1990, pp. 501-536.
Covers many other techniques for register allocation besides graph colouring. Improves upon Chaitin on almost every count: optimizes the entire program rather than a basic block; uses priorities of the live ranges (nodes) in the graph; uses a sophisticated spill algorithm; and decides for each procedure whether the registers are to be saved by the caller or the callee.
     Given r available registers, the colouring algorithm will bother only with nodes with r or more neighbours; the others are coloured in a final step. The basic colouring step is as follows. Find the node with the highest priority and colour it (of course with a colour different from its neighbours). If this makes another node uncolourable, split that node, using a heuristic algorithm specified in the paper.
     Results: Good register allocation can reduce the total number of loads and stores by 50 to 77 percent, and reduce the total number of machine cycles by 16 to 40 percent.

* Christian Iseli, A microcode compiler for the Watch-Oriented RISC Processor, Software -- Practice and Experience, vol. 20, #7, July 1990, pp. 729-748.
The Watch-Oriented RISC Processor WORP is an 8-bit microprocessor with horizontal microcode. The compiler schedules the micro-instructions so as to exploit the parallelism. Scheduling is on a basic-block basis, using a topological sort on the dependency graph; the rest of the compiler is traditional. The optimizer gains 19% on the sieve of Eratosthenes and 23% on a more typical watch program. A simple watch implementation is provided.

* Susan Horwitz, Identifying the semantic and textual differences between two versions of a program, in ACM SIGPLAN '90 Conference Programming Language Design and Implementation, ACM SIGPLAN Notices, vol. 25, #6, June 1990, pp. 234-245.
[[ preliminary abstract ]] A special graph representation of of each program is constructed, combining data flow and control flow, and this graph is normalized using the minimization algorithm for finite automata. Next the graphs are compared. I am not certain that the author's definition of "semantic changes" is what I would want the program to report: if one statement is changed that changes the semantics of N other statements, all these N statements are reported.

* William Pugh, Grant Wendell, Two-directional record layout for multiple inheritance, in ACM SIGPLAN '90 Conference Programming Language Design and Implementation, ACM SIGPLAN Notices, vol. 25, #6, June 1990, pp. 85-91.
Suppose a type A is extended to type B with a field b and to type C with a field c. The authors observe that the entire mapping mechanism needed to provide the three views can be avoided if all three types had the same layout in memory. This can be achieved by having $ . A b - $ for B and $ . A - c $ for C (with . the addressing point and - empty space) but this obviously wastes space. Their solution is to have $ b . A $ for B and $ . A c $ for C; this implies negative offsets, but what the hell. Several algorithms for finding the densest packing are presented, heavily rooted in graph theory. For the Lisp Flavors class system, a simple but not trivial one-directional algorithms wasted 47 percent space, their best two-directional algorithm 6 percent.

* Brian R. Nickerson, Graph coloring register allocation for processors with multi-register operands, in ACM SIGPLAN '90 Conference Programming Language Design and Implementation, ACM SIGPLAN Notices, vol. 25, #6, June 1990, pp. 40-52.
[[ preliminary abstract ]] [ construct different interference graphs for the different "mates" in [ the multi-registers and to try and color them simultaneously

* Alan L. Wendt, Fast code generation using automatically generated decision trees, in ACM SIGPLAN '90 Conference Programming Language Design and Implementation, ACM SIGPLAN Notices, vol. 25, #6, June 1990, pp. 9-15.
Actually code generation by peephole optimization: the instructions of the intermediate code are considered especially expensive machine instructions (more expensive than any machine instruction), and a peephole optimizer is used to optimize them away. This combines code generation and peephole optimization and works since any rewrite of an intermediate instructions to machine instructions is already an improvement.
     Both the intermediate instructions and the machine instructions are described by small trees and a code generator/peephole optimizer is generated by the method of Davidson & Fraser, June 1984. Most of the generated rewrite rules rewrite 1, 2 or 3 intermediate instructions into 1 VAX-11 instruction; occasionally 1 intermediate instruction turns into more than one VAX-11 instruction, for example to implement the modulo operation.
     Curiously, the system is called "the new system" throughout the paper, as if with capital letters.

* James R. Cordy, Richard C. Holt, Code generation using an orthogonal model, Software -- Practice and Experience, vol. 20, #3, March 1990, pp. 301-320.
The code generation process for an expression (including assignments) is split into two processes; one handles the "abstract operation" (for example :=+ in a[i] := b + 1); the second handles the "abstract operands" (a[i], b, 1) when not handled by the first process.
     A fixed number of abstract operations is defined; all else is an abstract operand. For each abstract operation, an implementation strategy tree is constructed during compiler construction time. The tree branches on conditions on the operands and the nodes contain code templates for the abstract operation; the nearer to the leaves one gets, the more specific the code templates get. When meeting an abstract operation in the input, the operation mapper in the compiler takes the unprocessed operands, and follows the tree guided by the conditions on the operands. The node at which no further progress (= specialization) is possible contains the code template for the abstract operation.
     Fifteen different abstract operand modes are recognized, representing all combinations of base, direct, indirect / double indirect, and indexed. Again strategy trees are constructed for each of the 15 abstract operand modes. Code generation then consists of plugging the code sequence(s) found for the operands into the code template found for the operation. A peephole optimizer does the rest.
     The advantage of this scheme is its teachability and its relative machine-independence. Constructing the strategy trees is very simple and safe. The scheme was tested in the Euclid compiler, where it lost less than 4% over the native compiler on PDP-11-like machines, and about 20% on the Intel 8086.

* C. Hemerik, J.P. Katoen, Bottom-up tree acceptors, Science of Computer Programming, vol. 13, pp. 51-72. Jan. 1990,
The paper starts with a not too forbidding account of the theory of bottom-up tree acceptors for code generation. It is followed by a formal treatment of a method based on Chase [1987] for the reduction of the match tables before they are generated. This is achieved by defining equivalence classes.

* Vance E. Waddle, Production trees: a compact representation of parsed programs, ACM Trans. Prog. Lang. Syst., vol. 12, #1, Jan 1990, pp. 61-83.
Redundant items are removed from a traditional parse tree through a number of techniques: unit productions are contracted, terminals symbols are removed, structure information in links is replaced by a rule number, etc. Each node in the resulting parse tree corresponds to one right-hand side and contains the rule number and a list of pointer to the nodes for the non-terminals in the right-hand side. A space saving of a factor 20 is achieved on the average. A grammar form that corresponds more closely to this representation is defined.

* Jacqueline M. Caron, Peter A. Darnell, Bugfind -- A tool for debugging optimizing compilers, ACM SIGPLAN Notices, vol. 25, #1, Jan. 1990, pp. 17-22.
Given the source files of a program, a correct sample run and a (reasonably well-organized) makefile, bugfind performs a sequence of compilations and tests to find the highest level of optimization (in terms of -ON) at which each file can be compiled and still give correct results. This can be used to localize (pinpoint is too strong a word) errors in the optimizer.