Compiler Construction from 2000 on

Literature references and annotations by Dick Grune, dick@dickgrune.com. Last update: Tue Sep 15 10:26:50 2009.
These references and annotations were originally intended for personal use and are presented here only in the hope that they may be useful to others. There is no claim to completeness or even correctness. Each annotation represents my understanding of the text at the moment I wrote the annotation.
No guarantees given; comments and content criticism welcome.

* Alexander Meduna, "Elements of Compiler Design", Auerbach, 2008, pp. 304.
Compiler design on a theoretical, formal basis, in a pleasant way.

* Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman, "Compilers: Principles, Techniques, and Tools", 2nd Ed. , Addison Wesley, Aug 31, 2006, pp. 1000.

* David Galles, "Modern Compiler Design", Addison Wesley, 2004, pp. 361.

* Keith D. Cooper, Linda Torczon, "Engineering a Compiler", Morgan Kaufmann, pp. 801. 2004,
Solid, straightforward account of compilation. No special attention to different paradigms. Based on ILOC (Intermediate Language for an Optimizing Compiler).

* Andrew W. Appel, "Modern Compiler Implementation in C/ML/Java", Cambridge University Press, Cambridge, 2004, pp. 554.
Consists of 2 parts: I. General compiler implementation in 250 pages; II. Advanced topics, among which garbage collection, compiling for different paradigms, and optimization.

* Dibyendu Das, "Function inlining versus function cloning", ACM SIGPLAN Notices, 38, #4, April 2003, pp. 18-24.
The problem is formulated as a multiple-choice knapsack problem, and a heuristic (greedy) algorithm for solving that problem is presented. (The readability of the paper is reduced considerably by the fact that the ligature "fi" did not print...)

* Bjorn De Sutter, Koen De Bosschere, "Software techniques for program compaction", Commun. ACM, 46, #8, Aug. 2003, pp. 32-66.
Introduction plus 5 papers on various forms of program compaction: 1. removing redundant library routines and name compression (in Java programs); 2. post-compilation optimization on all of the assembler code, eliminating duplicate code; 3. mixed-width instruction sets, switching between 32 and 16 bits; 4. Huffman compression of less frequently used segments, with subsequent run-time decompression; 5. grammar-based compression, using the fact that the code obeys a grammar, which at any point severely reduces the number of possibilities, which in turn allows a very efficient representation.

* Larry Carter, Jeanne Ferrante, Clark Thomborson, "Folklore confirmed: Reducible flow graphs are exponentially larger", in 2003 ACM Symposium on Principles of Programming Languages, 2003, pp. 106-114.
Proves that there are flow graphs with n nodes (in particular the complete graph over n nodes) that have no equivalent reducible graphs of size < 2 sup n-1 .

* O.G. Kakde, "Algorithms for Compiler Design", Charles River Media, Higham, Mass., 2002, pp. 334.
Traditional book on compiler design, in a clear and easy style. The first 194 pages are on traditional parsing, in reasonable depth. The rest is on syntax-directed translation, storage management, code generation, etc., in broad coverage, with short simple examples.

* Christian S. Collberg, "Automatic derivation of compiler machine descriptions", TOPLAS, 24, #4, July 2002, pp. 369-408.
[ uses local C compiler to analyse machine architecture; crazy plan; seems to work ]

* Dick Grune, Henri E. Bal, Ceriel J.H. Jacobs, Koen G. Langendoen, "Compilateurs -- Cours et exercises corrigés", in French: Compilers and corrected exercises, Collection Sciences Sup, Dunod, Paris, 2002, pp. 774.
Translation by Carine Fédele and Olivier Lecarme, of Grune et al., Modern Compiler Design.

* Craig Chambers, "Staged compilation", ACM SIGPLAN Notices, 37, #3, March 2002, pp. 1-8.

* Y.N. Srikant, Priti Shankar, "The Compiler Design Handbook: Optimizations and Machine Code Generation", CRC Press, Boca Raton, Fl., 2002, pp. 1088.
Compilation of 22 papers/chapters, on subjects like Dataflow analysis, Profile-guided compiler optimizations, Optimizations for object-oriented languages, Compilation for distributed memory architectures, Dynamic compilation, Type systems in programming, etc.

* Yumin Zhang, Xiaobo Hu, Danny Z. Chan, "Efficient global register allocation for minimal energy consumption", ACM SIGPLAN Notices, 37, #4, April 2002, pp. 42-53.
The basic blocks are sorted topologically. Then equations for the energy consumption, using register allocation as the "variable", are derived and solved for the first basic block. The results are then transformed cleverly into the input conditions for the next basic block, etc. Detailed analyses of this transformation is given for branches, merges and loops.
Experiments show an energy consumption reduction of between 68 and 90 percent. In the introduction, the authors claim that they optimize both for energy saving and for speed, but this is not elaborated in the body of the paper. Since, however, the main effect of the algorithm is to reduce the number of memory accesses since they consume a lot of energy, we can expect speed to benefit as well, since memory accesses are slow.

* J. Aycock, R. Nigel Horspool, "Schrödinger's token", SP&E, 31, 2001, pp. 803-814.
A Schrödinger's token is a token which has several types (classes), for example Keyword_If and Identifier, the correct one of which is only established at the moment of truth. This arrangement is usual in linguistics, where general CF parsing techniques are used to resolve the ambiguity, but here (in CS) we use generalized LR. The advantages of Schrödinger's tokens are: ease of use, robustness, and increased separation of lexical analyser and parser.

* Randy Allen, Ken Kennedy, "Optimizing Compilers for Modern Architectures: A Dependence-based Approach", Morgan Kaufmann, 2001, pp. 816.
In-depth, state of the art study of all the nitty-gritty details of high-quality code generation for scalar pipelined architectures, vector-pipelined, multiple-issue, parallel, you name it, based on dependency graphs. Other topics covered are advanced register allocation, cache management, HPF, etc.; all with case studies and historical comments.

* David Grove, Craig Chambers, "A framework for call graph construction algorithms", TOPLAS, 23, #6, Nov. 2001, pp. 685-746.

* Ralf L\("ammel, Chris Verhoef, "Cracking the 500-language problem", IEEE Software, 18, #6, pp. 78-88. Nov./Dec. 2001,

* R. Lämmel, C. Verhoef, "Semi-automatic grammar recovery", SP&E, 31, #15, Dec. 2001, pp. 1395-1438.
There is much legacy code around for which there is not even a usable grammar; this impedes upgrading. Still, almost always, some sources of grammar information can be found: paper manuals, on-line manuals, compilers, pretty-printers, test-set generation tools, etc.
The authors outline the general techniques for recovering a grammar semi-automatically from such sources, and then present an extensive case study of the recovery of the grammar of IBM's VS COBOL II, a language at the end of its life time. The main source of grammar information was the electronic manual, which shows the grammar as syntax diagrams. A "level 1" grammar was extracted by parsing the syntax diagrams; this grammar was then tested and refined against several millions of lines of VS COBOL II, using a scannerless generalized (Tomita) LR parser. The refinement process is described in great detail, interspersed with many juicy stories. Convergence was much more rapid than expected, strongly helped by the fact that the parser could work with any CF grammar; the traditional LL and LR parsers are too weak for this work. The recovered "level 5" grammar was then used to generate a new on-line manual.

* Lunjin Lu, "On Dart-Zobel algorithm for testing regular type inclusion", ACM SIGPLAN Notices, 36, #9, Sept. 2001, pp. 81-85.
Regular-type inclusion could be tested using standard algorithms for testing emptiness of regular languages: A subset B is equivalent to A andsign "\(no" B = empty , but Dart-Zobel is much more efficient.
The author proves formally that the Dart-Zobel algorithm is complete (if A is really included in B, it will find A subset B ), and shows by counterexample that it is incorrect: there are examples where it finds A subset B and A is not included in B. No correction to the algorithm is given.
The counterexample looks vaguely like the occur check problem in Prolog, and it would be interesting to see a formal comparison between the two. The paper is quite well-written, in spite of the clumsy title.

* Chris Clark, "ASTs for optimizing compilers", ACM SIGPLAN Notices, 36, #9, Sept. 2001, pp. 25-30.
Although the author starts off with ASTs and even claims that "it makes a nice tidy world" and "it is possible to build optimizing compilers around it", he soon switches over to the traditional directed graph of basic blocks.

* Sheayun Lee, Andreas Ermedahl, Sang Lyul Min, "An accurate instruction-level energy consumption model for embedded RISC processors", ACM SIGPLAN Notices, 36, #8, Aug. 2001, pp. 1-10.

* Christof Keßler, Andrzej Bednarski, "A dynamic programming approach to optimal integrated code generation", ACM SIGPLAN Notices, 36, #8, Aug. 2001, pp. 165-174.
The authors aim at optimal code generation which integrates instruction selection, instruction scheduling and register allocation for basic blocks (given in the form of dags) for processors (especially digital signal processors) with irregular features like pipelining, arbitrary latency, non-homogeneous register sets, etc., for use in their planned retargetable code generation system OPTIMIST. In this paper, however, they integrate instruction selection and instruction scheduling only; it would seem that their algorithm could handle different classes of registers, but the paper is not clear on that. Register allocation is done on the finished target schedule; spilling registers is under "future work".
The basic idea is to do exhaustive enumeration of the possible schedules and selections of the operation nodes in the basic block. Naively done, this creates an exponentially-sized tree, but using dynamic programming (nodes in this tree that lead to the same situation are combined and the cheapest is kept) reduces this to a (still exponential) dag of much smaller size. This algorithm is then extended to incorporate pipelining effects (instruction overlap (=input restrictions) and latency (=output restrictions)).
The present algorithm creates optimal code for basic blocks of up to 30 to 50 nodes, if one is willing to accept compilation times of several hours. No figures are given for the gain in run-time speed (or power consumption) compared to "normal" compilation. (Such a comparison is perhaps meaningless if you generate absolutely 100% proof optimal code?)

* William S. Evans, Christopher W. Fraser, "Bytecode compression via profiled grammar rewriting", ACM SIGPLAN Notices, 36, #5, May 2001, pp. 148-155.
The paper concerns the situation in which compressed bytecode is interpreted by on-the-fly decompression. The bytecode compression/decompression technique is based on the following observations.
1. Bytecode is repetitive and conforms to a grammar, so it can be represented advantageously as a parse tree in prefix form. Whenever the interpreter reaches a node representation, it knows the non-terminal (N) the node conforms to, exactly as with expressions in prefix form. The first byte of the node representation serves as a guiding byte and indicates which of the alternatives of the grammar rule N applies. This allows the interpreter again to know which non-terminal the next node conforms to, as required above.
2. Since non-terminals usually have few alternatives, most of the bits in the guiding bytes are wasted, and it would be better if all non-terminals had exactly 256 alternatives. One way to achieve this is to substitute some alternatives of some non-terminals in the alternatives of other non-terminals, thereby creating alternatives of alternatives, etc. This increases the number of alternatives per non-terminals and allows a more efficient representation of those subtrees of the parse tree that contain these alternatives of alternatives.
3. By choosing the substitutions so that the most frequent alternatives of alternatives are present in the grammar, a --heuristic-- optimal compression can be achieved. The heuristic algorithm is simple: repeatedly substitute the most frequent non-terminal pair, unless the target non-terminal would get more than 256 alternatives in the process.
A few minor problems still have to be solved. The resulting grammar (expanded specifically for a given program) is ambiguous; an Earley parser is used to obtain the simplest --and most compact-- parsing. Labels are dealt with as follows. All non-terminals that are ever a destination of a jump are made alternatives of the start non-terminal (what if there are more than 256 of these? DG), and parsing starts anew at each label. Special arrangements are made for linked-in code.
In one sample, the bytecode size was reduced from 199kB to 58kB, whereas the interpreter grew by 11kB, due to a larger grammar.

* Yukong Zhang, Young-Jun Kwon, Hyuk Jae Lee, "A systematic generation of initial register-reuse chains for dependency minimization", ACM SIGPLAN Notices, 36, #2, Feb. 2001, pp. 47-54.
Read Berson, Gupta and Soffa first.

* "International Symposium on Memory Management ISIMM '00", ACM SIGPLAN Notices, 36, #1, pp. 1-178. Jan. 2001,

* Dick Grune, Henri E. Bal, Ceriel J.H. Jacobs, Koen G. Langendoen, "Modern Compiler Design", John Wiley, Chichester, 2000, pp. 736.
Covers all of compiler construction (imperative, object-oriented, functional, logic, parallel and distributed, + assemblers and interpreters) to a carefully graded depth (principles rather than detailed coding).
The book consists of three parts: general compiler construction, following a program through a compiler from character sequence (text) to parse tree to annotated syntax tree to binary object file, in 4 chapters; memory management for compilers and compiled programs, in 1 chapter; and 4 chapters on specific techniques for imperative / object-oriented, functional, logic, and parallel / distributed programs.
The style is conversational rather than formal. Many examples, little hard code. Definitely not a cookbook.

* David A. Watt, Deryck F. Brown, "Programming Language Processors in Java", Prentice Hall, Harlow, England, 2000, pp. 436.
Part three of Watt's Magnum Opus, now in Java. Much of the criticism on the original version (David A. Watt, "Programming Language Processors", Prentice Hall, 1993) still holds: little coverage of LR(1) parsing, no register allocation, non-standard terminology, etc. On up side, within its limits it is a well-structured, pleasantly readable book, with lots of exercises, with answers; it is probably quite student-friendly. And it has a 30 page chapter on interpreters, both recursive and iterative. Like its parent version, it is based on the mini-language Triangle.

* David Basanta Gutiérrez, et al., "Improving the quality of compiler construction with OO techniques", ACM SIGPLAN Notices, 35, #12, pp. 41-50. Dec. 2000,

* Maya Madhavan, Priti Shankar, Siddharta Rai, U. Ramakrishna, "Extending Graham-Glanville techniques for optimal code generation", TOPLAS, 22, #6, Nov. 2000, pp. 973-1001.
This summary addresses the parsing part only. Classical Graham-Glanville is riddled by ambiguities that have to be resolved too early, resulting in sub-optimal code. This paper describes a parsing method, auxiliary LR(0), which allows ambiguities in an LR-like parser to remain unresolved arbitrarily long. The method is actually a linearized version of tree parsing.
The method is applicable to grammars that have the following property; no technical name for such grammars is given. All rules are either "unit rules" in which the right-hand side consists of exactly one non-terminal, or "operator rules" in which the right-hand side consists of N (>= 0) non-terminals followed by a terminal, the "operator". As usual with operators, the operator has an arity, which has to be equal to N.
In such a grammar, each shift of a terminal is immediately followed by a reduce, and the arity of the terminal shifted determines the number of items on the stack that are replaced by one non-terminal. This allows us to do the reduction, even if multiple reductions are possible, without keeping multiple stacks as is done in a Tomita parser: all reduces take away the same number of stack items. Note that all the items on the stack are non-terminals. Such a reduction results in a set of non-terminals to be pushed on the stack, each with a different, possibly ambiguous parse tree attached to them. This set may then be extended by other non-terminals, introduced by unit reductions using unit rules; only when no further reduces are possible is the next terminal (= operator) shifted in.
A new automaton is constructed from the existing LR(0) automaton, based on the above parsing algorithm; the unit reductions have been incorporated completely into the automaton. The algorithm to do so is described extensively.
The parser is used to parse the intermediate code stream in a compiler and to isolate in it operator trees that correspond to machine instructions, the grammar rules. A cost is attached to each rule, and the costs are used to disambiguate the parse tree and so decide on the machine code to be generated.

* Uwe Assmann, "Graph rewrite systems for program optimization", TOPLAS, 22, #4, July 2000, pp. 583-637.

* K. Wilken, J. Liu, M. Heffernan, "Optimal instruction scheduling using integer programming", in 2000 ACM SIGPLAN Conference on Programming Language Design and Implementation, ACM SIGPLAN Notices, 35, #5, May 2000, pp. 121-133.

* Mayur Naik, Rajeev Kumar, "Efficient message dispatch in object-oriented systems", ACM SIGPLAN Notices, 35, #3, March 2000, pp. 49-58.
Investigates three methods of message dispatch: array lookup, binary search and decision trees. Gives criteria when to use which and when and how to combine them for best space and time efficiency.

* Matthew S.Davis, "An object oriented approach to constructing recursive descent parsers", ACM SIGPLAN Notices, 35, #2, Feb. 2000, pp. 29-35.
Weird paper that seems to miss the point at several turns. The author seems unaware of LL(1) and non-predictive recursive descent, requires the grammar in Greibach Normal Form (!) and refers to Hopcroft and Ullman on how to do so (by hand!). The author seems unaware of the work of Koskimies (1991) on AND/OR grammars. The author claims that his goal is "to create a parser whose implementation is optimized for programmer understanding", but his example grammars in GNF are incomprehensible. Also, a statement like "Automata and language theory shows us that grammars represent languages correctly (Hopcroft and Ullman, 1979)." makes one fear the worst. In summary, I do not think that the paper solves any problem that Koskimies does not.

* Chris Clark, "Uniform abstract syntax trees", ACM SIGPLAN Notices, 35, #2, Feb. 2000, pp. 11-16.
Extensive description of ASTs with nodes that are all of the same type. Different node types are distinguished by markers. Pros (simplicity, editability) and cons (weak typing, no OO techniques possible, space inefficiency) are described.

* Emin Gün Sirer, Brian N. Bershad, "Using production grammars in software testing", ACM SIGPLAN Notices, in 2nd Conference on Domain-Specific Languages DSL'99, 35, #1, Jan. 2000, pp. 1-13.
Test cases for a Java byte interpreter are generated from what amounts to an affix grammar; the term "affix grammar" is not used; the language the affix grammar is written in is called lava. In addition to this production grammar, a behavioral description in Scheme is given, which specifies the semantics of each rule in lambda notation. When a test case is generated from the affix grammar an expression in Scheme is also generated, which, when evaluated, yields the result of the test case. Then the test case is run on the interpreter to be tested, the Scheme expression evaluated, and the results are compared.
Actually, the implementation of interpreter is tested by comparing it to the implementation in Scheme. The method that links the affixes in the grammar and the values in the Scheme expression is not described.

* Thomas Reps, "Undecidability of context-sensitive data dependence analysis", TOPLAS, 22, #1, Jan. 2000, pp. 162-186.
If I understand this correctly, it says the following. Normal data dependency analysis follows in principle all possible paths in the graph. Sometimes one wants to restrict this set of paths, by attaching symbols to arcs and requiring that the words spelled out by the paths belong to a language. One such language is the language of matching parentheses. The paper proves that even for this context-free language, the data dependency analysis problem is undecidable.

* Zhong Shao, Andrew W. Appel, "Efficient and safe-for-space closure conversion", TOPLAS, 22, #1, Jan. 2000, pp. 129-161.
All function calls are implemented as closures; these are then combined in clever ways, using traditional and novel techniques, to reduce the number and size of the allocations.

* Wan Fokkink, Jasper Kamperman, Pum Walters, "Lazy rewriting on eager machinery", TOPLAS, 22, #1, Jan. 2000, pp. 45-86.
The "machinery" in the title is the reduction engin, not some hardware. Just as strictness analysis in a lazy system tries to find out what nodes are strict, to improve the speed of execution, lazy rewriting tries to find out what nodes could use lazyness in a strict system, to improve the semantics.
Lots of math about abstract reduction systems.

* J. R. Levine, "Linkers and Loaders", Morgan-Kaufmann, 2000, pp. 256.
The one and only, and excellent to boot.