Compiler Construction 1980-1989

> Literature references and annotations by Dick Grune,
Last update: Fri Nov 12 14:11:46 2021.

These references and annotations were originally intended for personal use and are presented here only in the hope that they may be useful to others. There is no claim to completeness or even correctness. Each annotation represents my understanding of the text at the moment I wrote the annotation.
> No guarantees given; comments and content criticism welcome.

* J.J.F.M Schlichting, An Early Implementation of Revised ALGOL 68, (PhD Thesis), Technical University Delft, Delft, Nov. 1989, pp. ~200.
Describes the CDC Algol 68 implementation, which was produced under the supervision of the author. The compiler and the run-time system are treated equally extensively. Some special attention is given to the garbage collector, which uses self-describing references (pointers).

* Des Watson, High-Level Languages and Their Compilers, International Computer Science Series, Addison-Wesley, Wokingham, England, 1989, pp. 337.
Very good introduction to the title subject. Covers in three parts programming languages, the anatomy of a compiler and software-engineering a compiler. Programming languages are covered in 93 pages and it shows: modules and ADTs are hardly mentioned. But the book is favourable to Algol 68, so all is forgiven:-); explains two-level grammars. Compiler construction is treated in the simplest possible terms, but most subjects are touched upon. Interpreters are not forgotten.

* Peter Lee, Realistic Compiler Generation, MIT Press, Cambridge, Mass., 1989, pp. 246.
Denotational semantics is modified in a number of ways to adapt it to the needs of the compiler writer and is provided with a Standard ML-like notation. An interpreter is then used to derive real-world (= efficient, comparable to hand-written compilers) compilers for it. Half the book is code. The author likes puns.

* Peter Rechenberg, Hanspeter Mössenböck, A Compiler Generator for Microcomputers, (translated from German), Prentice-Hall, Hemel Hempstead, UK, 1989, pp. 378.
Detailed write-up of the LL(1) L-attributed compiler generation system Coco. The system is Modula-2-based, features named attributes and is intended for the PC. The first 200 pages describe the theory behind, design of, methods used in and applications of Coco; the rest is manual and code (all of it?). A comparison with four other compiler generators (yacc, HLP84, GAG, MUG) is given and summarized in a table.
     The text is divided in a large number of small (usually less than one page) section, each making a clear point. Sometimes the view of the authors on things is enlightening: "An LL(1) grammar is already a parsing program. To make a parser it is only necessary to code an interpreter for it" (Section 2.5). And then they set out to do so.
     Easy to read, and worth reading, if it were only for its thoroughly un-American style.

* Josef Grosch, Generators for high-speed front-ends, in Compiler Compilers and High-Speed Compilation, Lecture Notes in Computer Science #371, ed. by D. Hammer, Springer-Verlag, Berlin, 1989, pp. 81-92.
A coherent system of lexical scanner generator, LALR(1) parser generator and LL(1) parser generator, using a uniform input syntax, is presented. The scanner beats UNIX lex by a factor of 5, the LALR parser beats yacc by a factor of 2.

* Andrew S. Tanenbaum, M. Frans Kaashoek, Koen G. Langendoen, Ceriel J.H. Jacobs, The design of very fast portable compilers, ACM SIGPLAN Notices, vol. 24, #11, pp. 125-131. Nov. 1989,
A special kind of back-end generator for the Amsterdam Compiler Kit is described, which will produce very fast back ends that produce relocatable binary directly. A fast back end for a machine is generated from two tables, one that maps EM instructions to assembler instructions of the machine, and one that maps the assembler instructions to relocatable binary. (Note that since the assembly instructions are used only as a communication medium between the two tables, its format can be chosen freely by the back-end writer.) The back-end generator integrates these two tables into one set of routines, one for each EM instruction; this set of routines forms a library that can be linked with any front end to produce a fast compiler. The back-end generator does limited push-pop optimization.
     The resulting compilers are about 3 to 4 times faster than the native compilers. The resulting code is about 3 times slower.

* J. Grosch, Efficient generation of lexical analysers, SP&E, vol. 19, #11, Nov. 1989, pp. 1089-1103.
[[ preliminary abstract ]] Automaton generation and table compression are combined. Question: are the tunnelling state changes described just $epsilon$-transitions, or is there more to it?

* Alfred V. Aho, Mahadevan Ganapathi, Steven W.K. Tjiang, Code generation using tree pattern matching and dynamic programming, ACM Trans. Programming Languages and Systems, vol. 11, #4, Oct. 1989, pp. 491-516.
The Aho-Corasick multi-string search algorithm is adapted to bottom-up subtree recognition, as follows. Each template (subtree) to be recognized is characterized by the sets of paths from the root to each of the leaves; the paths include at each step information showing which branch in the template is taken. The Aho-Corasick automaton for this set of paths is constructed. Next, a top-down scan uses this automaton to recognize the possible leaves of templates. A subsequent bottom-up scan sets for each node a bit matrix, the entries of which indicate whether the node can be a j-level node of template i; since the left-right information is part of the path, this information is sufficient to conclude that we have found a full template when we find a level 0 node. This process determines for each node in the tree which templates match there, and does so in linear time.
     Each template comes with a cost function and an action. The cost function can check the values of leaves if needed, and can return an infinitely high cost if the context turns out to disallow the template. Another bottom-up scan is used to find the minimum cost at each node and the corresponding template, using dynamic programming: for each matching template determine the cost of leaves + cost of template, and take the minimum. Again this is done in linear time.
     A mixed top-down/bottom-up scan then performs the actions, to generate code and/or to do constant folding, tree reordering, etc.
     Register allocation is not addressed; nor is completeness of the templates. Also, the algorithm is restricted to expressions that form pure trees; this is not mentioned in the paper.
     The algorithm is retargetable, small, fast and produces good quality code.
     Summary. The first part of the paper is a general explanation of the principles in a clear style. The second part concerns the combined implementation of the tree matching and the dynamic programming, and soon confuses explanation and implementation. The third part concerns the code generator writing language twig used in testing the algorithms. It is a yacc-like language with sometimes unexpected nomenclature: the keyword ABORT means `return infinity', and TOPDOWN means that the default post-order visit sequence is replaced by one specified by the user.

* R. Dixon, T. McKee, P. Schweizer, M. Vaughan, A fast method dispatcher for compiled languages with multiple inheritance, in OOPSLA '89 Conference, ACM SIGPLAN Notices, vol. 24, #10, Oct. 1989, pp. 211-214.
The dispatch matrix is implemented as a matrix for speed, but it is compressed by making selector rows coincide. A close to optimal packing is achieved by a simple graph colouring algorithm.
     Additionally, scheme is given to postpone the graph colouring until link time, when all classes with their selectors are known.

* K. Bothe, B. Hohberg, Ch. Horn, O. Wikarsky, A portable high-speed Pascal to C translator, ACM SIGPLAN Notices, vol. 24, #9, Sept 1989, pp. 60-65.
Very summary description of same; mostly about the reasons for writing and using this convertor: 1. using Pascal where there is no Pascal compiler; 2: support for projects that combine C and Pascal.
     Several problems in the translation are mentioned (non-local procedures, initialization of structured values) but no solutions are given. Care is taken to produce readable C, to be improved by hand afterwards.

* Preston Briggs, Keith D. Cooper, Ken Kennedy, Linda Torczon, Coloring heuristics for register allocation, in ACM SIGPLAN '89 Conference Programming Language Design and Implementation, ACM SIGPLAN Notices, vol. 24, #7, July 1989, pp. 275-284.
Explains easiest-last graph colouring and gives a linear-time implementation, as follows. First, an array of sets of nodes S (implemented as linked lists or so) is constructed, such that all nodes with i neighbours are in $ S sub i $; this can be done in $O(n)$ time. Now a node with the fewest neighbours can be found by scanning S from $ S sub 0 $ for the first non-empty set, say $ S sub k $; any node from $ S sub k $ will then do, say N. Next, N is removed from the graph and put on the colouring stack; it is also removed from $ S sub k $. When N is removed from the graph, the degrees of its neighbours are all decreased by one, which means that they all have to moved down one entry in the array S. (In addition, to find the first non-empty entry in S one need not start searching from 0, but one can start at $k-1$ after having removed a node from $ S sub k $: $ S sub { k - 2 } $ was empty at that moment, but it can only be filled from $ S sub { k - 1 } $, which, however, was also empty at that moment, so $ S sub { k - 2 } $ will still be empty after the above step.)
     The authors report speed-ups of about 10 percent compared to Chaitin.

* David Bernstein, D. Goldin, M. Golumbic, H. Krawczyk, Y. Mansour, I. Nahshon, R. Pinter, Spill code minimization techniques for optimizing compilers, in ACM SIGPLAN '89 Conference Programming Language Design and Implementation, ACM SIGPLAN Notices, vol. 24, #7, July 1989, pp. 258-263.
Seven authors, all from the Technion in Haifa. Describes a combination of and improvements over Chaitin, Chow, Brélaz, etc., resulting in about a 10 percent reduction in spills and a 3 percent speed-up. The main item is the cost function, of which they have three. The colouring algorithm is run once for each cost function and the best (least spilling) colouring is chosen.
     The basic graph decomposition step of their colouring algorithm is, for a given number of registers r: if there are one or more nodes with fewer than r neighbours, remove the one with the largest number of neighbours; otherwise, spill the node with the lowest cost. The authors claim this works better than the traditional greedy or lazy algorithms on typical interference graphs.
     A representative graph size of 1000 nodes and 20000 edges is reported for the "average" complete program; the ratio 20:1 for edges:nodes is also representative.

* Wen-Mei Hwu, Pohua P. Chang, Inline function expansion for compiling C programs, in ACM SIGPLAN '89 Conference Programming Language Design and Implementation, ACM SIGPLAN Notices, vol. 24, #7, July 1989, pp. 246-257.
Very clear detailed exposition of the problems that can be encountered doing inlining, and their solutions. Uses profiling of the program to find the most likely candidates. One conclusion is that system calls are bad news.

* Christopher W. Fraser, A language for writing code generators, in ACM SIGPLAN '89 Conference Programming Language Design and Implementation, ACM SIGPLAN Notices, vol. 24, #7, July 1989, pp. 238-245.
Hand-written code generators are repetitive and full of case statements. The author describes a condensed notation for this. The user specifies a naive code rule for each type of node in the DAG, and a set of optimization rules. The rules look like parameter lists of printf(), with the provision that formats may occur as parameters to formats. Basically, the user can imagine that they are turned into a long if-then-chain, but the actual generated code is much more efficient. The reduction achieved (or expansion) is quite considerable: the specification for the VAX is 126 lines, which expand into about 4000 lines of C. Prominent detail: the rule compiler is written in Icon.

* H. Emmelmann, F.W. Schroer, R. Landwehr, BEG - a generator for efficient back ends, ACM SIGPLAN Notices, vol. 24, #7, July 1989, pp. 227-237.
A bottom-up cost-based pattern recognizer, similar to Aho & Ganapathi's, in which the dynamic programming tables are replaced by compiled routines, and which includes register allocation; it actually features two different register allocators. One is naive and tries to hand out registers while recognizing the patterns; if it runs out of registers it spills another register; it is reported to work well enough for the MC68020. The second is a separate pass that calculates the cost of keeping the top of each subtree in a register, using some heuristic on the costs of the the children; if the cost is infinite, the top of the subtree is kept in memory; this is reported to work for the IBM/360 and the 8086.

* Jack W. Davidson, David B. Whalley, Quick compilers using peephole optimizations, SP&E, vol. 19, #1, Jan. 1989, pp. 79-97.
The "very portable C compiler" vpcc has CVM (C Virtual Machine) as its intermediate code; CVM features separate calling and expression stacks and has been designed specifically for C. However, naive code generation from CVM (to the VAX-11) looses about a factor of 2 in speed; hence the peephole optimization. Peephole optimization for CVM is special in the sense that the input sequences are known; a result is that a small number of peephole optimization rules suffice.
     The VAX machine instructions generated by naive code generation are kept in a linked list. Rules are matched to the input starting from the end of the input backwards, to allow matches over the results of matches. The matching process is repeated after each addition to the list of instructions. Register allocation is done from a fixed stack of registers; if an expression exhausts this stack, the compiler gives up.
     The set of replacement rules is transformed into a very detailed and low-level C routine. It does an ad-hoc form of Aho-Corasick bibliographic search, and is about 9 times faster than an interpretative pattern matcher.
     Using about 40 peephole optimization rules, the system generates code that is between -10 and +20 percent faster than the code compiled by the native C compiler.
     The authors make two interesting observations: 1. Using a fixed set of 4 registers on the VAX-11 has not yet caused problems after one year of operation. 2. Searching for clever peephole optimizations is entertaining but of doubtful use: the most useful optimizations are generally obvious.

* B.L. Meek, K.K. Siu, The effectiveness of error seeding, ACM SIGPLAN Notices, vol. 24, #6, June 1989, pp. 81-89.
The differences between manual and automatic error seeding for compiler testing are examined for two very different languages, Pascal and Prolog. Two conclusions:
     1. Automatic error seeding is more effective in languages with strong syntactic structure than in more "loose" languages. The latter require stronger manual steering of the error seeding process.
     2. All methods need some measure of manual intervention.

* J. Angel Velázquez Iturbide, Formalization of the control stack, ACM SIGPLAN Notices, vol. 24, #3, pp. 46-54. March 1989,
The author points out that the control stack is not a stack at all but rather a variable-size array of which only a small and bounded number of elements are accessible at any time (bounded by the maximum lexical depth of the program). Therefore the usual axioms for stack do not apply. The activation record is considered as an ADT and a machine-independent implementation is given. There are also useful and clear pictures of control stacks with accessible and inaccessible activation records depicted as toy blocks shifted front and back.

* Mahadevan Ganapathi, Geoffrey O. Mendal, Issues in Ada compiler technology, IEEE Computer, vol. 22, #2, pp. 52-60. Feb. 1989,
Some of the issues are (there are many more in the paper): 1. Constraint check optimization is a must. 2. Code reordering is highly desirable, but "a myriad of special cases" in the Reference Manual effectively prevents it. 3. Exception handlers hinder optimizations, since values must be kept for use in exception handlers that could otherwise be discarded. 4. The rules for parameter passing cause a lot of bothersome aliasing (and passing everything by copy-restore is not an option either).
     The authors also complain about the vagueness of the semantics of tasking and "its intrinsic requirements of substantial processing time", but their complaints are equally vague.

* P. Steenkiste, J.L. Hennessey, A simple interprocedural register allocation algorithm and its effectiveness for LISP, ACM TOPLAS, vol. 11, #1, Jan. 1989, pp. 1-32.
The paper starts with a very clear exposition of interprocedural register allocation, both in general, historical and modern, and more in particular in a system with many small and often recursive procedures and lots of registers. The unfeasability of constructing an colouring a complete interference graph for the entire program is emphasized.
     The algorithm works as follows. First, traditional register allocation by graph colouring is done on all procedures; it is assumed that no spills will occur in this phase (or we will not have enough registers to do interprocedural register allocation anyway). This tells us how many registers each procedure needs.
     Next the call graph is constructed and strongly connected components (consisting of mutually recursive procedures) are identified. Each strongly connected component is assigned the maximum number of registers any of its members needs. This number is sufficient if we impose on each of the members the obligation to save its own registers when it calls any of the other members.
     Now we have a DAG containing both procedures and strongly connected components, but in both cases we know how many registers the element needs. Registers are now handed out bottom-up in this DAG. Say an element i uses $ t sub i $ registers directly or indirectly; it will then use the registers $ R sub 1 ... R sub t sub i $. This $ t sub i $ is the sum of $ n sub i $, the number of registers i itself uses, and the maximum of the $ t sub j $ over all children (callees) of i. This means that there may not be enough registers for the top level procedures, but they are active only a small part of the time anyway.
     The result is an average speed-up of about 10 percent.

* Arthur B. Pyster, Compiler Design and Construction: Tools and Techniques with C and Pascal, (2nd ed.), Van Nostrand Reinhold, New York, 1988, pp. 267.
[How does it compare to the 1st edition?] [Not available in Dutch university libraries.] Gabriela O. de Vivo <> writes: The book covers the general principles of compiler design and presents a good number of examples focusing on the building of pieceparts of compilers for C and Pascal. The implementation (construction) language is C. Note that this edition (in contrast with the previous one) is very related to the Unix world, including the use of tools like Lex, Yacc, and standard utilities. (Out of print.)

* Charles N. Fischer, Richard J. LeBlanc, Crafting A Compiler, Benjamin Cummings Publishing, Menlo Park, CA, 1988,
Highly readable though somewhat repetitive book by two experts in the field.

* D.M. Dhamdhere, Register assignment using code placement techniques, Comput. Lang., vol. 13, #2, pp. 75-93. 1988,
As an alternative to register allocation oriented code generation, the author starts from unoptimized code in which the registers for each operation are loaded on the spot and the result is stored right away. This code is optimized by moving the loads and stores up and down the control path until they meet with and annihilate other stores and loads. Lots of theory, and there is probably a real algorithm in there soemwhere.

* J.S. Yates, R.A. Schwatz, Dynamic programming and industrial-strength instruction selection: code generation by tiring but not exhaustive search, ACM SIGPLAN Notices, vol. 23, #10, Oct. 1988, pp. 131-140.
In the dynamic programming code generation as described by Aho, Sethi and Ullman, the code for each node returns the result of the expression (in a register or in memory). This is extended in the sense that the code can also return the result except for a deferred operation; for example, the result may be in register 1 but still has to be shifted left over 2 positions or so. An attempt is made to curb combinatorial explosion by sending down inherited attributes telling which operations might be useful.
     Excellent bibliography on Graham-Glanville code generation.

* Peter F. Lemkin, PSAIL: a portable SAIL to C compiler -- description and tutorial, ACM SIGPLAN Notices, vol. 23, #10, Oct. 1988, pp. 149-171.
Example of fairly massive compilation with C as a target language.

* C. Genillard, A. Strohmeier, GRAMOL -- an grammar description language for lexical and syntactic parsers, ACM SIGPLAN Notices, vol. 23, #10, Oct. 1988, pp. 103-122.
A manual rather than a description or rationale. The lexical analyser is restricted to programming language tokens. Remarkably, the description of the lexical analyser takes 7 pages and that of the syntactic analyser 2.5 pages. The paper is sometimes singularly uninformative: the form of the resulting parsing is not explained nor how it can be obtained; and the paper explicitly refuses to reveal whether a top-down or bottom-up parser is generated (either, depending on the grammar perhaps?).

* Peter Schnorf, Dynamic instantiation and configuration of functionally extended efficient lexical analyser, ACM SIGPLAN Notices, vol. 23, #10, Oct. 1988, pp. 93-102.
LEXXO, a fixed lexical analyser, parameterized with such things as the set of characters that start an identifier, etc.

* Jack W. Davidson, Anne M. Holler, A study of a C function inliner, SP&E, vol. 18, #8, Aug. 1988, pp. 775-790.
Describes INLINER, a source-to-source inliner for C. Several inlining problems were encountered: actual-formal mismatch, varargs, etc. Many details are given in narrative form, for example concerning the renaming of labels; no (pseudo-)algorithms. A call graph is constructed as far as possible; recursion cycles are broken arbitrarily. The routines are inlined according to a backward topological sort.
     On the average, a 12 percent speed-up was obtained at the expense of a 24 percent increase in code size. The conflicts between inlining and debugging / symbolic debuggers are noted.

* H. Mössenböck, A convenient way to incorporate semantic actions in two-pass compiling schemes, Software -- Practice & Experience, vol. 18, #7, July 1988, pp. 691-700.
First a parse tree of a program is constructed using a strong parser (LR(1)) and then convenient LL(1)-based semantic analysis is done guided by that parse tree.

* Christopher W. Fraser, Alan L. Wendt, Automatic generation of fast optimizing code generators, in ACM SIGPLAN '88 Conference on Programming Language Design and Implementation, ACM SIGPLAN Notices, vol. 23, #7, July 1988, pp. 79-84.
First, the compiler writer prepares two tables, one describing a naive code generator (for a given intermediate language) and one describing a peephole optimizer for it. These are then combined into a slow, effective optimizing code generator $CG sub 1$, which does optimization through classical techniques. Now, $CG sub 1$ is used to compile and optimize program segments from a test bed and all optimizations found are recorded, including the conditions that led to their discovery. These results, together with the naive code generation table are used to construct a second code generator $CG sub 2$, which contains all handling of the discovered optimizations as hard code: the code generator has "learned" the optimizations by heart.

* Paul E. Cohen, An abundance of registers, ACM SIGPLAN Notices, vol. 23, #6, pp. 24-34. June 1988,
Points out that even if there are registers enough, it is not wise to just use then liberally, since having many live registers at procedure calls still causes trouble: either they have to be saved and restored upon return or the callee (and its sub-callees!) have to be careful not to touch them. The NEC V60 and V70 microprocessors have 32 registers of 32 bits, and subroutine call and return instructions that accept dynamic masks indicating which registers are in use at the call point and which are going to be used by the callee; ANDing these masks gives the registers that have to be saved, which the hardware then duly does. The author then analyses this situation statistically, for various assumptions, and runs tests on a real-world program. One conclusion is that it is probably wise to save all registers before calling a routine that is called only once, to allow as many registers as possible to the calling tree under that routine.

* Kai Koskimies, Otto Nurmi, Jukka Paakki, Seppo Sippu, The design of a language processor generator, Softw. Pract. Exper., vol. 18, #2, pp. 107-135. Feb. 1988,
For generality, the authors prefer "language processor generator" over "compiler writing system". The paper describes HLP84 (Helsinki Language Processor 84), the design of which was for a large part driven by the shortcomings of HLP78: 1. LALR(1) is not convenient enough; 2. pure attribute grammar is too primitive; 3. most practical attribute grammars are one-pass, but this is not exploited in HLP78.
     HLP84 is an open system, i.e. it is a set of interfaces: parsers / code generators / preprocessors / etc. can be added at will. Lots of tools are supplied. One essential level in the interfaces is the block (as in a structured programming language). Processing is one-pass. Although emission of semantic errors is largely automatic, measures are taken to prevent semantic errors from propagating.
     Attempts to design a general scope-managing system led to difficulties, but some tools are available.
     The system produces language processors in Pascal; it consists of ~ 4400 lines of HLP84 code, yielding ~ 6400 lines of Pascal. The code is described in some detail.

* M. Ganapathi, C.N. Fischer, Integrating code generation and peephole optimization, Acta Inf., vol. 25, Jan. 1988, pp. 85-109.
The paper describes attribute-based code generation and attribute-based peephole optimization, but in spite of the title the integration of both is not explained, as far as I can see. The attribute-based code generation rules record the semantics of the instructions issued in attributes rather than to generate code. The attribute-based peephole optimization rules specify code for specific combinations of nodes in the parse tree, subject to conditions on the attributes of these nodes. Apparently there is some bottom-up tree recognizer active that signals the applicability of a peephole optimization rule and then tests the attributes to see if the peephole optimization rule really applies.
     Lots of (58) literature references.

* Dick Grune, Ceriel J.H. Jacobs, A programmer-friendly LL(1) parser generator, Software -- Practice and Experience, vol. 18, #1, pp. 29-38. Jan. 1988,
Presents a practical ELL(1) parser generator, called LLgen, that generates fast error correcting recursive descent parsers. In addition to the error correction, LLgen features static as well as dynamic conflict resolvers and a separate compilation facility. The grammar can be viewed as a program, allowing for a natural positioning of semantic actions.

* Christopher Warwick Fraser, Automatic Generation of Code Generators, PhD Thesis, Yale University, Dec. 1987, pp. 87.
Describes an automatic code generator generator which is instantiated with a machine- and language-independent code generation knowledge base XGEN and an ISP machine description of machine M to create a code generator for M. This code generator converts the intermediate language XL into code for M.
     XGEN contains general rules such as "if the XL form contains indirect addressing of the form such and such and the machine has such addressing, then generate the indirect addressing code, else replace the XL form by such and so". The replacement(s) will then again be subject to analysis with XGEN rules.
     ISP is a machine description language similar to the one in th PDP-11 manual.
     XL is a fairly high-level intermediate code; a too low level would preempt valuable optimizations. Its main ingredient is three-address code in which the addresses can be relatively complicated and can feature indirect addressing. Also, it has a fixed sequence of pseudo-instruction to represent a loop.
     The thesis is remarkably uninformative in that it does not describe the matching algorithm that interprets XGEN; only examples are given, and they are none too clear either.

* J.W. Davidson, C.W. Fraser, Automatic inference and fast interpretation of peephole optimization rules, Softw. Pract. Exper., vol. 17, #11, Nov. 1987, pp. 801-812.
See also Davidson & Fraser, "Automatic generation of peephole optimizations", ACM SIGPLAN Notices, Vol. 19, No. 6, June 1984, pp. 111-116. The present paper tells us more on the peephole optimizer PO, for which see ACM TOPLAS April 1980, by the same authors.
     Rather than including all instruction pairs that could conceivably be replaced by an optimizing replacement, only those combinations are considered that occur in actual programs. To this end, a test set of programs is assembled, translated naively and the result analysed for instruction pairs.
     The compile-time pattern matching is done by renormalizing two instructions in the input stream and looking the result up by hashing; details are given.
     The inference mechanism is the same as in their 1984 paper mentioned above.
     An evaluation of PO, Robert Kessler's Peep, and Peter Kessler's XXXX???? shows that all three systems yield roughly the same quality of code. It appears that combining three instructions is enough, combinations of four or higher are not useful. A comparison with EM, the abstract machine of the Amsterdam Compiler Kit, showed that the automatically generated set of peephole optimizations rules performed about 85 percent of the optimizations that the hand-written EM peephole optimizations rules did.

* Michael Sonnenschein, Graph translation schemes to generate compiler parts, ACM TOPLAS, vol. 9, #4, Oct. 1987, pp. 473-490.
During attribute evaluation in an AST, it is sometimes useful to have attributes which are pointers to syntax tree nodes. This is barely possible inside the AG framework, and if these pointers are treated on par with the syntax tree pointers resulting from parsing, the AST is no longer a tree but a graph, and possibly cyclic at that.
     The "graph" in the title is an ASG, an Abstract Syntax Graph, and the described language GTS is an attribute language for ASGs. GTS is also capable of modifying the ASG, which is essential for optimization. To facilitate the use of some iterative algorithms, it includes iteration in addition to the usual attribute evaluation rules. It is translated to a set of recursive procedures, using Katayama's scheme.
     The paper is quite theoretical and heavy reading. Its germanisms do not help either (for example p. 747-11: "for the destination of different information on ..." = "für die Bestimmung unterschiedlicher Information über ..." = "for the determination of different data on ...").

* N. Wirth, Hardware architectures for programming languages and programming languages for hardware architectures, ACM SIGPLAN Notices, vol. 22, #10, Oct. 1987, pp. 2-8.
Compares the 1960-1970 software crisis to the 1980 hardware crisis. Suggests a calculus for hardware correctness.

* H. Massalin, Superoptimizer -- A look at the smallest program, ACM SIGPLAN Notices, vol. 22, #10, Oct. 1987, pp. 122-126.
The absolutely optimal code sequence for some very simple but useful functions is found by exhaustive search. Functions include sign(x), max(a,b), min(a,b) and obtaining boolean values for a == 0 and a == b. The universe of the exhaustive search is a set of hand-picked instructions that do arithmetic and logical operations on registers; jumps are explicitly excluded. The search tree is pruned by recognizing repeated or zero-result instruction combinations. Each prospective solution is tested by two sets of spot tests for the argument values; if the solution survives these tests, it is checked manually and in practice the solution is always found to be OK. The superoptimizer can test 50000 solutions per second, and finds the optimum code in a few days. It tends to find astounding and very "clever" code sequences.

* R.C. Holt, Data descriptors -- A compile-time model of data and addressing, ACM TOPLAS, vol. 9, #3, July 1987, pp. 367-389.
A uniform descriptor for data (in memory, in a register, constant) is proposed for use in a compiler. The form is $ @ sup k b dot d dot i $, in which $ b $ is the base: a register $ R sub n $, a scope level $ L sub n $ or $ null $ for constants; $ d $ is an signed integer displacement; $ i $ is an index register $ I sub m $ or $ null $. The sum of the above forms the numeral of the data descriptor. Its value is the numeral, k times dereferenced, with $ k >= 0 $.
     The use of these data descriptors for expressing values and addresses in expressions in various programming languages are described, as are mappings onto many machine addressing schemes. One example is that the C & pseudo-operator corresponds to $ @ sup -1 $. Next, rules for adding, subtracting and multiplication with a constant are given. It is shown that these can for example be used to generate good code for compound array indexing, for example for a.b[i].c[j] := false.

* Jeanne Ferrante, Karl J. Ottenstein, Joe D. Warren, The program dependency graph and its use in optimization, ACM TOPLAS, vol. 9, #3, July 1987, pp. 319-349.
[ Heavy graph theory ]

* F. Warren Burton, Dieter Maurer, Hans-Georg Oberhauser, Reinhard Wilhelm, A space-efficient optimization of call-by-need, IEEE Trans. Softw. Eng., vol. SE-13, #6, pp. 636-642. June 1987,
Call-by-need is lazy evaluation. The paper explains strictness analysis, and gives many examples where it does a lot of good, and a few where it is not optimal.

* Steven P. Reiss, Automatic compiler production -- The front end, IEEE Trans. Softw. Eng., vol. SE-13, #6, pp. 609-629. June 1987,
[ about ACORN ]

* Andrew W. Appel, Kenneth J. Supowit, Generalizations of the Sethi-Ullman algorithm for register allocation, Softw. Pract. Exper., vol. 17, #6, June 1987, pp. 417-421.
Extends the Sethi-Ullman algorithm for register allocation (Sethi & Ullman, The generation of optimal code for arithmetic expressions, J. ACM, Vol. 17. No. 4, pp. 715-728, April 1970) for operations that have more than 2 inputs and/or more than 1 output, by sorting the subtrees in decreasing order for $ a sub n - b sub n $, where $ a sub n $ is the number of registers required to evaluate node n and $ b sub n $ is the number of registers required by the result of node n.

* R. Nigel Horspool, An alternative to the Graham-Glanville code generation scheme, IEEE Software, vol. 4, #3, pp. 33-39. May 1987,
A simple bottom-up dynamic programming scan assigns the best storage class to each node (numeric register, address register, even-numbered register, etc.). That done, a second bottom-up scan matches the patterns to instructions. Accidental mismatch is solved by emitting extra code. Comparisons with Graham-Glanville (can get stuck and the author's cannot), and with Aho-Johnson (J. ACM, July 1976) (simpler and can handle even/odd registers) are made.

* Dick Grune, How to compare the incomparable, Information Processing Letters, vol. 24, pp. 177-181. 1987,
Suppose you have to choose the larger of ($x sub 1$, $y sub 1$) and ($x sub 2$, $y sub 2$), when you know nothing about x and y. Choose the one for which $ x * y $ is larger. The paper shows that this is the only criterion that is invariant under a variety of transformations.

* Jim Welsh, Atholl Hay, A Model Implementation of Standard Pascal, Prentice-Hall, 1986, pp. ???.
Joe Snyder <> writes: This book is only really useful if you need to implement a COMPLETE version of a platform-independent Pascal, but I find it interesting because the 483 pages consist entirely of the source code listing for the compiler and p-code interpreter (both written in Pascal itself), including copious {comments} to explain the code. The code eagerly delves into the horrible minutiae necessary when implementing a complete language, and proves that no language designer should be allowed to present his design until AFTER being forced to write a complete compiler for the language.

* John R. Ellis, Bulldog -- A Compiler for VLIW Architectures, MIT Press, Cambridge, Ma., 1986, pp. 320.
The Bulldog compiler is a very sophisticated compiler for a very specific problem: exploiting the parallelism in a VLIW machine for scientific programs in Fortran. One of the premises is that the VLIW machine can be designed hand in hand with the compiler. The compiler generates code for an imaginary machine, ELI, which stands for "Enormously Long Instructions". A simulator for ELI exists.
     The compiler is based on three techniques: trace scheduling, memory-reference disambiguation, and memory bank disambiguation.
     The VLIW machine has a number of specific properties (aside from having Very Long Instruction Words); the most important ones follow. All binary operations are symmetrical: if, for example, there is a machine instruction to compute $a-b$, there is another one that computes $b-a$. Instructions like incr and dec are replaced by a constant generator unit, which, when activated, will produce the desired constant operand. There are no "hot spots" (latches, hidden registers): all operations read from registers and write to registers; and there are enough of these. Memory is interlaced in memory banks; a memory location can be addressed efficiently as location N in memory bank M, and less efficiently just by its address. Operations can be multi-cycle (they deliver their results only after more than one cycle); the resources such an operation needs must be tied up explicitly in the following instructions, until the result can be written to a register.
     A "trace" is a list of basic blocks that are connected by a possible flow of control path in the original flow graph. The original flow graph is divided up into traces; each basic block is part of exactly one trace. Traces are ordered for "importance". There can be jumps from somewhere in the middle of a trace to the start of less important trace and from the end of a trace to somewhere in the middle of a more important trace. No other jumps are allowed; more specifically, there cannot be a jump from a trace to itself.
     It is the task of the trace scheduler to create and order the traces in such a way that a more important is likely to be executed more often in a run of the program than a less important trace. To this end, the trace scheduler uses heuristics, user information and a profiler to estimate the importance of each basic block. The most important basic block is then selected as the "seed" for the most important trace. This trace seed is then extended on both ends, using the estimated importance of other basic blocks, until the process stops either at one of the ends of the function or whenever an illegal jump would be created. This process is then repeated to find the one but most important trace, etc.
     The most important trace is compiled first, and optimized at the expense of less important traces; if code inside the trace is moved past jumps off the trace or into the trace, this code is copied to the other (lesser) traces, as needed.
     Memory references are disambiguated by constructing a symbolic expression for each memory reference in a given trace. The expressions are AND-OR formulas of actual reference expressions; the latter can be composed of additions and multiplications only, and multiplication can be by a constant only. To see if two memory references $ e sub 1 $ and $ e sub 2 $ could ever refer to the same location, the equation $ e sub 1 - e sub 2 = 0 $ is constructed. This equation is resolved into a number of Diophantine equations, which can be solved easily. There are three possible answers: $ e sub 1 $ and $ e sub 2 $ always refer to the same location, $ e sub 1 $ and $ e sub 2 $ never refer to the same location, and "don't know". If an equation cannot be constructed (division, multiplication by a variable, etc.), the compiler assumes the worst ("don't know"). All this is effective because index expressions are usually linear.
     Similar techniques are used to do memory bank disambiguation. Many optimizations are described for each of the three techniques.
     The second half of the book concerns the code generation proper.
     Code generation starts with live analysis on the variables in the basic blocks in a function; this analysis determines which variables are live at the beginning and end of each basic block.
     Next the trace scheduler groups the basic blocks into traces, with priorities. The live information of the blocks in a trace is merged to give live information about the trace itself. At this moment the live information does not yet include location information, except perhaps for the input and output parameters.
     Starting with the most important one, code is then generated for each trace. If this trace connects to a trace that was already processed, the location requirements of that trace are obeyed, otherwise the code generation process determines the locations in which live input variables are expected from other traces and live output variables are to be expected by other traces. These data are then passed on the traces to which the present trace is connected and which have not yet been processed. The information translates there into DEF and USE nodes at the beginning and the end, respectively. In this way allocation decisions in more important traces are given absolute priority over those in less important traces.
     The code generator applies many of the usual Aho-Sethi-Ullman optimizations to each trace, including loop unrolling. If these result in code motion and if this motion is past jumps off the trace and into the trace, code is copied to the other traces. This makes the other traces less efficient, but that's OK since they are less important anyway.
     The next task is assigning functional units and filling the instruction words. This task is performed by a recursive top-down scan that propagates information from the roots of the DAG downwards; when the scan reaches the bottom, it collects the information found there and on this basis assigns functional units and fills instruction words, using a greedy algorithm. This phase is called BUG, for Bottom-Up Greedy.
     Register allocation is trivial by design: the ELI machine has enough registers for all functional units.
     The Bulldog compiler is written in ELISP, the LISP dialect for the DEC-20. The source program, in Fortran, is converted to Lisp syntax, called Tinylisp, using a variant of the UNIX Fortran structuring program struct. The output is object code, to be run on the ELI interpreter. The compiler achieves speed-ups between 0.9 and 7.6 over the sequential case, over a mix of scientific Fortran programs.
     A recurring conclusion in the book is that simple heuristics help, but that once these are in place, more complicated heuristics help less.
     The book is written in an easy-going style, features many diagrams and hardly any math, and is much less difficult than it could have been, given the subject. The many personal and entertaining comments add to the book's attractiveness.

* Ralph E. Griswold, Madge T. Griswold, The Implementation of the Icon Programming Language, Princeton University Press, Princeton, N.J., 1986, pp. 336.
Detailed and possibly exhaustive explanation of the internal workings of the Icon interpreter, with many examples, diagrams and occasional but not too many chunks of C code. On the one hand it reads as an extensive technical report on a non-trivial interpreter, on the other it features exercises, which suggest textbook use. It seems quite usable as a textbook in a course on interpreter writing; its restriction to Icon does not prevent a host of interpreter problems from appearing and being solved.
     The entire backtracking mechanism, including generators and co-expression is explained by example and by C code, but a few catchy invariants would have been helpful. The implementation of generators is unexpected, if I understand it correctly. The scope of a generator is bounded by the expression it occurs in; what is exactly a bounded expression is defined in the language. The beginning of the bounded expression is marked on the stack; the mark includes a failure label. When a generator occurs, it is called in the usual way; it is implemented as a C routine. When it suspends, the C routine does not return, since that would involve loss of its local variables, but calls the interpreter recursively instead, with special parameters. At that moment the stack contains (counting from the top) some data for calls and expressions C and an expression bound marker M. The recursive call stacks a generator marker G, then stacks a copy of the section C and stacks one value of the generator on top. This simulates the situation in which there is no generator at all and just one value is supplied. The code in the stacked copy C then runs its course (possibly involving nesting generators), and the recursive call to the interpreter returns with success or failure. If it fails the generator is resumed, which may of course again fail, but which normally results in a repetition of the scenario described above with the next value of the generator. If it succeeds, the bounded expression terminates; both the interpreter stack and the C stack have to be cleared of any stuff pertaining to it. Clearing the interpreter stack is easy: M is unstacked, and together with it everything on top of it. But if the discarded part of the interpreter stack contained G markers for non-exhausted generators, the C stack will contain activation records for these and they will never be resumed. To solve this, the recursive depth of the interpreter is recorded and any C routine that finds itself on a depth from which the interpreter has already returned, itself returns immediately.
     The interpreter features a mark-scan-compactify garbage collector, with a number of peculiarities. The mark phase is recursive and hopes that the C stack will suffice; it usually does. Since Icon is not a typed language, there can be no templates and pointers have to be identified by being implemented as self-identifying. Problems with hidden and invisible pointer are avoided by moving all memory requests, and thus all potential calls to the garbage collector, to safe places, by "predictive need". There are some problems with pointers to possibly overlapping substrings; the compaction phase does the right thing and keeps only the accessible "clumps" as they are called here, at the expense of some sorting of pointers.
     Some additional technical points:
     -- Totally different data structures have to be prepared for lvalues and rvalues (called here "occurrences in assignment context" and "occurrences in dereferencing context") that look the same in Icon, for example for table elements: t[12] may result in a value or in a pointer to an assignable position. If we need an rvalue and happen to have an lvalue, for example as a function argument, it can be dereferenced in place.
     -- The interpreter has an interpret stack on which virtual Icon machine instructions operate; these instructions are programmed as C functions and are issued in principle by a fetch-and-interpret loop, except that the suspend operation and any other operation that does suspend operations internally (for example toby) calls the interpreter recursively, to aid in backtracking.
     -- The problem with co-expressions is that they require a new interpreter stack, which is easy, and a new C stack, which is hard. This is a consequence of having the interpreter call itself to do backtracking.

* William A. Barrett, Rodney M. Bates, David A. Gustafson, John D. Couch, Compiler Construction: Theory and Practice, Science Research Associates, Chicago, 1986, pp. 504.
A considerable part (about 50 percent) of the book is concerned with parsing; formal language theory, finite-state automata, top-down en bottom-up parsing and error recovery are covered in very readable chapters. Only those theorems are treated that relate directly to actual parsing; proofs are quite understandable. The book ends with an annotated bibliography of almost 200 entries, on parsing and other aspects of compiler construction.

* Patrick D. Terry, Programming Language Translation -- A Practical Approach, Addison-Wesley, Wokingham, England, 1986, pp. 443.
Essentially a series of case studies with introduction, concerning an interpreter, an assembler, a parser (LL(1) only), a toy compiler, block structure, semaphores, modules and monitors. The language is Pascal. Covers the bare essentials of a wide range of subjects, in an easy-going style.

* A.V. Aho, R. Sethi, J.D. Ullman, Compilers: Principles, Techniques and Tools, Addison-Wesley, Reading, Mass., 1986, pp. 796.
The "Red Dragon Book". Excellent, UNIX-oriented treatment of compiler construction. Even treatment of the various aspects.

* David A. Padua, Michael J. Wolfe, Advanced compiler optimization for supercomputers, Commun. ACM, vol. 29, #12, Dec. 1986, pp. 1184-1201.
For parallel supercomputers.

* Yuh-Huei Shyu, From semi-syntactic analyzer to a new compiler model, ACM SIGPLAN Notices, vol. 21, #12, Dec. 1986, pp. 149-157.
Many so-called context-free syntax rules are actually regular expressions over (regular) tokens, and thus themselves regular. In this none too clear paper, the author examines the consequences of considering all non-recursive leaves of the grammar as tokens.

* H. Mössenböck, Alex -- a simple and efficient scanner generator, ACM SIGPLAN Notices, vol. 21, #12, Dec. 1986, pp. 139-148.
Lexical scanner to the Coco system, generating Modula-2 code. The input language Alexis consists of marked sections of token classes, for example CHARACTER SETS, KEYWORDS, COMMENTS, etc., of which Alex understands the implications. Within each section, tokens can be defined in a section-determined format. Usually the definition consists of the token number to be returned, followed by =, followed by an EBNF expression. There is no action code. Nesting definitions are allowed in the COMMENTS section.
     The calling mechanism seems awkward: to obtain a token the generated analyser has to be called twice. The first call returns the token number as defined, and the second call returns an index in a hash table of an entry that contains the text of the token.
     The implementation is not described in detail and leaves questions. 1. The implementation of the nesting tokens is not explained. 2. It is stated that Alex disambiguates by taking the longest match. which requires unbounded look-ahead, but it is also stated that there is no or only very limited look-ahead.

* Dick Grune, Generic packages in C, ACM SIGPLAN Notices, vol. 21, #9, Aug. 1986, pp. 31-39.
Shows how to fake generics in C, using the preprocessor.

* Valentin F. Turchin, The concept of a supercompiler, ACM TOPLAS, vol. 8, #3, July 1986, pp. 292-325.
[[ preliminary abstract ]] [A combination of partial evaluation, lazy evaluation, generalization, etc.]

* Walter F. Tichy, Smart recompilation, ACM Trans. Programming Languages and Systems, vol. 8, #6, pp. 273-291. July 1986,
Editing a source file A produces a change set, a specification of the modifications in terms of the language (rather than in terms of the text); example: the declaration of X has changed. Compiling a source file B produces a reference set, a specification of the items B used from its environment, in terms of the language. If B depends on A, B is recompiled only if the intersection of the change set of A and the reference set of B is non-empty.

* David W. Wall, Global register allocation at link time, in ACM SIGPLAN '86 Symposium on Compiler Construction, ACM SIGPLAN Notices, vol. 21, #7, July 1986, pp. 264-275.
The arguments for doing global register allocation at link time are 1. that that is the first moment that all information is in, including that from libraries and 2. that it allows global variables to be put in registers.
     The naive approach of delaying almost all actions to link time was found to be too slow and too inflexible. In the improved approach, the code generator generates code that works correctly and that can be used without further register allocation at link time. The only registers it uses (out of the 64 registers the target machine has) are 8 registers for temporaries in expressions. In addition to the working code, the code generator supplies one or more annotations with each instruction; an example of such an annotation is "remove this instruction if variable y is in a register". The code generator also supplies partial call graphs with call frequencies.
     When link time optimization is desired, the call graphs of all modules involved are combined into an over-all call graph. Recursive procedures are found and removed from the graph as "difficult cases". This leaves an acyclic graph, which is augmented with usage frequency estimates. One observes that none of the leaf procedures can be active simultaneously, so they can all share the same registers for their variables. Once these have been handed out, the same argument applies to the next level, etc. If there are not enough registers, the frequency counts are used to make the tough decisions. A more complicated scheme has been followed for recursive procedures, in which one procedure is a cycle gets the obligation to save ("archive") all the registers on the cycle.
     Once the register assignment have been performed, the object code can be edited and cleaned up; this involves address recalculations.
     The author reports that global link-time register allocation obtains larger speed improvements than global compile-time register allocation, with or without graph colouring, the ratio being about 15 percent to 11 percent to 10 percent (all percentages compared to local compile-time register allocation).

* James R. Larus, Paul N. Hilfinger, Register allocation in the SPUR Lisp compiler, in ACM SIGPLAN '86 Symposium on Compiler Construction, ACM SIGPLAN Notices, vol. 21, #7, July 1986, pp. 255-263.
Very clear and reasonably detailed report on the implementation of Chow's priority-based graph colouring register allocator in a Lisp compiler for a RISC machine. The resulting impact on the compilation time is reported as "moderate for most programs, though particular, complex functions may require considerable time". Required reading for anyone undertaking such an enterprise, I think.

* Peter B. Kessler, Discovering machine-specific code improvements, in ACM SIGPLAN '86 Symposium on Compiler Construction, ACM SIGPLAN Notices, vol. 21, #7, July 1986, pp. 249-254.
[[ preliminary abstract ]] The instruction set is partitioned in general-purpose and special-purpose instruction. The process starts from a Lisp-like (= tree-like) description of all machine instructions, and a list of general-purpose instructions to be used by the code generator. For each special-purpose instruction, an attempt is made to decompose its tree into trees of general-purpose instructions. This seems to be done through unification, although this is not described in detail in the text; the unification yields conditions under which the rewriting (= the equivalence) holds.
     This yields "idioms" for each special-purpose instruction. Three types of idioms are distinguished:
     1. A "set idiom" is a special-purpose instruction that is equivalent to a general-purpose instruction except that one or more of its operands are restricted to a very limited set; an example is INCR R1 for ADD R1,1.
     2. A "binding idiom" is a special-purpose instruction that is equivalent to a general-purpose instruction except that two or more of its operands are the same; this is used to model addressing modes, but it is not explained how.
     3. A "composite idiom" is a special-purpose instruction that is equivalent to a sequence of general-purpose instruction; an example is JGE R1,R2,LABEL for CMP R1,R2; JPOS LABEL.
It is not explained how these idioms and their context conditions are used by the code generator or the code generator writer or generator.

* Christopher W. Fraser, Alan L. Wendt, Integrating code generation and optimization, in ACM SIGPLAN '86 Symposium on Compiler Construction, ACM SIGPLAN Notices, vol. 21, #7, July 1986, pp. 242-248.
Recognizing that both code generators and optimizers are pattern matches leads to an integrated rewriting system. Being part of PO, for which see Davidson & Fraser, ACM TOPLAS, April 1980, the rewriting system is not tree-based but used hashing to determine the appropriate rewrite quickly.

* J. Davidson, A retargetable instruction reorganizer, in ACM SIGPLAN '86 Symposium on Compiler Construction, ACM SIGPLAN Notices, vol. 21, #7, July 1986, pp. 234-241.
About the "order" algorithm of PO, for which see Davidson & Fraser, ACM TOPLAS, April 1980. The basic algorithm is the Sethi-Ullman ordering algorithm for expressions, applied here to target code that has already been generated by previous phases; late instruction reordering is advantageous. Its retargetability is not explained but seems to stem from its being integrated in PO.

* Phillip B. Gibbons, Steven S. Muchnick, Efficient instruction scheduling for a pipelined architecture, in ACM SIGPLAN '86 Symposium on Compiler Construction, ACM SIGPLAN Notices, vol. 21, #7, July 1986, pp. 11-16.
Compilation is for a HP Precision, an (experimental?) pipelined architecture. The constraints imposed by the pipelining are inserted as dependencies in the DAG of the basic block. Next two rules and a set of heuristics are given to create a topological sort. The resulting algorithm is $ O ( n sup 2 ) $ for a basic block size of n, rather than $ O ( n sup 4 ) $ for previous algorithms.

* Henri E. Bal, Andrew S. Tanenbaum, Language- and machine-independent global optimization on intermediate code, Comput. Lang., vol. 11, #2, pp. 105-121. April 1986,
Does the following optimizations on the intermediate code: inline substitution, common subexpression elimination, strength reduction in loops, stack pollution, tail merging, copy propagation, constant propagation, dead code elimination, register allocation. Gives detailed considerations that went into the implementation of each of them. The total of them causes a speed-up of about 16 percent; the paper does not break this down over the various optimizations.

* A.M.M. Al-Hussaini, R.G. Stone, Yet another storage technique for LR parsing tables, Softw. Pract. Exper., vol. 16, #4, April 1986, pp. 389-401.
The paper describes compression techniques for LR parser tables; somewhat confusingly the LR parser itself is used for doing file compression. With this out of the way, an introduction to standard LR table compression follows. The new submatrix technique is described next; it partitions the rows into a number of submatrices, such that the rows of each are similar enough in some sense to allow drastic compressing. The access cost is $O (1)$.
     One heuristic partitioning algorithm is the following. For each row we choose the default value it contains (majority vote or so) and list the column which contain non-default values, the "significant columns". We then put all rows with the same default value and the same set of significant columns in the same partition, and reassemble the rows into matrices, the submatrices. These submatrices resulting from this process allow various obvious compressed implementations. At the top level there is a large case clause (access cost $O (1)$?) over the row number to get to the right matrix.

* T.G. Lang, J.T. O'Quin(n??), R.O. Simpson, Threaded code interpreter for object code, IBM Technical Disclosure Bulletin, vol. 28, #10, March 1986, pp. 4238-4241.
Incremental compilation is effected using a form of threaded code, in the following setting (which is not explained in but can be inferred from the paper). Machine code of "the old machine" must be run on "the new machine"; neither machine is specified in the paper, but both are byte-oriented and the new machine is a pipe-lined architecture with a large virtual memory. The code of the old machine is allowed to be self-modifying; this precludes static object-to-object conversion.
     The machine code of the old machine is loaded into an array O on the new machine, and a new array N is allocated that is 8 times as large; the factor 8 derives from expedience, not from principles. The idea is to store in the $8m$ bytes $ N [ 8 p .. 8 p + 8 m ] $ the new machine code that performs the same operation as the old m-byte long machine instruction in $ O [ p .. p + m ] $. In fact, each segment of 8 bytes in N corresponds to one byte in O; such an 8-byte segment in N is called a cell. At Initial Program Loading (of the old machine, simulated on the new machine) , each 8-byte cell is initialzed to two instructions, one that loads the address of the corresponding byte in O into a fixed register $oldIP$ and one that jumps to the interpreter / incremental compiler. Rather than emulating the old machine instruction at $ O [ oldIP ] $, the interpreter / incremental compiler inserts new machine code equivalent to that instruction into the cells corresponding to the bytes of the old instruction, fills the unused part of these cells with NOOPs, and jumps to the beginning of the first of thes cells. The (new) machine now executes the machine instructions just assemled there and at the end runs into the first cell corrsponding to the next instruction in the old code. This cell will again hold a "load oldIP; jump to interpreter" sequence, etc., until it runs into a cell that has already been filled by this process; then program execution continues without interpreter intervention. This greatly speeds up the interpretation process.
     This scheme is improved / extended in three ways. 1. The "load oldIP; jump to interpreter" sequence is replaced by a BX instruction, "branch while executing next instruction", apparently featured by the new machine. 2. If the cells in the new memory corresponding to the old instruction are not large enough to contain the corresponding new code, the remaining code is spilled to an overflow area, with jumps there and back again. (This is where the factor 8 as a compromise comes in.) 3. The old memory array O is write-protected (a feature of the new machine). A write is caught by the hardware; the interrupt routine sets the cell(s) corresponding to the affected byte(s) to the initial BX instruction (thus redirecting the flow of control to the interpreter for each of the bytes) and then performs the write.
     There are no literature references.

* F. Chow, M. Himelstein, E. Killian, L. Weber, Engineering a RISC compiler system, in Thirtyfirst IEEE Computer Society International Conference, IEEE, Feb. 1986, pp. 132-137.
An optimizing compiler can reduce the register saving/restoring (RSR) traffic during function calls by changing the allocation of variables to registers for leaf functions. ...

* Simon L. Peyton Jones, Parsing distfix operators, Communications of the ACM, vol. 29, Feb. 1986, #2, pp. 118-122.
A distfix operator is an operator which is distributed over its operands; examples are if _ the _ else _ fi and rewrite _ as _ using _ end. It is useful to allow users to declare such operators, especially in functional languages.
     Such distfix operators are introduced in a functional language using two devices. First the keywords of a distfix operator are given different representations depending on their positions: prefix keywords are written with a trailing dot, infix ones with a leading and a trailing dot, and postfix ones with a leading dot; so the user is required to write rewrite. x .as. y .using. z .end. These forms are recognized by the lexical analyzer, and given the token classes PRE_TOKEN, IN_TOKEN, and END_TOKEN. Second, generic rules are written in yacc to parse such structures.

* Axel T. Schreiner, H. George Friedman Jr., Introduction to Compiler Construction with UNIX, Prentice-Hall, Englewood Cliffs, 1985, pp. 194.
This album-sized book shows the construction of a very simple compiler for a subset of C called `sampleC', in eight chapters: language definition, word recognition, language recognition, error recovery, semantic restrictions, memory allocation, code generation, and in-core code generation. The resulting compiler generates code for a simple stack machine; registers are delegated to an exercise (8.2). The tools used are Lex and Yacc; neither of these is introduced in its own right and their usage is explained as far as required by the main story line. The book seems a direct printout of lecture notes and often reads as a workbook for a compiler construction lab. No theory.

* Per Brinch Hansen, Brinch Hansen on Pascal Compilers, Prentice-Hall, 1985, pp. ???.
Joe Snyder <> writes: Another light-on-theory heavy-on-pragmatics here's-how-to-code-it book. The author presents the design, implementation, and complete source code for a compiler and p-code interpreter for Pascal- (Pascal "minus"), a Pascal subset with boolean and integer types (but no characters, reals, subranged or enumerated types), constant and variable definitions and array and record types (but no packed, variant, set, pointer, nameless, renamed, or file types), expressions, assignment statements, nested procedure definitions with value and variable parameters, if statements, while statements, and begin-end blocks (but no function definitions, procedural parameters, goto statements and labels, case statements, repeat statements, for statements, and with statements).
     The compiler and interpreter are written in Pascal* (Pascal "star"), a Pascal subset extended with some Edison-style features for creating software development systems. A Pascal* compiler for the IBM PC is sold by the author, but it's easy to port the book's Pascal- compiler to any convenient Pascal platform.
     This book makes the design and implementation of a compiler look easy. I particularly like the way the author is concerned with quality, reliability, and testing. The compiler and interpreter can easily be used as the basis for a more involved language or compiler project, especially if you're pressed to quickly get something up and running.

* Robin Hunter, Compilers: Their Design and Construction Using Pascal, John Wiley, Chichester, 1985, pp. 273.
A solid, down-to-earth book of limited coverage. Excellent as introductory reading. Has a good chapter on Storage Allocation. Successor to (+ Pascal rewrite of) Hunter, The Design and Construction of Compilers, 1981.

* J.P. Tremblay, P.G. Sorenson, The Theory and Practice of Compiler Writing, McGraw-Hill, Singapore, 1985, pp. 796.
Extensive and detailed. Heavy reading. To be consulted when other sources fail.

* Robert E. Noonan, An algorithm for generating abstract syntax trees, Computer Languages, vol. 10, #3/4, 1985, pp. 225-236.
Gives a list of seven heuristic rules to derive a grammar for abstract syntax trees from a CF grammar. Simple rules remove constant predictable leaves (THEN in an if-statement, etc.); more complicated rules transform, for example, right recursion into lists. The rules have to be applied in order; whether they constitute an algorithm is a matter of opinion. They have been implemented in a system called TREEGEN.

* David R. Hanson, Compact recursive-descent parsing of expressions, SP&E, vol. 15, #12, Dec. 1985, pp. 1205-1212.
Naive parsing of expressions with n levels of precedence by a top-down parser requires n routines. The paper gives a very clear exposition of the technique of telescoping these into one routine and a table. Essentially, the routine contains a while-loop with recursive calls to itself for higher precedences; the while-loop gobbles up all subexpressions with the same precedence.

* Robert L. Bernstein, Producing good code for the case statement, SP&E, vol. 15, #10, Oct. 1985, pp. 1021-1024.
There are two subjects in the paper, not well separated. 1. Tricky code for the case statement for the PL.8 compiler for the IBM 801 RISC machine. 2. A heuristic implementation of cluster separation: enter the gaps into a sorted heap, and separate at the maximum gaps as provided by the sorted heap until the case density is sufficiently high.

* Mahadevan Ganapathi, Charles N. Fischer, Affix grammar driven code generation, ACM TOPLAS, vol. 7, #4, Oct. 1985, pp. 560-599.
Extension of the Graham-Glanville code generator, by steering the parsing using affixes. The description is extensive and thorough, supported by many examples.
     In addition to the usual instruction and addressing mode selection, the code generator also performs constant folding, identification of fancy instructions, subsuming code within addressing modes, suppressing redundant code, instruction buffering and tracking, and register assignment based on register tracking.
     Special attention is paid to loop suppression (by arc removal), syntactic and semantic blocking prevention (by requiring umbrella productions), and shift-reduce conflict resolution (by always shifting and falling back on the umbrella productions when in trouble). There are also facilities for grammar factoring.
     Their loop suppression algorithm may have potential for automatic coercion system design derivation / checking.

* Lawrence A. Harris, Yannich Wets, Resetting displays, SIGPLAN Notices, vol. 20, #8, pp. 73-77. Aug. 1985,
After examining a number of implementations of access to non-local variables, the authors points out that the display of a called routine and that of its calling routine have the bottom $n-1$ elements in common, where n is the level of the called routine. This is utilized to update the (global) display at procedure entry and exit by moving only one pointer.
     This trick requires each procedure to exit properly; jumps out of procedure will have to be led through the exit code. In principle the trick does not work for calling procedures that come in from a different environment (procedure parameters, procedure variables), since in that case the displays of the called and calling procedures may have less in common. But the authors show that by leading the control through the proper entry and exit code(s), it can be made to work. Code for the PRIME 750 is given.

* David Gelernter, Generative communication in Linda, ACM TOPLAS, vol. 7, #1, Jan. 1985, pp. 80-112.
The "generative communication" in the title is communication between parallel and/or distributed processes by generating objects, otherwise known as the Linda tuple model. The model and some of its implementation problems are described.

* P. Branquart, A high-level intermediate code, in Methods and Tools for Compiler Construction, ed. by B. Lorho, Cambridge University Press, Cambridge, 1984, pp. 317-343.
Describes H-code. H-code makes explicit all flow of control and all data allocation (but not register allocation!), and leaves everything else untouched. It looks roughly like Fortran-II. The paper shows how code can be generated from it for a variety of machines. Very dry exposition; hardly any example.

* E. Morel, Data flow analysis and global optimization, in Methods and Tools for Compiler Construction, ed. by B. Lorho, Cambridge University Press, Cambridge, 1984, pp. 289-315.
A fundamental approach to the problem, awkward in style and not extensive, but certainly worthwhile. Maps all global optimization onto a single set of boolean equations, which are then applied in some case studies, for example the suppression of useless checks in Ada programs.
     Global (= interprocedural) optimization information problems are concerned with events in the code; they are classified as follows. "All"-problems ask whether an event occurs on all paths from or to a given point, "one"-problems ask whether an event occurs on at least one such path. "Backward"-problems are those in which only events before the given point are considered, "forward"-problems consider only events on path leaving the given point. This gives four combinations. For example, live variable analysis is a "one-forward"-problem. Each combination corresponds to a specific form of the set of boolean equations.

* Pierre Boullier, Syntax analysis and error recovery, in Methods and Tools for Compiler Construction, ed. by B. Lorho, Cambridge University Press, Cambridge, 1984, pp. 7-44.
[[ preliminary abstract ]] The description of the syntax analysis part is conventional. Unconventional is the treatment of lexical analysis: it is explained in terms of LR-items, which makes LR error recovery techniques applicable. The discussion of error recovery is very clear, simple and effective.

* W.M. Waite, G. Goos, Compiler Construction, Springer-Verlag, New York, NY, 1984,
A theoretical approach to compiler construction. Refreshing in that it gives a completely new view of many subjects. Heavy reading, high information density.

* Thomas P. Murtagh, A less dynamic memory allocation scheme for Algol-like languages, Eleventh ACM Symposium on Principles of Programming Languages, pp. 283-289. 1984,
The call graph is divided up into intervals. Since there is no recursion within an interval, the intervals can be considered as single procedures, even if you don't do inlining. This means that only one activation record needs to be allocated for each of them, and only one display pointer is needed. Both effects are analysed in full, and algorithms given.
     Also, when the call graph shows that a procedure is not recursive at all, its data can be allocated statically.

* Dick Grune, How to produce all sentences from a two-level grammar, Inform. Process. Lett., vol. 19, Nov. 1984, pp. 181-185.
All terminal productions are derived systematically in breadth-first order. The author identifies pitfalls in this process and describes remedies. A parser is used to identify the hyperrules involved in a given sentential form. This parser is a general CF recursive descent parser to which a consistency check for the metanotions has been added; it is not described in detail.

* M.V.S. Ramanath, Marvin Solomon, Jump minimalization in linear time, ACM TOPLAS, vol. 6, #4, Oct. 1984, pp. 527-545.
The authors point out that unconditional jumps do no useful work: they are an artefact of the code sequencing; the problem is to order the straight code segments so that the number of unconditional jumps is minimal.
     A dissection of a graph into k paths is a set of k paths such that each node occurs exactly once on exactly one path. The code segments on one path can be compiled adjacently; going from one path to another requires a jump. The problem of finding a dissection of minimal k is NP-complete in general. Polynomial solutions exist for acyclic graphs and for reducible graphs. This paper shows a linear time solution for structured flow graphs, graphs resulting from programs containing only loops, if-then-else and multilevel exit statements. The algorithm does dynamic programming over a carefuly selelcted set of subgraphs; its explanation and proof covers some ten pages.
     A small drawback of the algorithm is that the entry point of the produced code will not normally be at the beginning, so a jump is required to enter the code.

* J.W. Davidson, C.W. Fraser, Code selection through object code optimization, ACM TOPLAS, vol. 6, #4, Oct. 1984, pp. 505-526.
Sequel to Davidson & Fraser, "The design and application of a retargetable peephole optimizer", ACM Trans. Prog. Lang. Syst., Vol. 2, No. 2, 1980, pp. 191-202. The source code is translated naively into code for what is called here an "intersection machine", a machine that has only the intersection of the machine instructions that can be expected in practice. It corresponds to the register transfers in the cited paper; each register transfer sets a new pseudo-register, of which there are an unbounded number. This code is then processed by a sophisticated peephole optimizer, or rather post-optimizer, which consists of three passes, called Catcher, Combiner and Assigner.
     Catcher eliminates common subexpressions in the register transfers, and attaches last-use info to registers and nearest-use info to results. The algorithm is not described; probably classic.
     Combiner is the real peephole optimizer. In principle it combines the register transfers of two and/or three consecutive instructions into one formula and tries to find a machine instruction with that pattern, as in the cited paper. When it finds one, it replaces the register transfers by the formula (rather than by the machine instruction). In practice it follows precomputed tables. What they contain and how they are precomputed is not revealed; probably some finite state automaton.
     Assigner assigns two things: machine instructions to the register transfers and real registers to the pseudo-registers. It introduces spill code where necessary. The register assignment algorithm is not described.
     There are two examples, one in which 16 register transfers are reduced to 3 machine instructions for the VAX-11, and a larger one in which 52 register transfers yield 17 machine instructions. The instruction quality looks quite good.

* Jack W. Davidson, Christopher W. Fraser, Register allocation and exhaustive peephole optimization, Softw. Pract. Exper., vol. 14, #9, Sept. 1984, pp. 857-865.
In addition to the CSE elimination and peephole optimization on the intermediate code, the authors also do CSE elimination and peephole optimization on the object code, to catch machine-specific CSEs. This necessitates a register allocator that works on optimized object code.
     The CSE catcher performs symbolic execution of the instructions, and tries to match the symbolic expressions obtained this way. If a match if found, the instructions casing the newer value are edited away. This works by virtue of the fact that an infinite number of pseudo-registers is assumed to exist, so all old expressions (in a basic block) are saved. In addition the CSE catcher eliminates redundant loads and dead variables, and dynamically estimates a window for the peephole optimizer. Its CSE algorithm is described in detail.
     The register assigner then finds real registers for those pseudo-registers that are really required; the algorithm is not described in detail. I do not understand why the title has the word "exhaustive" in it.

* A.S. Tanenbaum, E.G. Keizer, H. van Staveren, Does anybody out there want to write HALF of a compiler?, ACM SIGPLAN Notices, vol. 19, #8, pp. 106-108. Aug. 1984,
Short description of ACK, for prospective compiler writers.

* Michael L. Powell, A portable optimizing compiler for Modula-2, in ACM SIGPLAN '84 Symposium on Compiler Construction, ACM SIGPLAN Notices, vol. 19, #6, June 1984, pp. 310-318.
Straightforward compiler, based on "best simple" design: take the best of the simple solutions for each problem. This results in a competitive compiler, but does not always provide a solution: the author reports that a "best simple" approach to inlining is still missing.
     The various design decisions are each explained briefly.

* David C. Robbins, Engineering a high-capacity Pascal compiler for high performance, in ACM SIGPLAN '84 Symposium on Compiler Construction, ACM SIGPLAN Notices, vol. 19, #6, June 1984, pp. 300-309.
Design and implementation of a Pascal compiler for programs of say 1000 modules of 1000 lines each, with minimal recompilation requirements. Organized around a data base of the source code, plus a very general graph manipulation package. Efficiency is achieved by tuning using powerful profiling tools.

* Frederick Chow, John Hennessy, Register allocation by priority-based coloring, in ACM SIGPLAN '84 Symposium on Compiler Construction, ACM SIGPLAN Notices, vol. 19, #6, June 1984, pp. 222-232.
Starting from a global (i.e. procedure-wide) interference graph in which the nodes are live ranges, registers are assigned to the ranges as follows. We assume r registers. Note that the ranges consist of basic blocks.
     First all nodes that are connected to less than r other nodes are set aside; these can be coloured trivially. This leaves the harder to allocate ranges; for the moment they are all still uncoloured. For each of the uncoloured ranges, the gain of putting it in a register is calculated, using a store-and-load cost function for each block in the range, weighted by the loop depth of the range. The range with the highest gain is now coloured. If this results in the existence of a node that is connected to exactly r coloured nodes, that node is now uncolourable, and its range will be split, as follows. Its (or one of its) starting blocks is taken as a new range, and blocks connected to it in the old range are moved to the new range; this process is continued as long as the new range is connected to less than r ranges. Essentially, we now start over again (but many data are still usable).
     Measurements show that register allocation reduces running times by 20-25 percent.

* Christopher W. Fraser, Eugene W. Myers, Alan L. Wendt, Analysing and compressing assembly code, in ACM SIGPLAN '84 Symposium on Compiler Construction, ACM SIGPLAN Notices, vol. 19, #6, June 1984, pp. 117-121.
The space-saving optimizations `procedural abstraction' (in which stretches of code that occur more than once are replaced by subroutine calls) and cross-jumping (in which the merge point of two code sequences is moved backwards if the two code sequences end in the same instruction(s)), are generalized into a general assembly code compression algorithm. The actual algorithm used is a variant of McCreight's linear suffix tree construction algorithm.
     Since many conditions have to be fulfilled before a duplicate piece of code can be turned into a subroutine (may not permanently alter the stack level, may not contain non-local labels or goto's, etc.), a priority queue of segments is constructed, from which segments are either accepted or divided up and reentered.
     Compression ratios range from 0-30 percent; compressed code took 1-5 percent more CPU time but 11 percent less real time. An additional option is to set the compressor for turning every non-jump instruction into a subroutine, thus effectively generating threaded code.

* Jack W. Davidson, Christopher W. Fraser, Automatic generation of peephole optimizations, in ACM SIGPLAN '84 Symposium on Compiler Construction, ACM SIGPLAN Notices, vol. 19, #6, June 1984, pp. 111-116.
Preliminary version of Davidson & Fraser, "Automatic inference and fast interpretation of peephole optimization rules", SPE, Vol. 17, No. 11, 1987, pp. 801-812.
     Patterns for a peephole optimizer are, at least in principle, generated automatically, as follows. For each machine instruction, a semantic description in the form of a few simple formulas is given, describing its actions on registers, condition code and memory. For each possible combination of two and three of instructions, the effect is calculated using symbolic formula manipulation, and the result is compared to those of the existing instructions: an equivalence signals a peephole optimization pattern. (It seems to me that this will not (easily) derive patterns like: [reg := reg * 4] → [reg := reg << 2].)
     Since this compiler generation time process is very slow and generates many patterns that will never be used, an approach is described in which interesting code sequences are derived from code actually generated, and symbolic manipulation is used to derive patterns for these. The learning curve levelled off after about 17000 proposed pattern with 627 useful patterns for the VAX.
     In an appendix, the authors provide an example in which 4 instructions are reduced to one. It is remarkable that the final code generated is the same as would be generated by Aho, Ganapathi, Tjiang, "Code generation using tree pattern matching and dynamic programming", TOPLAS, 11, 4, Oct. 1989, 491-516, and that the code is derived in essentially the same way, with tree pattern matching taking the place of symbolic manipulation.
     The paper probably derives from the first author's PhD thesis (Univ. of Arizona, Dec. 1981).

* Robert R. Kessler, Peep -- an architectural description driven peephole optimizer, in ACM SIGPLAN '84 Symposium on Compiler Construction, ACM SIGPLAN Notices, vol. 19, #6, June 1984, pp. 106-110.
Each machine instruction is described by an equation, a space cost and a time cost. The basic algorithm is as follows. At compiler generation time, for each pair of instructions an attempt is made to combine them: the equation of the first is substituted in the equation of the second, if possible, and the data base is searched for an instruction that has a matching equation. If substitution is possible, if a matching instruction can be found, and if its space/time cost is less than that of the combination, it is registered in a table as the peephole replacement of the original pair. This table is used at compile time to replace instructions quickly.
     Several other optimizations are discussed. 1. If substitution is not possible, there still may be an instruction that combines the effects of both, for example a move-double-word instruction. 2. At compile time, Peep combines instruction pairs even if there are non-interfering instructions between them, using the notion of "logical adjacency". 3. At compile time, Peep does a modest global scan to combine instruction pairs between basic blocks.

* Thomas Johnsson, Efficient compilation of lazy evaluation, in ACM SIGPLAN '84 Symposium on Compiler Construction, ACM SIGPLAN Notices, vol. 19, #6, June 1984, pp. 58-69.
Detailed description of the G machine, which is both an intermediate code to be translated to machine instructions (examples for the VAX given) and to be implemented in hardware. It features a number of short-cuts to avoid many intermediate graph rewritings.

* Joseph A. Fisher, John R. Ellis, John C. Ruttenberg, Alexandru Nicolau, Parallel processing: a smart compiler and a dumb machine, in ACM SIGPLAN '84 Symposium on Compiler Construction, ACM SIGPLAN Notices, vol. 19, #6, June 1984, pp. 37-47.
Preliminary partial text of Ellis's Bulldog book. 1. Combine basic blocks into much longer traces. 2. Find parallelism in these traces. 3. Distribute over processors, CDC-like.

* Thomas W. Christopher, Philip J. Hatcher, Ronald C. Kukuk, Using dynamic programming to generate optimized code in a Graham-Glanville style code generator, in ACM SIGPLAN '84 Symposium on Compiler Construction, ACM SIGPLAN Notices, vol. 19, #6, pp. 25-36. June 1984,
To better handle ambiguity, looping, syntactic blocking and semantic blocking, the LR parsing of Graham-Glanville is replaced by Earley parsing. The best code sequence is selected from among the multiple parsings by dynamic programming, similar to Lyon's minimum cost error recovery. Two implementations, one by Kukuk and one by Hatcher, are described.
     In tests, the system achieved slightly better code than "compiling on the stack", at the expense of a loss of a factor 15 to 20 in compilation speed; after all, Earley is $O ( n sup 3 )$, especially for very ambiguous grammars.

* Philippe Aigrain, Susan L. Graham, Robert R. Henry, Marshall Kirk McKusick, Eduardo Pelegrí-LLopart, Experience with a Graham-Glanville style code generator, in ACM SIGPLAN '84 Symposium on Compiler Construction, ACM SIGPLAN Notices, vol. 19, #6, June 1984, pp. 13-36.
About the incorporation of a Graham-Glanville style code generator in the UNIX Portable C Compiler pcc. The authors have chosen to keep strictly to the syntactic approach: all semantic conditions have been incorporated in the syntax by enumeration over finite set attributes. The keep the machine description small, a preprocessor is used to do the enumeration. Many techniques and tricks are described to keep the resulting tables small in the face of a considerable actual machine description syntax.
     All this results for the VAX-11 in a machine description of 246 grammar rules; the speed of the generated code is the same as that generated by the original pcc for the VAX-11.

* Michael Karr, Code generation by coagulation, in ACM SIGPLAN '84 Symposium on Compiler Construction, ACM SIGPLAN Notices, vol. 19, #6, June 1984, pp. 1-12.
A flow graph of preferably the entire program is assumed, with frequency data about each arc; the nodes are assumed to be of a size for which optimal code can be generated. The basic loop of the code generator is now to find the most frequently used uncompiled arc, compile its entry node if still uncompiled, compile its exit node if still uncompiled, and compile the arc; this coagulates the code starting from centers of importance. "Compile" in the last three actions must be understood as "compile optimally under the conditions imposed by the compiled neighbours". Optimality is defined under an explicit cost metric.
     The bulk of the paper considers techniques for this type of compilation. No experimental results are given since the technique has not been implemented, regrettably.

* Timothy A. Budd, An APL compiler for a vector processor, ACM TOPLAS, vol. 6, #3, July 1984, pp. 297-313.
Fairly straightforward compiler, which matches APL vector operations to machine vector operations. It cuts up the program vectors if they do not fit in the machine vectors.
     Like Budd's sequential APL compiler (for UNIX), it features "dragthrough". This technique combines several APL operators into one operator. For example, R←A+B*C is converted to something like R = do_plus_and_times(A,B,C). The function do_plus_and_times then runs down the three arrays, doing the computations on the elements. This saves on intermediate results and on data motion. The effect can be increased by doing inline substitution, to increase the number of operators in a line.

* A.S. Tanenbaum, H. van Staveren, E.G. Keizer, J.W. Stevenson, Description of a Machine Architecture for use with Block Structured Languages, IR-81, Aug. 1983, Vrije Universiteit, Amsterdam,
The EM Manual.

* Ravi Sethi, Control flow aspects of semantics-directed compiling, ACM TOPLAS, vol. 5, #4, Oct. 1983, pp. 553-595.
A very civilized version of denotational semantics is integrated with yacc to produce control flow graphs for C programs in a very clear way. More in particular, the grammar and the denotational semantics are combined in a specification in a language called Plumb (since it considers functions to be pipes); the specification is then translated into a yacc program.
     The system can handle many forms of flow control, including "gotos that can go anywhere".

* Andrew S. Tanenbaum, Hans van Staveren, E.G. Keizer, Johan W. Stevenson, A practical toolkit for making portable compilers, Commun. ACM, vol. 26, #9, pp. 654-660. Sept. 1983,
The UNCOL idea is restricted to algebraic languages (languages of the Algol / Pascal lineage) and to 8-bit byte-addressable machines. The resulting UNCOL is called EM (Encoding Machine), an assembly code for a simple stack (pseudo-)machine. Actually a family of EM-machines is defined, for various word lengths and alignment requirements.
     The Amsterdam Compiler Kit consists of a set of programs to generate programs that create, manipulate and process EM code, based on machine descriptions. The generated programs can be chained in pipelines in various ways, to obtain interpreters, compilers, optimizing compilers, etc. The compilers include assemblers and even a linker.

* David R. Hanson, Simple code optimizations, SP&E, vol. 13, #8, Aug. 1983, pp. 745-763.
An optimization is defined as "simple" if it 1. requires no major changes to the compiler; 2. causes no more than 10 percent increase in run time and memory requirements of the compiler; 3. can be performed locally ( next-use info required).
     The author reports three simple optimizations applied in the Y compiler (Y is a kind of C with a kind of Ratfor syntax).
     Expression rearrangement applies a set of transformations to the DAG that results from an expression. These patterns do constant folding, simple simplifications (e.g. $ ( + ~ x ~ 0 ) → x $), rewrites to canonical form (e.g. $ ( + ~ C ~ x ) → ( + ~ x ~ C ) $), and addressing patterns (e.g. $ ( @+ ~ (&x) ~ 0 ) → (&x) $), where $@+$ is address addition (written $++$ in the paper!).
     Lazy instruction selection. The DAG is processed bottom-up (and probably fromm the end on). If the code to be generated is of a form that could possibly be an operand of another instruction (e.g. $ R sub 2 + 1 $), no code is generated and the node is replaced by a summary of its value. When finally a real machine instruction must be generated, its operands may be of acceptable form for that instruction, in which case they are incorporated in the instruction, or they may not, in which case adapting code is generated.
     Resource caching. Register tracking, common subexpression elimination etc., are combined in a general resource cache. This improves code considerably, but causes problems with some language constructs which invalidate the cache (e.g. if-statements). Ad-hoc solutions better than discarding the entire cache contents are given.

* D.W. Jones, Assembly language as object code, SP&E, vol. 13, #8, Aug. 1983, pp. 715-725.
[ about an assembly language that is so simple you can do linkage on it ]

* Andrew S. Tanenbaum, Hans van Staveren, E.G. Keizer, A Unix toolkit for making portable compilers, USENIX Toronto 1983 Summer Conference, Toronto, Canada, July 1983, pp. 255-261.
Views the Amsterdam Compiler Kit (ACK) as taking the UNCOL idea and "work out the details and make a practical implementation that runs on UNIX". Describes the phases of ACK in short paragraphs. No literature references.

* Norman H. Cohen, Eliminating redundant recursive calls, ACM Trans. Programming Languages and Systems, vol. 5, #3, July 1983, pp. 265-299.
Many recursive program schemata consist of two alternatives: a recursive call with transformed arguments and an escape hatch, which terminates the recursion. A family of DAGs is constructed for such schemata; such DAGs are often periodic and the periodicity can be exploited to eliminate recursive calls. Several transformations are given and treated extensively to obtain and exploit this periodicity.

* Steven P. Reiss, Generation of compiler symbol processing mechanisms from specifications, ACM TOPLAS, vol. 5, #2, April 1983, pp. 127-163.
A set of library routines is described to answer the question "In a given SCOPE, to what OBJECT does a given NAME refer?" concerning a variety of programming languages. The answer is

      Visibility(SCOPE, NAME),
in which LOOK_UP() and RESTRICT() yield object sets. I did not get the role of the outermost RESOLVE() upon cursory reading.
     Many more manipulation routines exist. A special language is shown to generate complicated calls to these routines. Also, a higher-lvel language is described for formulating the scope and identification rules. The system, which is part of the ACORN project,is powerful enough to create symbol processors for languages like Modula-2, Ada, Algol 68, the Red language, etc.
     A formal description of the operations provided, and a complete symbol processing specification for Ada (7 pages) conclude the paper.

* J.R. Allen, Ken Kennedy, Carrie Porterfield, Joe Warren, Conversion of control dependence to data dependence, in Tenth ACM Symposium on Principles of Programming Languages, ACM, New York, Jan. 1983, pp. 177-189.
About data flow in vector machines

* Stefan M. Freudenberg, Jacob T. Schwartz, Micha Sharir, Experience with the SETL optimizer, ACM TOPLAS, vol. 5, #1, Jan. 1983, pp. 26-45.
The annotations described in Dewar et al., "Programming by refinement, as exemplified by the SETL representation sublanguage", ACM Trans. Program. Lang. Syst., Vol. 1, No. 1, July 1979, pp. 27-49, and the automatic technique described in Schonberg et al., "An automatic technique for selection of data representations in SETL programs", ACM Trans. Program. Lang. Syst., Vol. 3, No. 2, Apr. 1981, pp. 126-143, are combined into one optimizer.
     Many of its techniques and optimizations are described. The great gains lie in choosing simpler data representations and better operator identification; the first gains obvious efficiency and both avoid a lot of run-time tests. They result in a speed-up of a factor of 2 to 8, although for small inputs other effects dominate, including input/output. At present the optimizer is too slow to be part of the standard compilation pipeline.

* Patricia Anklam, David Cutler, Roger Heinen Jr., M. Donald MacLaren, Engineering a compiler; VAX-11 code generation and optimization, Digital Press, Bedford, Mass., 1982, pp. 269.
Bootstrapping a PL/I compiler running under MULTICS, to a VAX-11. The back-end was then isolated to serve other languages. A partly narrative account of how to redo a code-generator, including optimization. No bibliography.

* J. Lewi, K. de Vlaminck, J. Steegmans, A programming methodology in compiler construction, Part 1/2, North-Holland, Amsterdam, 1982/1979, pp. 569/308.
Extensive and in-depth analysis of the Language Implementation Laboratory LILA, both a software-writing tool to construct compilers and a pedagogical aid in a course on compiler construction. Part 1 explains the principles of the compiler generator, an ELL(1) attributed push-down transducer; part 2 describes the implementation and gives examples of its use.

* Uwe Kastens, Brigitte Hutt, Erich Zimmermann, GAG: A Practical Compiler Generator, Lecture Notes in Computer Science #141, Springer Verlag, Heidelberg, 1982, pp. 156.
GAG (Generator based on Attribute Grammars, these people don't speak English or they would have chosen a different acronym) is based on ordered attribute LALR(1) grammars. It reads a description in Aladin, and turns out a processor which parses the input, constructs an attributed tree and performs a number of visit sequences to compute the attribute vales. Two to four visit sequences were required for common programming languages, for example Pascal (4), Ada (3) and Lisp (2). The processor incorporates various optimizations. The system is written in Pascal and generates Pascal.
     About 10% of the book is introduction, 40% is manual plus advice and 50% is the Aladin code for a Pascal compiler.

* M. Ganapathi, C.N. Fischer, J.L. Hennessy, Retargetable compiler code generation, Computing Surveys, vol. 14, #4, Dec. 1982, pp. 573-592.
More properly called "The history of retargetable compiler code generation". Much historical detail, little technical information.

* John L. Hennessy, Noah Mendelsohn, Compilation of the Pascal case statement, SP&E, vol. 12, #9, Sept. 1982, pp. 879-882.
Identifies clusters in the range of the case values and generates a binary decision tree over the clusters; the clusters are then implemented as jump tables. The clusters are identified by repeatedly finding a split point between them, where a split point is defined as a point where the two parts plus the overhead are cheaper than the entire range.

* R. Paige, S. Koenig, Finite differencing of computable expressions, ACM TOPLAS, vol. 4, #3, July 1982, pp. 402-452.
The theory of generalized operator strength reduction, applied to many algorithms, in an almost book-size paper. Contains tables of transformations.

* Andres Rudmik, Barbara G. Moore, An efficient separate compilation strategy for very large programs, in ACM SIGPLAN '82 Symposium on Compiler Construction, ACM SIGPLAN Notices, vol. 17, #6, June 1982, pp. 301-307.
[ software-engineering aspects of separate compilation of Chill ]

* Ravi Sethi, Control flow aspects of semantics-directed compiling, in ACM SIGPLAN '82 Symposium on Compiler Construction, ACM SIGPLAN Notices, vol. 17, #6, June 1982, pp. 245-260.
Preliminary version of Sethi, "Control flow aspects of semantics-directed compiling", ACM TOPLAS, Vol. 5, No. 4, Oct. 1983, pp. 553-595.

* Thomas M. Morgan, Laurence A. Rowe, Analyzing exotic instructions for a retargetable code generator, in ACM SIGPLAN '82 Symposium on Compiler Construction, ACM SIGPLAN Notices, vol. 17, #6, June 1982, pp. 197-204.
The system EXTRA is a program transformation system that can assist in proving that certain language features (for example string copy) can be translated into certain exotic machine instructions (for example block move), and if they cannot, how to modify either to obtain a fit.

* David W. Krumme, A practical method for code generation based on exhaustive search, in ACM SIGPLAN '82 Symposium on Compiler Construction, ACM SIGPLAN Notices, vol. 17, #6, June 1982, pp. 185-196.
Expression trees are translated using exhaustive search. This is feasable since there are usually not many ways to translate a given form. The system does evaluation order, instruction selection and register allocation in an integrated way. Its speed is moderate, the code it generates is good, but not excellent since it does not exploit associativity and common subexpressions.

* Harald Ganzinger, Robert Giegerich, Ulrich Möncke, Reinhard Wilhelm, A truly generative semantics-directed compiler generator, in ACM SIGPLAN '82 Symposium on Compiler Construction, ACM SIGPLAN Notices, vol. 17, #6, June 1982, pp. 172-184.
Details of the MUG2 system.

* Rodney Farrow, Linguist-86 -- Yet another translator writing system based on attribute grammars, in ACM SIGPLAN '82 Symposium on Compiler Construction, ACM SIGPLAN Notices, vol. 17, #6, June 1982, pp. 160-171.
Has its own built-in paging system (sign of the times), in step with the evaluation order (alternating passes). Some considerable effort is spent on reducing the amount of data copied to the paging disk. As a further optimization, all static attributes with the same name are mapped onto the same global variable, to eliminate attribute copying, which was diagnosed to form 40 to 60 percent of all attribute activity. Several other features are introduced to create a workable attribute-oriented compiler generator. Has no built-in programming language; anything not recognized is code.
     Is written in itself, in the best of compiler construction traditions.

* Kai Koskimies, Kari-Jouko Räihä, Matti Sarjakoski, Compiler construction using attribute grammars, in ACM SIGPLAN '82 Symposium on Compiler Construction, ACM SIGPLAN Notices, vol. 17, #6, June 1982, pp. 153-159.
Since the symbol table is more or less a global variable, it does not fit well in the attribute framework. The idea is to give each non-terminal an inherited attribute $ env sub i $ which describes the environment prior to the non-terminal and a synthesized attribute $ env sub s $ which describes the environment post the non-terminal, so to speak. Examples are given and a systematic approach is suggested. Memory cost is also addressed; the preliminary verdict is: not good but not bad either.

* G.J. Chaitin, Register allocation and spilling via graph coloring, in ACM SIGPLAN '82 Symposium on Compiler Construction, ACM SIGPLAN Notices, vol. 17, #6, June 1982, pp. 98-105.
1. The inference graph is implemented both as a bit matrix and as a set of adjacency lists. 2. In deciding which node to spill when stuck, frequency estimates are used. 3. The time required to rebuild the inference graph after node spilling is found to be no problem; the time is dominated by the initial building of the graph, all following graphs are a lot smaller.
     Explicit code in SETL is given.

* Rudolf Landwehr, Hans-Stephan Jansohn, Gerhard Goos, Experience with an automatic code generator generator, in ACM SIGPLAN '82 Symposium on Compiler Construction, ACM SIGPLAN Notices, vol. 17, #6, June 1982, pp. 56-66.
Graham-Glanville for the Siemens 7000, compiling Pascal.

* Susan L. Graham, Robert R. Henry, Robert A. Schulman, An experiment in table-driven code generation, in ACM SIGPLAN '82 Symposium on Compiler Construction, ACM SIGPLAN Notices, vol. 17, #6, June 1982, pp. 32-43.
Quite explicit description of the details of a Graham-Glanville code generator for the VAX-11, in four phases each subdivided into three subphases. For example, in such a subphase, the branches of the expressions are ordered in advance using Sethi-Ullman numbering, and trees that require more registers than available are isolated and spill code is introduced. Several improvements are suggested.

* The-Thanh Nguyen, E. Walter Raschner, Indirect threaded code used to emulate a virtual machine, ACM SIGPLAN Notices, vol. 17, #5, May 1982, pp. 80-89.
A general-purpose virtual machine is implemented using indirect threaded code. The storage efficiency of this machine on a sample program (105 words) is compared with the same machine implemented with threaded code (171 words) and with the UCSD-p machine for Pascal (60 words). The UCSD-p implementation wins, since it is special-purpose.

* Butler W. Lampson, Fast procedure calls, in Symposium on Architectural Support for Programming Languages and Operating Systems, ACM SIGPLAN Notices, vol. 17, #4, pp. 66-76. April 1982,
In order to unify procedure call and return, coroutine transfers, exceptions, process switches, parameter access, etc., in a portable way, a data type context is introduced, with one operation on values of that type: XFER. A context describes anything that control can be transferred to; examples are activation records and routine preludes and postludes. It is up to the context to decide where control is transferred when it is finished; it may transfer to caller, or do something else. The operation XFER() has three parameters, the destination context, a return context, and a parameter list. (The paper gives XFER only one parameter, the destination context, and passes the other parameters through global variables, no doubt in imitation of the Mesa implementation.) Contexts are allocated in the heap, unless a better scheme is possible.
     The ADT context allows implementation of all desired transfers. For example, a call to a procedure P is implemented as a transfer to a context creation, which prepares the activation record for the new incarnation of P, obtains the pointer to the code of P from the parameter list, and XFERs to P; note that creation does not modify the return context.
     The paper gives a simple implementation, based on the usual stack, program counter, frame pointer, and return address. A very space-efficient implementation tuned to the needs of Mesa programs is given next. Mesa interfaces are just records holding pointers to available entities, and can be passed as parameters.
     Ideas are given for speeding up the XFER operation by using the "Jump To Subroutine" machine instruction whenever feasible, and for speeding up access to local variables. No measurements are supplied.

* R.C. Holt, J.R. Cordy, D.B. Wortman, An introduction to S/SL: Syntax/Semantic Language, ACM TOPLAS, vol. 4, #2, April 1982, pp. 149-178.
The syntax part of S/SL is a top-down parser generator; the actions in the parser are restricted to parameterless calls to operations on objects defined in the semantic part. This provides a clear separation of syntax and semantics. The objects are programmed in Pascal.
     The communication between the parser and the objects and between the objects themselves is remarkably meager. Objects cannot communicate with each other. Grammar rules can provide information to objects, but cannot normally get answers in return (but see below). Grammar rules can optionally leave one result on a global result stack; a value on this result stack can be used by a special choice action in the syntax part which bases the choice between a number of alternatives on the value obtained from the return stack rather than on an input symbol. This same choice action can also base its choice on a value obtained from an object, the only way an object can return a value.
     Code generation is done, either on the fly or at the end, by interrogating the objects and using the answers as directives on what code to generate.
     The authors "prove" that S/SL is equivalent to a Turing machine. They also make it plausible that the top-down mechanism + choice action is strong enough to recognize the deterministic languages, and conclude from this that S/SL is equivalent to LR(k). That exploiting this equivalence may entail an amount of work exponential in k is not mentioned (the authors also imply that yacc is LR(k)).

* Andrew S. Tanenbaum, Hans van Staveren, Johan W. Stevenson, Using peephole optimization on intermediate code, ACM TOPLAS, vol. 4, #1, Jan. 1982, pp. 21-36.
Makes the point that since the front end is language-dependent and the back end is machine-dependent, the proper place to do peephole optimization is on the intermediate code, since it is fixed: any improvement on it is pure profit.
     The intermediate code in question is EM; it is described briefly. The 123 peephole optimization patterns used are shown and annotated. The pattern matching algorithm is not described, but the text suggests that the patterns are organized in a tree, which is searched from the top for current and following instructions.
     The optimizer processes about 1000 EM instructions per second on a PDP-11/70. The result is a 16 percent reduction in EM instructions and a 14 percent reduction in subsequent PDP-11 instructions.
     There is a letter by Steven Pemberton, ACM TOPLAS, Vol 5, No. 3, p. 499, pointing out that the proposed optimizations that reorder the operands of arithmetic operations are unsafe as to integer and floating point overflow. The answer by the first author supplies corrected patterns.

* R. Giegerich, Automatic generation of machine specific code optimizers, in Ninth ACM Symposium on Principles of Programming Languages, Jan. 1982, Albuquerque, New Mexico, pp. 75-81.
Author's abstract: While compiler generating systems are gradually moving towards production quality, the compiler subtask of machine code optimization has so far withstood automation (except for [Fra79], see below). The necessity for such a phase comes from the fact that code generators produce code in a piecemeal fashion, in order to limit the otherwise combinatorially growing amount of case analysis to be done. The emitted code sequences may be optimal when considered by themselves, but allow (if not require) optimization when placed in the context of others ([PQC80], [Fra79]). This effect is especially severe where, for reasons of retargetability, code for various processors is generated from a common low-level intermediate form by some kind of macro expansion.

* Robin Hunter, The Design and Construction of Compilers, John Wiley, Chichester, 1981, pp. 272.
Likeable introductory book in Algol 68 notation, with lots of sound advice. Old-fashioned in some sense, but still quite worth reading.
     Predecessor to Hunter, Compilers: Their Design and Construction Using Pascal, 1985.

* R.E. Berry, Program Language Translation, Ellis Horwood, Chichester, 1981, pp. 175.
As it says on the inside cover: "It is an overview, rather than an exhaustive study, and will satisfy the curiosity of the rapidly increasing number of computer users about the compilers or assemblers which process the programs." The book is somewhat reminiscent of Hopgood [1971], and is based on the Pascal S compiler. No literature references. Not a serious book (unlike Hopgood!)

* Steven S. Muchnick, Neil D. Jones, Program Flow Analysis, Prentice-Hall, Englewood Cliffs, N.J., 1981, pp. 418.
Ten chapters/papers by different authors, arranged in four parts: introduction, applications to program optimization, applications to software engineering, and theory. The bibliography is impressive.

* F.E. Allen, J. Cocke, K. Kennedy, Reduction of operator strength, in Program Flow Analysis, ed. by Steven S. Muchnick and Neil D. Jones, Prentice-Hall, Englewood Cliffs, N.J., 1981, pp. 79-101.
The text starts by treating simple operator strength reduction. A variable i is an induction variable if its value is only changed in two kinds of assignments: $ i := j ± k $ and $ i := ± j $ in which j and k are either region constants or induction variables. The region is then searched for other occurrences of i and if such an occurrence is sufficiently simple, strength reduction can be applied. Ten such cases are identified and replacements are given. Next the technique is generalized to any kind of expression, but limitations of that generalization are shown.
     Returning to the simple case and using use-definition chains, an algorithm is then derived and shown in full that does the job.

* Ken Kennedy, A survey of data flow analysis techniques, in Program Flow Analysis, ed. by Steven S. Muchnick and Neil D. Jones, Prentice-Hall, Englewood Cliffs, N.J., 1981, pp. 5-54.
Global data flow analysis gives rise to sets of forward-flow and backward-flow equations; constructing these equations from the code is fairly straightforward (covered in 13 pages), but solving them is not. Nine algorithms for solving data flow equations are explained, examined and compared:
     1. Iterative: the equations are scanned and values updated until nothing changes any more; works always but is $ O ( n sup 2 ) $.
     2. Utilizing nested strongly connected regions: works only for graphs that derive from simple loop structures.
     3. Interval analysis: an interval is a cycle-free natural loop. The intervals are analysed bottom-up, from smaller intervals to larger. Again $ O ( n sup 2 ) $, but also finds structure.
     4. T1-T2 analysis: transformation T1 collapses a loop into a single node, and T2 contracts node $ N sub 1 $ and $ N sub 2 $ if $ N sub 1 $ is the only parent of $ N sub 2 $. Repeated application of T1 and T2 reduces a reducible graph to a single node. Meanwhile a tree is constructed, which allows the data flow equations to be solved in time $ O ( n ~ log ~ n ) $.
     5. Node listings: A rooted graph defines a finite set of non-circular paths, which can be represented as lists of nodes. A node listing of a graph is a listing of all the nodes (duplicates allowed) which has all non-circular paths as subsequences. It can be proved (although the paper does not do so) that flow equations can be solved completely by solving then partially in the order of a node listing, much like Warshall's algorithm. Node listings of size $ O ( n ~ log ~ n ) $ can be constructed in time $ O ( n ~ log ~ n ) $.
     6. Path compression: A variant of T1-T2 analysis, using three transformations. It is faster than the other methods and almost always linear in the number of nodes.
     7. Balanced path compression: Tarjan's own very fast and very complicated version of path compression, based on a special kind of balanced trees.
     8. Graph grammars: A hopefully general enough graph grammar is decreed, and the graph is parsed with this grammar. If the parsing succeeds, the method works and the flow equations can be solved in linear time by following the parsing; otherwise the method just fails. The advantage is the linear time requirement, the disadvantage is finding proper graph grammars. Also the state of the art in graph parsing was a problem in 1981.
     9. High-level data flow analysis: data flow analysis on parse trees. Using attributes as in an attribute grammars.
Algorithms and examples for most of these are given. Additionally, algorithms for use-def chains and constant propagation are given.
     The next section is concerned with symbolic interpretation. It starts off with having full symbolic expressions, but soon restricts itself to Wegbreit's well-founded property sets. (See Wegbreit, "Property extraction in well-founded property sets", IEEE Trans. Software Eng., Vol. SE-1, No. 3, 1975, pp. 270-285.) This simplified the treatment of path joins considerably.
     The chapter closes with optimization in SETL and global type inference.

* George Logothetis, Prateek Mishra, Compiling short-circuit Boolean expressions in one pass, SP&E, vol. 11, #11, Nov. 1981, pp. 1197-1214.
A Boolean value is represented in the compiler as a record with state indicating among other things if it is encoded (in a variable) or decoded (in the flow of control). Five different code generating routines are given to be called in different contexts; each generates the optimal code for that state of the Boolean in that context and updates the state of the Boolean.

* P. Klint, Interpretation techniques, SP&E, vol. 11, #9, Sept. 1981, pp. 963-973.
Discussion and comparison of three interpreter construction techniques: classical iterative (the program is a list of pseudo-instructions with a routine for each instruction), direct threaded code (the program is a list of addresses of routines), and indirect threaded code (the program is a list of addresses of lists of addresses of routines and data). The latter technique allows variables to be considered as objects: the list of addresses of routines and data is an object. A variable A is implemented as a (generated) list $@push$, $@store$, $value$, in which $@push$ and $@store$ are addresses of fixed routines. To store, for example, the top of the stack in A, the program contains the address A+1.
     The comparison was performed on a PDP11 and a Cyber-73. Direct threaded code was found to be 10 to 30 percent faster than the others. It is important to keep the program counter in a register. The classical iterative method may result in the smallest interpreter. No word about recursive interpreters.

* Arthur Sale, The implementation of case statements in Pascal, SP&E, vol. 11, #9, Sept. 1981, pp. 929-942.
Implementations by jump table, binary search tree and linear table search are compared in closed form and experimentally on a Burroughs B7700. Conclusion: never use linear table search, and use the smallest of the others.

* David Alex Lamb, Construction of a peephole optimizer, Softw. Pract. Exper., vol. 11, 1981, pp. 638-647.
Straightforward programmable peephole optimizer, used to generate VAX-11 code from intermediate code from "Preliminary Ada", to pep up a prototype compiler, "Charrette" at CMU. Properties:
     1. Both the pattern and the replacement can contain an arbitrary mix of intermediate and target code.
     2. The replacement can contain, in addition to the usual arguments from the pattern, applications of built-in functions on such arguments, both as replacements and as conditions on the applicability of the pattern; examples are IsOdd(&1) which returns true if argument 1 is odd, and $dollar$negate(&2) which will substitute the negation of argument 2.
     3. Pattern matching proceeds backwards, starting at the end, to allow matches over matches in a natural way (can it replace a pattern by a longer pattern?).
     4. The peephole optimizer generator is written in SNOBOL4. It says in the text that it "compiles the patterns into subroutines and data declarations in the implementation language". Does it indeed generate a peephole optimizer in VAX-11 assembly code?
The system achieves a code size reduction of about 40 to 50 percent. It took about 20 man-days to write.

* William A. Wulf, Compilers and computer architecture, IEEE Comput., vol. 14, #7, July 1981, pp. 41-47.
Desiderata for a machine instruction from a compiler writer point of view. The general gist is: 1. Provide primitives, not solution. 2. Keep it simple (and orthogonal). Lots of examples are given, pointedly naming no machine in particular.

* Tomasz Kowaltowski, Parameter passing mechanisms and run-time data structures, S-P&E, vol. 11, #7, pp. 757-765. 1981,
General survey of the implementation of parameter passing mechanisms in a language with statically nested scopes using displays. Explains the optimization of having each procedure being responsible "for saving and restoring the previous contents of the position of the display which it needs to point to its own activation record". Next explains the old Elliott 503 trick of letting all formal and local objects have absolute addresses and requiring the procedures to save and restore the previous contents on a stack. This does not need a display at all, but cannot handle procedures (or labels) passed as parameters.

* Arthur B. Pyster, Compiler Design and Construction, Van Nostrand Reinhold, New York, 1981, pp. 357.
Compiler course based on a Pascal compiler for the IBM/370. Many explicit examples. Good bibliography.

* A.J.T. Davie, R. Morrison, Recursive descent compiling, Ellis Horwood Ltd., Chichester, 1981, pp. 195.
Well-balanced description of the design considerations that go into a recursive descent compiler; uses the St. Andrews University S-algol compiler as a running example.

* Gregory J. Chaitin, Marc A. Auslander, Ashkok K. Chandra, John Cocke, Martin E. Hopkins, Peter W. Markstein, Register allocation via coloring, Computer Languages, vol. 6, #1, 1981, pp. 45-57.
Very readable and enthusiastic account of how the experimental code generator of the PL/I compiler for the IBM/370 works. Although register allocation via colouring is the main ingredient, many additional tricks are explained. The compiler compiles to intermediate code, in which almost everything is kept in a pseudo-register, of which there infinitely many. Then aggressive register allocation maps then to real registers.
     The following techniques and tricks are employed. 1. In constructing the interference graph, rather than adding $ k * ( k + 1 ) / 2 $ arcs at any place where k pseudo-registers interfere, arcs are added at the definition point of each pseudo-register to the k pseudo-registers that are live at that point. 2. In constructing the interference graph, any assignment of the form y:=x which is the end of the live range of x and the start of that of y, is optimized away by coalescing the nodes for x and y ("subsumption"). 3. The compiler employs two graph colouring algorithms. Both start with a known number of registers (the number is even a fixed constant, 17, deriving from the architecture of the IBM/370). The first algorithm repeatedly removes all nodes with less than 17 connections from the graph, until the empty graph remains, and then puts them back in inverse order. If this fails and if the user has requested a very high level of optimization, the second algorithm is brought in. The second algorithm sorts the nodes by "urgency", the urgency of a node N being defined at any moment as the number of uncoloured neighbours divided by the number of available colours, and colours them in that order. If both fail, spilling is in order. 4. For fast access during all this, the graph is implemented both as half a bit matrix and as a list of nodes describing all the pseudo-registers, with for each pseudo-register a list of its adjacent registers. 5. Register coalescing is applied repeatedly, to see if more pseudo-registers can be combined. 6. Several tricks are used to incorporate machine idiosyncrasies. 7. If we run out of registers, three approaches are tried. First, we try to recompute results just before they are used; if this is possible, the live range can be cut in two chunks that together are smaller than the original. Second, we try to do the same thing through rematerialization: constructing a value from one or more of the live values at a given point. Third, if nothing helps, we spill a register. Register spilling code is entered in the program graph, and the entire algorithm is run again, including dead-code elimination, which will eliminate possible dead register-spilling code.
     A discussion of the "ultimate" notion of interference and a proof that any graph can arise in the above construction end the paper.

* A. Mycroft, The theory and practice of transforming call-by-need into call-by-value, in International Symposium on Programming, Lecture Notes in Compter Science #83, Springer-Verlag, Berlin, pp. 269-281. 1980,
More theory than practice about transforming lazy parameter evaluation into call by value.

* Terry Ritter, Gregory Walker, Varieties of threaded code for language implementation, BYTE, vol. 5, #9, Sept. 1980, pp. 206-227.
Four varieties of threaded code are explained:
     1. subroutine-threaded (executable code: JSR A; JSR B; ... -- with A, B, ... subroutine-threaded lists, or machine code subroutines);
     2. direct-threaded (address list: A; B; ... -- with A, B, ... direct-threaded address lists packaged with call setup and return code, or addresses of machine code subroutines);
     3. indirect-threaded (address list: A; B; ... -- with A, B, ... indirect-threaded address lists surrounded by indirect addresses to machine code routines for call setup and return code, or indirect addresses of machine code subroutines);
     4. token-threaded (token list: A; B; ... -- with A, B, ... indexes in a table containing the addresses that occur in the indirect-threaded code described under 3.).
     The explanations are supplied in English, in pictures, in (pseudo-)Pascal and in MC6809 assembly code. There is no indication of accessing parameters, nor of incremental compilation.

* Peter Kornerup, Bent Bruun Kristensen, Ole Lehrmann Madsen, Interpretation and code generation based on intermediate languages, SP&E, vol. 10, #8, Aug. 1980, pp. 635-658.
The authors note three problems with P-code: 1. some info is lost and has to be retrieved by a pre-scan; 2. the control structure is lost; 3. the memory structure is lost.
     Effectively arguing that P-code is too low-level, they propose an intermediate form closer to Pascal, called PIF for Pascal Intermediate Form. A detailed proposal follows. There is a front-end producing PIF, but no further processors.

* A.V. Aho, Translator writing systems: where do they now stand?, IEEE Computer, vol. 13, #8, pp. 9-14. Aug. 1980,
Very general paper on compiler construction using tools.

* W.A. Wulf, B.W. Leverett, R.G.G. Cattell, S.O. Hobbs, J.M. Newcomer, A.H. Reiner, B.R. Schatz, An overview of the production-quality compiler compiler project, IEEE Computer, vol. 13, #8, pp. 38-49. Aug. 1980,
Much overview, not much technical detail. The system consists of a large number of small transformations, each "intellectually manageable". Each transformation derives from a formal description and is translated into a table. The compiler then is a driver that applies these tables to the program to be translated. Much of the code transformations is based on TCOL (Tree UNCOL).

* Susan L. Graham, Table-driven code generation, IEEE Computer, vol. 13, #8, pp. 25-34. Aug. 1980,
Extensive argumentation concerning the design of the Graham-Glanville code generator, with examples. Little hard information.

* Stephen C. Johnson, Language development tools on the UNIX system, IEEE Computer, vol. 13, #8, Aug. 1980, pp. 16-21.
Tools are described as general input-processing devices. The section on the philosophy of tool design ('Tooling up') is very interesting.

* Bruce Leverett, Thomas G. Szymanski, Chaining span-dependent jump instructions, ACM Trans. Prog. Lang. Syst., vol. 2, #3, July 1980, pp. 274-289.
A long jump to L can sometimes be replace by a short jump to M where M already holds a jump to L. This can be utilized to reduce the number of long jumps. Several variants of the problem are considered. Some are NP-complete; for the others, dynamic programming is the name of the game. The effectiveness of the technique is not discussed.

* F.E.J. Kruseman Aretz, Jan L.A. van de Snepscheut, H. Grasdijk, J.M.H. Smeets, SATHE: Some aspects of an ALGOL implementation, Software -- Practice and Experience, vol. 10, #7, pp. 563-573. July 1980,
Mainly about syntactic error recovery.

* Jack W. Davidson, Chistopher W. Fraser, The design and application of a retargetable peephole optimizer, ACM Trans. Prog. Lang. Syst., vol. 2, #2, April 1980, pp. 191-202.
To use the peephole optimizer PO for a new machine, its instructions must be described as register and memory transfers. The input to PO may be assembler language or register transfer language, but the input to the PO algorithm is register transfers: assembly language is converted upon input.
     The PO considers pairs of instructions, retrieves the register transfers for them, substitutes the effect of the first in the register transfers for the second and tries to find an instruction which does the combination. The paper does not explain how it is determined if an instruction (with its addressing modes and other flags) does what a set of register transfers describe; the examples suggest that the mechanism used is stronger than simple tree matching.
     If a replacement instruction is found, it replaces the instruction pair and the process continues with the new instruction as the first instruction of the next instruction pair.
     The authors point out that assembly code often overspecifies: many instructions set condition codes which are not used afterwards. Such overspecification can prevent replacement by cheaper instructions which have the same effect except that they not set the (after all unused) condition codes properly. They recommend for the code generator to generate register transfers rather than actual assembly instructions, to avoid such problems.
     The system is slow, about 1 to 10 instructions per second, but allows the code generator to be very simple, for example to generate naive code for a stack machine, and have the PO convert the output into clever code. Examples are given.

* R.G.G. Cattell, Automatic derivation of code generators from machine descriptions, ACM Trans. Programming Languages and Systems, vol. 2, #2, pp. 173-190. April 1980,
The entire program (or at least the entire subroutine) is given in tree form in TCOL, and so are all machine instructions, including jumps, increment-and-skip-on-zero, etc. No completeness is required of the instruction set. The pattern matcher performs in principle an exhaustive search, limited in simple ways.
     Since the instruction set may be incomplete, the search may get stuck. In that case, a "primary operator" is found in the unmatchable region according to certain criteria, an instruction is found that has the same primary operator, and the instruction is forcibly matched. This requires repairs to the tree that can be derived automatically. The search is then restarted to find instructions to match the new tree. This need not terminate in principle, but in practice it does.
     An example is shown in which the search algorithm manages to find the proper code for 'ACC ← Mem[Addr]' on a PDP8-like machine: 'ACC ← 0; ADD Mem[Addr]'.
     To increase efficiency, the initial set of instructions is extended at compiler-generation time with sets of canned instruction sequences, according to a set of heuristics. Most importantly, canned instruction sequences are created (by calling the search algorithm at compiler-generation time) for those TCOL operators for which no direct machine instruction exists. So the PDP8 idiom for 'ACC ← Mem[Addr]' is found at compiler-generation time. In addition, patterns are generated for all addressing modes.
     The scheme generates good code and is fast, in spite of the exhaustive search. Quote: "It appears that code comparable to a good handcrafted compiler can be generated with proper care."