Compiler Construction 1980-1989
Literature references and annotations by Dick Grune, dick@dickgrune.com.
Last update: Fri Nov 12 14:11:46 2021.
These references and annotations were originally intended
for personal use and are presented here only in the hope
that they may be useful to others.
There is no claim to completeness or even correctness.
Each annotation represents my understanding of the text
at the moment I wrote the annotation.
No guarantees given; comments and content criticism welcome.
J.J.F.M Schlichting,
An Early Implementation of Revised ALGOL 68,
(PhD Thesis),
Technical University Delft,
Delft,
Nov. 1989,
pp. ~200.
Describes the CDC Algol 68 implementation, which was produced under the
supervision of the author.
The compiler and the run-time system are treated equally extensively.
Some special attention is given to the garbage collector, which uses
self-describing references (pointers).
Des Watson,
High-Level Languages and Their Compilers,
International Computer Science Series,
Addison-Wesley,
Wokingham, England,
1989,
pp. 337.
Very good introduction to the title subject.
Covers in three parts programming languages, the anatomy of a compiler
and software-engineering a compiler.
Programming languages are covered in 93 pages and it shows: modules and
ADTs are hardly mentioned.
But the book is favourable to Algol 68, so all is forgiven:-); explains
two-level grammars.
Compiler construction is treated in the simplest possible terms, but
most subjects are touched upon. Interpreters are not forgotten.
Peter Lee,
Realistic Compiler Generation,
MIT Press,
Cambridge, Mass.,
1989,
pp. 246.
Denotational semantics is modified in a number of ways to adapt it to
the needs of the compiler writer and is provided with a Standard ML-like
notation.
An interpreter is then used to derive real-world (= efficient, comparable
to hand-written compilers) compilers for it.
Half the book is code.
The author likes puns.
Peter Rechenberg,
Hanspeter Mössenböck,
A Compiler Generator for Microcomputers,
(translated from German),
Prentice-Hall,
Hemel Hempstead, UK,
1989,
pp. 378.
Detailed write-up of the LL(1) L-attributed compiler generation system
Coco.
The system is Modula-2-based, features named attributes and is intended
for the PC.
The first 200 pages describe the theory behind, design of, methods used
in and applications of Coco; the rest is manual and code (all of it?).
A comparison with four other compiler generators (yacc, HLP84, GAG, MUG)
is given and summarized in a table.
The text is divided in a large number of small (usually less than one
page) section, each making a clear point.
Sometimes the view of the authors on things is enlightening:
"An LL(1) grammar is already a parsing program. To make a parser it is
only necessary to code an interpreter for it" (Section 2.5).
And then they set out to do so.
Easy to read, and worth reading, if it were only for its thoroughly
un-American style.
Josef Grosch,
Generators for high-speed front-ends,
in Compiler Compilers and High-Speed Compilation,
Lecture Notes in Computer Science #371,
ed. by D. Hammer,
Springer-Verlag,
Berlin,
1989,
pp. 81-92.
A coherent system of lexical scanner generator, LALR(1) parser generator
and LL(1) parser generator, using a uniform input syntax, is presented.
The scanner beats UNIX lex by a factor of 5, the LALR parser beats yacc
by a factor of 2.
Andrew S. Tanenbaum,
M. Frans Kaashoek,
Koen G. Langendoen,
Ceriel J.H. Jacobs,
The design of very fast portable compilers,
ACM SIGPLAN Notices,
vol. 24,
#11,
pp. 125-131.
Nov. 1989,
A special kind of back-end generator for the Amsterdam Compiler Kit is
described, which will produce very fast back ends that produce
relocatable binary directly.
A fast back end for a machine is generated from two tables, one that
maps EM instructions to assembler instructions of the machine, and one
that maps the assembler instructions to relocatable binary.
(Note that since the assembly instructions are used only as a
communication medium between the two tables, its format can be chosen
freely by the back-end writer.)
The back-end generator integrates these two tables into one set of
routines, one for each EM instruction; this set of routines forms a
library that can be linked with any front end to produce a fast
compiler.
The back-end generator does limited push-pop optimization.
The resulting compilers are about 3 to 4 times faster than the native
compilers.
The resulting code is about 3 times slower.
J. Grosch,
Efficient generation of lexical analysers,
SP&E,
vol. 19,
#11,
Nov. 1989,
pp. 1089-1103.
[[ preliminary abstract ]]
Automaton generation and table compression are combined.
Question: are the tunnelling state changes described just
$epsilon$-transitions, or is there more to it?
Alfred V. Aho,
Mahadevan Ganapathi,
Steven W.K. Tjiang,
Code generation using tree pattern matching and dynamic programming,
ACM Trans. Programming Languages and Systems,
vol. 11,
#4,
Oct. 1989,
pp. 491-516.
The Aho-Corasick multi-string search algorithm is adapted to bottom-up
subtree recognition, as follows.
Each template (subtree) to be recognized is characterized by the sets of
paths from the root to each of the leaves; the paths include at each
step information showing which branch in the template is taken.
The Aho-Corasick automaton for this set of paths is constructed.
Next, a top-down scan uses this automaton to recognize the possible
leaves of templates.
A subsequent bottom-up scan sets for each node a bit matrix, the entries
of which indicate whether the node can be a j-level node of template
i; since the left-right information is part of the path, this
information is sufficient to conclude that we have found a full template
when we find a level 0 node.
This process determines for each node in the tree which templates match
there, and does so in linear time.
Each template comes with a cost function and an action.
The cost function can check the values of leaves if needed, and can
return an infinitely high cost if the context turns out to disallow the
template.
Another bottom-up scan is used to find the minimum cost at each node
and the corresponding template, using dynamic programming: for each
matching template determine the cost of leaves + cost of template, and
take the minimum.
Again this is done in linear time.
A mixed top-down/bottom-up scan then performs the actions, to generate
code and/or to do constant folding, tree reordering, etc.
Register allocation is not addressed; nor is completeness of the
templates.
Also, the algorithm is restricted to expressions that form pure trees;
this is not mentioned in the paper.
The algorithm is retargetable, small, fast and produces good quality
code.
Summary.
The first part of the paper is a general explanation of the principles
in a clear style.
The second part concerns the combined implementation of the tree
matching and the dynamic programming, and soon confuses explanation and
implementation.
The third part concerns the code generator writing language twig
used in testing the algorithms.
It is a yacc-like language with sometimes unexpected nomenclature: the
keyword ABORT means `return infinity', and TOPDOWN means that the
default post-order visit sequence is replaced by one specified by the
user.
R. Dixon,
T. McKee,
P. Schweizer,
M. Vaughan,
A fast method dispatcher for compiled languages with multiple inheritance,
in OOPSLA '89 Conference,
ACM SIGPLAN Notices,
vol. 24,
#10,
Oct. 1989,
pp. 211-214.
The dispatch matrix is implemented as a matrix for speed, but it is
compressed by making selector rows coincide.
A close to optimal packing is achieved by a simple graph colouring
algorithm.
Additionally, scheme is given to postpone the graph colouring until link
time, when all classes with their selectors are known.
K. Bothe,
B. Hohberg,
Ch. Horn,
O. Wikarsky,
A portable high-speed Pascal to C translator,
ACM SIGPLAN Notices,
vol. 24,
#9,
Sept 1989,
pp. 60-65.
Very summary description of same; mostly about the reasons for writing
and using this convertor: 1. using Pascal where there is no Pascal
compiler; 2: support for projects that combine C and Pascal.
Several problems in the translation are mentioned (non-local procedures,
initialization of structured values) but no solutions are given.
Care is taken to produce readable C, to be improved by hand afterwards.
Preston Briggs,
Keith D. Cooper,
Ken Kennedy,
Linda Torczon,
Coloring heuristics for register allocation,
in ACM SIGPLAN '89 Conference Programming Language Design and Implementation,
ACM SIGPLAN Notices,
vol. 24,
#7,
July 1989,
pp. 275-284.
Explains easiest-last graph colouring and gives a linear-time
implementation, as follows.
First, an array of sets of nodes S (implemented as linked lists or so)
is constructed, such that all nodes with i neighbours are in
$ S sub i $; this can be done in $O(n)$ time.
Now a node with the fewest neighbours can be found by scanning S from
$ S sub 0 $ for the first non-empty set, say $ S sub k $; any node from
$ S sub k $ will then do, say N.
Next, N is removed from the graph and put on the colouring stack; it
is also removed from $ S sub k $.
When N is removed from the graph, the degrees of its neighbours are
all decreased by one, which means that they all have to moved down one
entry in the array S.
(In addition, to find the first non-empty entry in S one need not
start searching from 0, but one can start at $k-1$ after having removed
a node from $ S sub k $: $ S sub { k - 2 } $ was empty at that moment,
but it can only be filled from $ S sub { k - 1 } $, which, however, was
also empty at that moment, so $ S sub { k - 2 } $ will still be empty
after the above step.)
The authors report speed-ups of about 10 percent compared to Chaitin.
David Bernstein,
D. Goldin,
M. Golumbic,
H. Krawczyk,
Y. Mansour,
I. Nahshon,
R. Pinter,
Spill code minimization techniques for optimizing compilers,
in ACM SIGPLAN '89 Conference Programming Language Design and Implementation,
ACM SIGPLAN Notices,
vol. 24,
#7,
July 1989,
pp. 258-263.
Seven authors, all from the Technion in Haifa.
Describes a combination of and improvements over Chaitin, Chow,
Brélaz, etc., resulting in about a 10 percent reduction in spills and
a 3 percent speed-up.
The main item is the cost function, of which they have three.
The colouring algorithm is run once for each cost function and the best
(least spilling) colouring is chosen.
The basic graph decomposition step of their colouring algorithm is, for
a given number of registers r:
if there are one or more nodes with fewer than r neighbours, remove
the one with the largest number of neighbours;
otherwise, spill the node with the lowest cost.
The authors claim this works better than the traditional greedy or lazy
algorithms on typical interference graphs.
A representative graph size of 1000 nodes and 20000 edges is reported
for the "average" complete program; the ratio 20:1 for edges:nodes is
also representative.
Wen-Mei Hwu,
Pohua P. Chang,
Inline function expansion for compiling C programs,
in ACM SIGPLAN '89 Conference Programming Language Design and Implementation,
ACM SIGPLAN Notices,
vol. 24,
#7,
July 1989,
pp. 246-257.
Very clear detailed exposition of the problems that can be encountered
doing inlining, and their solutions.
Uses profiling of the program to find the most likely candidates.
One conclusion is that system calls are bad news.
Christopher W. Fraser,
A language for writing code generators,
in ACM SIGPLAN '89 Conference Programming Language Design and Implementation,
ACM SIGPLAN Notices,
vol. 24,
#7,
July 1989,
pp. 238-245.
Hand-written code generators are repetitive and full of case statements.
The author describes a condensed notation for this.
The user specifies a naive code rule for each type of node in the DAG,
and a set of optimization rules.
The rules look like parameter lists of printf(), with the
provision that formats may occur as parameters to formats.
Basically, the user can imagine that they are turned into a long
if-then-chain, but the actual generated code is much more efficient.
The reduction achieved (or expansion) is quite considerable: the
specification for the VAX is 126 lines, which expand into about 4000
lines of C.
Prominent detail: the rule compiler is written in Icon.
H. Emmelmann,
F.W. Schroer,
R. Landwehr,
BEG - a generator for efficient back ends,
ACM SIGPLAN Notices,
vol. 24,
#7,
July 1989,
pp. 227-237.
A bottom-up cost-based pattern recognizer, similar to Aho & Ganapathi's,
in which the dynamic programming tables are replaced by compiled
routines, and which includes register allocation; it actually features
two different register allocators.
One is naive and tries to hand out registers while recognizing the
patterns; if it runs out of registers it spills another register; it is
reported to work well enough for the MC68020.
The second is a separate pass that calculates the cost of keeping the
top of each subtree in a register, using some heuristic on the costs of
the the children; if the cost is infinite, the top of the subtree is
kept in memory; this is reported to work for the IBM/360 and the 8086.
Jack W. Davidson,
David B. Whalley,
Quick compilers using peephole optimizations,
SP&E,
vol. 19,
#1,
Jan. 1989,
pp. 79-97.
The "very portable C compiler" vpcc has CVM (C Virtual Machine) as its
intermediate code; CVM features separate calling and expression stacks
and has been designed specifically for C.
However, naive code generation from CVM (to the VAX-11) looses about a
factor of 2 in speed; hence the peephole optimization.
Peephole optimization for CVM is special in the sense that the input
sequences are known; a result is that a small number of peephole
optimization rules suffice.
The VAX machine instructions generated by naive code generation are kept
in a linked list.
Rules are matched to the input starting from the end of the input
backwards, to allow matches over the results of matches.
The matching process is repeated after each addition to the list of
instructions.
Register allocation is done from a fixed stack of registers; if an
expression exhausts this stack, the compiler gives up.
The set of replacement rules is transformed into a very detailed and
low-level C routine.
It does an ad-hoc form of Aho-Corasick bibliographic search, and is
about 9 times faster than an interpretative pattern matcher.
Using about 40 peephole optimization rules, the system generates code
that is between -10 and +20 percent faster than the code compiled by the
native C compiler.
The authors make two interesting observations:
1. Using a fixed set of 4 registers on the VAX-11 has not yet caused
problems after one year of operation.
2. Searching for clever peephole optimizations is entertaining but of
doubtful use: the most useful optimizations are generally obvious.
B.L. Meek,
K.K. Siu,
The effectiveness of error seeding,
ACM SIGPLAN Notices,
vol. 24,
#6,
June 1989,
pp. 81-89.
The differences between manual and automatic error seeding for compiler
testing are examined for two very different languages, Pascal and
Prolog.
Two conclusions:
1.
Automatic error seeding is more effective in languages
with strong syntactic structure than in more "loose" languages.
The latter require stronger manual steering of the error seeding
process.
2.
All methods need some measure of manual intervention.
J. Angel Velázquez Iturbide,
Formalization of the control stack,
ACM SIGPLAN Notices,
vol. 24,
#3,
pp. 46-54.
March 1989,
The author points out that the control stack is not a stack at all but
rather a variable-size array of which only a small and bounded number of
elements are accessible at any time (bounded by the maximum lexical
depth of the program).
Therefore the usual axioms for stack do not apply.
The activation record is considered as an ADT and a machine-independent
implementation is given.
There are also useful and clear pictures of control stacks with
accessible and inaccessible activation records depicted as toy blocks
shifted front and back.
Mahadevan Ganapathi,
Geoffrey O. Mendal,
Issues in Ada compiler technology,
IEEE Computer,
vol. 22,
#2,
pp. 52-60.
Feb. 1989,
Some of the issues are (there are many more in the paper):
1. Constraint check optimization is a must.
2. Code reordering is highly desirable, but "a myriad of special cases"
in the Reference Manual effectively prevents it.
3. Exception handlers hinder optimizations, since values must be kept
for use in exception handlers that could otherwise be discarded.
4. The rules for parameter passing cause a lot of bothersome aliasing
(and passing everything by copy-restore is not an option either).
The authors also complain about the vagueness of the semantics of
tasking and "its intrinsic requirements of substantial processing time",
but their complaints are equally vague.
P. Steenkiste,
J.L. Hennessey,
A simple interprocedural register allocation algorithm and its effectiveness for LISP,
ACM TOPLAS,
vol. 11,
#1,
Jan. 1989,
pp. 1-32.
The paper starts with a very clear exposition of interprocedural
register allocation, both in general, historical and modern, and more in
particular in a system with many small and often recursive procedures
and lots of registers.
The unfeasability of constructing an colouring a complete interference
graph for the entire program is emphasized.
The algorithm works as follows.
First, traditional register allocation by graph colouring is done on all
procedures; it is assumed that no spills will occur in this phase (or we
will not have enough registers to do interprocedural register allocation
anyway).
This tells us how many registers each procedure needs.
Next the call graph is constructed and strongly connected components
(consisting of mutually recursive procedures) are identified.
Each strongly connected component is assigned the maximum number of
registers any of its members needs.
This number is sufficient if we impose on each of the members the
obligation to save its own registers when it calls any of the other
members.
Now we have a DAG containing both procedures and strongly connected
components, but in both cases we know how many registers the element
needs.
Registers are now handed out bottom-up in this DAG.
Say an element i uses $ t sub i $ registers directly or indirectly; it
will then use the registers $ R sub 1 ... R sub t sub i $.
This $ t sub i $ is the sum of $ n sub i $, the number of registers i
itself uses, and the maximum of the $ t sub j $ over all children
(callees) of i.
This means that there may not be enough registers for the top level
procedures, but they are active only a small part of the time anyway.
The result is an average speed-up of about 10 percent.
Arthur B. Pyster,
Compiler Design and Construction: Tools and Techniques with C and Pascal,
(2nd ed.),
Van Nostrand Reinhold,
New York,
1988,
pp. 267.
[How does it compare to the 1st edition?]
[Not available in Dutch university libraries.]
Gabriela O. de Vivo <gdevivo@dino.conicit.ve> writes: The book covers the
general principles of compiler design and presents a good number of
examples focusing on the building of pieceparts of compilers for C and
Pascal. The implementation (construction) language is C. Note that this
edition (in contrast with the previous one) is very related to the Unix
world, including the use of tools like Lex, Yacc, and standard utilities.
(Out of print.)
Charles N. Fischer,
Richard J. LeBlanc,
Crafting A Compiler,
Benjamin Cummings Publishing,
Menlo Park, CA,
1988,
Highly readable though somewhat repetitive book by two experts in the
field.
D.M. Dhamdhere,
Register assignment using code placement techniques,
Comput. Lang.,
vol. 13,
#2,
pp. 75-93.
1988,
As an alternative to register allocation oriented code generation,
the author starts from unoptimized code in which the registers for each
operation are loaded on the spot and the result is stored right away.
This code is optimized by moving the loads and stores up and down the
control path until they meet with and annihilate other stores and loads.
Lots of theory, and there is probably a real algorithm in there
soemwhere.
J.S. Yates,
R.A. Schwatz,
Dynamic programming and industrial-strength instruction selection: code generation by tiring but not exhaustive search,
ACM SIGPLAN Notices,
vol. 23,
#10,
Oct. 1988,
pp. 131-140.
In the dynamic programming code generation as described by Aho, Sethi
and Ullman, the code for each node returns the result of the expression
(in a register or in memory).
This is extended in the sense that the code can also return the result
except for a deferred operation; for example, the result may be in
register 1 but still has to be shifted left over 2 positions or so.
An attempt is made to curb combinatorial explosion by sending down
inherited attributes telling which operations might be useful.
Excellent bibliography on Graham-Glanville code generation.
Peter F. Lemkin,
PSAIL: a portable SAIL to C compiler -- description and tutorial,
ACM SIGPLAN Notices,
vol. 23,
#10,
Oct. 1988,
pp. 149-171.
Example of fairly massive compilation with C as a target language.
C. Genillard,
A. Strohmeier,
GRAMOL -- an grammar description language for lexical and syntactic parsers,
ACM SIGPLAN Notices,
vol. 23,
#10,
Oct. 1988,
pp. 103-122.
A manual rather than a description or rationale.
The lexical analyser is restricted to programming language tokens.
Remarkably, the description of the lexical analyser takes 7 pages and
that of the syntactic analyser 2.5 pages.
The paper is sometimes singularly uninformative: the form of the
resulting parsing is not explained nor how it can be obtained; and the
paper explicitly refuses to reveal whether a top-down or bottom-up
parser is generated (either, depending on the grammar perhaps?).
Peter Schnorf,
Dynamic instantiation and configuration of functionally extended efficient lexical analyser,
ACM SIGPLAN Notices,
vol. 23,
#10,
Oct. 1988,
pp. 93-102.
LEXXO, a fixed lexical analyser, parameterized with such things as the
set of characters that start an identifier, etc.
Jack W. Davidson,
Anne M. Holler,
A study of a C function inliner,
SP&E,
vol. 18,
#8,
Aug. 1988,
pp. 775-790.
Describes INLINER, a source-to-source inliner for C.
Several inlining problems were encountered: actual-formal mismatch,
varargs, etc.
Many details are given in narrative form, for example concerning the
renaming of labels; no (pseudo-)algorithms.
A call graph is constructed as far as possible; recursion cycles are
broken arbitrarily.
The routines are inlined according to a backward topological sort.
On the average, a 12 percent speed-up was obtained at the expense of a
24 percent increase in code size.
The conflicts between inlining and debugging / symbolic debuggers are
noted.
H. Mössenböck,
A convenient way to incorporate semantic actions in two-pass compiling schemes,
Software -- Practice & Experience,
vol. 18,
#7,
July 1988,
pp. 691-700.
First a parse tree of a program is constructed using a strong parser
(LR(1)) and then convenient LL(1)-based semantic analysis is done guided
by that parse tree.
Christopher W. Fraser,
Alan L. Wendt,
Automatic generation of fast optimizing code generators,
in ACM SIGPLAN '88 Conference on Programming Language Design and Implementation,
ACM SIGPLAN Notices,
vol. 23,
#7,
July 1988,
pp. 79-84.
First, the compiler writer prepares two tables, one describing a naive
code generator (for a given intermediate language) and one describing a
peephole optimizer for it.
These are then combined into a slow, effective optimizing code generator
$CG sub 1$, which does optimization through classical
techniques.
Now, $CG sub 1$ is used to compile and optimize program segments from a
test bed and all optimizations found are recorded, including the
conditions that led to their discovery.
These results, together with the naive code generation table are used to
construct a second code generator $CG sub 2$, which contains all handling
of the discovered optimizations as hard code: the code generator has
"learned" the optimizations by heart.
Paul E. Cohen,
An abundance of registers,
ACM SIGPLAN Notices,
vol. 23,
#6,
pp. 24-34.
June 1988,
Points out that even if there are registers enough, it is not wise to
just use then liberally, since having many live registers at procedure
calls still causes trouble: either they have to be saved and restored
upon return or the callee (and its sub-callees!) have to be careful not
to touch them.
The NEC V60 and V70 microprocessors have 32 registers of 32 bits, and
subroutine call and return instructions that accept dynamic masks
indicating which registers are in use at the call point and which are
going to be used by the callee; ANDing these masks gives the registers
that have to be saved, which the hardware then duly does.
The author then analyses this situation statistically, for various
assumptions, and runs tests on a real-world program.
One conclusion is that it is probably wise to save all registers before
calling a routine that is called only once, to allow as many registers
as possible to the calling tree under that routine.
Kai Koskimies,
Otto Nurmi,
Jukka Paakki,
Seppo Sippu,
The design of a language processor generator,
Softw. Pract. Exper.,
vol. 18,
#2,
pp. 107-135.
Feb. 1988,
For generality, the authors prefer "language processor generator" over
"compiler writing system".
The paper describes HLP84 (Helsinki Language Processor 84), the design
of which was for a large part driven by the shortcomings of HLP78:
1. LALR(1) is not convenient enough;
2. pure attribute grammar is too primitive;
3. most practical attribute grammars are one-pass, but this is not
exploited in HLP78.
HLP84 is an open system, i.e. it is a set of interfaces: parsers / code
generators / preprocessors / etc. can be added at will.
Lots of tools are supplied.
One essential level in the interfaces is the block (as in a structured
programming language).
Processing is one-pass.
Although emission of semantic errors is largely automatic, measures are
taken to prevent semantic errors from propagating.
Attempts to design a general scope-managing system led to difficulties,
but some tools are available.
The system produces language processors in Pascal; it consists of ~ 4400
lines of HLP84 code, yielding ~ 6400 lines of Pascal.
The code is described in some detail.
M. Ganapathi,
C.N. Fischer,
Integrating code generation and peephole optimization,
Acta Inf.,
vol. 25,
Jan. 1988,
pp. 85-109.
The paper describes attribute-based code generation and attribute-based
peephole optimization, but in spite of the title the integration of both
is not explained, as far as I can see.
The attribute-based code generation rules record the semantics of the
instructions issued in attributes rather than to generate code.
The attribute-based peephole optimization rules specify code for
specific combinations of nodes in the parse tree, subject to conditions
on the attributes of these nodes.
Apparently there is some bottom-up tree recognizer active that signals
the applicability of a peephole optimization rule and then tests the
attributes to see if the peephole optimization rule really applies.
Lots of (58) literature references.
Dick Grune,
Ceriel J.H. Jacobs,
A programmer-friendly LL(1) parser generator,
Software -- Practice and Experience,
vol. 18,
#1,
pp. 29-38.
Jan. 1988,
Presents a practical ELL(1) parser generator, called LLgen,
that generates fast error correcting recursive descent parsers.
In addition to the error correction,
LLgen features static as well as dynamic conflict resolvers
and a separate compilation facility.
The grammar can be viewed as a program, allowing for a natural
positioning of semantic actions.
Christopher Warwick Fraser,
Automatic Generation of Code Generators,
PhD Thesis,
Yale University,
Dec. 1987,
pp. 87.
Describes an automatic code generator generator which is instantiated
with a machine- and language-independent code generation knowledge base
XGEN and an ISP machine description of machine M to create a code
generator for M.
This code generator converts the intermediate language XL into code for
M.
XGEN contains general rules such as "if the XL form contains indirect
addressing of the form such and such and the machine has such
addressing, then generate the indirect addressing code, else replace the
XL form by such and so".
The replacement(s) will then again be subject to analysis with XGEN
rules.
ISP is a machine description language similar to the one in th PDP-11
manual.
XL is a fairly high-level intermediate code; a too low level would
preempt valuable optimizations.
Its main ingredient is three-address code in which the addresses can be
relatively complicated and can feature indirect addressing.
Also, it has a fixed sequence of pseudo-instruction to represent a loop.
The thesis is remarkably uninformative in that it does not describe the
matching algorithm that interprets XGEN; only examples are given, and
they are none too clear either.
J.W. Davidson,
C.W. Fraser,
Automatic inference and fast interpretation of peephole optimization rules,
Softw. Pract. Exper.,
vol. 17,
#11,
Nov. 1987,
pp. 801-812.
See also
Davidson & Fraser,
"Automatic generation of peephole optimizations",
ACM SIGPLAN Notices, Vol. 19, No. 6, June 1984, pp. 111-116.
The present paper tells us more on the peephole optimizer PO, for which
see ACM TOPLAS April 1980, by the same authors.
Rather than including all instruction pairs that could conceivably be
replaced by an optimizing replacement, only those combinations are
considered that occur in actual programs.
To this end, a test set of programs is assembled, translated naively and
the result analysed for instruction pairs.
The compile-time pattern matching is done by renormalizing two
instructions in the input stream and looking the result up by hashing;
details are given.
The inference mechanism is the same as in their 1984 paper mentioned
above.
An evaluation of PO, Robert Kessler's Peep, and Peter Kessler's XXXX????
shows that all three systems yield roughly the same quality of code.
It appears that combining three instructions is enough, combinations of
four or higher are not useful.
A comparison with EM, the abstract machine of the Amsterdam Compiler
Kit, showed that the automatically generated set of peephole
optimizations rules performed about 85 percent of the optimizations that
the hand-written EM peephole optimizations rules did.
Michael Sonnenschein,
Graph translation schemes to generate compiler parts,
ACM TOPLAS,
vol. 9,
#4,
Oct. 1987,
pp. 473-490.
During attribute evaluation in an AST, it is sometimes useful to have
attributes which are pointers to syntax tree nodes.
This is barely possible inside the AG framework, and if these pointers
are treated on par with the syntax tree pointers resulting from parsing,
the AST is no longer a tree but a graph, and possibly cyclic at that.
The "graph" in the title is an ASG, an Abstract Syntax Graph, and the
described language GTS is an attribute language for ASGs.
GTS is also capable of modifying the ASG, which is essential for
optimization.
To facilitate the use of some iterative algorithms, it includes
iteration in addition to the usual attribute evaluation rules.
It is translated to a set of recursive procedures, using Katayama's
scheme.
The paper is quite theoretical and heavy reading.
Its germanisms do not help either (for example p. 747-11:
"for the destination of different information on ..." =
"für die Bestimmung unterschiedlicher Information über ..." =
"for the determination of different data on ...").
N. Wirth,
Hardware architectures for programming languages and programming languages for hardware architectures,
ACM SIGPLAN Notices,
vol. 22,
#10,
Oct. 1987,
pp. 2-8.
Compares the 1960-1970 software crisis to the 1980 hardware crisis.
Suggests a calculus for hardware correctness.
H. Massalin,
Superoptimizer -- A look at the smallest program,
ACM SIGPLAN Notices,
vol. 22,
#10,
Oct. 1987,
pp. 122-126.
The absolutely optimal code sequence for some very simple but useful
functions is found by exhaustive search.
Functions include sign(x), max(a,b), min(a,b) and obtaining boolean
values for a == 0 and a == b.
The universe of the exhaustive search is a set of hand-picked
instructions that do arithmetic and logical operations on registers;
jumps are explicitly excluded.
The search tree is pruned by recognizing repeated or zero-result
instruction combinations.
Each prospective solution is tested by two sets of spot tests for the
argument values; if the solution survives these tests, it is checked
manually and in practice the solution is always found to be OK.
The superoptimizer can test 50000 solutions per second, and finds the
optimum code in a few days.
It tends to find astounding and very "clever" code sequences.
R.C. Holt,
Data descriptors -- A compile-time model of data and addressing,
ACM TOPLAS,
vol. 9,
#3,
July 1987,
pp. 367-389.
A uniform descriptor for data (in memory, in a register, constant) is
proposed for use in a compiler.
The form is $ @ sup k b dot d dot i $, in which
$ b $ is the base: a register $ R sub n $, a scope level $ L sub n $ or
$ null $ for constants;
$ d $ is an signed integer displacement;
$ i $ is an index register $ I sub m $ or $ null $.
The sum of the above forms the numeral of the data descriptor.
Its value is the numeral, k times dereferenced, with $ k >= 0 $.
The use of these data descriptors for expressing values and addresses in
expressions in various programming languages are described, as are
mappings onto many machine addressing schemes.
One example is that the C & pseudo-operator corresponds to
$ @ sup -1 $.
Next, rules for adding, subtracting and multiplication with a constant
are given.
It is shown that these can for example be used to generate good code for
compound array indexing, for example for a.b[i].c[j] := false.
Jeanne Ferrante,
Karl J. Ottenstein,
Joe D. Warren,
The program dependency graph and its use in optimization,
ACM TOPLAS,
vol. 9,
#3,
July 1987,
pp. 319-349.
[ Heavy graph theory ]
F. Warren Burton,
Dieter Maurer,
Hans-Georg Oberhauser,
Reinhard Wilhelm,
A space-efficient optimization of call-by-need,
IEEE Trans. Softw. Eng.,
vol. SE-13,
#6,
pp. 636-642.
June 1987,
Call-by-need is lazy evaluation.
The paper explains strictness analysis, and gives many examples where it
does a lot of good, and a few where it is not optimal.
Steven P. Reiss,
Automatic compiler production -- The front end,
IEEE Trans. Softw. Eng.,
vol. SE-13,
#6,
pp. 609-629.
June 1987,
[ about ACORN ]
Andrew W. Appel,
Kenneth J. Supowit,
Generalizations of the Sethi-Ullman algorithm for register allocation,
Softw. Pract. Exper.,
vol. 17,
#6,
June 1987,
pp. 417-421.
Extends the Sethi-Ullman algorithm for register allocation
(Sethi & Ullman, The generation of optimal code for arithmetic expressions,
J. ACM, Vol. 17. No. 4, pp. 715-728, April 1970) for operations that
have more than 2 inputs and/or more than 1 output, by sorting the
subtrees in decreasing order for $ a sub n - b sub n $, where
$ a sub n $ is the number of registers required to evaluate node n and
$ b sub n $ is the number of registers required by the result of node
n.
R. Nigel Horspool,
An alternative to the Graham-Glanville code generation scheme,
IEEE Software,
vol. 4,
#3,
pp. 33-39.
May 1987,
A simple bottom-up dynamic programming scan assigns the best storage
class to each node (numeric register, address register, even-numbered
register, etc.).
That done, a second bottom-up scan matches the patterns to instructions.
Accidental mismatch is solved by emitting extra code.
Comparisons with Graham-Glanville (can get stuck and the author's
cannot), and with Aho-Johnson (J. ACM, July 1976) (simpler and can
handle even/odd registers) are made.
Dick Grune,
How to compare the incomparable,
Information Processing Letters,
vol. 24,
pp. 177-181.
1987,
Suppose you have to choose the larger of
($x sub 1$, $y sub 1$)
and
($x sub 2$, $y sub 2$),
when you know nothing about x and y.
Choose the one for which $ x * y $ is larger.
The paper shows that this is the only criterion that is invariant under
a variety of transformations.
Jim Welsh,
Atholl Hay,
A Model Implementation of Standard Pascal,
Prentice-Hall,
1986,
pp. ???.
Joe Snyder <joe@semaphorecorp.com> writes: This book is only really useful
if you need to implement a COMPLETE version of a platform-independent
Pascal, but I find it interesting because the 483 pages consist entirely
of the source code listing for the compiler and p-code interpreter (both
written in Pascal itself), including copious {comments} to explain the
code. The code eagerly delves into the horrible minutiae necessary when
implementing a complete language, and proves that no language designer
should be allowed to present his design until AFTER being forced to write
a complete compiler for the language.
John R. Ellis,
Bulldog -- A Compiler for VLIW Architectures,
MIT Press,
Cambridge, Ma.,
1986,
pp. 320.
The Bulldog compiler is a very sophisticated compiler for a very
specific problem: exploiting the parallelism in a VLIW machine for
scientific programs in Fortran.
One of the premises is that the VLIW machine can be designed hand in
hand with the compiler.
The compiler generates code for an imaginary machine, ELI, which stands
for "Enormously Long Instructions".
A simulator for ELI exists.
The compiler is based on three techniques:
trace scheduling,
memory-reference disambiguation, and
memory bank disambiguation.
The VLIW machine has a number of specific properties (aside
from having Very Long Instruction Words); the most important ones
follow.
All binary operations are symmetrical: if, for example, there is a
machine instruction to compute $a-b$, there is another one that
computes $b-a$.
Instructions like incr and dec are replaced by a constant
generator unit, which, when activated, will produce the desired constant
operand.
There are no "hot spots" (latches, hidden registers): all operations
read from registers and write to registers; and there are enough of
these.
Memory is interlaced in memory banks; a memory location can be addressed
efficiently as location N in memory bank M, and less efficiently
just by its address.
Operations can be multi-cycle (they deliver their results only after
more than one cycle); the resources such an operation needs must be tied
up explicitly in the following instructions, until the result can be
written to a register.
A "trace" is a list of basic blocks that are connected by a possible
flow of control path in the original flow graph.
The original flow graph is divided up into traces; each basic block is
part of exactly one trace.
Traces are ordered for "importance".
There can be jumps from somewhere in the middle of a trace to the start
of less important trace and from the end of a trace to somewhere in the
middle of a more important trace.
No other jumps are allowed; more specifically, there cannot be a jump
from a trace to itself.
It is the task of the trace scheduler to create and order the traces in
such a way that a more important is likely to be executed more often in
a run of the program than a less important trace.
To this end, the trace scheduler uses heuristics, user information and
a profiler to estimate the importance of each basic block.
The most important basic block is then selected as the "seed" for the
most important trace.
This trace seed is then extended on both ends, using the estimated
importance of other basic blocks, until the process stops either at one
of the ends of the function or whenever an illegal jump would be
created.
This process is then repeated to find the one but most important trace,
etc.
The most important trace is compiled first, and optimized at the
expense of less important traces; if code inside the trace is moved past
jumps off the trace or into the trace, this code is copied to the other
(lesser) traces, as needed.
Memory references are disambiguated by constructing a symbolic
expression for each memory reference in a given trace.
The expressions are AND-OR formulas of actual reference expressions; the
latter can be composed of additions and multiplications only, and
multiplication can be by a constant only.
To see if two memory references $ e sub 1 $ and $ e sub 2 $
could ever refer to the same location, the equation
$ e sub 1 - e sub 2 = 0 $ is constructed.
This equation is resolved into a number of Diophantine equations, which
can be solved easily.
There are three possible answers:
$ e sub 1 $ and $ e sub 2 $ always refer to the same location,
$ e sub 1 $ and $ e sub 2 $ never refer to the same location,
and "don't know".
If an equation cannot be constructed (division, multiplication by a
variable, etc.), the compiler assumes the worst ("don't know").
All this is effective because index expressions are usually linear.
Similar techniques are used to do memory bank disambiguation.
Many optimizations are described for each of the three techniques.
The second half of the book concerns the code generation proper.
Code generation starts with live analysis on the variables in the basic
blocks in a function; this analysis determines which variables are live
at the beginning and end of each basic block.
Next the trace scheduler groups the basic blocks into traces, with
priorities.
The live information of the blocks in a trace is merged to give live
information about the trace itself.
At this moment the live information does not yet include location
information, except perhaps for the input and output parameters.
Starting with the most important one, code is then generated for each
trace.
If this trace connects to a trace that was already processed, the
location requirements of that trace are obeyed, otherwise
the code generation process determines the locations in which live input
variables are expected from other traces and live output variables are
to be expected by other traces.
These data are then passed on the traces to which the present trace is
connected and which have not yet been processed.
The information translates there into DEF and USE nodes at the
beginning and the end, respectively.
In this way allocation decisions in more important traces are given
absolute priority over those in less important traces.
The code generator applies many of the usual Aho-Sethi-Ullman
optimizations to each trace, including loop unrolling.
If these result in code motion and if this motion is past jumps off the
trace and into the trace, code is copied to the other traces.
This makes the other traces less efficient, but that's OK since they are
less important anyway.
The next task is assigning functional units and filling the instruction
words.
This task is performed by a recursive top-down scan that propagates
information from the roots of the DAG downwards; when the scan reaches
the bottom, it collects the information found there and on this basis
assigns functional units and fills instruction words, using a greedy
algorithm.
This phase is called BUG, for Bottom-Up Greedy.
Register allocation is trivial by design: the ELI machine has enough
registers for all functional units.
The Bulldog compiler is written in ELISP, the LISP dialect for the
DEC-20.
The source program, in Fortran, is converted to Lisp syntax, called
Tinylisp, using a variant of the UNIX Fortran structuring program
struct.
The output is object code, to be run on the ELI interpreter.
The compiler achieves speed-ups between 0.9 and 7.6 over the sequential
case, over a mix of scientific Fortran programs.
A recurring conclusion in the book is that simple heuristics help, but
that once these are in place, more complicated heuristics help less.
The book is written in an easy-going style, features many diagrams and
hardly any math, and is much less difficult than it could have been,
given the subject.
The many personal and entertaining comments add to the book's
attractiveness.
Ralph E. Griswold,
Madge T. Griswold,
The Implementation of the Icon Programming Language,
Princeton University Press,
Princeton, N.J.,
1986,
pp. 336.
Detailed and possibly exhaustive explanation of the internal workings of
the Icon interpreter, with many examples, diagrams and occasional but
not too many chunks of C code.
On the one hand it reads as an extensive technical report on a
non-trivial interpreter, on the other it features exercises, which
suggest textbook use.
It seems quite usable as a textbook in a course on interpreter writing;
its restriction to Icon does not prevent a host of interpreter problems
from appearing and being solved.
The entire backtracking mechanism, including generators and
co-expression is explained by example and by C code, but a few catchy
invariants would have been helpful.
The implementation of generators is unexpected, if I understand it
correctly.
The scope of a generator is bounded by the expression it occurs in; what
is exactly a bounded expression is defined in the language.
The beginning of the bounded expression is marked on the stack; the mark
includes a failure label.
When a generator occurs, it is called in the usual way; it is
implemented as a C routine.
When it suspends, the C routine does not return, since that would
involve loss of its local variables, but calls the interpreter
recursively instead, with special parameters.
At that moment the stack contains (counting from the top) some data for
calls and expressions C and an expression bound marker M.
The recursive call stacks a generator marker G, then stacks a copy of
the section C and stacks one value of the generator on top.
This simulates the situation in which there is no generator at all and
just one value is supplied.
The code in the stacked copy C then runs its course (possibly involving
nesting generators), and the recursive call to the interpreter returns
with success or failure.
If it fails the generator is resumed, which may of course again fail,
but which normally results in a repetition of the scenario described
above with the next value of the generator.
If it succeeds, the bounded expression terminates; both the interpreter
stack and the C stack have to be cleared of any stuff pertaining to it.
Clearing the interpreter stack is easy: M is unstacked, and together
with it everything on top of it.
But if the discarded part of the interpreter stack contained G markers
for non-exhausted generators, the C stack will contain activation
records for these and they will never be resumed.
To solve this, the recursive depth of the interpreter is recorded and
any C routine that finds itself on a depth from which the interpreter
has already returned, itself returns immediately.
The interpreter features a mark-scan-compactify garbage collector, with
a number of peculiarities.
The mark phase is recursive and hopes that the C stack will suffice; it
usually does.
Since Icon is not a typed language, there can be no templates and
pointers have to be identified by being implemented as self-identifying.
Problems with hidden and invisible pointer are avoided by moving all
memory requests, and thus all potential calls to the garbage collector,
to safe places, by "predictive need".
There are some problems with pointers to possibly overlapping
substrings; the compaction phase does the right thing and keeps only the
accessible "clumps" as they are called here, at the expense of some
sorting of pointers.
Some additional technical points:
--
Totally different data structures have to be prepared for lvalues and
rvalues (called here "occurrences in assignment context" and
"occurrences in dereferencing context") that look the same in Icon, for
example for table elements: t[12] may result in a value or in a
pointer to an assignable position.
If we need an rvalue and happen to have an lvalue, for example as a
function argument, it can be dereferenced in place.
--
The interpreter has an interpret stack on which virtual Icon machine
instructions operate; these instructions are programmed as C functions
and are issued in principle by a fetch-and-interpret loop, except that
the suspend operation and any other operation that does suspend
operations internally (for example toby) calls the interpreter
recursively, to aid in backtracking.
--
The problem with co-expressions is that they require a new interpreter
stack, which is easy, and a new C stack, which is hard.
This is a consequence of having the interpreter call itself to do
backtracking.
William A. Barrett,
Rodney M. Bates,
David A. Gustafson,
John D. Couch,
Compiler Construction: Theory and Practice,
Science Research Associates,
Chicago,
1986,
pp. 504.
A considerable part (about 50 percent) of the book is concerned with
parsing;
formal language theory, finite-state automata, top-down en bottom-up
parsing and error recovery are covered in very readable chapters.
Only those theorems are treated that relate directly to actual parsing;
proofs are quite understandable.
The book ends with an annotated bibliography of almost 200 entries, on
parsing and other aspects of compiler construction.
Patrick D. Terry,
Programming Language Translation -- A Practical Approach,
Addison-Wesley,
Wokingham, England,
1986,
pp. 443.
Essentially a series of case studies with introduction, concerning an
interpreter, an assembler, a parser (LL(1) only), a toy compiler, block
structure, semaphores, modules and monitors. The language is Pascal.
Covers the bare essentials of a wide range of subjects, in an easy-going
style.
A.V. Aho,
R. Sethi,
J.D. Ullman,
Compilers: Principles, Techniques and Tools,
Addison-Wesley,
Reading, Mass.,
1986,
pp. 796.
The "Red Dragon Book".
Excellent, UNIX-oriented treatment of compiler construction.
Even treatment of the various aspects.
David A. Padua,
Michael J. Wolfe,
Advanced compiler optimization for supercomputers,
Commun. ACM,
vol. 29,
#12,
Dec. 1986,
pp. 1184-1201.
For parallel supercomputers.
Yuh-Huei Shyu,
From semi-syntactic analyzer to a new compiler model,
ACM SIGPLAN Notices,
vol. 21,
#12,
Dec. 1986,
pp. 149-157.
Many so-called context-free syntax rules are actually regular
expressions over (regular) tokens, and thus themselves regular.
In this none too clear paper, the author examines the consequences of
considering all non-recursive leaves of the grammar as tokens.
H. Mössenböck,
Alex -- a simple and efficient scanner generator,
ACM SIGPLAN Notices,
vol. 21,
#12,
Dec. 1986,
pp. 139-148.
Lexical scanner to the Coco system, generating Modula-2 code.
The input language Alexis consists of marked sections of token classes,
for example CHARACTER SETS, KEYWORDS, COMMENTS, etc., of
which Alex understands the implications.
Within each section, tokens can be defined in a section-determined
format.
Usually the definition consists of the token number to be
returned, followed by =, followed by an EBNF expression.
There is no action code.
Nesting definitions are allowed in the COMMENTS section.
The calling mechanism seems awkward: to obtain a token the generated
analyser has to be called twice.
The first call returns the token number as defined, and the second call
returns an index in a hash table of an entry that contains the text of
the token.
The implementation is not described in detail and leaves questions.
1. The implementation of the nesting tokens is not explained.
2. It is stated that Alex disambiguates by taking the longest match.
which requires unbounded look-ahead, but it is also stated that there is
no or only very limited look-ahead.
Dick Grune,
Generic packages in C,
ACM SIGPLAN Notices,
vol. 21,
#9,
Aug. 1986,
pp. 31-39.
Shows how to fake generics in C, using the preprocessor.
Valentin F. Turchin,
The concept of a supercompiler,
ACM TOPLAS,
vol. 8,
#3,
July 1986,
pp. 292-325.
[[ preliminary abstract ]]
[A combination of partial evaluation, lazy evaluation, generalization, etc.]
Walter F. Tichy,
Smart recompilation,
ACM Trans. Programming Languages and Systems,
vol. 8,
#6,
pp. 273-291.
July 1986,
Editing a source file A produces a change set, a specification of the
modifications in terms of the language (rather than in terms of the
text); example: the declaration of X has changed.
Compiling a source file B produces a reference set, a specification of
the items B used from its environment, in terms of the language.
If B depends on A, B is recompiled only if the intersection of the
change set of A and the reference set of B is non-empty.
David W. Wall,
Global register allocation at link time,
in ACM SIGPLAN '86 Symposium on Compiler Construction,
ACM SIGPLAN Notices,
vol. 21,
#7,
July 1986,
pp. 264-275.
The arguments for doing global register allocation at link time are
1. that that is the first moment that all information is in,
including that from libraries and
2. that it allows global variables to be put in registers.
The naive approach of delaying almost all actions to link time was found
to be too slow and too inflexible.
In the improved approach, the code generator generates code that works
correctly and that can be used without further register allocation at
link time.
The only registers it uses (out of the 64 registers the target machine
has) are 8 registers for temporaries in expressions.
In addition to the working code, the code generator supplies one or more
annotations with each instruction; an example of such an annotation is
"remove this instruction if variable y is in a register".
The code generator also supplies partial call graphs with call
frequencies.
When link time optimization is desired, the call graphs of all modules
involved are combined into an over-all call graph.
Recursive procedures are found and removed from the graph as "difficult
cases".
This leaves an acyclic graph, which is augmented with usage frequency
estimates.
One observes that none of the leaf procedures can be active
simultaneously, so they can all share the same registers for their
variables.
Once these have been handed out, the same argument applies to the next
level, etc.
If there are not enough registers, the frequency counts are used to make
the tough decisions.
A more complicated scheme has been followed for recursive procedures, in
which one procedure is a cycle gets the obligation to save ("archive")
all the registers on the cycle.
Once the register assignment have been performed, the object code can be
edited and cleaned up; this involves address recalculations.
The author reports that global link-time register allocation obtains
larger speed improvements than global compile-time register allocation,
with or without graph colouring, the ratio being about 15 percent to 11
percent to 10 percent
(all percentages compared to local compile-time register allocation).
James R. Larus,
Paul N. Hilfinger,
Register allocation in the SPUR Lisp compiler,
in ACM SIGPLAN '86 Symposium on Compiler Construction,
ACM SIGPLAN Notices,
vol. 21,
#7,
July 1986,
pp. 255-263.
Very clear and reasonably detailed report on the implementation of
Chow's priority-based graph colouring register allocator in a Lisp
compiler for a RISC machine.
The resulting impact on the compilation time is reported as "moderate
for most programs, though particular, complex functions may require
considerable time".
Required reading for anyone undertaking such an enterprise, I think.
Peter B. Kessler,
Discovering machine-specific code improvements,
in ACM SIGPLAN '86 Symposium on Compiler Construction,
ACM SIGPLAN Notices,
vol. 21,
#7,
July 1986,
pp. 249-254.
[[ preliminary abstract ]]
The instruction set is partitioned in general-purpose and
special-purpose instruction.
The process starts from a Lisp-like (= tree-like) description of all
machine instructions, and a list of general-purpose instructions to be
used by the code generator.
For each special-purpose instruction, an attempt is made to decompose
its tree into trees of general-purpose instructions.
This seems to be done through unification, although this is not
described in detail in the text; the unification yields conditions under
which the rewriting (= the equivalence) holds.
This yields "idioms" for each special-purpose instruction.
Three types of idioms are distinguished:
1.
A "set idiom" is a special-purpose instruction that is equivalent to a
general-purpose instruction except that one or more of its operands are
restricted to a very limited set;
an example is INCR R1 for ADD R1,1.
2.
A "binding idiom" is a special-purpose instruction that is equivalent to a
general-purpose instruction except that two or more of its operands are
the same;
this is used to model addressing modes, but it is not explained how.
3.
A "composite idiom" is a special-purpose instruction that is equivalent to a
sequence of general-purpose instruction;
an example is JGE R1,R2,LABEL for CMP R1,R2; JPOS LABEL.
It is not explained how these idioms and their context conditions are
used by the code generator or the code generator writer or generator.
Christopher W. Fraser,
Alan L. Wendt,
Integrating code generation and optimization,
in ACM SIGPLAN '86 Symposium on Compiler Construction,
ACM SIGPLAN Notices,
vol. 21,
#7,
July 1986,
pp. 242-248.
Recognizing that both code generators and optimizers are pattern matches
leads to an integrated rewriting system.
Being part of PO, for which see
Davidson & Fraser, ACM TOPLAS, April 1980,
the rewriting system is not tree-based but used hashing to determine the
appropriate rewrite quickly.
J. Davidson,
A retargetable instruction reorganizer,
in ACM SIGPLAN '86 Symposium on Compiler Construction,
ACM SIGPLAN Notices,
vol. 21,
#7,
July 1986,
pp. 234-241.
About the "order" algorithm of PO, for which see
Davidson & Fraser, ACM TOPLAS, April 1980.
The basic algorithm is the Sethi-Ullman ordering algorithm for
expressions, applied here to target code that has already been generated
by previous phases; late instruction reordering is advantageous.
Its retargetability is not explained but seems to stem from its being
integrated in PO.
Phillip B. Gibbons,
Steven S. Muchnick,
Efficient instruction scheduling for a pipelined architecture,
in ACM SIGPLAN '86 Symposium on Compiler Construction,
ACM SIGPLAN Notices,
vol. 21,
#7,
July 1986,
pp. 11-16.
Compilation is for a HP Precision, an (experimental?) pipelined
architecture.
The constraints imposed by the pipelining are inserted as dependencies
in the DAG of the basic block.
Next two rules and a set of heuristics are given to create a topological
sort.
The resulting algorithm is $ O ( n sup 2 ) $ for a basic block size of
n, rather than $ O ( n sup 4 ) $ for previous algorithms.
Henri E. Bal,
Andrew S. Tanenbaum,
Language- and machine-independent global optimization on intermediate code,
Comput. Lang.,
vol. 11,
#2,
pp. 105-121.
April 1986,
Does the following optimizations on the intermediate code:
inline substitution,
common subexpression elimination,
strength reduction in loops,
stack pollution,
tail merging,
copy propagation,
constant propagation,
dead code elimination,
register allocation.
Gives detailed considerations that went into the implementation of each
of them.
The total of them causes a speed-up of about 16 percent; the paper does
not break this down over the various optimizations.
A.M.M. Al-Hussaini,
R.G. Stone,
Yet another storage technique for LR parsing tables,
Softw. Pract. Exper.,
vol. 16,
#4,
April 1986,
pp. 389-401.
The paper describes compression techniques for LR parser tables;
somewhat confusingly the LR parser itself is used for doing file
compression.
With this out of the way, an introduction to standard LR table
compression follows.
The new submatrix technique is described next; it partitions the rows
into a number of submatrices, such that the rows of each are similar
enough in some sense to allow drastic compressing.
The access cost is $O (1)$.
One heuristic partitioning algorithm is the following.
For each row we choose the default value it contains (majority vote or
so) and list the column which contain non-default values, the
"significant columns".
We then put all rows with the same default value and the same set of
significant columns in the same partition, and reassemble the rows into
matrices, the submatrices.
These submatrices resulting from this process allow various obvious
compressed implementations.
At the top level there is a large case clause (access cost $O (1)$?)
over the row number to get to the right matrix.
T.G. Lang,
J.T. O'Quin(n??),
R.O. Simpson,
Threaded code interpreter for object code,
IBM Technical Disclosure Bulletin,
vol. 28,
#10,
March 1986,
pp. 4238-4241.
Incremental compilation is effected using a form of threaded code, in
the following setting (which is not explained in but can be inferred
from the paper).
Machine code of "the old machine" must be run on "the new machine";
neither machine is specified in the paper, but both are byte-oriented
and the new machine is a pipe-lined architecture with a large virtual
memory.
The code of the old machine is allowed to be self-modifying; this
precludes static object-to-object conversion.
The machine code of the old machine is loaded into an array O on the
new machine, and a new array N is allocated that is 8 times as large;
the factor 8 derives from expedience, not from principles.
The idea is to store in the $8m$ bytes
$ N [ 8 p .. 8 p + 8 m ] $ the new machine code that performs the same
operation as the old m-byte long machine instruction in
$ O [ p .. p + m ] $.
In fact, each segment of 8 bytes in N corresponds to one byte in O;
such an 8-byte segment in N is called a cell.
At Initial Program Loading (of the old machine, simulated on the new
machine) , each 8-byte cell is initialzed to two instructions, one that
loads the address of the corresponding byte in O into a fixed register
$oldIP$ and one that jumps to the interpreter / incremental compiler.
Rather than emulating the old machine instruction at $ O [ oldIP ] $, the
interpreter / incremental compiler inserts new machine code equivalent
to that instruction into the cells corresponding to the bytes of the old
instruction, fills the unused part of these cells with NOOPs, and jumps
to the beginning of the first of thes cells.
The (new) machine now executes the machine instructions just assemled
there and at the end runs into the first cell corrsponding to the next
instruction in the old code.
This cell will again hold a "load oldIP; jump to interpreter" sequence,
etc., until it runs into a cell that has already been filled by this
process; then program execution continues without interpreter
intervention.
This greatly speeds up the interpretation process.
This scheme is improved / extended in three ways.
1. The "load oldIP; jump to interpreter" sequence is replaced by a BX
instruction, "branch while executing next instruction", apparently
featured by the new machine.
2. If the cells in the new memory corresponding to the old instruction
are not large enough to contain the corresponding new code, the
remaining code is spilled to an overflow area, with jumps there and back
again.
(This is where the factor 8 as a compromise comes in.)
3. The old memory array O is write-protected (a feature of the new
machine).
A write is caught by the hardware; the interrupt routine sets the
cell(s) corresponding to the affected byte(s) to the initial BX
instruction (thus redirecting the flow of control to the interpreter for
each of the bytes) and then performs the write.
There are no literature references.
F. Chow,
M. Himelstein,
E. Killian,
L. Weber,
Engineering a RISC compiler system,
in Thirtyfirst IEEE Computer Society International Conference,
IEEE,
Feb. 1986,
pp. 132-137.
An optimizing compiler can reduce the register saving/restoring (RSR)
traffic during function calls by changing the allocation of variables to
registers for leaf functions. ...
Simon L. Peyton Jones,
Parsing distfix operators,
Communications of the ACM,
vol. 29,
Feb. 1986,
#2,
pp. 118-122.
A distfix operator is an operator which is distributed over its operands;
examples are if _ the _ else _ fi and rewrite _ as _ using _ end.
It is useful to allow users to declare such operators, especially in
functional languages.
Such distfix operators are introduced in a functional language using two
devices.
First the keywords of a distfix operator are given different representations
depending on their positions: prefix keywords are written with a trailing dot,
infix ones with a leading and a trailing dot, and postfix ones with a leading
dot; so the user is required to write rewrite. x .as. y .using. z .end.
These forms are recognized by the lexical analyzer, and given the token
classes PRE_TOKEN, IN_TOKEN, and END_TOKEN.
Second, generic rules are written in yacc to parse such structures.
Axel T. Schreiner,
H. George Friedman Jr.,
Introduction to Compiler Construction with UNIX,
Prentice-Hall,
Englewood Cliffs,
1985,
pp. 194.
This album-sized book shows the construction of a very simple compiler
for a subset of C called `sampleC', in eight chapters:
language definition,
word recognition,
language recognition,
error recovery,
semantic restrictions,
memory allocation,
code generation, and
in-core code generation.
The resulting compiler generates code for a simple stack machine;
registers are delegated to an exercise (8.2).
The tools used are Lex and Yacc; neither of these is introduced in its
own right and their usage is explained as far as required by the main
story line.
The book seems a direct printout of lecture notes and often reads as a
workbook for a compiler construction lab.
No theory.
Per Brinch Hansen,
Brinch Hansen on Pascal Compilers,
Prentice-Hall,
1985,
pp. ???.
Joe Snyder <joe@semaphorecorp.com> writes:
Another light-on-theory
heavy-on-pragmatics here's-how-to-code-it book. The author presents the
design, implementation, and complete source code for a compiler and p-code
interpreter for Pascal- (Pascal "minus"), a Pascal subset with boolean and
integer types (but no characters, reals, subranged or enumerated types),
constant and variable definitions and array and record types (but no packed,
variant, set, pointer, nameless, renamed, or file types), expressions,
assignment statements, nested procedure definitions with value and variable
parameters, if statements, while statements, and begin-end blocks (but no
function definitions, procedural parameters, goto statements and labels,
case statements, repeat statements, for statements, and with statements).
The compiler and interpreter are written in Pascal* (Pascal "star"), a
Pascal subset extended with some Edison-style features for creating
software development systems. A Pascal* compiler for the IBM PC is sold by
the author, but it's easy to port the book's Pascal- compiler to any
convenient Pascal platform.
This book makes the design and implementation of a compiler look easy. I
particularly like the way the author is concerned with quality,
reliability, and testing. The compiler and interpreter can easily be used
as the basis for a more involved language or compiler project, especially
if you're pressed to quickly get something up and running.
Robin Hunter,
Compilers: Their Design and Construction Using Pascal,
John Wiley,
Chichester,
1985,
pp. 273.
A solid, down-to-earth book of limited coverage.
Excellent as introductory reading.
Has a good chapter on Storage Allocation.
Successor to (+ Pascal rewrite of)
Hunter, The Design and Construction of Compilers, 1981.
J.P. Tremblay,
P.G. Sorenson,
The Theory and Practice of Compiler Writing,
McGraw-Hill,
Singapore,
1985,
pp. 796.
Extensive and detailed.
Heavy reading.
To be consulted when other sources fail.
Robert E. Noonan,
An algorithm for generating abstract syntax trees,
Computer Languages,
vol. 10,
#3/4,
1985,
pp. 225-236.
Gives a list of seven heuristic rules to derive a grammar for abstract
syntax trees from a CF grammar.
Simple rules remove constant predictable leaves (THEN in an
if-statement, etc.); more complicated rules transform, for example,
right recursion into lists.
The rules have to be applied in order; whether they constitute an
algorithm is a matter of opinion.
They have been implemented in a system called TREEGEN.
David R. Hanson,
Compact recursive-descent parsing of expressions,
SP&E,
vol. 15,
#12,
Dec. 1985,
pp. 1205-1212.
Naive parsing of expressions with n levels of precedence by a top-down
parser requires n routines.
The paper gives a very clear exposition of the technique of telescoping
these into one routine and a table.
Essentially, the routine contains a while-loop with recursive calls to
itself for higher precedences; the while-loop gobbles up all
subexpressions with the same precedence.
Robert L. Bernstein,
Producing good code for the case statement,
SP&E,
vol. 15,
#10,
Oct. 1985,
pp. 1021-1024.
There are two subjects in the paper, not well separated.
1. Tricky code for the case statement for the PL.8 compiler for the IBM 801
RISC machine.
2. A heuristic implementation of cluster separation: enter the gaps into a
sorted heap, and separate at the maximum gaps as provided by the sorted heap
until the case density is sufficiently high.
Mahadevan Ganapathi,
Charles N. Fischer,
Affix grammar driven code generation,
ACM TOPLAS,
vol. 7,
#4,
Oct. 1985,
pp. 560-599.
Extension of the Graham-Glanville code generator, by steering the
parsing using affixes.
The description is extensive and thorough, supported by many examples.
In addition to the usual instruction and addressing mode selection, the
code generator also performs
constant folding,
identification of fancy instructions,
subsuming code within addressing modes,
suppressing redundant code,
instruction buffering and tracking, and
register assignment based on register tracking.
Special attention is paid to loop suppression (by arc removal),
syntactic and semantic blocking prevention (by requiring umbrella
productions), and shift-reduce conflict resolution (by always shifting
and falling back on the umbrella productions when in trouble).
There are also facilities for grammar factoring.
Their loop suppression algorithm may have potential for automatic
coercion system design derivation / checking.
Lawrence A. Harris,
Yannich Wets,
Resetting displays,
SIGPLAN Notices,
vol. 20,
#8,
pp. 73-77.
Aug. 1985,
After examining a number of implementations of access to non-local
variables, the authors points out that the display of a called routine
and that of its calling routine have the bottom $n-1$ elements in
common, where n is the level of the called routine.
This is utilized to update the (global) display at procedure entry and
exit by moving only one pointer.
This trick requires each procedure to exit properly; jumps out of
procedure will have to be led through the exit code.
In principle the trick does not work for calling procedures that come in
from a different environment (procedure parameters, procedure
variables), since in that case the displays of the called and calling
procedures may have less in common.
But the authors show that by leading the control through the proper
entry and exit code(s), it can be made to work.
Code for the PRIME 750 is given.
David Gelernter,
Generative communication in Linda,
ACM TOPLAS,
vol. 7,
#1,
Jan. 1985,
pp. 80-112.
The "generative communication" in the title is communication between
parallel and/or distributed processes by generating objects, otherwise
known as the Linda tuple model.
The model and some of its implementation problems are described.
P. Branquart,
A high-level intermediate code,
in Methods and Tools for Compiler Construction,
ed. by B. Lorho,
Cambridge University Press,
Cambridge,
1984,
pp. 317-343.
Describes H-code.
H-code makes explicit all flow of control and all data allocation (but
not register allocation!), and leaves everything else untouched.
It looks roughly like Fortran-II.
The paper shows how code can be generated from it for a variety of
machines.
Very dry exposition; hardly any example.
E. Morel,
Data flow analysis and global optimization,
in Methods and Tools for Compiler Construction,
ed. by B. Lorho,
Cambridge University Press,
Cambridge,
1984,
pp. 289-315.
A fundamental approach to the problem, awkward in style and not
extensive, but certainly worthwhile.
Maps all global optimization onto a single set of boolean equations,
which are then applied in some case studies, for example the suppression
of useless checks in Ada programs.
Global (= interprocedural) optimization information problems are
concerned with events in the code; they are classified as follows.
"All"-problems ask whether an event occurs on all paths from or to a
given point, "one"-problems ask whether an event occurs on at least one
such path.
"Backward"-problems are those in which only events before the given
point are considered, "forward"-problems consider only events on path
leaving the given point.
This gives four combinations.
For example, live variable analysis is a "one-forward"-problem.
Each combination corresponds to a specific form of the set of boolean
equations.
Pierre Boullier,
Syntax analysis and error recovery,
in Methods and Tools for Compiler Construction,
ed. by B. Lorho,
Cambridge University Press,
Cambridge,
1984,
pp. 7-44.
[[ preliminary abstract ]]
The description of the syntax analysis part is conventional.
Unconventional is the treatment of lexical analysis: it is explained in
terms of LR-items, which makes LR error recovery techniques applicable.
The discussion of error recovery is very clear, simple and effective.
W.M. Waite,
G. Goos,
Compiler Construction,
Springer-Verlag,
New York, NY,
1984,
A theoretical approach to compiler construction.
Refreshing in that it gives a completely new view of many subjects.
Heavy reading, high information density.
Thomas P. Murtagh,
A less dynamic memory allocation scheme for Algol-like languages,
Eleventh ACM Symposium on Principles of Programming Languages,
pp. 283-289.
1984,
The call graph is divided up into intervals.
Since there is no recursion within an interval, the intervals can be
considered as single procedures, even if you don't do inlining.
This means that only one activation record needs to be allocated for
each of them, and only one display pointer is needed.
Both effects are analysed in full, and algorithms given.
Also, when the call graph shows that a procedure is not recursive at
all, its data can be allocated statically.
Dick Grune,
How to produce all sentences from a two-level grammar,
Inform. Process. Lett.,
vol. 19,
Nov. 1984,
pp. 181-185.
All terminal productions are derived systematically in breadth-first
order.
The author identifies pitfalls in this process and describes remedies.
A parser is used to identify the hyperrules involved in a given
sentential form.
This parser is a general CF recursive descent parser to which a
consistency check for the metanotions has been added; it is not
described in detail.
M.V.S. Ramanath,
Marvin Solomon,
Jump minimalization in linear time,
ACM TOPLAS,
vol. 6,
#4,
Oct. 1984,
pp. 527-545.
The authors point out that unconditional jumps do no useful work: they
are an artefact of the code sequencing; the problem is to order the
straight code segments so that the number of unconditional jumps is
minimal.
A dissection of a graph into k paths is a set of k paths such that
each node occurs exactly once on exactly one path.
The code segments on one path can be compiled adjacently; going from one
path to another requires a jump.
The problem of finding a dissection of minimal k is NP-complete in
general.
Polynomial solutions exist for acyclic graphs and for reducible graphs.
This paper shows a linear time solution for structured flow graphs,
graphs resulting from programs containing only loops, if-then-else and
multilevel exit statements.
The algorithm does dynamic programming over a carefuly selelcted set of
subgraphs; its explanation and proof covers some ten pages.
A small drawback of the algorithm is that the entry point of the
produced code will not normally be at the beginning, so a jump is
required to enter the code.
J.W. Davidson,
C.W. Fraser,
Code selection through object code optimization,
ACM TOPLAS,
vol. 6,
#4,
Oct. 1984,
pp. 505-526.
Sequel to Davidson & Fraser,
"The design and application of a retargetable peephole optimizer",
ACM Trans. Prog. Lang. Syst., Vol. 2, No. 2, 1980, pp. 191-202.
The source code is translated naively into code for what is called here
an "intersection machine", a machine that has only the intersection of
the machine instructions that can be expected in practice.
It corresponds to the register transfers in the cited paper; each
register transfer sets a new pseudo-register, of which there are an
unbounded number.
This code is then processed by a sophisticated peephole optimizer, or
rather post-optimizer, which consists of three passes, called Catcher,
Combiner and Assigner.
Catcher eliminates common subexpressions in the register transfers, and
attaches last-use info to registers and nearest-use info to results.
The algorithm is not described; probably classic.
Combiner is the real peephole optimizer.
In principle it combines the register transfers of two and/or three
consecutive instructions into one formula and tries to find a machine
instruction with that pattern, as in the cited paper.
When it finds one, it replaces the register transfers by the formula
(rather than by the machine instruction).
In practice it follows precomputed tables.
What they contain and how they are precomputed is not revealed;
probably some finite state automaton.
Assigner assigns two things: machine instructions to the register
transfers and real registers to the pseudo-registers.
It introduces spill code where necessary.
The register assignment algorithm is not described.
There are two examples, one in which 16 register transfers are reduced
to 3 machine instructions for the VAX-11, and a larger one in which 52
register transfers yield 17 machine instructions.
The instruction quality looks quite good.
Jack W. Davidson,
Christopher W. Fraser,
Register allocation and exhaustive peephole optimization,
Softw. Pract. Exper.,
vol. 14,
#9,
Sept. 1984,
pp. 857-865.
In addition to the CSE elimination and peephole optimization on the
intermediate code, the authors also do CSE elimination and peephole
optimization on the object code, to catch machine-specific CSEs.
This necessitates a register allocator that works on optimized object
code.
The CSE catcher performs symbolic execution of the instructions, and
tries to match the symbolic expressions obtained this way.
If a match if found, the instructions casing the newer value are edited
away.
This works by virtue of the fact that an infinite number of
pseudo-registers is assumed to exist, so all old expressions
(in a basic block) are saved.
In addition the CSE catcher eliminates redundant loads and dead
variables, and dynamically estimates a window for the peephole
optimizer.
Its CSE algorithm is described in detail.
The register assigner then finds real registers for those
pseudo-registers that are really required; the algorithm is not
described in detail.
I do not understand why the title has the word "exhaustive" in it.
A.S. Tanenbaum,
E.G. Keizer,
H. van Staveren,
Does anybody out there want to write HALF of a compiler?,
ACM SIGPLAN Notices,
vol. 19,
#8,
pp. 106-108.
Aug. 1984,
Short description of ACK, for prospective compiler writers.
Michael L. Powell,
A portable optimizing compiler for Modula-2,
in ACM SIGPLAN '84 Symposium on Compiler Construction,
ACM SIGPLAN Notices,
vol. 19,
#6,
June 1984,
pp. 310-318.
Straightforward compiler, based on "best simple" design: take the best
of the simple solutions for each problem.
This results in a competitive compiler, but does not always provide a
solution: the author reports that a "best simple" approach to inlining
is still missing.
The various design decisions are each explained briefly.
David C. Robbins,
Engineering a high-capacity Pascal compiler for high performance,
in ACM SIGPLAN '84 Symposium on Compiler Construction,
ACM SIGPLAN Notices,
vol. 19,
#6,
June 1984,
pp. 300-309.
Design and implementation of a Pascal compiler for programs of say 1000
modules of 1000 lines each, with minimal recompilation requirements.
Organized around a data base of the source code, plus a very general
graph manipulation package.
Efficiency is achieved by tuning using powerful profiling tools.
Frederick Chow,
John Hennessy,
Register allocation by priority-based coloring,
in ACM SIGPLAN '84 Symposium on Compiler Construction,
ACM SIGPLAN Notices,
vol. 19,
#6,
June 1984,
pp. 222-232.
Starting from a global (i.e. procedure-wide) interference graph in which
the nodes are live ranges, registers are assigned to the ranges as
follows.
We assume r registers.
Note that the ranges consist of basic blocks.
First all nodes that are connected to less than r other nodes are set
aside; these can be coloured trivially.
This leaves the harder to allocate ranges; for the moment they are all
still uncoloured.
For each of the uncoloured ranges, the gain of putting it in a register
is calculated, using a store-and-load cost function for each block in
the range, weighted by the loop depth of the range.
The range with the highest gain is now coloured.
If this results in the existence of a node that is connected to
exactly r coloured nodes, that node is now uncolourable, and its range
will be split, as follows.
Its (or one of its) starting blocks is taken as a new range, and blocks
connected to it in the old range are moved to the new range; this
process is continued as long as the new range is connected to less than
r ranges.
Essentially, we now start over again (but many data are still usable).
Measurements show that register allocation reduces running times by
20-25 percent.
Christopher W. Fraser,
Eugene W. Myers,
Alan L. Wendt,
Analysing and compressing assembly code,
in ACM SIGPLAN '84 Symposium on Compiler Construction,
ACM SIGPLAN Notices,
vol. 19,
#6,
June 1984,
pp. 117-121.
The space-saving optimizations `procedural abstraction' (in which
stretches of code that occur more than once are replaced by subroutine
calls) and cross-jumping (in which the merge point of two code sequences
is moved backwards if the two code sequences end in the same
instruction(s)), are generalized into a general assembly code
compression algorithm.
The actual algorithm used is a variant of McCreight's linear suffix tree
construction algorithm.
Since many conditions have to be fulfilled before a duplicate piece of
code can be turned into a subroutine (may not permanently alter the
stack level, may not contain non-local labels or goto's, etc.), a
priority queue of segments is constructed, from which segments are
either accepted or divided up and reentered.
Compression ratios range from 0-30 percent; compressed code took 1-5
percent more CPU time but 11 percent less real time.
An additional option is to set the compressor for turning every non-jump
instruction into a subroutine, thus effectively generating threaded
code.
Jack W. Davidson,
Christopher W. Fraser,
Automatic generation of peephole optimizations,
in ACM SIGPLAN '84 Symposium on Compiler Construction,
ACM SIGPLAN Notices,
vol. 19,
#6,
June 1984,
pp. 111-116.
Preliminary version of Davidson & Fraser,
"Automatic inference and fast interpretation of peephole optimization
rules",
SPE, Vol. 17, No. 11, 1987, pp. 801-812.
Patterns for a peephole optimizer are, at least in principle, generated
automatically, as follows.
For each machine instruction, a semantic description in the form of a
few simple formulas is given, describing its actions on registers,
condition code and memory.
For each possible combination of two and three of instructions, the
effect is calculated using symbolic formula manipulation, and the
result is compared to those of the existing instructions: an equivalence
signals a peephole optimization pattern.
(It seems to me that this will not (easily) derive patterns like:
[reg := reg * 4] → [reg := reg << 2].)
Since this compiler generation time process is very slow and generates
many patterns that will never be used, an approach is described in which
interesting code sequences are derived from code actually generated, and
symbolic manipulation is used to derive patterns for these.
The learning curve levelled off after about 17000 proposed pattern with
627 useful patterns for the VAX.
In an appendix, the authors provide an example in which 4 instructions
are reduced to one.
It is remarkable that the final code generated is the same as would be
generated by
Aho, Ganapathi, Tjiang,
"Code generation using tree pattern matching and dynamic programming",
TOPLAS, 11, 4, Oct. 1989, 491-516,
and that the code is derived in essentially the same way, with tree
pattern matching taking the place of symbolic manipulation.
The paper probably derives from the first author's PhD thesis (Univ. of
Arizona, Dec. 1981).
Robert R. Kessler,
Peep -- an architectural description driven peephole optimizer,
in ACM SIGPLAN '84 Symposium on Compiler Construction,
ACM SIGPLAN Notices,
vol. 19,
#6,
June 1984,
pp. 106-110.
Each machine instruction is described by an equation, a space cost and a
time cost.
The basic algorithm is as follows.
At compiler generation time, for each pair of instructions an attempt is
made to combine them: the equation of the first is substituted in the
equation of the second, if possible, and the data base is searched for
an instruction that has a matching equation.
If substitution is possible, if a matching instruction can be found, and
if its space/time cost is less than that of the combination, it is
registered in a table as the peephole replacement of the original pair.
This table is used at compile time to replace instructions quickly.
Several other optimizations are discussed.
1. If substitution is not possible, there still may be an instruction
that combines the effects of both, for example a move-double-word
instruction.
2. At compile time, Peep combines instruction pairs even if there are
non-interfering instructions between them, using the notion of "logical
adjacency".
3. At compile time, Peep does a modest global scan to combine
instruction pairs between basic blocks.
Thomas Johnsson,
Efficient compilation of lazy evaluation,
in ACM SIGPLAN '84 Symposium on Compiler Construction,
ACM SIGPLAN Notices,
vol. 19,
#6,
June 1984,
pp. 58-69.
Detailed description of the G machine, which is both an intermediate
code to be translated to machine instructions (examples for the VAX
given) and to be implemented in hardware.
It features a number of short-cuts to avoid many intermediate graph
rewritings.
Joseph A. Fisher,
John R. Ellis,
John C. Ruttenberg,
Alexandru Nicolau,
Parallel processing: a smart compiler and a dumb machine,
in ACM SIGPLAN '84 Symposium on Compiler Construction,
ACM SIGPLAN Notices,
vol. 19,
#6,
June 1984,
pp. 37-47.
Preliminary partial text of Ellis's Bulldog book.
1. Combine basic blocks into much longer traces.
2. Find parallelism in these traces.
3. Distribute over processors, CDC-like.
Thomas W. Christopher,
Philip J. Hatcher,
Ronald C. Kukuk,
Using dynamic programming to generate optimized code in a Graham-Glanville style code generator,
in ACM SIGPLAN '84 Symposium on Compiler Construction,
ACM SIGPLAN Notices,
vol. 19,
#6,
pp. 25-36.
June 1984,
To better handle ambiguity, looping, syntactic blocking and semantic
blocking, the LR parsing of Graham-Glanville is replaced by Earley
parsing.
The best code sequence is selected from among the multiple parsings by
dynamic programming, similar to Lyon's minimum cost error recovery.
Two implementations, one by Kukuk and one by Hatcher, are described.
In tests, the system achieved slightly better code than "compiling on
the stack", at the expense of a loss of a factor 15 to 20 in compilation
speed; after all, Earley is $O ( n sup 3 )$, especially for very
ambiguous grammars.
Philippe Aigrain,
Susan L. Graham,
Robert R. Henry,
Marshall Kirk McKusick,
Eduardo Pelegrí-LLopart,
Experience with a Graham-Glanville style code generator,
in ACM SIGPLAN '84 Symposium on Compiler Construction,
ACM SIGPLAN Notices,
vol. 19,
#6,
June 1984,
pp. 13-36.
About the incorporation of a Graham-Glanville style code generator in
the UNIX Portable C Compiler pcc.
The authors have chosen to keep strictly to the syntactic approach: all
semantic conditions have been incorporated in the syntax by enumeration
over finite set attributes.
The keep the machine description small, a preprocessor is used to do the
enumeration.
Many techniques and tricks are described to keep the resulting tables
small in the face of a considerable actual machine description syntax.
All this results for the VAX-11 in a machine description of 246 grammar
rules; the speed of the generated code is the same as that generated by
the original pcc for the VAX-11.
Michael Karr,
Code generation by coagulation,
in ACM SIGPLAN '84 Symposium on Compiler Construction,
ACM SIGPLAN Notices,
vol. 19,
#6,
June 1984,
pp. 1-12.
A flow graph of preferably the entire program is assumed, with frequency
data about each arc; the nodes are assumed to be of a size for which
optimal code can be generated.
The basic loop of the code generator is now to find the most frequently
used uncompiled arc,
compile its entry node if still uncompiled,
compile its exit node if still uncompiled,
and compile the arc; this coagulates the code starting from centers of
importance.
"Compile" in the last three actions must be understood as "compile
optimally under the conditions imposed by the compiled neighbours".
Optimality is defined under an explicit cost metric.
The bulk of the paper considers techniques for this type of compilation.
No experimental results are given since the technique has not been
implemented, regrettably.
Timothy A. Budd,
An APL compiler for a vector processor,
ACM TOPLAS,
vol. 6,
#3,
July 1984,
pp. 297-313.
Fairly straightforward compiler, which matches APL vector operations to
machine vector operations.
It cuts up the program vectors if they do not fit in the machine
vectors.
Like Budd's sequential APL compiler (for UNIX), it features
"dragthrough".
This technique combines several APL operators into one operator.
For example, R←A+B*C is converted to something like
R = do_plus_and_times(A,B,C).
The function do_plus_and_times then runs down the three arrays, doing
the computations on the elements.
This saves on intermediate results and on data motion.
The effect can be increased by doing inline substitution, to increase
the number of operators in a line.
A.S. Tanenbaum,
H. van Staveren,
E.G. Keizer,
J.W. Stevenson,
Description of a Machine Architecture for use with Block Structured Languages,
IR-81,
Aug. 1983,
Vrije Universiteit,
Amsterdam,
The EM Manual.
Ravi Sethi,
Control flow aspects of semantics-directed compiling,
ACM TOPLAS,
vol. 5,
#4,
Oct. 1983,
pp. 553-595.
A very civilized version of denotational semantics is integrated with
yacc to produce control flow graphs for C programs in a very clear way.
More in particular, the grammar and the denotational semantics are
combined in a specification in a language called Plumb (since it
considers functions to be pipes); the specification is then translated
into a yacc program.
The system can handle many forms of flow control, including "gotos that
can go anywhere".
Andrew S. Tanenbaum,
Hans van Staveren,
E.G. Keizer,
Johan W. Stevenson,
A practical toolkit for making portable compilers,
Commun. ACM,
vol. 26,
#9,
pp. 654-660.
Sept. 1983,
The UNCOL idea is restricted to algebraic languages (languages of the
Algol / Pascal lineage) and to 8-bit byte-addressable machines.
The resulting UNCOL is called EM (Encoding Machine), an assembly code
for a simple stack (pseudo-)machine.
Actually a family of EM-machines is defined, for various word lengths
and alignment requirements.
The Amsterdam Compiler Kit consists of a set of programs to generate
programs that create, manipulate and process EM code, based on machine
descriptions.
The generated programs can be chained in pipelines in various ways, to
obtain interpreters, compilers, optimizing compilers, etc.
The compilers include assemblers and even a linker.
David R. Hanson,
Simple code optimizations,
SP&E,
vol. 13,
#8,
Aug. 1983,
pp. 745-763.
An optimization is defined as "simple" if it
1. requires no major changes to the compiler;
2. causes no more than 10 percent increase in run time and memory
requirements of the compiler;
3. can be performed locally (e.g.no next-use info required).
The author reports three simple optimizations applied in the Y compiler
(Y is a kind of C with a kind of Ratfor syntax).
Expression rearrangement applies a set of transformations to the DAG that
results from an expression.
These patterns do constant folding,
simple simplifications (e.g. $ ( + ~ x ~ 0 ) → x $),
rewrites to canonical form (e.g. $ ( + ~ C ~ x ) → ( + ~ x ~ C ) $),
and
addressing patterns (e.g. $ ( @+ ~ (&x) ~ 0 ) → (&x) $), where $@+$ is
address addition (written $++$ in the paper!).
Lazy instruction selection.
The DAG is processed bottom-up (and probably fromm the end on).
If the code to be generated is of a form that could possibly be an
operand of another instruction (e.g. $ R sub 2 + 1 $), no code is
generated and the node is replaced by a summary of its value.
When finally a real machine instruction must be generated, its operands
may be of acceptable form for that instruction, in which case they are
incorporated in the instruction, or they may not, in which case adapting
code is generated.
Resource caching.
Register tracking, common subexpression elimination etc., are combined
in a general resource cache.
This improves code considerably, but causes problems with some language
constructs which invalidate the cache (e.g. if-statements).
Ad-hoc solutions better than discarding the entire cache contents are
given.
D.W. Jones,
Assembly language as object code,
SP&E,
vol. 13,
#8,
Aug. 1983,
pp. 715-725.
[ about an assembly language that is so simple you can do linkage on it ]
Andrew S. Tanenbaum,
Hans van Staveren,
E.G. Keizer,
A Unix toolkit for making portable compilers,
USENIX Toronto 1983 Summer Conference,
Toronto, Canada,
July 1983,
pp. 255-261.
Views the Amsterdam Compiler Kit (ACK) as taking the UNCOL idea and
"work out the details and make a practical implementation that runs on
UNIX".
Describes the phases of ACK in short paragraphs.
No literature references.
Norman H. Cohen,
Eliminating redundant recursive calls,
ACM Trans. Programming Languages and Systems,
vol. 5,
#3,
July 1983,
pp. 265-299.
Many recursive program schemata consist of two alternatives: a recursive
call with transformed arguments and an escape hatch, which terminates
the recursion.
A family of DAGs is constructed for such schemata; such DAGs are often
periodic and the periodicity can be exploited to eliminate recursive
calls.
Several transformations are given and treated extensively to obtain and
exploit this periodicity.
Steven P. Reiss,
Generation of compiler symbol processing mechanisms from specifications,
ACM TOPLAS,
vol. 5,
#2,
April 1983,
pp. 127-163.
A set of library routines is described to answer the question "In a
given SCOPE, to what OBJECT does a given NAME refer?" concerning
a variety of programming languages.
The answer is
OBJECT :=
RESOLVE(
RESTRICT(
LOOK_UP(NAME),
Object_Class(NAME),
Visibility(SCOPE, NAME),
Signature(NAME)
)
);
in which LOOK_UP() and RESTRICT() yield object sets.
I did not get the role of the outermost RESOLVE() upon cursory reading.
Many more manipulation routines exist.
A special language is shown to generate complicated calls to these
routines.
Also, a higher-lvel language is described for formulating the scope and
identification rules.
The system, which is part of the ACORN project,is powerful enough to
create symbol processors for languages like Modula-2, Ada, Algol 68,
the Red language, etc.
A formal description of the operations provided, and a complete symbol
processing specification for Ada (7 pages) conclude the paper.
J.R. Allen,
Ken Kennedy,
Carrie Porterfield,
Joe Warren,
Conversion of control dependence to data dependence,
in Tenth ACM Symposium on Principles of Programming Languages,
ACM,
New York,
Jan. 1983,
pp. 177-189.
About data flow in vector machines
Stefan M. Freudenberg,
Jacob T. Schwartz,
Micha Sharir,
Experience with the SETL optimizer,
ACM TOPLAS,
vol. 5,
#1,
Jan. 1983,
pp. 26-45.
The annotations described in Dewar et al.,
"Programming by refinement, as exemplified by the SETL representation sublanguage",
ACM Trans. Program. Lang. Syst., Vol. 1, No. 1, July 1979, pp. 27-49,
and the automatic technique described in Schonberg et al.,
"An automatic technique for selection of data representations in SETL programs",
ACM Trans. Program. Lang. Syst., Vol. 3, No. 2, Apr. 1981, pp. 126-143,
are combined into one optimizer.
Many of its techniques and optimizations are described.
The great gains lie in choosing simpler data representations and better
operator identification; the first gains obvious efficiency and both
avoid a lot of run-time tests.
They result in a speed-up of a factor of 2 to 8, although for small
inputs other effects dominate, including input/output.
At present the optimizer is too slow to be part of the standard
compilation pipeline.
Patricia Anklam,
David Cutler,
Roger Heinen Jr.,
M. Donald MacLaren,
Engineering a compiler; VAX-11 code generation and optimization,
Digital Press,
Bedford, Mass.,
1982,
pp. 269.
Bootstrapping a PL/I compiler running under MULTICS, to a VAX-11.
The back-end was then isolated to serve other languages.
A partly narrative account of how to redo a code-generator, including
optimization.
No bibliography.
J. Lewi,
K. de Vlaminck,
J. Steegmans,
A programming methodology in compiler construction, Part 1/2,
North-Holland,
Amsterdam,
1982/1979,
pp. 569/308.
Extensive and in-depth analysis of the Language Implementation
Laboratory LILA, both a software-writing tool to construct compilers and
a pedagogical aid in a course on compiler construction.
Part 1 explains the principles of the compiler generator, an ELL(1)
attributed push-down transducer; part 2 describes the implementation and
gives examples of its use.
Uwe Kastens,
Brigitte Hutt,
Erich Zimmermann,
GAG: A Practical Compiler Generator,
Lecture Notes in Computer Science #141,
Springer Verlag,
Heidelberg,
1982,
pp. 156.
GAG (Generator based on Attribute Grammars, these people don't speak
English or they would have chosen a different acronym) is based on
ordered attribute LALR(1) grammars.
It reads a description in Aladin, and turns out a processor which parses
the input, constructs an attributed tree and performs a number of visit
sequences to compute the attribute vales.
Two to four visit sequences were required for common programming
languages, for example Pascal (4), Ada (3) and Lisp (2).
The processor incorporates various optimizations.
The system is written in Pascal and generates Pascal.
About 10% of the book is introduction, 40% is manual plus advice and 50%
is the Aladin code for a Pascal compiler.
M. Ganapathi,
C.N. Fischer,
J.L. Hennessy,
Retargetable compiler code generation,
Computing Surveys,
vol. 14,
#4,
Dec. 1982,
pp. 573-592.
More properly called "The history of retargetable compiler code
generation".
Much historical detail, little technical information.
John L. Hennessy,
Noah Mendelsohn,
Compilation of the Pascal case statement,
SP&E,
vol. 12,
#9,
Sept. 1982,
pp. 879-882.
Identifies clusters in the range of the case values and generates a binary
decision tree over the clusters; the clusters are then implemented as jump
tables.
The clusters are identified by repeatedly finding a split point between them,
where a split point is defined as a point where the two parts plus the
overhead are cheaper than the entire range.
R. Paige,
S. Koenig,
Finite differencing of computable expressions,
ACM TOPLAS,
vol. 4,
#3,
July 1982,
pp. 402-452.
The theory of generalized operator strength reduction, applied to many
algorithms, in an almost book-size paper.
Contains tables of transformations.
Andres Rudmik,
Barbara G. Moore,
An efficient separate compilation strategy for very large programs,
in ACM SIGPLAN '82 Symposium on Compiler Construction,
ACM SIGPLAN Notices,
vol. 17,
#6,
June 1982,
pp. 301-307.
[ software-engineering aspects of separate compilation of Chill ]
Ravi Sethi,
Control flow aspects of semantics-directed compiling,
in ACM SIGPLAN '82 Symposium on Compiler Construction,
ACM SIGPLAN Notices,
vol. 17,
#6,
June 1982,
pp. 245-260.
Preliminary version of Sethi,
"Control flow aspects of semantics-directed compiling", ACM TOPLAS,
Vol. 5, No. 4, Oct. 1983, pp. 553-595.
Thomas M. Morgan,
Laurence A. Rowe,
Analyzing exotic instructions for a retargetable code generator,
in ACM SIGPLAN '82 Symposium on Compiler Construction,
ACM SIGPLAN Notices,
vol. 17,
#6,
June 1982,
pp. 197-204.
The system EXTRA is a program transformation system that can assist in
proving that certain language features (for example string copy) can be
translated into certain exotic machine instructions (for example block
move), and if they cannot, how to modify either to obtain a fit.
David W. Krumme,
A practical method for code generation based on exhaustive search,
in ACM SIGPLAN '82 Symposium on Compiler Construction,
ACM SIGPLAN Notices,
vol. 17,
#6,
June 1982,
pp. 185-196.
Expression trees are translated using exhaustive search.
This is feasable since there are usually not many ways to translate a
given form.
The system does evaluation order, instruction selection and register
allocation in an integrated way.
Its speed is moderate, the code it generates is good, but not excellent
since it does not exploit associativity and common subexpressions.
Harald Ganzinger,
Robert Giegerich,
Ulrich Möncke,
Reinhard Wilhelm,
A truly generative semantics-directed compiler generator,
in ACM SIGPLAN '82 Symposium on Compiler Construction,
ACM SIGPLAN Notices,
vol. 17,
#6,
June 1982,
pp. 172-184.
Details of the MUG2 system.
Rodney Farrow,
Linguist-86 -- Yet another translator writing system based on attribute grammars,
in ACM SIGPLAN '82 Symposium on Compiler Construction,
ACM SIGPLAN Notices,
vol. 17,
#6,
June 1982,
pp. 160-171.
Has its own built-in paging system (sign of the times), in step with the
evaluation order (alternating passes).
Some considerable effort is spent on reducing the amount of data copied
to the paging disk.
As a further optimization, all static attributes with the same name are
mapped onto the same global variable, to eliminate attribute copying,
which was diagnosed to form 40 to 60 percent of all attribute activity.
Several other features are introduced to create a workable
attribute-oriented compiler generator.
Has no built-in programming language; anything not recognized is code.
Is written in itself, in the best of compiler construction traditions.
Kai Koskimies,
Kari-Jouko Räihä,
Matti Sarjakoski,
Compiler construction using attribute grammars,
in ACM SIGPLAN '82 Symposium on Compiler Construction,
ACM SIGPLAN Notices,
vol. 17,
#6,
June 1982,
pp. 153-159.
Since the symbol table is more or less a global variable, it does not
fit well in the attribute framework.
The idea is to give each non-terminal an inherited attribute
$ env sub i $ which describes the environment prior to the non-terminal
and a synthesized attribute $ env sub s $ which describes the
environment post the non-terminal, so to speak.
Examples are given and a systematic approach is suggested.
Memory cost is also addressed; the preliminary verdict is: not good but
not bad either.
G.J. Chaitin,
Register allocation and spilling via graph coloring,
in ACM SIGPLAN '82 Symposium on Compiler Construction,
ACM SIGPLAN Notices,
vol. 17,
#6,
June 1982,
pp. 98-105.
1. The inference graph is implemented both as a bit matrix and as a set
of adjacency lists.
2. In deciding which node to spill when stuck, frequency estimates are
used.
3. The time required to rebuild the inference graph after node spilling
is found to be no problem; the time is dominated by the initial building
of the graph, all following graphs are a lot smaller.
Explicit code in SETL is given.
Rudolf Landwehr,
Hans-Stephan Jansohn,
Gerhard Goos,
Experience with an automatic code generator generator,
in ACM SIGPLAN '82 Symposium on Compiler Construction,
ACM SIGPLAN Notices,
vol. 17,
#6,
June 1982,
pp. 56-66.
Graham-Glanville for the Siemens 7000, compiling Pascal.
Susan L. Graham,
Robert R. Henry,
Robert A. Schulman,
An experiment in table-driven code generation,
in ACM SIGPLAN '82 Symposium on Compiler Construction,
ACM SIGPLAN Notices,
vol. 17,
#6,
June 1982,
pp. 32-43.
Quite explicit description of the details of a Graham-Glanville code
generator for the VAX-11, in four phases each subdivided into three
subphases.
For example, in such a subphase, the branches of the expressions are
ordered in advance using Sethi-Ullman numbering, and trees that require
more registers than available are isolated and spill code is introduced.
Several improvements are suggested.
The-Thanh Nguyen,
E. Walter Raschner,
Indirect threaded code used to emulate a virtual machine,
ACM SIGPLAN Notices,
vol. 17,
#5,
May 1982,
pp. 80-89.
A general-purpose virtual machine is implemented using indirect threaded
code.
The storage efficiency of this machine on a sample program (105 words)
is compared with the same machine implemented with threaded code (171
words) and with the UCSD-p machine for Pascal (60 words).
The UCSD-p implementation wins, since it is special-purpose.
Butler W. Lampson,
Fast procedure calls,
in Symposium on Architectural Support for Programming Languages and Operating Systems,
ACM SIGPLAN Notices,
vol. 17,
#4,
pp. 66-76.
April 1982,
In order to unify procedure call and return, coroutine transfers,
exceptions, process switches, parameter access, etc., in a portable way,
a data type context is introduced, with one operation on values of
that type: XFER.
A context describes anything that control can be transferred to;
examples are activation records and routine preludes and postludes.
It is up to the context to decide where control is transferred when it
is finished; it may transfer to caller, or do something else.
The operation XFER() has three parameters, the destination context, a
return context, and a parameter list.
(The paper gives XFER only one parameter, the destination context,
and passes the other parameters through global variables, no doubt in
imitation of the Mesa implementation.)
Contexts are allocated in the heap, unless a better scheme is possible.
The ADT context allows implementation of all desired transfers.
For example, a call to a procedure P is implemented as a transfer to a
context creation, which prepares the activation record for the new
incarnation of P, obtains the pointer to the code of P from the
parameter list, and XFERs to P; note that creation does not
modify the return context.
The paper gives a simple implementation, based on the usual stack,
program counter, frame pointer, and return address.
A very space-efficient implementation tuned to the needs of Mesa
programs is given next.
Mesa interfaces are just records holding pointers to available entities,
and can be passed as parameters.
Ideas are given for speeding up the XFER operation by using the
"Jump To Subroutine" machine instruction whenever feasible, and for
speeding up access to local variables.
No measurements are supplied.
R.C. Holt,
J.R. Cordy,
D.B. Wortman,
An introduction to S/SL: Syntax/Semantic Language,
ACM TOPLAS,
vol. 4,
#2,
April 1982,
pp. 149-178.
The syntax part of S/SL is a top-down parser generator; the actions in
the parser are restricted to parameterless calls to operations on
objects defined in the semantic part.
This provides a clear separation of syntax and semantics.
The objects are programmed in Pascal.
The communication between the parser and the objects and between the
objects themselves is remarkably meager.
Objects cannot communicate with each other.
Grammar rules can provide information to objects, but cannot normally
get answers in return (but see below).
Grammar rules can optionally leave one result on a global result stack;
a value on this result stack can be used by a special choice action in
the syntax part which bases the choice between a number of alternatives
on the value obtained from the return stack rather than on an input
symbol.
This same choice action can also base its choice on a value obtained
from an object, the only way an object can return a value.
Code generation is done, either on the fly or at the end, by
interrogating the objects and using the answers as directives on what
code to generate.
The authors "prove" that S/SL is equivalent to a Turing machine.
They also make it plausible that the top-down mechanism + choice action
is strong enough to recognize the deterministic languages, and conclude
from this that S/SL is equivalent to LR(k).
That exploiting this equivalence may entail an amount of work
exponential in k is not mentioned
(the authors also imply that yacc is LR(k)).
Andrew S. Tanenbaum,
Hans van Staveren,
Johan W. Stevenson,
Using peephole optimization on intermediate code,
ACM TOPLAS,
vol. 4,
#1,
Jan. 1982,
pp. 21-36.
Makes the point that since the front end is language-dependent and the
back end is machine-dependent, the proper place to do peephole
optimization is on the intermediate code, since it is fixed: any
improvement on it is pure profit.
The intermediate code in question is EM; it is described briefly.
The 123 peephole optimization patterns used are shown and annotated.
The pattern matching algorithm is not described, but the text suggests
that the patterns are organized in a tree, which is searched from the
top for current and following instructions.
The optimizer processes about 1000 EM instructions per second on a
PDP-11/70.
The result is a 16 percent reduction in EM instructions and a 14 percent
reduction in subsequent PDP-11 instructions.
There is a letter by Steven Pemberton, ACM TOPLAS, Vol 5, No. 3, p. 499,
pointing out that the proposed optimizations that reorder the operands
of arithmetic operations are unsafe as to integer and floating point
overflow.
The answer by the first author supplies corrected patterns.
R. Giegerich,
Automatic generation of machine specific code optimizers,
in Ninth ACM Symposium on Principles of Programming Languages,
Jan. 1982,
Albuquerque, New Mexico,
pp. 75-81.
Author's abstract:
While compiler generating systems are gradually moving towards production
quality, the compiler subtask of machine code optimization has so far
withstood automation (except for [Fra79], see below). The necessity for such a
phase comes from the fact that code generators produce code in a piecemeal
fashion, in order to limit the otherwise combinatorially growing amount of
case analysis to be done. The emitted code sequences may be optimal when
considered by themselves, but allow (if not require) optimization when placed
in the context of others ([PQC80], [Fra79]). This effect is especially severe
where, for reasons of retargetability, code for various processors is
generated from a common low-level intermediate form by some kind of macro
expansion.
Robin Hunter,
The Design and Construction of Compilers,
John Wiley,
Chichester,
1981,
pp. 272.
Likeable introductory book in Algol 68 notation, with lots of sound
advice.
Old-fashioned in some sense, but still quite worth reading.
Predecessor to
Hunter, Compilers: Their Design and Construction Using Pascal, 1985.
R.E. Berry,
Program Language Translation,
Ellis Horwood,
Chichester,
1981,
pp. 175.
As it says on the inside cover:
"It is an overview, rather than an exhaustive study, and will satisfy
the curiosity of the rapidly increasing number of computer users about
the compilers or assemblers which process the programs."
The book is somewhat reminiscent of Hopgood [1971], and is based on the
Pascal S compiler.
No literature references.
Not a serious book (unlike Hopgood!)
Steven S. Muchnick,
Neil D. Jones,
Program Flow Analysis,
Prentice-Hall,
Englewood Cliffs, N.J.,
1981,
pp. 418.
Ten chapters/papers by different authors, arranged in four parts:
introduction,
applications to program optimization,
applications to software engineering, and
theory.
The bibliography is impressive.
F.E. Allen,
J. Cocke,
K. Kennedy,
Reduction of operator strength,
in Program Flow Analysis,
ed. by Steven S. Muchnick and Neil D. Jones,
Prentice-Hall,
Englewood Cliffs, N.J.,
1981,
pp. 79-101.
The text starts by treating simple operator strength reduction.
A variable i is an induction variable if its value is only changed in
two kinds of assignments: $ i := j ± k $ and $ i := ± j $ in which
j and k are either region constants or induction variables.
The region is then searched for other occurrences of i and if such an
occurrence is sufficiently simple, strength reduction can be applied.
Ten such cases are identified and replacements are given.
Next the technique is generalized to any kind of expression, but
limitations of that generalization are shown.
Returning to the simple case and using use-definition chains, an
algorithm is then derived and shown in full that does the job.
Ken Kennedy,
A survey of data flow analysis techniques,
in Program Flow Analysis,
ed. by Steven S. Muchnick and Neil D. Jones,
Prentice-Hall,
Englewood Cliffs, N.J.,
1981,
pp. 5-54.
Global data flow analysis gives rise to sets of forward-flow and
backward-flow equations; constructing these equations from the code is
fairly straightforward (covered in 13 pages), but solving them is not.
Nine algorithms for solving data flow equations are explained, examined
and compared:
1.
Iterative: the equations are scanned and values updated until nothing
changes any more; works always but is $ O ( n sup 2 ) $.
2.
Utilizing nested strongly connected regions: works only for graphs that
derive from simple loop structures.
3.
Interval analysis: an interval is a cycle-free natural loop.
The intervals are analysed bottom-up, from smaller intervals to larger.
Again $ O ( n sup 2 ) $, but also finds structure.
4.
T1-T2 analysis: transformation T1 collapses a loop into a single node,
and T2 contracts node $ N sub 1 $ and $ N sub 2 $ if $ N sub 1 $ is the
only parent of $ N sub 2 $.
Repeated application of T1 and T2 reduces a reducible graph to a single
node.
Meanwhile a tree is constructed, which allows the data flow equations to
be solved in time $ O ( n ~ log ~ n ) $.
5.
Node listings:
A rooted graph defines a finite set of non-circular paths, which can be
represented as lists of nodes.
A node listing of a graph is a listing of all the nodes (duplicates
allowed) which has all non-circular paths as subsequences.
It can be proved (although the paper does not do so) that flow equations
can be solved completely by solving then partially in the order of a
node listing, much like Warshall's algorithm.
Node listings of size $ O ( n ~ log ~ n ) $ can be constructed in time
$ O ( n ~ log ~ n ) $.
6.
Path compression:
A variant of T1-T2 analysis, using three transformations.
It is faster than the other methods and almost always linear in the
number of nodes.
7.
Balanced path compression:
Tarjan's own very fast and very complicated version of path compression,
based on a special kind of balanced trees.
8.
Graph grammars:
A hopefully general enough graph grammar is decreed, and the graph is
parsed with this grammar.
If the parsing succeeds, the method works and the flow equations can be
solved in linear time by following the parsing; otherwise the method
just fails.
The advantage is the linear time requirement, the disadvantage is
finding proper graph grammars.
Also the state of the art in graph parsing was a problem in 1981.
9.
High-level data flow analysis: data flow analysis on parse trees.
Using attributes as in an attribute grammars.
Algorithms and examples for most of these are given.
Additionally, algorithms for use-def chains and constant propagation are
given.
The next section is concerned with symbolic interpretation.
It starts off with having full symbolic expressions, but soon restricts
itself to Wegbreit's well-founded property sets.
(See Wegbreit, "Property extraction in well-founded property sets",
IEEE Trans. Software Eng., Vol. SE-1, No. 3, 1975, pp. 270-285.)
This simplified the treatment of path joins considerably.
The chapter closes with optimization in SETL and global type inference.
George Logothetis,
Prateek Mishra,
Compiling short-circuit Boolean expressions in one pass,
SP&E,
vol. 11,
#11,
Nov. 1981,
pp. 1197-1214.
A Boolean value is represented in the compiler as a record with state
indicating among other things if it is encoded (in a variable) or
decoded (in the flow of control).
Five different code generating routines are given to be called in
different contexts; each generates the optimal code for that state of
the Boolean in that context and updates the state of the Boolean.
P. Klint,
Interpretation techniques,
SP&E,
vol. 11,
#9,
Sept. 1981,
pp. 963-973.
Discussion and comparison of three interpreter construction techniques:
classical iterative (the program is a list of pseudo-instructions with a
routine for each instruction),
direct threaded code (the program is a list of addresses of routines),
and
indirect threaded code (the program is a list of addresses of lists of
addresses of routines and data).
The latter technique allows variables to be considered as objects: the
list of addresses of routines and data is an object.
A variable A is implemented as a (generated) list $@push$, $@store$,
$value$, in which $@push$ and $@store$ are addresses of fixed routines.
To store, for example, the top of the stack in A, the program contains
the address A+1.
The comparison was performed on a PDP11 and a Cyber-73.
Direct threaded code was found to be 10 to 30 percent faster than the
others.
It is important to keep the program counter in a register.
The classical iterative method may result in the smallest interpreter.
No word about recursive interpreters.
Arthur Sale,
The implementation of case statements in Pascal,
SP&E,
vol. 11,
#9,
Sept. 1981,
pp. 929-942.
Implementations by jump table, binary search tree and linear table
search are compared in closed form and experimentally on a Burroughs
B7700.
Conclusion: never use linear table search, and use the smallest of the
others.
David Alex Lamb,
Construction of a peephole optimizer,
Softw. Pract. Exper.,
vol. 11,
1981,
pp. 638-647.
Straightforward programmable peephole optimizer, used to generate VAX-11
code from intermediate code from "Preliminary Ada", to pep up a
prototype compiler, "Charrette" at CMU.
Properties:
1.
Both the pattern and the replacement can contain an arbitrary mix of
intermediate and target code.
2.
The replacement can contain, in addition to the usual arguments from
the pattern, applications of built-in functions on such arguments, both
as replacements and as conditions on the applicability of the pattern;
examples are IsOdd(&1) which returns true if argument 1 is odd,
and $dollar$negate(&2) which will substitute the negation of argument 2.
3.
Pattern matching proceeds backwards, starting at the end, to allow
matches over matches in a natural way (can it replace a pattern by a
longer pattern?).
4.
The peephole optimizer generator is written in SNOBOL4.
It says in the text that it "compiles the patterns into subroutines and
data declarations in the implementation language".
Does it indeed generate a peephole optimizer in VAX-11 assembly code?
The system achieves a code size reduction of about 40 to 50 percent.
It took about 20 man-days to write.
William A. Wulf,
Compilers and computer architecture,
IEEE Comput.,
vol. 14,
#7,
July 1981,
pp. 41-47.
Desiderata for a machine instruction from a compiler writer point of
view.
The general gist is:
1. Provide primitives, not solution.
2. Keep it simple (and orthogonal).
Lots of examples are given, pointedly naming no machine in particular.
Tomasz Kowaltowski,
Parameter passing mechanisms and run-time data structures,
S-P&E,
vol. 11,
#7,
pp. 757-765.
1981,
General survey of the implementation of parameter passing mechanisms in
a language with statically nested scopes using displays.
Explains the optimization of having each procedure being responsible
"for saving and restoring the previous contents of the position of
the display which it needs to point to its own activation record".
Next explains the old Elliott 503 trick of letting all formal and local
objects have absolute addresses and requiring the procedures to save and
restore the previous contents on a stack.
This does not need a display at all, but cannot handle procedures (or
labels) passed as parameters.
Arthur B. Pyster,
Compiler Design and Construction,
Van Nostrand Reinhold,
New York,
1981,
pp. 357.
Compiler course based on a Pascal compiler for the IBM/370.
Many explicit examples.
Good bibliography.
A.J.T. Davie,
R. Morrison,
Recursive descent compiling,
Ellis Horwood Ltd.,
Chichester,
1981,
pp. 195.
Well-balanced description of the design considerations that go into a
recursive descent compiler; uses the St. Andrews University S-algol
compiler as a running example.
Gregory J. Chaitin,
Marc A. Auslander,
Ashkok K. Chandra,
John Cocke,
Martin E. Hopkins,
Peter W. Markstein,
Register allocation via coloring,
Computer Languages,
vol. 6,
#1,
1981,
pp. 45-57.
Very readable and enthusiastic account of how the experimental code
generator of the PL/I compiler for the IBM/370 works.
Although register allocation via colouring is the main ingredient, many
additional tricks are explained.
The compiler compiles to intermediate code, in which almost everything
is kept in a pseudo-register, of which there infinitely many.
Then aggressive register allocation maps then to real registers.
The following techniques and tricks are employed.
1. In constructing the interference graph, rather than adding
$ k * ( k + 1 ) / 2 $ arcs at any place where k pseudo-registers
interfere, arcs are added at the definition point of each
pseudo-register to the k pseudo-registers that are live at that point.
2. In constructing the interference graph, any assignment of the form
y:=x which is the end of the live range of x and the start of that
of y, is optimized away by coalescing the nodes for x and y
("subsumption").
3. The compiler employs two graph colouring algorithms.
Both start with a known number of registers (the number is even a fixed
constant, 17, deriving from the architecture of the IBM/370).
The first algorithm repeatedly removes all nodes with less than 17
connections from the graph, until the empty graph remains, and then puts
them back in inverse order.
If this fails and if the user has requested a very high level of
optimization, the second algorithm is brought in.
The second algorithm sorts the nodes by "urgency", the urgency of a node
N being defined at any moment as the number of uncoloured neighbours
divided by the number of available colours, and colours them in that
order.
If both fail, spilling is in order.
4. For fast access during all this, the graph is implemented both as
half a bit matrix and as a list of nodes describing all the
pseudo-registers, with for each pseudo-register a list of its adjacent
registers.
5. Register coalescing is applied repeatedly, to see if more
pseudo-registers can be combined.
6. Several tricks are used to incorporate machine idiosyncrasies.
7. If we run out of registers, three approaches are tried.
First, we try to recompute results just before they are used; if this is
possible, the live range can be cut in two chunks that together are
smaller than the original.
Second, we try to do the same thing through rematerialization:
constructing a value from one or more of the live values at a given
point.
Third, if nothing helps, we spill a register.
Register spilling code is entered in the program graph, and the entire
algorithm is run again, including dead-code elimination, which will
eliminate possible dead register-spilling code.
A discussion of the "ultimate" notion of interference and a proof that
any graph can arise in the above construction end the paper.
A. Mycroft,
The theory and practice of transforming call-by-need into call-by-value,
in International Symposium on Programming,
Lecture Notes in Compter Science #83,
Springer-Verlag,
Berlin,
pp. 269-281.
1980,
More theory than practice about transforming lazy parameter evaluation
into call by value.
Terry Ritter,
Gregory Walker,
Varieties of threaded code for language implementation,
BYTE,
vol. 5,
#9,
Sept. 1980,
pp. 206-227.
Four varieties of threaded code are explained:
1.
subroutine-threaded (executable code: JSR A; JSR B; ...
-- with A, B, ... subroutine-threaded lists, or machine code
subroutines);
2.
direct-threaded (address list: A; B; ...
-- with A, B, ... direct-threaded address lists packaged with call setup
and return code, or addresses of machine code subroutines);
3.
indirect-threaded (address list: A; B; ...
-- with A, B, ... indirect-threaded address lists surrounded by indirect
addresses to machine code routines for call setup and return code,
or indirect addresses of machine code subroutines);
4.
token-threaded (token list: A; B; ...
-- with A, B, ... indexes in a table containing the addresses that occur
in the indirect-threaded code described under 3.).
The explanations are supplied in English, in pictures, in
(pseudo-)Pascal and in MC6809 assembly code.
There is no indication of accessing parameters, nor of incremental
compilation.
Peter Kornerup,
Bent Bruun Kristensen,
Ole Lehrmann Madsen,
Interpretation and code generation based on intermediate languages,
SP&E,
vol. 10,
#8,
Aug. 1980,
pp. 635-658.
The authors note three problems with P-code:
1. some info is lost and has to be retrieved by a pre-scan;
2. the control structure is lost;
3. the memory structure is lost.
Effectively arguing that P-code is too low-level, they propose an
intermediate form closer to Pascal, called PIF for Pascal Intermediate
Form.
A detailed proposal follows.
There is a front-end producing PIF, but no further processors.
A.V. Aho,
Translator writing systems: where do they now stand?,
IEEE Computer,
vol. 13,
#8,
pp. 9-14.
Aug. 1980,
Very general paper on compiler construction using tools.
W.A. Wulf,
B.W. Leverett,
R.G.G. Cattell,
S.O. Hobbs,
J.M. Newcomer,
A.H. Reiner,
B.R. Schatz,
An overview of the production-quality compiler compiler project,
IEEE Computer,
vol. 13,
#8,
pp. 38-49.
Aug. 1980,
Much overview, not much technical detail.
The system consists of a large number of small transformations, each
"intellectually manageable".
Each transformation derives from a formal description and is translated
into a table.
The compiler then is a driver that applies these tables to the program
to be translated.
Much of the code transformations is based on TCOL (Tree UNCOL).
Susan L. Graham,
Table-driven code generation,
IEEE Computer,
vol. 13,
#8,
pp. 25-34.
Aug. 1980,
Extensive argumentation concerning the design of the Graham-Glanville
code generator, with examples.
Little hard information.
Stephen C. Johnson,
Language development tools on the UNIX system,
IEEE Computer,
vol. 13,
#8,
Aug. 1980,
pp. 16-21.
Tools are described as general input-processing devices.
The section on the philosophy of tool design ('Tooling up') is very
interesting.
Bruce Leverett,
Thomas G. Szymanski,
Chaining span-dependent jump instructions,
ACM Trans. Prog. Lang. Syst.,
vol. 2,
#3,
July 1980,
pp. 274-289.
A long jump to L can sometimes be replace by a short jump to M where
M already holds a jump to L.
This can be utilized to reduce the number of long jumps.
Several variants of the problem are considered.
Some are NP-complete; for the others, dynamic programming is the name of
the game.
The effectiveness of the technique is not discussed.
F.E.J. Kruseman Aretz,
Jan L.A. van de Snepscheut,
H. Grasdijk,
J.M.H. Smeets,
SATHE: Some aspects of an ALGOL implementation,
Software -- Practice and Experience,
vol. 10,
#7,
pp. 563-573.
July 1980,
Mainly about syntactic error recovery.
Jack W. Davidson,
Chistopher W. Fraser,
The design and application of a retargetable peephole optimizer,
ACM Trans. Prog. Lang. Syst.,
vol. 2,
#2,
April 1980,
pp. 191-202.
To use the peephole optimizer PO for a new machine, its instructions
must be described as register and memory transfers.
The input to PO may be assembler language or register transfer language,
but the input to the PO algorithm is register transfers: assembly
language is converted upon input.
The PO considers pairs of instructions, retrieves the register transfers
for them, substitutes the effect of the first in the register transfers
for the second and tries to find an instruction which does the
combination.
The paper does not explain how it is determined if an instruction (with
its addressing modes and other flags) does what a set of register
transfers describe; the examples suggest that the mechanism used is
stronger than simple tree matching.
If a replacement instruction is found, it replaces the instruction pair
and the process continues with the new instruction as the first
instruction of the next instruction pair.
The authors point out that assembly code often overspecifies: many
instructions set condition codes which are not used afterwards.
Such overspecification can prevent replacement by cheaper instructions
which have the same effect except that they not set the (after all
unused) condition codes properly.
They recommend for the code generator to generate register transfers
rather than actual assembly instructions, to avoid such problems.
The system is slow, about 1 to 10 instructions per second, but allows
the code generator to be very simple, for example to generate naive code
for a stack machine, and have the PO convert the output into clever
code.
Examples are given.
R.G.G. Cattell,
Automatic derivation of code generators from machine descriptions,
ACM Trans. Programming Languages and Systems,
vol. 2,
#2,
pp. 173-190.
April 1980,
The entire program (or at least the entire subroutine) is given in tree
form in TCOL, and so are all machine instructions, including jumps,
increment-and-skip-on-zero, etc.
No completeness is required of the instruction set.
The pattern matcher performs in principle an exhaustive search, limited
in simple ways.
Since the instruction set may be incomplete, the search may get stuck.
In that case, a "primary operator" is found in the unmatchable region
according to certain criteria, an instruction is found that has the same
primary operator, and the instruction is forcibly matched.
This requires repairs to the tree that can be derived automatically.
The search is then restarted to find instructions to match the new tree.
This need not terminate in principle, but in practice it does.
An example is shown in which the search algorithm manages to find the
proper code for 'ACC ← Mem[Addr]' on a PDP8-like machine: 'ACC ←
0; ADD Mem[Addr]'.
To increase efficiency, the initial set of instructions is extended at
compiler-generation time with sets of canned instruction sequences,
according to a set of heuristics.
Most importantly, canned instruction sequences are created (by calling
the search algorithm at compiler-generation time) for those TCOL
operators for which no direct machine instruction exists.
So the PDP8 idiom for 'ACC ← Mem[Addr]' is found at
compiler-generation time.
In addition, patterns are generated for all addressing modes.
The scheme generates good code and is fast, in spite of the exhaustive
search.
Quote: "It appears that code comparable to a good handcrafted compiler
can be generated with proper care."
|