Implementation of Functional Languages

> Literature references and annotations by Dick Grune, dick@dickgrune.com.
Last update: Tue Sep 15 10:33:30 2009.

These references and annotations were originally intended for personal use and are presented here only in the hope that they may be useful to others. There is no claim to completeness or even correctness. Each annotation represents my understanding of the text at the moment I wrote the annotation.
> No guarantees given; comments and content criticism welcome.

* Nadia Nedjah, Luiza de Macedo Mourelle, Efficient concise deterministic pattern-matching automata for ambiguous patterns, ACM SIGPLAN Notices, vol. 37, #2, Feb. 2002, pp. 57-67.
Actually pattern matching on trees, in the context of functional languages. Ambiguous patterns are disambiguated using some rules. These rules are applied when a match with multiple patterns is detected (during automaton construction), and results in the rejection of all but one of the reduce items in the recognizing state. This makes some of the items in previous states irrelevant since they can only lead to a reduce item that will be rejected. Removing these irrelevant items can simplify the automaton. Actually they are not removed; they are not generated in the first place, using a modified inference rule in the closure algorithm.

* Urban Boquist, Thomas Johnsson, The GRIN project -- A highly optimizing back end for lazy functional languages, in Implementation of Functional Languages, Lecture Notes in Computer Science #1268, ed. by Werner Kluge, Berlin, Springer-Verlag, 1996, pp. 58-84.
Exactly what the title says.

* Adam Webber, Optimization of functional programs by grammar thinning, ACM TOPLAS, vol. 17, #2, March 1995, pp. 293-330.
[ lots of interesting formal language theory and speculation ]

* Andrei Mantsivoda, Vyacheslav Petukhin, Compiling Flang, in Compiler Construction, 4th International Conference, CC'92, ed. by U. Kastens and P. Pfahler, Lecture Notes in Computer Science #641, Springer-Verlag, Oct. 1992, pp. 297-311.
[ compilation of a functional-logic language ]

* Koen Langendoen, Pieter H. Hartel, FCG: A code generator for lazy functional languages, in Compiler Construction, 4th International Conference, CC'92, ed. by U. Kastens and P. Pfahler, Lecture Notes in Computer Science #641, Springer-Verlag, Oct. 1992, pp. 278-296.
The original FAST (Functional ArrayS on Transputers) compilation pipeline of the Univ. of Southampton comprised the following steps: a high-level functional language (f.e. Haskell) → "intermediate" (a bare-bones functional language) → Functional C (a simple subset of C augmented by a library for lazy evaluation etc.) → machine code (courtesy of gcc). The problem with this is that all pointers turn into C data, in registers, on the stack and who knows where; they are irretrievable and no compacting garbage collector can be written.
     To remedy this, the authors want to obtain a C version in which all data manipulation is explicit; more in particular, the stack should be explicit. At the same time they wish to keep as much as possible of the original pipe line in tact. This is achieved by inserting the code generator FCG between Functional and machine code.
     FCG translates Functional C to KOALA, a simple graph reduction machine code similar to Johnsson's G machine. The resulting code is then translated into a single C routine, in which the "global variables" of the KOALA machine are register variables and all data manipulation is explicit. Various optimizations are applied, which are argued and discussed in detail in the paper.
     The resulting code is often faster than the FAST code and always faster than the code produced by the LML native compiler.

* Mikael Pettersson, A term pattern-match compiler inspired by finite automata theory, ed. by U. Kastens and P. Pfahler, in Compiler Construction, 4th International Conference, CC'92, Lecture Notes in Computer Science #641, Springer-Verlag, Oct. 1992, pp. 258-270.
The patterns in function definitions in functional languages are considered as very simple regular expressions (very simple in that they do not contain repetition and the only disjunction is on the top level), the tokens of which are variable, wildcard (_), and constructor. A non-deterministic automaton is constructed which matches the patterns in parallel, using the proper disambiguation rules. The automaton is then made deterministic.
     Advantages over the usual case-within-case tests are: avoidance of duplicate code, easy detection of incompleteness and redundancy (unreachability). The author reports that the result was meager; this may be due to "difficult" patterns being unnecessary or to programmers avoiding them.

* David R. Tarditi, Peter Lee, Anurag Acharya, No assembly required: compiling Standard ML to C, ACM Letters on Programming Languages and Systems, vol. 1, #2, pp. 161-177. June 1992,
The authors took the publicly available SML/NJ and replaced the code generator by one that produces C, in order to investigste the translation of first-class functions and tail recursion elimination. In addition, SML has dynamically scoped exceptions, garbage collection and higher-order functions. One man-summer was avaialble for the work.
     Basically, SML is converted to continuation-passing form, which is then flattened into C routines, but several optimizations were found necessary.
     C compilers are not good enough at tail recursion elimination, so the convertor has to do the work: rather than calling its last routine L, the routine returns with the address of L as the result. It returns to a driver, which then calls the requested routine, etc. This avoids the use of the C stack, which is good for the garbage collector, since now we do not need to do a machine-dependent scan of the C stack; in fact, the depth of the C stack never exceeds two activation records: the driver and the running routine.
     All SML values are kept on the heap, but for efficiency reasons some may be cached in local variables of C functions, hoping that the C compiler will place them in registers.
     The translation process breaks up the program into numerous small C routines. To avoid much of the overhead involved in routine calling, routines that are dominated by other routines (all of whose calls pass through the other routines) are incorporated into those routines.
     Since SML requires integer overflow checking, attempts are made to simplify the check in the cases where the value of one operand is known at compile time. Considerable simplifications turn out to be possible.
     These optimizations reduced the run times by 30 to 40%, leaving a 60% gap with native SML/NJ code.

* Raymond Turner, Constructive Foundations for Functional Languages, McGraw-Hill, London, 1991, pp. 269.
Implicitly assumes the reader to be familiar with functional programming and its terminology, and to have considerable mathematical sophistication. Yet it is never really obscure, which is quite an achievement. Heavy on the formalism, but the line of reasoning never gets buried in it. Constructivism extended to type theory. Excellent text book.

* Marc Feeley, Guy Lapalme, Using closures for code generation, Computer Languages, vol. 12, #1, 1987, pp. 47-66.
Closures as intermediate code in functional languages. Lambda lifting is done by introducing environment pointers. Sample translations for some syntactic structures are given. Code is compiled that does interpretation on these closures. As a result, many interpretation facilities are still available.

* Luca Cardelli, Compiling a functional language, in 1984 ACM Symposium on Lisp and Functional Programming, ACM, New York, 1984, pp. 208-217(226?).
[[ preliminary abstract ]] Very readable and clear account of the entire trajectory of writing a compiler for ML for the VAX.

* P.J. Landin, The mechanical evaluation of expressions, Computer J., vol. 6, #4, pp. 308-320. April 1964,
Invents functional programming and its implementation, including closures.