Formal Languages, Automata and Computability

Literature references and annotations by Dick Grune, dick@dickgrune.com. Last update: Thu Aug 30 19:46:26 2012.
These references and annotations were originally intended for personal use and are presented here only in the hope that they may be useful to others. There is no claim to completeness or even correctness. Each annotation represents my understanding of the text at the moment I wrote the annotation.
No guarantees given; comments and content criticism welcome.

* Peter Linz, "An Introduction to Formal Languages and Automata", 4nd ed., Jones and Bartlett, Sudbury, Mass., 2006, pp. 377.
Very accessible, but correspondingly limited. Intuitive and often even colloquial explanations, supplemented by convincing but not exhaustive proofs.
In addition to the usual fare -- finite and pushdown automata, regular and context-free languages, Turing machines, etc.-- it contains a chapter on other models of computation --recursive functions, Post systems, rewriting systems, matrix grammars, Markov algorithms, L-systems-- and a brief chapter on computational complexity.
Good, soft introduction, but, given the subject, still requires considerable studying.

* Franz Baader, Tobias Nipkow, "Term Rewriting and All That", Cambridge University Press, Cambridge, 1999, pp. 313.

* Michael Sipser, "Introduction to the Theory of Computation", PWS, Boston, 1997, pp. 396.
Makes a serious attempt to be student/reader-friendly, by giving proof ideas rather than have the reader plod through completely worked-out proofs. Although the attempt is successful to a certain extent, the book is still terse and formal, and it requires a motivated reader.

* Thomas A. Sudkamp, "Languages and Machines -- An Introduction to the Theory of Computer Science", 2nd ed., Addison Wesley, pp. 569. 1997,
Dry-as-dust and probably curriculum-driven text book on the subject, low on motivation. For example, although the author proves that w [ w ] and a sup i b sup j a sup i b sup j are not context-free he nowhere points out the significance of this: the impossibility to express declaration-application checking or parameter number checking in a CF grammar. I also miss the more juicy theorems, for example, "Every recursively enumerable language can be obtained by applying a homomorphism to the intersection of two context-free grammars", or "The stacks in context-free production form a regular language" (although the latter may be in there somewhere, hidden). Also the well-known name "uvwxy theorem" for the pumping lemma is not mentioned.
But there are redeeming factors: the explanations are quite understandable and still rigourous, the coverage is wide and there is a rich set of examples and exercises.
Excellent studying material for the already motivated student.

* Hanne Riis Nielson, Flemming Nielson, "Semantics with Applications -- A Formal Introduction", John Wiley & Sons, Chichester, 1992, pp. 240.
Compares the theories of operational semantics, denotational semantics and axiomatic semantics, mainly by analyzing the while statement in each of them. Lots of math, as can be expected, but well-argued.

* D.Kozen, "A completeness theorem for Kleene algebras and the algebra of regular events", in Proc. LICS 1991 ????, pp. 214 - 225.

* John Carroll, Darrell Long, "Theory of Finite Automata, with an Introduction to Formal Languages", Prentice-Hall Internat. Ed., Englewood Cliffs, NJ 07632, 1989, pp. 438.
Extensive and easily readable on finite automata, limited and easily readable on formal languages and parsing.

* György E. Révész, "Introduction to Formal Languages", McGraw-Hill, Singapore, 1985, pp. 199.
This nifty little book contains many results and elementary proofs of formal languages, without being "difficult". It gives a description of the ins and outs of the Chomsky hierarchy, automata, decidability and complexity of context-free language recognition, including the hardest CF language. Parsing is discussed, with descriptions of the Earley, LL(k) and LR(k) algorithms, each in a few pages.

* V.J. Rayward-Smith, "A first course in formal languages", Blackwell Scientific, Oxford, 1983, pp. 123.
Very useful intermediate between Révész and Hopcroft and Ullman. Quite readable (the subject permitting); simple examples; broad coverage. No treatment of LALR, no bibliography.

* Robert McNaughton, "Elementary Computability, Formal Languages, and Automata", Prentice-Hall, Englewood Cliffs, NJ, 1982, pp. 400.
Actually an introduction to theoretical computer science. Difficultly written, heavy on formalism. Has some interesting unusual material, e.g., theorems about how adding parentheses to a grammar affects possible ambiguity (the author's specialty). No serious treatment of parsing.

* Christoph M. Hoffmann, Michael J. O'Donnell, "Pattern matching in trees", J. ACM, 29, #1, pp. 68-95. Jan. 1982,
Full automata theory of (mainly bottom-up) pattern matching in trees. Algorithms are compared: bottom-up table-driven is usually best. Many literature references.

* John E. Hopcroft, Jeffrey D. Ullman, "Introduction to automata theory, languages, and computation", Addison-Wesley, Reading, Massachussetts, 1979, pp. 418.
Successor to Hopcroft & Ullman, 1969, and mandatory reading material for anybody interested in the subject. Claims no longer to be an exhaustive treatment of the subject, but is still pretty exhausting. Covers grammars, Turing machines, undecidability, LR grammars, computational complexity, NP-completeness. Remarkably, the book does not mention LL grammars.

* Michael A. Harrison, "Introduction to Formal Language Theory", Addison-Wesley, 1978, pp. 594.
Contains a wealth of interesting facts, ideas and theorems on formal languages, with considerable attention to parsing. Informative: puts roughly equal emphasis on ideas and formalisms. Major drawback: several proofs depend on earlier exercises, but no solutions to the exercises is given.
The theorem $L_0 = h(L_2 union L_2) is in Section 10.3 (hard to find from the index).

* R.D. Tennent, "The denotational semantics of programming languages", Commun. ACM, 19, #8, pp. 437-453. Aug. 1976,

* A. Salomaa, "Formal Languages", Academic Press, New York, 1973, pp. 322.
Contains, among much other knowledge, the theorem that any Type 0 language can be obtained as the intersection of two CF languages followed by an erasing homomorphism.

* F. Gécseg, I. Peák, "Algebraic Theory of Automata", Akadémiai Kiadó, Budapest, 1972, pp. 326.

* Richard Y. Kain, "Automata Theory: Machines and Languages", McGraw-Hill Book Comp., New York, 1972, pp. 301.
Th author is fully aware that he is explaining a difficult subject, and that he is explaining it to electrical engineering students. The first serious formalism appears on page 35.
He formulates clearly some principles and bits of wisdom that since have been lost: every actual machine is a finite-state machine; the state of a machine is both a summary of its past and a prediction of its future.
The book covers: finite-state machines; Turing machines; linear-bounded automata; pushdown automata; other machine models (two-way pushdown; many pushdown tapes; counters; and stack automata (which see below)); operations on languages; solvable linguistic questions; unsolvable linguistic questions.
The book has 17 pages on "stack automata", which are pushdown automata that can examine the full contents of the pushdown stack. It is shown that such automata can recognize all context-sensitive languages and that there is at least one non-context-sensitive language that can be recognized by a proper variant of these stack automata.
One misses a key to the exercises.
The author gives clear descriptions of Post's correspondence and normal problems; a summary follows here.
A Post correspondence system is a set of word pairs; the set is used as follows to produce two string simultaneously. Both strings start empty; as often as desired, a pair in the set is considered, its first word is concatenated after the first string and the second word after the the second string. The Post Correspondence Problem is: given a Post correspondence system, is it possible to make two equal strings? Remarkably (but fundamentally) the problem is recursively unsolvable.
This is equivalent to the Post normal system. A Post normal system is again a set of word pairs; the set is used as follows to transform one string into another. If the string starts with the first word in a word pair, that word is removed from the string and the second word of the pair is appended to the end, until all the original text has been replaced. If the process gets stuck the original string is rejected. The Post Normal Problem is: given a Post normal system, is there a string that is transformed into itself?

* J. Doner, "Tree acceptors and some of their applications", J. Comput. Syt. Sci., 4, #??, 1970, pp. 406-451.

* John E. Hopcroft, Jeffrey D. Ullman, "Formal Languages and their Relation to Automata", Addison-Wesley, Reading, Mass., 1969, pp. 242.
Classical text in formal languages, full of solved and unsolved problems.

* Michael A. Arbib, "Theories of Abstract Automata", Prentice-Hall, Englewood Cliffs, N.J., pp. 412. 1969,
A very classical text: excellent and minute explanations for students with much time and patience. Explains everything from the original problems (in engineering, electronics, etc.). Excellent chapter on algebra: sets, graphs, groups and semigroups. The terminology has aged somewhat, it seems: the word "deterministic" is not used at all and the Chomsky hierarchy is not mentioned. Many examples.

* A. Salomaa, "Two complete axiom systems for the algebra of regular events", J. ACM, 13, #1, pp. 158-169. 1966,

* A. Ginzburg, "Algebraic Theory of Automata", ACM Monograph series, Academic Press, New York, 1968, pp. 165.

* Alfred V. Aho, Jeffrey D. Ullman, "The Theory of Languages", Mathematical Systems Theory, 9, #2, pp. 97-126. 1968,