Literature references and annotations by Dick Grune,
Last update: Tue Nov 29 12:14:51 2022.

These references and annotations were originally intended for personal use and are presented here only in the hope that they may be useful to others. There is no claim to completeness or even correctness. Each annotation represents my understanding of the text at the moment I wrote the annotation.
No guarantees given; comments and content criticism welcome.

* Donald E. Knuth, The Art of Computer Programming, Vol 4.0: Introduction to Combinatorial Algorithms and Boolean Functions, Addison-Wesley, Reading, Mass., 2008, pp. 216.
Incredibly rich in ideas and detail; each page provide stuff for a week of thought. Still, the algorithms are presented in a 1950's cross between assembler and Fortran II.

* Stuart Dreyfus, Richard Bellman on the Birth of Dynamic Programming, Oper. Res., vol. 50, #1, pp. 48-51. Jan. 2002,
Very high level and therefore rather ungripping account of same. Worth reading for the explanation of the term "dynamic programming": to impress officials and attract funds.

* Christian Böhm, Stefan Berchtold, Daniel A. Keim, Searching in high-dimensional spaces -- Index structures for improving the performance of multimedia databases, ACM Computing Surveys, vol. 33, #3, Sept. 2001, pp. 322-373.
The problem is basically the same as in its companion paper, Chávez et al., "Searching in metric spaces": given a (possibly very large) set U of points and the distance between each pair of points (as a function); given a point P in U; find all points within a distance r from P. But now the points are given a vectors, which is more information than just their distances. This allows more specialized and more efficient algorithms to be designed. Also, the algorithms allow the index to be modified.
     The paper surveys existing techniques, with emphasis on recent research.

* Edgar Chávez, Gonzalo Navarro, Ricardo Baeza-Yates, José Luis Marroquín, Searching in metric spaces, ACM Computing Surveys, vol. 33, #3, Sept. 2001, pp. 273-321.
Problem: given a (possibly very large) set U of points and the distance between each pair of points (as a function); given a point P in U; find all points within a distance r from P.
     To do better than linear search, an index over U has to be built, to be consulted when asked to solve this problem. Many algorithms exist, all with initial and running costs. The paper surveys all known algorithms and then exposes the basic underlying algorithm, which creates equivalence classes. Solving the problem then involves rapidly rejecting many equivalence classes and searching the remainder. This leads to a taxonomy of algorithms.
     Remarkably readable, many examples, good pictures.

* Jeffrey Scott Vitter, External memory algorithms and data structures -- Dealing with massive data, ACM Computing Surveys, vol. 33, #2, June 2001, pp. 209-271.
Internal memory (e.g. 32k words) was too small in the 1960s; it is still too small (e.g. 256M bytes) in the year 2000; and guess is that it will continue to be so for a very long time to come. The paper surveys both the well-known and the newest results for the usual heavy-duty applications: sorting, text search, indexing, Fast-Fourier, etc.

* Sara Baase, Allen Van Gelder, Computer Algorithms -- Introduction to Design and Analysis, (3rd Edn.), Addison-Wesley, Reading, Mass., 2000, pp. 668.
Excellent as ever!

* S.S. Skiena, The Algorithm Design Manual, Springer, New York, 1998, pp. 486.

* Dorit S. Hochbaum, Approximation Algorithms for NP-hard Problems, PWS, Boston, Mass., 1997, pp. 596.
Thirteen papers on various aspects of the subject. As is usual, the authors restrict themselves to approximation algorithms for which bounds can be proved. No juicy unproven heuristics; more in particular, none of the mainstay graph coloring heuristics is mentioned, although the approximation algorithm that is given is surprising. Rather than coloring the nodes with integers, they are first colored with unit-length vectors of real numbers, such that the vectors of adjacent nodes are almost orthogonal to each other. The whole system is then distorted in N-space by rounding, to the effect that the vectors each reduce to contain one 1 and all 0's elsewhere. Each vector then corresponds to an integer = color.

* L.M.G. Feijs, R.C. van Ommering, Abstract derivation of transitive closure algorithms, Information Processing Letters, vol. 63, #3, Aug. 1997, pp. 159-164.
The authors base their analysis of Warshall's algorithm on relational algebra, which they assume available right from page 1 (page 160 actually). This is dangerous, since with full relational algebra available, calculating the transitive closure T of a relation R becomes trivial: $ T := R sup "+" $.
     The basic algorithm is a four-line loop that is just not completely trivial, uses Kleene star closure freely(!), and which has a subset Z of the elements on which the relation is defined as a free parameter to be chosen at the beginning of each loop. Termination requires Z to be non-empty. Three choices for Z are considered: a singleton, which leads to Warshall's algorithm; a genuine subset, which leads to a generalization of the grid algorithm; and the full set of elements, which leads to the above trivial algorithm.
     Next Algol-like realizations for the first and second choice are given; this supplies the proof that the authors have successfully avoided trivializing the problem away:-).

* K. Sikkel, Parsing Schemata, Springer Verlag, 1996, pp. 398.
Describes the "primordial soup algorithm": the soup initially contains all grammar rules and all rules of the form $ A → t_i $ for all $t_i$ in the input; both are in effect parse tree fragments. During stewing fragments are combined according to obvious rules, until all possible combinations have been formed. Then the complete parse trees float to the surface.
     The rules of this algorithm, which is actually a transitive closure algorithm, are then formalized into sets of inference rules geared to parsing, called parsing schemata. These are then specialized to form many existing parsing methods and some new ones, including predictive head-corner parsing, and a parallel bottom-up GLR parser. All this is supported by great mathematical rigor, but enough diagrams and examples are given to keep it readable.

* Yedidyah Langsam, Moshe J. Augenstein, Aaron M. Tenenbaum, Data structures using C and C++, (2nd ed.), Prentice Hall, Upper Saddle River, N.J., 1996, pp. 672.
ANSI-fied update of the first edition, which see. The use of different style files gives the book a completely different appearance, but much of the text has remained unchanged. The C code has been ANSI-fied, C++ code has been added, and the font used for program code has been upgraded from ugly to peculiar. Still very much worth using.

* Gilles Brassard, Paul Bratley, Fundamentals of Algorithms, Prentice-Hall, Englewood Cliffs, 1996, pp. 524.
Simple and advanced algorithms (primality testing!) and their theoretical foundations and complexity.

* M.-J. Nederhof, Linguistic Parsing and Program Transformation, Katholieke Universiteit Nijmegen, Nijmegen, 1994, pp. 206.
Contains in coherent chapter form versions of the following papers by the author: ``Generalized left-corner parsing''; ``An Optimal Tabular Parsing Algorithm''; ``Increasing the Applicability of LR Parsing''; and ``Top-Down Parsing for Left-Recursive Grammars''. They are preceded by an introduction to parsing, and followed by a chapter on attribute propagation, and one on a grammar workbench.

* Esko Nuutila, An efficient transitive closure algorithm for cyclic digraphs, Information Processing Letters, vol. 52, #4, Nov. 1994, pp. 207-213.
Very careful redesign of the top-down transitive closure algorithm using strongly connected components, named COMP_TC. Extensive experimental analysis, depicted in 3D graphs. The speed ratio to Schmitz's algorithm varies from 1 for very sparse graphs to about 3 for an out-degree of 10 percent.

* Jayadev Misra, Powerlists: a structure for parallel recursion, ACM TOPLAS, vol. 16, #6, pp. 1737-1767. Nov. 1994,
Besides concat, the zip list constructor is introduced. Also, a reverse operator is defined. Powerlists are balanced (actually probably implemented as arrays) and can be decomposed uniquely using deconcat or dezip, as desired. Axioms of concat and zip are proved. Several algorithms are expressed in these operators: permutations, Gray codes, Fourier Transform, and lots, lots more. The point is that components of powerlists can be distributed easily.

* A. van Deudekom, D. Grune, P. Kooiman, Initial experience with non-correcting syntax error handling, IR-339, Vrije Universiteit, Amsterdam, pp. 11. Nov. 1993,
Shows how to implement Richter's non-correcting error recovery using a recombining breadth-first top-down parser based on the LLgen input. Provides somewhat artificial efficacy test results which show that one program modification causes at most one error message.

* Preston Briggs, Linda Torczon, An efficient representation for sparse sets, ACM Letters on Programming Languages and Systems, vol. 2, #1-4, March-Dec. 1993, pp. 59-69.
Assume sets of at most u members, numbered 1..u. For set with n members, the following representation is kept: 1. an integer variable containing n; 2. an array dense[1..u], in which the elements 1..n contain the member numbers; 3. an array sparse[1..u], in which only the elements are used whose indexes correspond to member numbers: the element corresponding to member number m contains the index of the element in dense which contains m. All other elements in both array are not initialized (but see below); this avoids $ O ( u ) $ initialization costs.
     This representation allows $O(1)$ implementations of set membership, clear-set, add-to-set, delete-from-set, and an iterator over the elements. Also, copy, compare, union, intersection and set-difference are $(O(n)$ rather than $O(u)$. This comes at a cost of $ 4 u + 2 $ bytes per set (assuming short unsigned ints for the indexes), as opposed to $ u / 8 $ bytes for a bit vector.
     Note that the set membership test for an absent element a accesses the uninitialized element sparse[a]: the (unsigned) value x found there can be larger than n, in which case the absence of a is established or be between 1 and n inclusive, in which case dense[x] is consulted; but since a is not in the set, dense[x] cannot be equal to a and again the absence of a is established.
     The representation is compared to a bit vector implementation and found to be 2 to 3 times faster for the sparse sets that occur in graph colouring in register allocation. (The speed advantage will be lost completely for non-sparse sets. DG) Several other applications are discussed.

* M.A. Weiss, Data Structures and Algorithm Analysis, Benjamin/Cummings, Redwood City, Ca., 1992, pp. 455.
Coverage of the usual algorithms, each with a complete but not excessively formal analysis. Algorithm subjects include: lists, stacks, queues, trees, hashing, priority queues including heaps, sorting including quicksort but excluding quickersort, union/find, shortest-path, network flow and minimum spanning tree. Algorithmic techniques include: greedy, dynamic programming, randomization, backtracking. Analysis techniques include: simple analysis, amortized analysis.

* Wim Hesselink, Jan Jongejan, Duplicate deletion derived, Commun. ACM, vol. 35, #7, pp. 99-. July 1992,
Addition to Jukka Teuhola, Lutz Wegner, ...

* W.L. Hsu, R.C.T. Lee, ISA '91 Algorithms, Lecture Notes in Computer Science #557, Springer-Verlag, Berlin, 1991, pp. 395.
More than 40 papers on algorithms of various kinds, most of them with explicit algorithms and proofs.

* Arne Anderson, A note on searching in a binary search tree, SP&E, vol. 21, #10, Oct. 1991, pp. 1125-1128.
Replaces the three-way comparison (<, =, >) usual used in binary tree searche algorithms with a two-way comparison (<, >=). This is more realistic, since machines do not have a three-way comparison. Detailed code is given.

* George H. Roberts, Searching in discrete universes, ACM SIGPLAN Notices, vol. 26, #3, March 1991, pp. 19-24.
Very fast parsing (see, e.g. Pennello) requires fast searching in (relatively small) discrete universes. A hybrid binary/sequential search technique is demonstrated.

* L. Davis, Handbook of Genetic Algorithms, Van Nostrand Reinhold, New York, pp. 352. 1991,
The first part of the book is an introduction to genetic algorithms. A genetic algorithm for solving a parameter-finding problem works as follows. The parameters of the problem are encoded in chromosomes and an initial generation of chromosomes is constructed (random or based on some heuristics). Each chromosome (solution to the problem) is evaluated and assigned a grade. A number of pairs of chromosomes, selected for their high grade, are combined --by cross-over and mutation-- into new chromosomes; these are the next generation. Repeat until satisfied or out of patience.
     Part two concerns 13 application case studies, ranging from aircraft design to traveling salesman.
     Part three describes the GENESIS and OOGA genetic algorithm software packages.

* Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Introduction to Algorithms, MIT Press, Cambridge, Mass., 1990, pp. 1028.
Both a tutorial and a handbook, encyclopedic in size and in contents. Algorithms are introduced and proven correct using considerable math. A reasonable addition the Knuth's unfinished opus. 205 literature references.

* Aaron M. Tenenbaum, Yedidyah Langsam, Moshe J. Augenstein, Data Structures Using C, Prentice Hall, Englewood Cliffs, 1990, pp. 662.
In spite of its 650+ pages, the book is fast-paced. It covers everything a student should know about data structures, abstract or otherwise. Especially enjoyable are the treatment of hash functions (normal and perfect), that of garbage collection, and the exercises; but almost any other subject would do. Recommended! Two minor gripes: the text is sometimes overly explicit, and the type font used for the program texts was designed by a rank amateur.

* Jeffrey H. Kingston, Algorithms and Data Structures -- Design, Correctness, Analysis, Addison Wesley, Sidney, 1990, pp. 315.
A fairly usual text on advanced data structures, though slightly more theoretical. A small number of algorithms is analysed in depth.

* Robert Sedgewick, Algorithms in C, Addison-Wesley, Reading, Mass., 1990, pp. 657.
Identical to Sedgewick [1988], all algorithms in C. I don't particularly like the style of C coding.

* Manoocher Azmoodeh, Abstract Data Types and Algorithms, (2nd Edn.), MacMillan, London, 1990, pp. 377.
Many readable examples of implementations of ADT's in Pascal. Using ADT's to implement ADT's. Algorithm design using ADT's. Good.

* J. Ullman, M. Yannakakis, The input/output complexity of transitive closure, in ACM SIGMOD 1990 International Conference on Management of Data, ACM, 1990, pp. 44-53.
[[ grid algorithm?? ]]

* John Q. Walker II, A node-positioning algorithm for general trees, Software -- Practice and Experience, vol. 20, #7, July 1990, pp. 685-706.
The tree is conceptually laid out bottom-up. The children (nodes or subtrees) of a node are positioned a distance sibling_distance apart if both are nodes, and such that there is at least a distance subtree_distance between any node of one child and any node of the other child otherwise. Nodes that are free to move within these restrictions are equally spaced in their space. The parent is then positioned centered above the left-most and right-most child nodes. Full and detailed algorithms are given.

* Hessam Khoshnevisan, Efficient memo-table management strategies, Acta Inform., vol. 28, #1, 1990, pp. 43-81.
Uses the term memo-table in a very restricted sense: his memo-tables exist only during the evaluation of a function and serve only to benefit recursive calls. When a function finally delivers its result, its memo-tables are discarded. So there is no long-term memory.
     The author observes that the evaluation tree of many recursive functions has a bounded width. For example, the Fibonacci series for n only uses the elements for $n-1$ and $n-2$. By storing these in a memo-table, a memo-table of two entries suffices to evaluate the Fibonacci function in linear time. Such functions exhibit common generator redundancy.
     Next the theory of functions with common generator redundancy is developed and algorithms for the generation of table managers for them are derived. The latter work as source-to-source converters.

* Roger B. Dannenberg, A structure for efficient update, incremental redisplay and undo in graphical editors, Software -- Practice and Experience, vol. 20, #2, Feb. 1990, pp. 109-132.
The display is considered to contain "items", which have named attributes, which have values. An item is represented as a linked list of (name, value, version_number) triplets, and modifications are prepended at the start of the list without removing the old triplet. Each triplet also contains a MODIFIED bit, which is on when the item in memory has been modified more recently that the corresponding item on screen. The items are kept in a circular list, called the itemList.
     Upon a redraw(), the itemList is scanned, and any triplet marked MODIFIED (easily found at the head of the list) are displayed and unmarked. Upon a newVersion(), the global version number is increased and all lists are scanned to remove any obsolete triplets (those with too low version numbers). Upon an undo(), the triplet lists are scanned for triplets with the previous version number; these are then copied (not moved!) to the head of the list. This allows undoing the undoing.
     An undo is performed by finding the old value and establishing it as the most recent value. So, Put 5, Put 8, Undo, yields three versions, 5, 8, 5.

* Jeffrey D. Smith, Design and Analysis of Algorithms, PWS-Kent Publ. Comp., Boston, 1989, pp. 447.
Thorough, though occasionally somewhat over-emphatic treatment of a very wide range of algorithms.

* Manfred Ruschitzka, John L. Clevenger, Heterogeneous data translations based on environment grammars, IEEE Soft. Eng., vol. 15, #10, pp. 1236-1251. Oct. 1989,
[ use of 2-level grammars in data marshalling ]

* C. Bron, E.J. Dijkstra, A better way to combine efficient string length encoding and zero-termination, ACM SIGPLAN Notices, vol. 24, #6, pp. 11-19. June 1989,
Two methods are proposed, the first of which has been implemented in Modular Pascal. Both require the strings to be located in string containers (areas) of known length and both allow the strings to be zero-terminated. The first method is based on storing the distance between the end of the string and the end of the container on the last byte(s) of the container; if the distance is zero, the last byte is also zero. In the second method, all bytes after the last string byte are kept zero, so the end of the string can be found by binary search; this is faster than the linear search required to find the first zero byte and imposes no further requirements.

* Alan Gibbons, Wojciech Rytter, Efficient Parallel Algorithms, Cambridge Univ. Press, Cambrigde, 1988, pp. 259.
Has a chapter on parallel parsing, in which the authors prove that every context-free language can be recognized in $ O ( log sup 2 n ) $ time using $ n sup 6 $ processors on a P-RAM, and every unambiguous context-free language can be recognized on a P-RAM in $ O ( log ~ n ) $ time using a (large-order) polynomial number of processors.
     Further results are derived for bracket languages and for input-driven languages; a language is "input-driven" if there exists a PDA for it with the property that all movements the PDA can make for a given input symbol s stack or unstack the same number of symbols, $N(s)$.
     The chapter relies heavily on the material explained in earlier chapters and can hardly be read in isolation.

* Robert Sedgewick, Algorithms, Addison-Wesley, Reading, Mass., 1988, pp. 657.
One of the best books on algorithms available; comprehensive, understandable treatment of many algorithms, beautifully done. Some subjects: priority queues, hashing, parsing, convex hull, weighted graphs, curve fitting, FFT, linear programming, you name it.

* Webb Miller, Eugene W. Myers, A simple row-replacement method, SP&E, vol. 18, #7, July 1988, pp. 597-612.
Optimal algorithms for video screen row updating on character-controlled terminals.

* P. Wayner, Error-free fractions, BYTE, vol. 13, #6, June 1988, pp. 289-298.
Factorial-based arithmetic for exact fractions, explained to the intelligent layman.

* I.R. Hentzel, D.J. Pokrass, A practical solution for a large sparse matrix, SP&E, vol. 18, #3, March 1988, pp. 279-284.
Downsizing the data needed for a specific large problem in linear algebra. Mostly math.

* Bing-Chao Huang, Michael A. Langston, Practical in-place merging, Commun. ACM, vol. 31, #3, March 1988, pp. 348-352.
The array, which is supposed to have size $ N sup 2 $ is divided in N chunks of size N. The N largest items are concentrated in one chunk, which is then used as a buffer to manipulate the rest. At the end the buffer is sorted normally. Although all steps are detailed, the correctness of the steps is taken for granted (though not always obvious!).

* Lawrence Davis, Genetic algorithms and simulated annealing, Research Notes in Artificial Intelligence, Pitman, London, pp. 216. 1987,
Collection of 13 papers on same.

* J.H. Hester, D.S. Hirschberg, Self-organizing search lists using probabilistic back-pointers, Commun. ACM, vol. 30, #12, pp. 1074-1079. Dec. 1987,
While traversing a linked list, a second point, "back-pointer" or "finger", is maintained that drags after the search pointer. When the required element is found, it is moved up-list to the position pointed to by the back-pointer. Never moving the back-pointer yields "move-to-front", always moving it along yields "transpose". Several update strategies, called JUMPs, are examined. No clear winner emerges.

* David Alex Lamb, IDL -- Sharing intermediate representations, ACM TOPLAS, vol. 9, #3, July 1987, pp. 297-318.
Although IDL --Interface Description Language-- is intended to describe the interconnections in large systems, it is mainly concerned with getting the data from one program to another. The actual interfaces are very simple and consist of a reader side and a writer side; input and output along other paths does not count.
     The system features type declarations that are converted into marshalling read and write routines; this marshalling includes things like pointers, circular sharing structures, byte exchange and data type upgrade. Often the read routine can handle multiple representations, and the write routine can always produce ASCII (human-readable) output. Although the routines are about 8 times slower than hand-coded routines, they slow down the entire system by no more than about 15 percent.
     This IDL should not be confused with Atkinson's IDL --Intermediate Data Language--, q.v.

* Richard M. Karp, Michael O. Rabin, Efficient randomized pattern-matching algorithms, IBM Journal of Research and Development, vol. 31, #2, pp. 249-260. March 1987,
Author's abstract: We present randomized algorithms to solve the following string-matching problem and some of its generalizations: Given a string X of length n (the pattern) and a string Y (the text), find the first occurrence of X as a consecutive block within Y. The algorithms represent strings of length n by much shorter strings called fingerprints, and achieve their efficiency by manipulating fingerprints instead of longer strings. The algorithms require a constant number of storage locations, and essentially run in real time. They are conceptually simple and easy to implement. The method readily generalizes to higher-dimensional pattern-matching problems.

* Jeff Naylor, Introduction to Programming, Paradigm, London, 1987, pp. .
Very simple, using Pascal and COBOL.

* Wayne Amsbury, Data Structures, From Arrays To Priority Queues, Wadsworth Publ. Comp., Belmont, Ca., ~1986, pp. 512.
Is exactly what the title says. Beautiful pictures, but stops explaining as soon as the going gets tough. Contains too much program code. Sometimes confusing or incomprehensible explanation. Some literature references are not listed (e.g. to Gonnet).

* Johannes J. Martin, Data Types and Data Structures, Prentice Hall, Englewood Cliffs, NJ, 1986, pp. 282.
Solid and convincing explanation of abstract data types. Very readable. In addition to the usual subjects it has a chapter on garbage collection. A novel feature of the book is a running summary in the margin, which allows the book to be read at very high speed :-).

* Jean-Paul Trembley, Paul G. Sorenson, An Introduction to Data Structures with Applications, (2nd ed.), McGraw-Hill, Singapore, 1986, pp. 861.
Some short notes: Mainly PL/I based. Much algorithm, little explanation. Of interest: Markov algorithms, survey of non-linear data structures, biconnectivity algorithm, garbage collection. Low-priced and worth its money.

* Brigitte Jaumard, Michel Minoux, An efficient algorithm for the transitive closure and a linear worst-case complexity result for a class of sparse graphs, Inf. Process. Lett., vol. 22, #4, pp. 163-169. April 1986,
Algorithm of the Purdom-Eve-Kurki-Suonio line. Improves the computation of the transitive closure of the dag by improving the merging of the sets (actually lists) of related nodes associated with each node. Clear explanations with pictures.
     The algorithm is faster than Purdom's on large graphs with fan-in smaller than 2. It is linear for any graph with constant fan-in.

* Daniel F. Stubbs, Neil W. Webre, Data Structures with Abstract Data Types and Modula-2, Brooks/Cole Publ. Comp., Monterey, Ca. 93940, pp. 556. 1985,
Clear, entertaining and beautifully done. Covers arrays, records, linked lists, stack, queues, linear structures, trees, sorting, sets, strings and graphs, all represented tastefully as abstract data types.

* John Bentley, Programming pearls -- selection, Commun. ACM, vol. 28, #11, pp. 1121-1127. Nov. 1985,
How do I find the k-th element in an unordered set of N elements in time $ O ( N ) $? Take an arbitrary element X and construct three sets: {$ x < X $}, {x} and {$ x > X $}. From the size of the first set, decide in which set the k-th element will be found and with what ranking. Repeat the procedure until the element is in the middle set. Cost: $2N$ on the average.

* R.R. Oldehoeft, S.J. Allen, Adaptive exact fit storage management, Commun. ACM, vol. 28, #5, pp. 506-511. May 1985,
Most programs deal with blocks of a limited number of different sizes, the working set. Freed blocks are cached for a while. Requests are first checked against the cache, and if a hit is found, the block can be supplied at minimal cost.

* Kurt Mehlhorn, Data Structures and Algorithms 1: Sorting and Searching, EATCS Monographs on Theoretical Computer Science, Springer-Verlag, Berlin, 1984, pp. 336.
More narrow but much deeper than Knuth's works. Very much worth consulting, though not (printing-)error-free. Printed from typed pages.

* Walter F. Tichy, String-to-string correction problem with block moves, ACM Trans. Comput. Syst., vol. 2, #4, Nov. 1984, pp. 309-321.
We have two more or less similar large strings (actually files) S and T, and we want to exploit the redundancy by storing only S plus a script that can construct T from S when needed. This script can contain two kinds of actions: move(from, length), which copies length characters (actually file lines) from position from in S and appends them to T (note that it copies and appends; S and T are files, not strings); and add(line), which adds the line line to T.
     The problem is how to construct the script when S and T are given; once we have the script, T can be discarded. We want the script to be as small as possible. The add command includes a complete line and is supposed to be larger than any move command; so we want to minimize the number of add commands, and, once that is done, the number of move commands.
     The optimal algorithm is surprisingly simple. We try to find the longest common sequence in S that starts at $T[0]$ in T. If there is such a sequence of length l starting at s in S, we record a move(s, l), discard the first l lines of T and start again. If there is no such sequence, we record an add(T[0)], discard the first line of T and start again. We continue until T has been completely "explained away".
     The algorithm may be simple, but it is not trivial; several equally "obvious" algorithms are presented and shown wanting. The author proves the optimality of the presented algorithm, but the proof is fairly complicated. We can see that the algorithm minimizes the number of adds and then the number of moves, as follows. When there is no common sequence, $T[0]$ does not occur in S and we cannot ever obtain it by a move command, so we have no choice. When there is a common sequence, we have three choices: use all of it, use part of it, and do not use it. If we do not use it, we still have to produce $T[0]$. Using an add is too expensive, so we must get $T[0]$ from another place in S. But we will never find a longer common sequence than we have already got, so there cannot be any advantage in not using this one. If we use part of it, we can use the first part or any other part; if we use any other part we still have to produce $T[0]$, and the previous argument applies. If we use only the first l-x lines of the common sequence, we will still have to produce the other x lines. This may require an extra move or we may be lucky and combine obtaining them with another move required by the lines following $T[l]$ from an appropriate sequence in S. In either case it is not cheaper than incorporating $T[x..l]$ in our present move. So there is no point in using only part of the longest common subsequence.
     The paper explains and proves the algorithm in terms of strings and characters, which is formally correct but hardly helpful, since the algorithm is pretty useless for strings, so the motivation for the work goes out of the window.
     Two improvements of the basic algorithm are introduced, both aiming at speeding up the localization of the longest common sequence. The first enters all lines into a symbol table (called "index" in the paper) (implemented by hashing); the information kept with each line is a list of all positions it occurs in in S. $T[0]$ is looked up in this symbol table. If it is not there, it is not in S and no common sequence exists; if it is there, the symbol table gives us a list of all positions where a common sequence might start. Since lines do not normally repeat very often in files, this list is probably short.
     The second avoids all risks by constructing a suffix tree of S (by McCreight's algorithm). We just use $T[0]$, $T[1]$, $T[2]$, ..., to follow a path in the suffix tree and see how far we get. This brings us to the longest common sequence in a time proportional to the length of it. The suffix method is slower than the index method for small files (<1000 lines) but quickly gets superior for larger files (>5000 lines).
     The drawing of the suffix tree in Figure 1 is misleading because unary nodes are not shown; so there is no place, for example, for a search for wvwa to end.

* Larson, Kajla, File organization, a method guaranteeing retrieval in 1 access, Commun. ACM, vol. 27, #7, July 1984, pp. 670-.
See write-up. Retrieval is cheap, insertion can be very expensive.

* Kannitz, Van Ekert, Audit trail compaction for database recovery, Commun. ACM, vol. 27, #7, July 1984, pp. 678-.
An audit trail consists of update_record(a), update_record(b), ... update_record(a), ...
There are very many records (N = ~10^9) and many updates (~10^6), among which many duplicates. We compress the audit trail efficiently as follows. 1. Allocate a bit map of N bits. 2. Read the audit trail backwards starting from the end, and for each update_record(p), IF bit p is on THEN skip the update ELSE keep the update (writing backwards) and switch on bit p.

* L. Schmitz, An improved transitive closure algorithm, Computing, vol. 30, #4, 1983, pp. 359-371.
Detailed improvement of the algorithm of Eve & Kurki-Suonio, 1977, clearly explained. Compares 4 versions of the algorithm to those of Eve & Kurki-Suonio and of Warshall, 1962. Warshall loses big, and the others are comparable, with the author's always equal or superior.

* Jürgen Peemüller, A correction to Brelaz's modification of Brown's coloring algorithm, Commun. ACM, vol. 26, #8, Aug. 1983, pp. 595-597.
Code for two complete algorithms for exact graph colouring, with proofs.

* K. Mullin, A second look at Bloom filters, Communications of the ACM, vol. 26, #8, pp. 570-571. Aug. 1983,
Short theoretical analysis of the performance of Bloom filters, checked against a simulation. The simulation is worse than the theory predicts and the difference is explained from the imperfection of the hash functions.

* T.C. Hu, Combinatorial Algorithms, Addison-Wesley, Reading, Mass., 1982, pp. 292.
Unusual book; rich in subjects, a minimum of math and formalism, of uneven style. Not always easy to follow, but adequate. Threats shortest path, maximum flow, dynamic programming, backtracking, optimal binary trees, heuristic and near-optimal algorithms for bin packing and job scheduling, and matrix multiplication; P versus NP. Extensive bibliographies by subject. Many examples. Author quotes amply from own publications and rightly so. Lots of advice on designing algorithms.

* M. Herlihy, B. Liskov, A value transmission method for abstract data types, ACM Trans. Programming Languages and Systems, vol. 4, #4, pp. 527-551. Oct. 1982,
In-depth analysis of marshalling data (including graphs) in CLU, both for distributed programming and for persistent data.

* Manfred Broy, Peter Pepper, Combining algebraic and algorithmic reasoning: An approach to the Schorr-Waite algorithm, ACM TOPLAS, vol. 4, #3, July 1982, pp. 362-381.
Formal proof based on the rotating turn-style interpretation. Lots of literature references.

* Paolo Sipala, Compact storage of binary trees, ACM TOPLAS, vol. 4, #3, July 1982, pp. 345-361.
The relationships between the storage requirements for binary and k-ary trees are examined in closed form, using considerable amounts of undergraduate math. Ensembles of trees are characterized by a left branching probability p and a right branching probability q. This results in a map of regions showing for which k k-ary trees are preferable for a given p and q. For Lisp, the goal of the exercise, $k=3$ is best.

* A.C. Kilgour, Generalized non-recursive traversal of binary trees, SP&E, vol. 11, #12, pp. 1299-1308. Dec. 1981,
Ultimate generalization of the Schorr & Waite algorithm, derived systematically. The algorithm is parameterized with the visit order (six possibilities).

* Jürgen Ebert, A sensitive transitive closure algorithm, IPL, vol. 12, #5, pp. 255-258. Oct. 1981,
First does Tarjan's strongly connected components and then Warshall's matrices; tuned to exploit the low number of edges.

* R.S. Bird, Tabulation techniques for recursive programs, ACM Computing Surveys, vol. 12, #4, pp. 403-417. December 1980,
The call of a recursive program defines a call graph the nodes of which are labelled with the arguments of the call; this graph has to be non-cyclic for the call to terminate. If the graph is a DAG (that is, not a tree) some values will be calculated more than one, which we would like to avoid. In the general case this is only possible through brute-force memoizing, but if the recursive function shows some special properties, the call graphs have a systematic structure that can be exploited to simplify the memoization.
     Using graph pebbling as a tool, three types of tabulation strategies are described. The first is based on uniform dependency graphs; these allow an optimal solution. Less uniform graphs can sometimes be extended to (embedded in) uniform graphs; the optimal solution to the uniform graph is then a sub-optimal solution to out original graph. Non-uniform graphs can still be handled, but the solution is equivalent to brute-force memoizing.

* Edsger W. Dijkstra, Some beautiful arguments using mathematical induction, Acta Inform., vol. 13, pp. 1-8. 1980,
The first concerns a simple formula for the number of factors p (p prime) in $n!$, based on the representation of $n!$ in base p. The second is a proof of the theorem that the vertices of any polygon of maximal area lie on a circle. The third concerns an algorithm for finding the length of the longest ascending subsequence (called `upsequence' here) in a given sequence of numbers.

* Peter J. Downey, Ravi Sethi, Robert Endre Tarjan, Variations on the common subexpression problem, J. ACM, vol. 27, #4, April 1980, pp. 758-771.
[[ actually congruence closure on a directed graph ]]

* Robert Endre Tarjan, Andrew Chi-Chih Yao, Storing a sparse table, Commun. ACM, vol. 22, #11, Nov. 1979, pp. 606-611.
First the well-known row displacement technique is considered and it is shown that its implementation using first-fit on rows of decreasing density works well (with provable bounds) if the densities of the rows have a "harmonic decay" property (obey Zipf's Law). Now, if the rows do not have this property, the columns are extended downwards with empty spaces and the actual column contents are slid downwards over a certain distances so as to create a set of rows that does have the desired property. Hence the name double displacement.
     Although we have increased the number of rows by this procedure, they behave better than before, provided the sliding distances have be determined properly. A criterion (definitely non-trivial) is given for these sliding distances and an analysis is sketched to justify it. The size of the resulting table is $ O ( n ~ log ~ log ~ n ) $ and its acces time is constant. By packing some of the data in one machine word, the memory requirements can be lowered to $ O ( n ) $.
     Code is given for the first-fit decreasing row displacement algorithm. Of the code for the much more difficult column displacement algorithm, the authors say "We leave the details to the reader" (pg 610, left column) :-(.

* Daniel Brélaz, New methods to color the vertices of a graph, Commun. ACM, vol. 22, #4, April 1979, pp. 251-256.
Assumes considerable familiarity with graph colouring algorithms, heuristic and exact. Specifies two improvements, one to a heuristic algorithm and one to an exact one.
     The heuristic algorithm use the "saturation degree" of a node, the number of different colours its neighbours have (which is less than or equal to the "degree" of the node, the number of neighbours the node has). First a node with maximal degree is coloured. Next, repeatedly nodes of maximal saturation degree are coloured, until done. The improvement: when a node cannot be coloured without introducing a new colour, we first try to interchange some of the colours already assigned so as to reduce the saturation degree of the node. This is called saturation degree with interchange.
     The exact algorithm (Randall-Brown) seems to me to be just depth first visit. The improvement consists of preceding it by a clique-finding step to obtain an initial unavoidable colouring and a lower bound, and an application of the saturation degree with interchange algorithm to obtain an upper bound.
     Both algorithms are compared to a couple of other algorithms from the literature and found superior.

* C.P. Schnorr, An algorithm for transitive closure with linear expected time, SIAM J. Comput., vol. 7, #2, pp. 127-133. May 1978,
Gives a very sophisticated version of the iterative bottom-up transitive closure algorithm and does a very careful statistical complexity analysis. The author even worries about the time spent in reading the graph, and justifiedly so, since the entire algorithm has no higher time complexity.

* S. Even, M. Rodeh, Economical encoding of commas between strings, Commun. ACM, vol. 21, #4, April 1978, pp. 315-317.
[ recursively prepend the length; this is about single bit strings ]

* Thomas L. Saaty, Paul C. Kainen, The Four-Color Problem -- Assaults and Conquest, Dover Publ., New York, 1977, pp. 211.
The first half is devoted to the four-colour problem and the Appel-Haken solution. The second half treats a number of variations: Hamiltonians, chromatic numbers, etc. Extensive bibliography.

* J. Eve, R. Kurki-Suonio, On computing the transitive closure of a relation, Acta Inform., vol. 8, 1977, pp. 303-314.
[[ excerpt ]] Extremely complicated derivation of a relatively simple way to compute a transitive closure in top-down fashion: determine strongly connected components on the fly and contract them; then a top-down visit does the job. Or am I missing something important here?

* M. Sassa, E. Goto, A hashing method for fast set operations, Inf. Process. Lett., vol. 5, #2, 1976, pp. 31-34.
None to clear exposition of ...

* Niklaus Wirth, Algorithms + Data Structures = Programs, Prentice-Hall, Englewood Cliffs, NJ, 1976, pp. 366.
Solid treatment of the state of the art in computer programming, combining breadth with depth. The division in chapters is surprising to say the least: 1. Fundamental data structures; 2. Sorting (!); 3. Recursive algorithms; 4. Dynamic information structures; 5. Language structures and compilers. But the lack of structure is completely offset by the wealth of good algorithm examples in Pascal. Many algorithms are covered; to name a few: multi-way file merging; polyphase sort; backtracking; stable marriage problem; balanced trees; B-trees; hashing; table-driven parsers. (Note: The publisher goofed: it says "Nicklaus Wirth" on the cover!)
     Still a good read after 25 years.

* M.R. Garey, D.S. Johnson, The complexity of near-optimal graph coloring, J. ACM, vol. 23, #1, Jan. 1976, pp. 43-49.
Extension of the result (by Stone... ??) that if there is a polynomial algorithm that always does graph coloring with less than 4/3 of the minimum number of colours, then P == NP. There is so much math here that it is difficult to tell what the exact implications are; perhaps not severe, as with the previous result.

* J, Cohen, Interpretation of Non-Deterministic Algorithms in Higher-Level Languages, Inf. Process. Lett., vol. 3, #4, pp. 104-109. 1975,
Floyd ("Nondeterministic Algorithms", 1967) shows how non-deterministic algorithms can be executed; this paper fills in the programmatic details by supplying code segments for the operations used by Floyd.

* D.S. Hirschberg, A Linear Space Algorithm for Computing Maximal Common Subsequences, Commun. ACM, vol. 18, #6, June 1975, pp. 341-343.
The maximally possible common subsequence (not substring!) of two strings $ A [ 1 .. m ] $ and $ B [ 1 .. n ] $ can be found in linear space and quadratic time as follows.
     Conceptually, an $ n times m $ matrix L is computed, where each element contains the length of the maximal common subsequence (MCS, to avoid confusion with LCS, longest common substring) that can be found in the top-left submatrix of which it is the bottom-right corner. This matrix can be filled easily in a top-to-bottom left-to-right fashion. This yields the length of the MCS of the entire matrix in its last element, but leaves two problems: the size of the matrix, and what is the MCS.
     At any moment, filling uses only the present row and the previous row, so we need only two rows rather than the entire matrix; this is Algorithm A. Finding an MCS (there may be more than one) itself is more difficult.
     If we knew the MCS, we could trace it in the matrix as a broken diagonal, from somewhere up-left to somewhere down-right. We try to find the point where the MCS crosses the horizontal middle of the matrix. To this end, we compute the bottom-most row $L1$ of the top half of the matrix, using Algorithm A, and the top-most row $L2$ of the bottom half of the matrix, using Algorithm A upside down. Now any point on the border between these two rows at which the sum of the lengths equals the length of MCS, is a point where an MCS crosses the border. This is Algorithm B.
     Applying Algorithm B recursively to the top and bottom halves, we can find succesive points through which the MCS passes. Once the recursion has reduced the height of the matrix to 1, we know which character is part of the MCS, and returning from the recursion we assemble these characters into the required MCS. This is Algorithm C.
     There are several bugs in the code: 1. Algorithm B, line 2: for j ← 1 to n do => for j ← 1 to n do. 2. Algorithm C, line 3.3: $ A sub { n , i+1 } $ => $ A sub { m , i+1 } $. 3. Algorithm C loops when both input strings are empty (I think; implement!).

* M. Rem, On the programming of elastic stores, Information Processing Letters, vol. 3, #6, pp. 184-187. July 1975,
An elastic store is a linear memory in which insertion and deletion operations are about as expensive as read and write operations in "normal" memory. The paper considers the hypothetical situation in which such stores could be built, and asks how such stores could be used in programming, addressing by offset of course being out of the question.
     Accessing data is based on markers, zero-thickness named entities positioned between some pairs of adjacent data cells. The operation "find next (previous) marker" is assumed efficient. Several questions arise: 1. If two different markers are positioned between the same pair of adjacent data cells, do these markers have any order? 2. Do markers point to the data cell before them or after them? 3. How do we use markers to partition the store into partitions (segments), especially when these partitions can be empty? All these questions have a heavy "one-off error" flavor.
     Several alternatives are examined in the paper, and checked for consistency. For example, the sequence "insert x as a data cell at marker m" followed immediately by "delete data cell at marker m" should be a null operation.
     The answers are: 1. adjacent markers have no order; 2. one needs two kinds of markers, one kind that points to the predecessor and one kind that points to the successor; 3. one needs two sentinel data cells per partition, each with a marker pointing away (!) from it. The author shows that this is sufficient (and almost necessary) to implement a FIFO queue.

* J. Dzikiewicz, An algorithm for finding the transitive closure of a digraph, Computing, vol. 15, 1975, pp. 75-79.
This is Algol 60 code only. The algorithm looks a lot like Purdom's, but is structured much better. Comparing them statement by statement is too much work.

* M.M. Sysło, J. Dzikiewicz, Computational experiences with some transitive closure algorithms, Computing, vol. 15, 1975, pp. 33-39.
Four algorithms for transitive closure are compared: Warshall (1962), Purdom (1970), Yen (not published) and Dzikiewicz (1975). Purdom's is better behaved, but Dzikiewicz's is usually faster; Warshall's is OK for very low densities only.

* David A. Fisher, Copying cyclic list structures in linear time using bounded workspace, Commun. ACM, vol. 18, #5, May 1975, pp. 251-252.
This is what happens, as far as I understand it. The "list structure" from the title is a binary graph. Its nodes may be all over the place, but the nodes of its copy have to be allocate contiguously in a known segment of memory, since pointer increments and decrements are used to reach the nodes and pointers to them have to be identifiable.
     In phase 1, the original is copied by repeatedly scanning and copying the spines described by the right pointers. It turns out that in doing so, all pointers in the copy can be set correctly except those right pointers that point to forward (= not yet copied) nodes; these still point to their original right successors and the right pointer of their original points to them.
     In phase 2, the nodes of the copy are scanned in forward direction (using pointer increments) and the affected right pointers are reset to the parent nodes of the original right successors.
     In phase 3, the right pointers of a copied node, its original and the parent of the latter are rotated much as in the Schorr and Waite algorithm, to set all pointers correctly.

* Robert Endre Tarjan, Efficiency of a good but not linear set merging algorithm, J. ACM, vol. 22, #2, April 1975, pp. 215-225.
The original paper on the famous UNION/FIND algorithm, with complexity analysis resulting in the inverse Ackermann function.

* R.A. Wagner, Order-n correction for regular languages, Commun. ACM, vol. 17, #5, May 1974, pp. 265-268.
Presents an O(n) algorithm which, given a string and a finite-state automaton, can correct the string to an acceptable one with a minimum number of edit operations.

@article{DBLP:journals/spe/Plum74, author = {Thomas W.-S. Plum}, title = {Random Search on the 8-Queens Problem}, journal = {Softw., Pract. Exper.}, volume = {4}, number = {3}, year = {1974}, pages = {251-253}, annote = { Subtitled "A Simple Solution for a Simple Problem". Presents a 15-line APL program, which draws random permutations of length 8 (in APL: 8?8; simple indeed), interprets them as a board position and tests them. After 996 tries a solution was found. }

@article{DBLP:journals/spe/Bell74, author = {James R. Bell}, title = {Rapid Calculations of Subscripted Array Addresses}, journal = {Softw., Pract. Exper.}, volume = {4}, number = {3}, year = {1974}, pages = {275-277}, annote = { Since address calculations for a multi-dimensional array involve multiplying indices by a known step size over a known range, it is proposed to precompute the pertinent multiplication table; it is (much) smaller than the array itself. The method is compared to precomputing tables of pointers to sub-rows and sub-matrices, which are (much) bigger. }

* J.E. Hopcroft, J.D. Ullman, Set merging algorithms, SIAM J. Comput., vol. 2, #?, 1973, pp. ??-??.
Very complicated algorithm, probably superseded by Tarjan, "Efficiency of a good but not linear set merging algorithm", 1975.

* J.M. Robson, An improved algorithm for traversing binary trees without an auxiliary stack, Inf. Process. Lett., vol. 2, #1, March 1973, pp. 12-14.
Shows a clever way to get rid of the TAG bit in Siklóssy's algorithm (Siklóssy, "Fast and read-only algorithms for traversing trees without an auxiliary stack", Inf. Process. Lett., Vol. 1, No. 4, June 1972, pp. 149-152), but silently gives up the read-only property.

@article{DBLP:journals/spe/Baker72, author = {John L. Baker}, title = {CDC 6000-Series Register Save/Restore}, journal = {Softw., Pract. Exper.}, volume = {2}, number = {4}, year = {1972}, pages = {377-387}, annote = { Saving all 23 (X[0:7], A[0:7], B[1:7]) registers of the CDC 6000 is somewhat of a Chinese puzzle; it is solved by devious manipulation in about 60 machine instructions. Same for restore. The routine cannot handle concurrent peripheral action on A6 or A7. }

@article{DBLP:journals/spe/Brown72, author = {P. J. Brown}, title = {Re-creation of Source Code from Reverse Polish Form}, journal = {Softw., Pract. Exper.}, volume = {2}, number = {3}, year = {1972}, pages = {275-278}, annote = { The author suggests considering an opening parenthesis as a unary operator on the expression it enclosed, and the closing parenthesis as a postfix operator on the basic operand it follows. Under this regimen reverse Polish is generated which can be converted back to the exact original expression. }

* David W. Matula, George Marble, Joel D. Isaacson, Graph coloring algorithms, in Graph Theory and Computing, ed. by Ronald C. Read, Academic Press, New York, 1972, pp. 109-122.
Practical paper concerned with efficient heuristics to do the job, but still keeps an eye on the theoretical bounds. Restricts itself to "sequential algorithms", algorithms which first order the nodes and then colours them in the indicated order.
     Two basic sequential orderings are explained: largest-first, which orders the nodes by degree, highest degree first, and smallest-last, which recursively dismantles the graph, removing a node of highest degree in all steps, and then reverses the order found this way. Also two ways of colouring are given: naive, in which a new colour is introduced only if none of the colours in use would do, and "colouring-with-interchange".
     Colouring-with-interchange is invoked when a node, V, is to be coloured and its direct neighbours together already have all the colours that are in use. Rather than immediately decking out a fresh colour, we try to reduce the number of neighbouring colours by interchanging two colours in a restricted part of the graph. We pick a neighbour, N, of V, which has already been coloured and which has a colour that occurs only once among the neighbours of V, and which has a neighbour M with that same property. N and M have different colours $ C sub N $ and $ C sub M $, since they are neighbours. We now swap the colours of N and M. The result is that there is now one colour less among the neighbours of V (since $ C sub N $ disappeared and the $ C sub M $ which took its place was already there, since all colours were already there), but both N and M may now be in trouble since they may have neighbours of the colour they have just been given. If so, the offended nodes are recoloured in turn, from $ C sub N $ to $ C sub M $ and vice versa, to solve the problem. This may introduce new conflicts, which are solved the same way. If this process dies out without reaching back to a neighbour of node V, all is well (it will die out eventually for lack of nodes to colour). If, however, it does reach back to a neighbour of V, we lose, since we recolour a neighbour of V back to $ C sub N $, reintroducing the colour we tried to dispel (we cannot return to a neighbour of V and colour it to $ C sub M $, since then its colour had to be $ C sub N $ originally, which is impossible since there was only one occurrence of $ C sub N $ among the neighbours of V).
     The four combinations are analysed theoretically (to a certain depth) and compared practically. Smallest-last with interchange is the winner, which the best colouring most of the time and one colour too many about 30 percent of the time. Even random ordering without interchange usually wastes less than 35 percent of the colours. The theoretical bounds are often 200 to 300 percent too high.

* R. Bayer, E. McCreight, Organization and maintenance of large ordered indexes, Acta Inform., vol. 1, pp. 173-189. 1972,

* Donald E. Knuth, Ancient Babylonian algorithms, Commun. ACM, vol. 15, #7, July 1972, pp. 671-677.
The ancient Babylonians (~ 1800 B.C.) used base 60, without indicating the position of the sexagesimal point; so 30 could mean 30 or 1800 or ½. Addition is simple, multiplication was probably done by using tables, and division by X was done as multiplication by $1/X$ and was only possible for those values of X that have finite reciprocals in base 60. Tables of reciprocals of all numbers with reciprocals of <= 3 sexagesimal digits have been found. There is some evidence that they were also used backwards, for division by numbers that did not have finite reciprocals in base 60. No explicit algorithm descriptions for the simple arithmetic operations have been found; probably everybody concerned knew them.
     Algorithms, almost all of them non-repetitive, were specified as worked-out examples, with sample numbers filled in, but formulated so that they would also work for other numbers. Algorithms described specific domestic or architectural problems, but in fact solved equations. Some of the equation problems for which algorithms have been found are solving $ x - y = d $ and $ x * y = p $ for x and y; finding the lengths of diagonals using Pythagoras (over 1000 years before P.!); computing y under the constraint $ x + y = x * y $ and x given; and compound interest (at a sample interest of 20% per year!).
     Some algorithms are given in stack machine style. Arithmetic operations conceptually destroyed their left operand, i.e., "double the length" destroyed the copy of "length", necessitating instructions for making copies for further computations. Compound interest was done by example macro expansion.
     The Seleucids (~ 300 B.C.) had a stylized form of algorithm description, a "programming language", in which no sample numbers occurred. They also had reciprocal tables of <= 6 sexagesimal digits (sorted as well!).

* Laurent Siklóssy, Fast and read-only algorithms for traversing trees without an auxiliary stack, Inf. Process. Lett., vol. 1, #4, June 1972, pp. 149-152.
The tree nodes are modified thus: a single bit per node, TAG, distinguishes between left and right children (to direct the backing-up step) and the left and right pointers are xor-ed with the address of the parent. Now apply Schorr & Waite, suitably modified for the xor, and it will never have to write a pointer in a node. If the entire tree is fixed, right pointers can be changed to threads (this can be done irrespective of xor, Schorr & Waite, etc. anyway DG).

* V.L. Arlazarov, E.A. Dinic, M.A. Kronrod, I.A. Farad\*(vzev, On economical construction of the transitive closure of an oriented graph, Soviet Math. Dokl., vol. 11, 1970, pp. 1209-1210.
The "Four Russians" algorithm.

* Ian Munro, Efficient determination of the transitive closure of a directed graph, IPL, vol. 1, #2, pp. 56-58. 1970,
Casts the problem in adjacency matrices and then reformulates it into matrix multiplication. Next applies Strassen's matrix multiplication to obtain a bound smaller than $ O ( N sup 3 ) $.

* Paul Purdom Jr., A transitive closure algorithm, BIT, vol. 10, 1970, pp. 76-94.
First a four-step procedure is described: 1. all cycles are contracted; 2. the resulting nodes are sorted topologically; 3. the transitive closure is computed in that order; 4. the contracted nodes are expanded, to produce the entire transitive closure. The term strongly connected component is not used.
     Next the algorithm is analysed in detail, and compared to Warshall's algorithm (which loses, often by a factor of 5 or more). Third, the algorithm is given in full, in Algol 60 (one procedure of 7½ printed pages, no subprocedures, no recursion).

* Burton H. Bloom, Space/time tradeoffs in hash coding with allowable errors, Commun. ACM, vol. 13, #7, July 1970, pp. 422-426.
The addressed problem is the test for non-membership of a large given set of (compound) values. If the test succeeds, non-membership is guaranteed; if it fails, membership is very likely but not guaranteed.
     Two hashing algorithms are given. Method 1 uses open hashing and stores a hash representation of the value rather than the value itself in the hash table. If the look-up fails (no hash entry or no matching representation) non-membership is guaranteed.
     Method 2 uses a hash table H of N single-bit elements and d hash functions $ h sub 1..d $. If the given set contains a value v, all bits with indexes $ h sub 1..d ( v ) $ are on in H. If for a test value t not all bits with indexes $ h sub 1..d ( t ) $ are on, non-membership is guaranteed. There is an optimal value for d, given N and the number of values in the fixed set, which minimizes the chance of a false positive. Method 2 has later become known as Bloom filters.
     It is shown analytically that Method 2 is better than Method 1. In a realistic case study concerning word hyphenation, allowing about 6% false positives allowed the size of the hash table to be reduced by a factor of about 7.

* B.W. Kernighan, S. Lin, An efficient heuristic procedure for partitioning graphs, Bell System Technical J., vol. 49, #2, Feb. 1970, pp. 291-308.
This paper concerns minimum-cost graph partitions with restrictions on the sizes of the resulting subgraphs. Quite a number of heuristics are supplied.
     The first heuristic addresses the problem in which 2 subgraphs should result of equal size. A random partition in two equally sized subgraphs is made, and its cost computed; this is the initial partition. The heuristic now works as follows. We are going to exchange pairs of nodes, one from each of the subgraphs, until all nodes have been swapped; so we end up with the same partition, with left and right exchanged. During such a swap sequence, the cost may go up and down, but the gain starts at 0 and ends at 0. Somewhere on each such swap sequence there is a maximum gain, and among all possible swap sequences there is one that exhibits the optimal gain. The problem is to find this optimum-exhibiting swap sequence. The following heuristic generates promising swap sequences.
     Make a temporary copy C of the initial random partition P. Find the pair of nodes in C, one from each subgraph, that, if these nodes were exchanged, would give the greatest reduction in cost; note that this greatest reduction may still be negative. Record the pair and the reduction so far, swap the pair in C, and mark the pair as "swapped". Repeat this process, restricting the choice to non-swapped nodes, until all nodes have been swapped; note that since the subgraphs change, the greatest reduction need not remain negative after having been negative. Find the point in the swap sequence thus obtained at which the reduction was the highest, replay the swap sequence on the partition P, and clear the "swapped" marks on all nodes. This yields our new initial but no longer rnadom partition P.
     Now new gain is possible, since all nodes are back in the game. We repeat this process until we meet a swap sequence that shows no reduction at all. This yields our required partition. (The paper explains how to do these operations efficiently and avoid the copy.) The algorithm requires between 2 and 4 cycles, and yields the optimal partition in about half the cases for graphs of 30 nodes. The probability goes down exponentially with increased graph size. The time requirements are $ O ( n sup 2 ) $.
     The next heuristic builds on the previous one, as follows. Again start by randomly splitting the graph into two subgraphs, A and B. Apply the above heuristic algorithm to partition A into $ A sub 1 $ and $ A sub 2 $, and B into $ B sub 1 $ and $ B sub 2 $. Now apply the above heuristic algorithm with initial partitions (A, B), ($ A sub 1 $ + $ B sub 1 $, $ A sub 2 $ + $ B sub 2 $) and ($ A sub 1 $ + $ B sub 2 $, $ A sub 2 $ + $ B sub 1 $), and keep the best. This raises the probability of obtaining the optimal partition for a 30 node graph to almost certainty.
     The paper includes many more heuristics for related problems.

* Howard L. Morgan, Spelling correction in systems programs, Communications of the ACM, vol. 13, #2, pp. 90-94. Feb. 1970,
Applies Damerau's (1964) spelling corrector to program texts and extends it to have different dictionaries for different contexts. For example, in the context of the condition in an if-statement, the dictionary contains only words that are allowed syntactically in a condition. So the identifier GOR will be corrected to OR rather than to FOR.
     The algorithm is given in a flowchart and in IBM 360 Assembly Language Code; it relies heavily on the TRT (Translate and Test) instruction.

* D. Michie, "Memo" functions and machine learning, Nature, vol. 218, #5136, April 6, 1968, pp. 19-22. (Dept. of Machine Intelligence and Perception, Univ. of Edinburgh),
Recognizes that a computer function should behave like a mathematical function: the input determines the output, and how the calculation is done is immaterial, or at least behind the screens. This idea frees the way for alternative implementations of a given function, for example by memoizing.
     A function implementation consists of a rote part and a rule part. The rote part contains the input-to-output mappings the function has already learned by rote, and the rule provides the answer if the input is new. New results are added to the rote part and the data is ordered in order of decreasing frequency by using a self-organizing list. This way the function "learns" answers by rote as it is being used. The list is fixed-size and if it overflows, the least popular element is discarded.
     The first example uses memoization simply as a speed-up device to calculate the function factorial (in POP-2).
     The second example is considerably more sophisticated. It concerns a program on one machine (an Elliott 4100) that must learn to control an intentionally unstable program on a second machine (a PDP-7); both machine are connected by a link. The PDP-7 controls what the paper calls a Donaldson system. Its main constituent is a little motor cart which runs on a linear bounded track. The cart balances a pole on its top; the pole is connected to the cart by a hinge, which allows it to fall forward or backward, but not sideways. The goal of the Elliott 4100 is to keep the pole balanced as long as possible by sending commands forward, stop or backward to the PDP-7. It looses when the pole comes down or the carriage crashes into one of the ends of the track.
     What happens next is not entirely clear to me. There are at least two issues here. One is that the result of the action is somehow used to either or not (or negatively?) enter the resulting value in the rote table, so learning is conditional here. The second is that the input to the function is mapped onto equivalence classes, to reduce the size of the rote table; the mapping is programmed by the user.
     The third example concerns interpolation and curve fitting, but there the relation with memoization escapes me completely.

* Robert W. Floyd, Nondeterministic Algorithms, J. ACM, vol. 14, #4, pp. 636-644. Oct. 1967,
A non-deterministic flowchart ("algorithm" is equated with "flowchart" in this paper) is a flowchart that contains a call to a multiple-valued function, a function that can give more than one result. The program is then run for all the results of the function, as follows. Each time the program reaches a point of termination labeled "success", the program result is reported; when it reaches a termination point labeled "failure", it backtracks to the latest call of a multi-valued function and continues from there with the next value.
     Many programs, especially those for combinatorial search problems, can be formulated more easily in a non-deterministic flowchart, but it is impossible to execute such a flowchart on a conventional computer.
     The paper describes a mechanical method for converting a non-deterministic flowchart into a deterministic one. Basically each path in the flowchart is replaced by two paths, one going forward performing the computation, and the other going backward performing the backtracking. The backtracking is implemented using a stack.
     For simple non-deterministic flowchart constituents, two replacing deterministic constituents are given, one for each path. For example, an assignment node $ X := f $ is replaced by a computation node $ push(X); X := f $ and a backtrack node $ X := pop() $.
     More complicated non-deterministic nodes, which can have more than one input and output arc, are replaced by a graph of deterministic nodes, with each non-deterministic input arc augmented by a backtrack output arc and each non-deterministic output arc augmented by a backtrack input arc. This process turns out to be less complicated than it sounds, and the transformations are relatively straightforward. The process is illustrated using the 8-queens problem and by finding cycles in a directed graph.
     The author takes pains to point out that non-deterministic algorithms are not probabilistic, random of Monte Carlo algorithms. He also suggests the interpretation that the value of the multi-valued function is gouverned by the desired result rather than by preceding causes; it is as if the values that did not lead to success were never actually yielded by the function.

* S.W. Golomb, L.D. Baumert, Backtrack programming, J. ACM, vol. 12, pp. 516-524. 1965,
The idea of backtracking is introduced as a method to find one or all global optimal of a function $φ(x sub 1, ..., x sub n)$. The method is set off against hill-climbing, which starts with one set of parameter values and improves the function value by small changes in some of the parameters until no more improvement can be achieved; this method finds one local optimum.
     The method needs a threshold T such that if $φ(x sub 1, ..., x sub k, ----) < T$ it can be concluded that any further solution with parameters $x sub 1, ..., x sub k$ will be suboptimal. At that point all possibilities for $x sub (k+1), ..., x sub n$ can be skipped, and backtracking can be initiated leading to other values of $x sub k$. The process is explained in detail, using the eight queens problem.
     The next problem, comma-free words, uses a different threshold. As words are added to the solution set, a test is made if the words still obey the comma-freeness condition; if not the program backtracks.
     Two optimizations are discussed: preclusion, in which certain values of some $x sub i$ with $i > k$ are temporarily taken out of the game because they are incompatible with $x sub 1, ..., x sub k$; the heuristic of ordering the parameters such that $|X sub 1| <= ... <= |X sub n|$, so smaller sets are tested first. A statistical version for very large problems (recovery from noisy transmissions) is also explained.
     The author emphasizes that the application of backtracking is not automatic, and ingenuity is required. Also, backtracking does not always take away the exponential sting of an algorithm.

* Fred Damerau, A technique for computer detection and correction of spelling errors, Communications of the ACM, vol. 7, #3, pp. 171-176. March 1964,
Observes that 80% of all spelling errors in natural language text fall in one of four classes: wrong character, missing character, extraneous character, and adjacent characters swapped. The correcting spelling checker is based on this, as follows.
     A word (both in the text and in the dictionary) is characterized by its length and its character register, a machine word (bit array) the length of the alphabet, in which bit N is on if the word contains character number N from the alphabet. These characterizations are used to quickly determine if a match is possible (under the proposed error corrections) between a word W in the text and a word D in the dictionary, to speed up the linear dictionary search.
     If the lengths of W and D differ by more than 1, or their character registers differ in more than 1 bit, there can be no match. Otherwise, W and D are compared character by character until a non-matching position is found (if none is found W is in the dictionary). Now, if |W| - |D| = 1, one character in W is skipped and the rest must match; if |W| - |D| = -1, one character in D is skipped and the rest must match; and otherwise (|W| - |D| = 0) the next two characters in W are swapped and the rest must match.

* J. Weizenbaum, Symmetric list processor, Commun. ACM, vol. 6, #9, Sept. 1963, pp. 524-544.
Algorithms for doubly-linked lists.

* Thomas N. Hibbard, A simple sorting algorithm, Journal of the ACM, vol. 10, pp. 142-150. 1963,
Interesting stuff about shortest fixed sort sequences.

* L.R. Ford, Jr., D.R. Fulkerson, Flows in Networks, Princeton Univ. Press, Princeton, New Jersey, 1962, pp. ???.
At p. 11: Max-flow min-cut algortithm.

* Douglas T. Ross, A generalized technique for symbol manipulation and numerical calculation, Comm. ACM, vol. 4, #3, March 1961, pp. 147-150.
Invents the record by observing that many property lists in Lisp (the only dynamic data structure known in 1960) are similar or even equal, thus describing similar or equal-structured elements. By putting the components of such an element consecutively in fixed places, memory space and processor cycles can be saved.
     To access a component of an element (in today's speech "a field in a record") indexing instructions are used "reversed", as follows. The normal index instruction specifies the (constant) address A of the data, usually an array, and an index register R holding the offset into the array; the memory location accessed is mem[A+contents(R)]. In the reversed use, R holds the address of the element (=record), and the constant A is the fixed offset of the element (=field) in the record. Since the instruction still accesses mem[A+contents(R)], it accesses R→A.
     The use of records is demonstrated by a very complicated flow-chart for the propositional calculus, the Wang Algorithm, which involves extensive pointer manipulation. Note: the null pointer is called END.
     The author is deeply impressed by the fact that "the simple operation of changing the index register to refer to the next element ... causes a great number of changes of data to be made essentially instantaneously". Makes you realize how much we take for granted today.

* Charles R. Blair, A program for correcting spelling errors, Information and Control, vol. 3, pp. 60-67. 1960,
The paper actually describes a test for the "similarity" of two words, one from the text (W) and one from the dictionary (D), under the assumption that W is spelling error (not typing error!); given W, D is to be retrieved. So "sieze" should be found similar to "seize", but "sieae" would not.
     Both words are abbreviated to 4 characters, using a Soundex-like system not based on pronunciation. The abbreviation algorithm has two tables, one giving the probability L that a given letter is a mistake (for example, 'A' will result from spelling error more readily than 'Q'), the other giving the probability P that the letter in a given position is a mistake (mistakes are less likely at the start and end of a word than in the middle). Now for each letter in W, compute $ L times P $, the probability that the letter in its position is a mistake, and retain the 4 most certain (least error-likely) letters in their proper order. This is the abbreviation of W.
     In a test on 117 misspelled words, the algorithm corrected 89, rejected 26, and incorrectly accepted 2.

* A. Newell, J.C. Shaw, H.A. Simon, Empirical explorations of the logic theory machine, in 1957 Western Joint Computer Conference, ???, 1957, pp. 218-230.
[ Found as an early reference to dags ]