Program Creation -- Theory, Methodology and Tools

Literature references and annotations by Dick Grune, dick@dickgrune.com.
Last update: Sun Nov 20 22:04:39 2022.

These references and annotations were originally intended for personal use and are presented here only in the hope that they may be useful to others. There is no claim to completeness or even correctness. Each annotation represents my understanding of the text at the moment I wrote the annotation.
No guarantees given; comments and content criticism welcome.

* Nicolas Carriero, David Gelernter, A computational model of everything, Commun. ACM, vol. 44, #11, Nov. 2001, pp. 77-81.
The authors divide computing into two orthogonal components: computation, which includes computers, printers, etc., in short anything that does something useful for computation; and coordination, which includes data, files, printer commands (files to be printed), etc., in short anything that requires or assists in computation. (Is this still orthogonal? DG)
     The authors offer two models for coordination, Tuple Space and Lifestreams, and one for computation, symmetric programming languages. A second model for computation, human thought, is left for the last paragraph (and for the next 20 years).
     Tuple space is like Linda tuple space, except that it is recursive (tuples can contain tuple spaces) and ubiquitous (it is the only way to represent data). Lifestreams are somewhat similar to tuple spaces, but are ordered chronologically. Items can be anonymous, and can be accessed by contents, classification or what not. Both have the property that items can be added and retrieved by any process that has access to the tuple space or lifestream.
     Symmetric programming languages treat space and time symmetrically. For example, just as a block in C can hold three variables x, y, and z, which exist concurrently and occupy space, a block in a symmetric language can hold three processes which exist concurrently and occupy time. (I think I do not understand this part.)

* Cristóbal Pareja-Flores, J. Ángel Velázquez-Iturbide, Calculating encoding and decoding functions for prefix codes, ACM SIGPLAN Notices, vol. 34, #4, April 1999, pp. 54-60.
Uses Partsch's method of program transformation to derive the desired functions systematically.

* Philip Wadler, How to declare an imperative, ACM Computing Surveys, vol. 29, #3, Sept. 1997, pp. 240-263.
Monads are a way to define the semicolon sequencer of imperative languages as a binary operator in a functional language. Not surprisingly, the arguments of the ; operator are quite complicated. The main purpose of the ; operator is to gather and pass on "global" state, of type State. The left operand is a function of type (State → ($alpha$, State)), the right operand is a function of type ($alpha$ → (State → ($beta$, State))), and the result is a function of type (State → ($beta$, State)), as far as I can decipher. Considerable syntactic sugar is applied to hide the gory details. Many examples are given.

* Philip Wadler, Lazy versus strict, ACM Computing Surveys, vol. 28, #2, June 1996, pp. 318-320.
Tutorial on the state of the art in lazy versus strict. First the difference is explained. Next it is stated that monads combined with continuations can be used to introduce strict semantics in a lazy language and lazy semantics in a strict language. Finally, taking Church's $ lambda $ calculus as a model for the lazy languages and Plotkin's $ lambda sub v $ calculus as a model for the strict languages, and extending them towards each other using the above techniques, the difference can be isolated in one very simple axiom, all the other axioms being the same in both augmented theories.
     No proofs, just a sketch; the paper ends in a visual pun with the words untied and united.

* Branko Soucek, The Iris Group, Dynamic, Genetic and Chaotic Programming -- The Sixth Generation, John Wiley, 1992, pp. 544.
[w.00418.IG]

* Ole-Johan Dahl, Verifiable Programming, Prentice Hall, 1992, pp. 350.
More of the same.

* Peter Fritzson, Nahid Shahmehri, Mariam Kamkar, Tibor Gyimothy, Generalized algorithmic debugging and testing, ACM Letters on Programming Languages and Systems, vol. 1, #4, Dec. 1992, pp. 303-322.
[ see sigplan june 1991 ??? ]

* Ronald E. Prather, The semantics of CAT, ACM SIGPLAN Notices, vol. 27, #6, pp. 54-63. June 1992,
Based on category theory.

* Antoni Diller, Z: An Introduction to Formal Methods, John Wiley, Chichester, 1990, pp. 309.
How to program in Z. Very nice and realistic examples and case studies: from a telephone number data base and a bill of materials manager to the specification of a text editor. Two chapters show how to transform specifications in Z to programs in Miranda and Prolog. Includes the Z Reference Manual. With the possible exception of the somewhat belligerent preface, the author's attempts at humor are reasonably successful. It suffers from 88 special mathematical symbols; why can't these people use keywords?

* Matthew Hennessy, The Semantics of Programming Languages -- An Elementary Introduction using Structural Operational Semantics, John Wiley, Chichester, 1990, pp. 157.
Structural Operational Semantics is a combination of Evaluation semantics and Computation semantics. Evaluation semantics is a calculus for results (values) associated with instances of language constructs, Computation semantics is a calculus for the sets of computations a construct specifies. The book is disappointing in that it is again full of ingenious ways to prove the obvious.

* Caroll Morgan, Programming from Specifications, Prentice Hall, Hemel Hempstead, 1990, pp. 255.
Programming theory expressed with over 100 one-symbol operators, gouverned by some 80 laws. The written text is relatively easy to follow, the formalism has a high threshold. Many case studies.

* Edsger W. Dijkstra, Carel S. Scholten, Predicate Calculus and Program Semantics, Springer Verlag, New York, 1990, pp. 220.
Bone dry and very efficient exposition of the subject, giving heuristics for finding proofs. Probably worth the considerable effort of studying it. Does not contain a single literature reference.

* Anne Kaldewaij, Programming: the Derivation of Algorithms, Prentice Hall, 1990, pp. 240.
A large number of small algorithms are derived using a very polished form of calculational programming. No literature references.

* Helmut A. Partsch, Specification and Transformation of Programs -- A Formal Approach to Software Development, Springer Verlag, Berlin, 1990, pp. 493.
Systematic and thorough real-world treatment of the subject in great mathematical detail.

* Daniel Keller, A guide to natural naming, ACM SIGPLAN Notices, vol. 25, #5, May 1990, pp. 95-102.
Much good and usable advice on naming types, procedures, variables, etc. in programs.

* Giulio Ianello, Programming abstract data types, iterators and generic modules in C, Software -- Practice and Experience, vol. 20, #3, March 1990, pp. 243-260.
The values created for the ADT have pointer semantics. Iterators are implemented as four routines: Start(iter_state), Exists_Next(iter_state), Next(iter_state), and Stop(iter_state). Generics are implemented using (void *) pointers and macros. C type checking is maintained.

* H. Ehrig, B. Mahr, Fundamentals of Algebraic Specification 2 -- Module Specifications and Constraints, EATCS monographs on Theoretical Computer Science, #21, Springer-Verlag, Berlin, 1989, pp. 421.
Reasonably understandable set of examples of applications of the formalisms described in part 1. Indispensable for understanding part 1.

* Mike Spivey, The Z Notation, Prentice Hall, Hemel Hepmstead, England, 1989, pp. 154.
Z is a specification and programming notation with peculiar quasi-two-dimensional syntax items, called schemas, and is based on (naive) set theory and predicate calculus. It explicitly includes error handling. Remarkably, the primitive and defined notions, predicates, etc., are described in UNIX-like manual pages. A library of definitions is presented, called the mathematical tool-kit.

* J.J. van Amstel, J.A.A.M. Poirters, The Design of Data Structures and Algorithms, Prentice Hall/Academic Service, Schoonhoven, Holland, 1989, pp. 334.
Pre- and postconditions are used systematically to design and sometimes derive algorithms and data structures. Meanwhile the first principles of programming are explained, rapidly advancing to polymorphism, the eight-queens problem and heap sort. Provides interesting insights. For the sophisticated novice.

* Greg Nelson, A generalisation of Dijkstra's calculus, ACM TOPLAS, vol. 11, #4, Oct. 1989, pp. 517-561.
An almost tutorial explanation of axiomatic semantics from first principles, almost in normal language. Very readable, the right material for non-naive beginners who want to be advanced.

* Thomas Pressburger, Maximum sum subsegment derivation, vol. , pp. . 1988,
Actually an introduction to Squigol, with applications to the derivation of maximum sum subsegment algorithms.

* D. Riley, Structured programming, sixteen years later, J. Pascal, Ada & Mod2, vol. 7, #1, 1988, pp. 42-48.
Two case studies of invariant-driven programming plus some pep talk about its use.

* H. Ehrig, B. Mahr, Fundamentals of Algebraic Specification 1 -- Equations and initial semantics, EATCS monographs on Theoretical Computer Science, #6, Springer-Verlag, Berlin, 1985, pp. 321.
This is Theory with a capital T; requires heavy mathematical background. Gives a definition of a line editor its own mother would not recognize. The average computer scientist is probably better off reading part 2 first.

* Jacques Loeckx, Kurt Sieber, The Foundations of Program Verification, Teubner / Wiley, Stuttgart / Chichester, 1984, pp. 230.
Reasonably palatable. Covers program verification using Floyd's inductive assertions, Hoare calculus and Milner's LCF.

* J.J. van Amstel, Programmeren -- het ontwerpen van algoritmen (met Pascal), (In Dutch: Programming -- The Design of Algorithms (with Pascal)), Academic Service, Den Haag, 1984, pp. 164.
Fairly formal introduction to programming, consistently placing pre- and postconditions first and code second, in Dijkstra style. Most of the work is done on simple types and arrays; records are mentioned in passing, pointers not at all.

* Elaine Weyuker, Asserting test data adequacy through program inference, ACM TOPLAS, vol. 5, #4, Oct. 1983, pp. 641-656.
[ bibliography ]

* Jean Dominique Warnier, Logical Construction of Systems, Van Nostrand Reinhold, New York, 1981, pp. 177.
The methodology of Warnier's Logical Construction of Programs [1974] applied to systems and organizations rather than to output.

* D. Gries, The Science of Programming, Springer-Verlag, New York, 1981, pp. 357.
A tale from a different world. Uses a mixture of proposition calculus, predicate calculus and a natural deduction system. Sometimes trivial (?), sometimes very profound, always readable. Especially part III, the development of programs, is very interesting: it shows the construction of many algorithms, including a way to run (some) programs backwards. One small complaint: some of the cross-references are incorrect.

* N. Suzuki, Analysis of pointer rotation, in Seventh ACM Symposium on Principles of Programming Languages, Jan. 1980, Las Vegas, Nevada, pp. 1-11.
Author's abstract: Pointer manipulation is one of the trickiest operations in programming and is a major source of programming errors. A great deal of research effort has been dedicated to making programs with pointers more reliable. In this paper we will present pointer operations which reduce conceptual difficulties when writing pointer programs, and increase reliability of programs. We will analyze theoretically as well as empirically why these operations are more convenient. Finally, we will define the _safety_ of rotations, and show that it is decidable to check whether the rotation is safe and also show that it is a good measure of the correctness of programs. (That is, if the rotation is safe it is highly probable that the program is correct, and if it is unsafe it is highly probable that the program is incorrect.) Probably the most successful effort to eliminate difficulties associated with pointer manipulation is the invention of abstract data types and the use of implementation modules. Pointers are often used to implement complex data structures, which in turn represent abstract data types. We can textually localize relevant pointer manipulations using abstract data types. Thus, we can easily check correctness of pointer programs. This approach is successful when the structure of abstract data types is simple, or efficiency is not seriously required. However, programs like garbage collection, list marking and copying, and balanced tree manipulations have not been well specified by abstract types.

* Zohar Manna, Adi Shamir, The optimal approach to recursive programs, Commun. ACM, vol. 20, #11, Nov. 1979, pp. 824-831.
Discusses the merits of least, maximal and optimal fixed points. Fascinating account of five slightly different very simple recursive programs that have completely different fixed points.

* J. Darlington, R.M. Burstall, A system which automatically improves programs, Acta Inf., vol. 6, #1, March 1976, pp. 41-60.
Actually about improving functional-like programs: recursion removal, common subexpression elimination, in-lining, using assignments to overwrite list cells no longer in use (compile-time garbage collection).

* B. Baker, An algorithm for structuring programs, in Third ACM Symposium on Principles of Programming Languages, Jan. 1976, Atlanta, Georgia, pp. 113-126.
Author's abstract: Structured programming emphasizes programming language constructs such as _while_ loops, _until_ loops, and _if then else_ statements. Properly used, these constructs make occurrences of loops and branching of control obvious. They are preferable to _goto_ statements, which tend to obscure the flow of control [DDH, DIJ]. This paper describes an algorithm which transforms a flowgraph into a program written using _repeat (do forever)_ and _if then else_ statements. The goal of the algorithm is to produce readable programs, rather than to avoid the use of _goto_ statements entirely. _goto_ statements are generated when there is no better way to describe the flow of control. A number of techniques for eliminating _goto_ statements from programs have been previously published [AM, BJ, BS, COO, KF, KOS, PKT]. However, these techniques do not necessarily produce clear flow of control [KN]. Misuse of control constructs may mislead the reader into expecting patterns of control flow which do not exist in the algorithm. For example, these techniques may use a _repeat_ statement when the contained code cannot be executed more than once or add numerous control variables to avoid _goto_ statements. They also may avoid _goto_ statements by copying segments of code or creating subroutines. The former method results in longer programs and bugs may be introduced when all the identical segments must be modified. The latter method may result in subroutines which appear unnatural.

* F.L. Bauer, A philosophy of programming -- A course of three lectures, in Proceedings of the International Summer School on Language Hierarchies and Interfaces, Marktoberdorf, FRG, ed. by F.L. Bauer and K. Samelson, LNCS #46, Springer Verlag, pp. 194-241. 1975,
Lecture 1: A philosophy of programming requires a unique conceptual basis, founded on both scientific, economic and educational considerations. This basis is found in composability: the possibility to compose new concepts from a small number of general and powerful concepts by applying simple and well-defined composition rules. [Orthogonality is not yet mentioned. DG]
     Lecture 2: A "structured program" is a sequence of programs, each more detailed than the previous, spanning the gap from problem to machine code, such that each program can immediately be deduced from its predecessor. Various examples are given.
     Lecture 3: Programming and programming languages should treat software and hardware uniformly (especially in the context of parallel processing). Programming languages should have the same neutral stance as to being implemented in software or in hardware as Petri nets have. (The author sees the border line between software and hardware fade.)
     Summary: Scientifically: Complexity is the problem; a unique conceptual basis, well-structured design, and uniformity of software and hardware are the solution. Economically: Short-term focus on employee productivity, reliability and loyalty should make place for emphasis on employee independence and intellectual activity. Educationally: Start young; the old generation (1973!) is lost anyway.
     Note for the modern reader: the "certain nameless manufacturer" vilified in the paper is IBM, not Microsoft.

* Jean Dominique Warnier, Logical Construction of Programs, Stenfert Kroese B.V., Leiden, 1974, pp. 221.
The proposed method is very similar to JPM. The user is led to write a regular expression (in schematic form) for the results to be obtained. The notation allows concatenation, non-empty repetition of names (+), optionality of names (?), and alternatives specifying names (( name1 | name2 | ... )). The error-prone empty set is avoided by decomposing * into +?; a rhs is called "complex" if it contains more than one repetition, option or alternative. Code is then attached to each terminal name and to the begin and end of each nonterminal.
     To accommodate relationships between various parts of the output, attributes are attached to the names. These attributes can only be booleans and designate whether the data belongs to a given class. Efficient implementations are constructed using truth tables as conditions to output statements. Boolean variables are introduced as "links to truth tables".
     The text spells out every detail and summarizes everything in design rules.

* K.V. Nori, V. Srinivas, Structured programming -- An example using decision tables, J. Comp. Soc. India, vol. 4, #1, June 1973, pp. ??-??.
Proposes to accept the "decision table" as an addition to the structured-programming tool "selection statement".
     This somewhat non-linear paper supplies an analysis in terms of structured programming, of a simulation program for a TDC 16 computer. The instruction set of the machine was specified through decision tables; the program was written in FORTRAN, augmented by a program generator for decision tables.
     The paper first describes the simulator, which consists of a driver + code generated from the decision tables. Next the driver is analysed for conformance to structured programming principles (and found wanting). Then the use of a decision table as a heavy-duty alternative to the selection statement is argued, mainly because automatic methods of checking for ambiguities and completeness of decision tables exist. Finally, more of less unrelated to the main point, the authors report their experience that structured programming is no help in how to structure a program; it only supplies the tools, not the instructions of use. This problem is exacerbated by the fact that small mistakes in high-level structuring can have (and did have) large and unfavorable effects on the lower levels and on the final product.

* D.L. Parnas, On the criteria to be used in decomposing systems into modules, Commun. ACM, vol. 15, 1972, pp. 1053-1058.
Shows an example of a mediocre and a good decomposition and compares the criteria that led to both. Likewise for hierarchical ordering between modules.

* I.C. Pyle, R.C.F. McLatchie, B. Grandage, A Second-Order Bug with Delayed Effect, Softw., Pract. Exper., vol. 1, #3, 1971, pp. 231-233.
Horrible story of what occurs when you ignore zero. A badly designed instruction (MVC, which copies 1 character more than the length indication says, and so cannot copy 0 characters) had resulted in a kludgey implementation of empty strings in the input routine EDIT (the empty string was implemented as one space character, which was then further on interpreted as the empty string). When a macro processor was then added, it needed a real zero-length string, but it also needed the input routine. So a flag was added to the input routine, to tell it that it was used by the macro processor, and had to deliver real empty strings. Once this flag got out of sync, the input routine yielded a real empty string where a single space was requires, the MVC got a negative length -1, 256 characters were copied, essential data was overwritten, and the OS crashed.
     The problem was fixed by removing the "one space" idea, and checking before using MVC. (Packing the nasty MVC in a subroutine was probably too slow.)

* D. Parnas, Information distribution aspects of design methodology, in Inform. Processing 71, North-Holland, 1971, pp. 339-344.
Parnas' Principles:
     -- One must provide the intended user with all the information needed to use the module correctly and nothing more.
     -- One must provide the implementor with all the information needed to implement the module correctly and nothing more.
Does this have any bearing on the "representation module" question?