.EQ
delim $$
.EN
.P1 "Miscellaneous literature"
%A Noam Chomsky
%T Three models for the description of language
%J IEEE Trans. Inform. Theory
%V 2
%N 3
%D 1956
%P 113-124
%K done
.La
In an attempt to delineate the set of correct English sentences, the author
considers three mechanisms. Finite-state automata
.Xr "finite-state automaton"
are rejected on the
grounds that they cannot cope with arbitrary nesting. Phrase structure
.Xr "phrase structure grammar"
grammars are considered probably applicable but declared impractical due to
their problems in expressing context conditions. Most of these problems can
be solved if we augment PS grammars with
.Df "transformation rules" , "transformation rule"
which specify the rearrangement of parts of the derivation tree.
%T On certain formal properties of grammars
%A Noam Chomsky
%J Inform. Control
%V 2
%N
%P 137-167
%D 1959
%K done
.La
This article discusses what later became known as the Chomsky hierarchy.
.Xr "Chomsky hierarchy"
Chomsky defines type 1 grammars in the \(l"context-sensitive\(r"
.Xr "context-sensitive grammar"
way.
His motivation for this is that it permits the construction of a tree as a
structural description. Type 2 grammars exclude $epsilon$-rules,
.Xr "$epsilon$-rule"
so in Chomsky's system, type 2 grammars are a subset of type 1 grammars.
.Lp
Next, the so called
.Df "counter languages" "" "counter language"
are discussed.
A counter language is a language recognized by a finite automaton, extended
with a finite number of counters, each of which can assume infinitely many
values. $ L sub 1 = roman "{" a sup n b sup n | n > 0 roman "}" $ is a counter
language,
$ L sub 2 = roman "{" xy | x, y \(mo roman "{" a, b roman "}" sup "*",$
$y$ is the mirror image of $x roman "}"$
is not, so there are type 2 languages that are not counter languages. The
reverse is not investigated.
.Lp
The Chomsky Normal Form
.Xr "Chomsky Normal Form"
is introduced, but not under that
name, and a bit different: Chomsky calls a type 2 grammar @i[regular]
if production rules have the form $ A -> a $ or $ A -> BC $, with
$ B != C$,
and if
$ A -> alpha A beta $ and $ A -> gamma A eta $ then
$ alpha = gamma $ and $ beta = eta $. A grammar is
.Df "self-embedding"
if there is a derivation $ A "\*(>*" alpha A beta $ with
$ alpha != epsilon $ and $ beta != epsilon $.
The bulk of the paper is dedicated to the theorem that the extra power of
type 2 grammars over type 3 grammars lies in this self-embedding property.
%A J.W. Backus
%A F.L. Bauer
%A J. Green
%A C. Katz
%A J. McCarthy
%A P. Naur\ (Ed.)
%A A.J. Perlis
%A H. Rutishauser
%A K. Samelson
%A B. Vauquois
%A J.H. Wegstein
%A A. van\ Wijngaarden
%A M. Woodger
%T Report on the algorithmic language ALGOL 60
%J Commun. ACM
%V 3
%N 5
%D May 1960
%P 299-314
%K done
.Xr "Algol 60"
.La
First application of a BNF
.Xr "Backus-Naur Form"
grammar (for the definition of a programming
language). Revised report by the same authors: @i[Commun. ACM], vol. 6, no. 1,
p. 1-17, Jan 1963.
%A R.A. Brooker
%A I.R. MacCallum
%A D. Morris
%A J.S. Rohl
%T The compiler-compiler
%J Annual Review in Automatic Programming
%V 3
%N
%D 1960 ????
%P 229-275
%K done
.La
One of the first extensive descriptions of a compiler-compiler. Parsing is
done by a backtracking
.Xr "backtracking"
non-exhaustive top-down parser using a transduction-like grammar.
.Xr "transduction grammar"
This grammar is kept in an integrated form and modifications can be made to
it while parsing.
%A Robert W. Floyd
%T A descriptive language for symbol manipulation
%J J. ACM
%V 8
%N
%D Oct 1961
%P 579-584
%K done
.La
Original paper describing Floyd productions.
.Xr "Floyd production"
See Section 9.3.1.
%A Robert S. Ledley
%A James B. Wilson
%T Automatic-programming-language translation through syntactical analysis
%J Commun. ACM
%V 5
%N 3
%D March 1962
%P 145-155
%K done
.La
An English-to-Korean (!) translation system is described in detail, in
which parts of the Korean translation are stored in attributes
.Xr "attribute"
in the parse
tree, to be reordered and interspersed with Korean syntactic markers on
output. The parser is Irons' [CF 1961].
.Au Irons CF 1961
%A Melvin E. Conway
%T Design of a separable transition-diagram compiler
%J Commun. ACM
%V 6
%N 7
%D July 1963
%P 396-408
%K done
.La
.Xr "transition diagram"
The first to introduce @i[coroutines]
.Xr "coroutine"
and to apply them to structure a compiler. The parser is Irons' [CF 1961],
.Au Irons CF 1961
made deterministic by a No-Loop Condition and a No-Backup Condition.
It follows transition diagrams
.Xr "transition diagram"
rather than grammar rules.
%A Robert W. Floyd
%T The syntax of programming languages\-a survey
%J IEEE Trans. Electronic Comput.
%V EC-13
%N
%D 1964
%P 346-353
%K done
.La
.Xr "syntax"
Early analysis of the advantages of and problems with the use of grammars
for the specification of programming languages. Contains a bibliography of
almost 100 entries.
%A Jerome Feldman
%A David Gries
%T Translator writing systems
%J Commun. ACM
%V 11
%N 2
%D Feb. 1968
%P 77-113
%K done
.La
Grand summary of the work done on parsers (with semantic actions)
.Xr "semantic action"
before 1968.
%A D.J. Cohen
%A C.C. Gotlieb
%T A list structure form of grammars for syntactic analysis
%J Computing Surveys
%V 2
%N 1
%D 1970
%P 65-82
%K done
.La
CF rules are represented as linked lists of alternatives, which themselves are
linked lists of members. The trick is that both lists end in different null
pointers. This representation is very amenable to various backtracking
.Xr "backtracking"
and
non-backtracking top-down and bottom-up parsing methods (by interpreting the
grammar). Several practical parsers are given in flowchart form. An algorithm
is given to \(l"invert\(r" a grammar, i.e. the linked lists, to create a data
structure that will efficiently guide a bottom-up parser.
%A A. Birman
%A J.D. Ullman
%T Parsing algorithms with backtrack
%J Inform. Control
%V 23
%N 1
%D 1973
%P 1-34
%K done
.La
.Xr "backtracking"
Models classes of recursive descent parsers,
.Xr "recursive descent"
capable of recognizing all
deterministic context-free languages
.Xr "deterministic language"
and also some non-context-free
languages.
%A B.W. Kernighan
%A L.L. Cherry
%T A system for typesetting mathematics
%J Commun. ACM
%V 18
%N 3
%D March 1975
%P 151-157
%K done
.Xr "typesetting"
.La
A good example of the use of an ambiguous
.Xr "ambiguity"
grammar to specify the preferred analysis of special cases.
%A A.V. Aho
%A S.C. Johnson
%A J.D. Ullman
%T Deterministic parsing of ambiguous grammars
%J Commun. ACM
%V 18
%N 8
%D 1975
%P 441-452
%K done
.La
Demonstrates how LL and LR parsers can be constructed for certain classes of
ambiguous
.Xr "ambiguity"
grammars, using simple disambiguating rules,
.Xr "disambiguating rule"
such as operator-precedence.
.Xr "operator-precedence"
%A Jacques Cohen
%T Experience with a conversational parser generation system
%J Softw. Pract. Exper.
%V 5
%N
%P 169-180
%D 1975
%K done
.La
Realistic description of the construction of a professional interactive
parser generator.
.Xr "parser generator"
%A Jay Earley
%T Ambiguity and precedence in syntax description
%J Acta Inform.
%V 4
%N
%D 1975
%P 183-192
%K done
.La
.Xr "syntax"
.Xr "ambiguity"
Informal description of how to use precedence information for disambiguation.
%A Michael Marcotty
%A Henry F. Ledgard
%A Gregor V. Bochmann
%T A sampler of formal definitions
%J Computing Surveys
%V 8
%N 2
%D June 1976
%P 191-276
%K done
.La
Describes and compares four semantic definition methods: VW grammars,
production systems and the axiomatic approach, Vienna Definition Language, and
attribute
.Xr "attribute grammar"
grammars. No clear winner emerges.
%A R.M. Wharton
%T Resolution of ambiguity in parsing
%J Acta Inform.
%V 6
%N 4
%D 1976
%P 387-395
%K done
.La
.Xr "ambiguity"
It is proposed that ambiguity be resolved in a bottom-up parser by 1)
reducing upon a shift/reduce conflict, 2) reducing the shorter right-hand
side upon a reduce/reduce conflict and 3) reducing the textual first
right-hand side upon a reduce/reduce conflict with equal lengths. In a
top-down parser, criteria similar to 2) and 3) are applied to each LL(1)
conflict.
.Xr "LL(1) conflict"
%A R.B. Hunter
%A A.D. McGettrick
%A R. Patel
%T LL versus LR parsing with illustrations from Algol 68
%J ACM SIGPLAN Notices
%V 12
%N 6
%D June 1977
%P 49-53
%K done
.La
Syntax-improved LL(1)
.Xr "LL(1)"
(Foster [Transform 1968])
.Xr "transformations on grammars"
.Au Foster Transform 1968
and LR(1)
.Xr "LR(1)"
are equally unsuccessful in handling a CF version of the grammar of
Algol 68.
.Xr "Algol 68"
After hand adaptation
.Xr "modifying a grammar"
LL(1) has the advantage.
%A Niklaus Wirth
%T What can we do about the unnecessary diversity of notation for syntactic definitions?
%J Commun. ACM
%V 20
%N 11
%D Nov 1977
%P 822-823
%K done
.La
Introduces Wirth's notation for extended CF grammars,
.Xr "extended context-free grammar"
using @t[{...}] for
repetition, @t([...]) for optionality, @t[(...)] for grouping and @t["..."]
for quoting.
%A Jacques Cohen
%A Martin S. Roth
%T Analyses of deterministic parsing algorithms
%J Commun. ACM
%V 21
%N 6
%D June 1978
%P 448-458
%K done
.La
Gives methods to calculate the average parsing times and their standard
deviations from the input grammar, for several parsers. The resulting
formulae are finite series, and are sometimes given in closed form.
%A Kuo-Chung Tai
%T On the implementation of parsing tables
%J ACM SIGPLAN Notices
%V 14
%N 1
%D Jan 1979
%P 100-101
%K done
.La
How to implement parsing tables using hashing.
%A Peter Deussen
%T One abstract accepting algorithm for all kinds of parsers
%B Automata, languages and programming
%S Lecture Notes in Computer Science #71
%E Hermann A. Maurer
%I Springer-Verlag
%C Berlin
%D 1979
%P 203-217
%K done
.La
Parsing is viewed as an abstract search problem, for which a high-level
algorithm is given. The selection predicate involved is narrowed down to
give known linear parsing methods.
%A Robert Endre Tarjan
%A Andrew Chi-Chih Yao
%T Storing a sparse table
%J Commun. ACM
%V 22
%N 11
%D Nov. 1979
%P 606-611
%K done
.La
Two methods of storing sparse tables are presented and analysed:
trie structure and double displacement.
%A Hanan Samet
%T A coroutine approach to parsing
%J ACM Trans. Prog. Lang. Syst.
%V 2
%N 3
%D July 1980
%P 290-306
%K done
.La
Some inputs consist of interleaved chunks of text conforming to different
grammars. An example is programming text interrupted at unpredictable points
by macro-processor directives. This situation can be handled by having
separate parsers for each grammar, cooperating in coroutine fashion.
.Xr "coroutine"
%A Anton Nijholt
%T Parsing strategies: A concise survey
%B Mathematical Foundations of Computer Science
%S Lecture Notes in Computer Science #118
%E J. Gruska & M. Chytil
%I Springer-Verlag
%C Berlin
%D 1981
%P 103-120
%K done
.La
The context-free parser and language field is surveyed in terse prose.
Highly informative to the connoisseur.
%A Esko Ukkonen
%T Lower bounds on the size of deterministic parsers
%J J. Comput. Syst. Sci.
%V 26
%D 1983
%P 153-170
%K done
.Xr "deterministic parser"
.La
Worst-case lower bounds for the parser sizes are given for the various
classes of LL($k$)
.Xr "LL($k$)"
and LR($k$)
.Xr "LR($k$)"
parsers for $k = 0, 1$ and $k >= 2$. All LL($k$)
.Xr "LL($k$)"
lower bounds are polynomial, except the one for full LL($k>1$),
.Xr "full LL($k$)"
which is exponential; all LR($k$)
.Xr "LR($k$)"
bounds are exponential.
%A Fernando C.N. Pereira
%A David H.D. Warren
%T Parsing as deduction
%B Proceedings of the 21st Annual Meeting of the Association for Computational Linguistics
%C Cambridge, Mass.
%D 1983
%P 137-144
%K done
.La
The Prolog
.Xr "Prolog"
deduction mechanism is top-down depth-first. It can be exploited
to do parsing, using Definite Clause
.Xr "definite clause"
grammars. Parsing can be done more
efficiently with Earley's technique.
.Xr "Earley parser"
The corresponding Earley deduction
.Xr "Earley deduction"
mechanism is derived and analysed.
%A Anton Nijholt
%T Deterministic top-down and bottom-up parsing: historical notes and bibliographies
%I Mathematisch Centrum
%C Amsterdam
%D 1983
%P 118
%K done
.La
Over a 1000 references about LL($k$),
.Xr "LL($k$)"
LR($k$)
.Xr "LR($k$)"
and precedence parsing, with
short histories and surveys of the three methods.
%A Peter Dencker
%A Karl D\(:urre
%A Johannes Heuft
%T Optimization of parser tables for portable compilers
%J ACM Trans. Prog. Lang. Syst.
%V 6
%N 4
%D Oct 1984
%P 546-572
%K done
.La
Given an $n times m$ parser table, an $n times m$ bit table is used to
indicate which entries are error entries; this table is significantly
smaller than the original table and the remaining table is now sparse
(typically 90-98% don't-care entries).
.Xr "don't-care entry"
The remaining table is compressed
.Xr "table compression"
row-wise (column-wise) by setting up an interference graph in which each
node corresponds to a row (column) and in which there is an edge between any
two nodes the rows (columns) of which occupy an element in the same
position. A (pseudo-)optimal partitioning is found by a minimal
graph-colouring heuristic.
%A W.M. Waite
%A L.R. Carter
%T The cost of a generated parser
%J Softw. Pract. Exper.
%V 15
%N 3
%D 1985
%P 221-237
%K done
.La
Supports with measurements the common belief that compilers employing
generated parsers suffer significant performance degradation with respect to
recursive descent
.Xr "recursive descent"
compilers. Reasons: interpretation of parse tables
.Xr "parse table"
versus
direct execution, attribute
.Xr "attribute"
storage allocation and the mechanism to determine
which action(s) to perform. Then, a parser interface is proposed that
simplifies integration of the parser; implementation of this interface in
assembly language results in generated parsers that cost the same as
recursive descent
.Xr "recursive descent"
ones. The paper does not consider generated recursive
descent parsers.
%A Gerard D. Finn
%T Extended use of null productions in LR(1) parser applications
%J Commun. ACM
%V 28
%N 9
%D Sept 1985
%P 961-972
%K done
.Xr "LR(1)"
.La
Extensive account of how to use an LR parser for conversion purposes.
.Xr "document conversion"
Makes
a strong case for the use of parsers for conversion. Contains a good
introduction to parsing.
%A Robert Gerardy
%T Experimental comparison of some parsing methods
%J ACM SIGPLAN Notices
%V 22
%N 8
%D Aug 1987
%P 79-88
%K done
.La
Experimental time measurements for recursive descent,
.Xr "recursive descent"
operator-precedence
.Xr "operator-precedence"
and SLR(1)
.Xr "SLR(1)"
parsing show a ratio of 1 : 4 : 10, in that order. All parsers were
written in Pascal
.Xr "Pascal"
and parsed a mini-Pascal language.
%A Michael Share
%T Resolving ambiguities in the parsing of translation grammars
%J ACM SIGPLAN Notices
%V 23
%N 8
%D Aug 1988
%P 103-109
%K done
.La
The UNIX LALR parser generator
.Xr "parser generator"
@i[yacc]
.Xr "@i[yacc]"
is extended to accept LALR conflicts
and to produce a parser that requests an interactive user decision when a
conflict occurs while parsing. The system is used in document conversion.
.Xr "document conversion"
%A Josef Grosch
%T Generators for high-speed front-ends
%B Compiler Compilers and High-Speed Compilation
%S Lecture Notes in Computer Science #371
%E D. Hammer
%I Springer-Verlag
%C Berlin
%D 1989
%P 81-92
%K done
.La
A coherent system of lexical scanner generator, LALR(1)
.Xr "LALR(1)"
parser generator
.Xr "parser generator"
and LL(1)
.Xr "LL(1)"
parser generator,
.Xr "parser generator"
using a uniform input syntax, is presented.
The scanner beats UNIX @i[lex]
.Xr "@i[lex]"
by a factor of 5, the LALR parser beats @i[yacc]
.Xr "@i[yacc]"
by a factor of 2.
%A Vance E. Waddle
%T Production trees: a compact representation of parsed programs
%J ACM Trans. Prog. Lang. Syst.
%V 12
%N 1
%D Jan 1990
%P 61-83
%K done
.Xr "production tree"
.La
Redundant items are removed from a traditional parse tree through a number
of techniques: unit productions are contracted, terminals symbols are
removed, structure information in links is replaced by a rule number, etc.
Each node in the resulting parse tree corresponds to one right-hand side and
contains the rule number and a list of pointer to the nodes for the
non-terminals in the right-hand side.
A space saving
.Xr "space requirements"
of a factor 20 is
achieved on the average. A grammar form that corresponds more closely to
this representation is defined.
%A Frank G. Pagan
%T Comparative efficiency of general and residual parsers
%J ACM SIGPLAN Notices
%V 25
%N 4
%D April 1990
%P 59-68
%K done
.La
The switch from table-driven
.Xr "table-driven"
LL(1)
.Xr "LL(1)"
to recursive descent
.Xr "recursive descent"
or from table-driven
.Xr "table-driven"
LR(1)
.Xr "LR(1)"
to recursive ascent
.Xr "recursive ascent"
is viewed as an example of
@i[partial computation].
.Xr "partial computation"
Underlying theory and examples are given.
%A Boris Burshteyn
%T On the modification of the formal grammar at parse time
%J ACM SIGPLAN Notices
%V 25
%N 5
%D May 1990
%P 117-123
%K done
.La
Modifying the grammar under control of and utilizing information obtained by
the parsing process is proposed as a means of handling context-sensitivity.
For example, the recognition of the declaration of an array @t[aaa] could
cause the introduction of a new grammar rule
$ expr -> "@t[aaa]" "@t{[}" expr "@t{]}" $,
(generated from a template), thus allowing forms like @t{aaa[3]} to be used.
With a correction in the same journal, Vol. 25, No. 8, p 6.
.EQ
delim $$
.EN
.P1 "Unrestricted PS and CS grammars"
This section also covers some other non-context-free grammar types, excluding
VW (two-level) grammars and affix grammars,
.Xr "affix grammar"
for which see Section 13.3.
%A Alfred V. Aho
%T Indexed grammars \- an extension of context-free grammars
%J J. ACM
%V 15
%N 4
%D Oct 1968
%P 647-671
%K done
.La
In an
.Df "indexed grammar" ,
each non-terminal $N$ in a sentential form is followed by zero or more
\(l"indices\(r", which govern which of the alternatives for $N$ are allowed
for this occurrence of $N$. The indices propagate according to specific
rules. $L(CF) ~ "\(sb" ~ L(Indexed) ~ "\(sb" ~ L(CS)$.
%A William A. Woods
%T Context-sensitive parsing
%J Commun. ACM
%V 13
%N 7
%D July 1970
%P 437-445
%X all references are inaccessible!
%K done
.La
The paper presents a canonical form for context-sensitive ($epsilon$-free)
.Xr "$epsilon$-free"
derivation trees. Parsing is then performed by an exhaustive guided search
over these canonical forms exclusively. This guarantees that each possible
parsing will be found exactly once.
%A Jacques Loeckx
%T The parsing for general phrase-structure grammars
%J Inform. Control
%V 16
%N
%D 1970
%P 443-464
%K done
.La
The paper sketches two non-deterministic parsers for PS grammars, one
top-down, which tries to mimic the production process and one bottom-up, which
tries to find a handle.
The instructions of the parsers are derived by listing
all possible transitions in the grammar and weeding out by hand those that
cannot occur. Trial-and-error methods are discussed to resolve the
non-determinism, but no instruction selection mechanisms are given. Very
readable, nice pictures.
%A Daniel A. Walters
%T Deterministic context-sensitive languages, Parts I & II
%J Inform. Control
%V 17
%N 1
%P 14-61
%D 1970
%K done
.Xr "deterministic language"
.La
The definition of LR($k$)
.Xr "LR($k$)"
grammars is extended to context-sensitive
grammars.
.Xr "context-sensitive grammar"
Emphasis is more on theoretical properties than on obtaining a parser.
%A Z.J. Ghandour
%T Formal systems and analysis of context-sensitive languages
%J Computer J.
%V 15
%N 3
%D 1972
%P 229-237
%K done
.La
Ghandour describes a formal production system that is in some ways similar to
but far from identical to a two-level grammar. A hierarchy of 4 classes is
defined on these systems, with Class 1 \(ip Class 2 \(sp Class 3 \(sp Class 4,
Class 1 \(ip CS and Class 4 \(eq CF. A parsing algorithm for Class 1 systems
is given in fairly great detail.
%A N.A. Khabbaz
%T Multipass precedence analysis
%J Acta Inform.
%V 4
%N
%D 1974
%P 77-85
%K done
.La
A hierarchy of CS grammars is given that can be parsed using multipass
precedence parsing. The parser and the table construction algorithm are
given explicitly.
%A Eberhard Bertsch
%T Two thoughts on fast recognition of indexed languages
%J Inform. Control
%V 29
%N
%D 1975
%P 381-384
%K done
.Xr "indexed grammar"
.La
Proves that parsing with (tree-)unambiguous indexed grammars is possible in
$O ( n sup 2 )$ steps.
.Xr "quadratic time dependency"
%A Robert W. Sebesta
%A Neil D. Jones
%T Parsers for indexed grammars
%J Intern. J. Comput. Inform. Sci.
%V 7
%N 4
%D Dec 1978
%P 344-359
%K done
.Xr "indexed grammar"
.La
Very good explanation of indexed grammars. Three classes of indexed grammars
are defined, corresponding to strong-LL,
.Xr "strong-LL($k$)"
.Xr "strong-LL(1)"
LL and LR, respectively. It is
shown that the flag sets generated by indexed grammars are regular sets.
%A C.J.M. Turnbull
%A E.S. Lee
%T Generalized deterministic left to right parsing
%J Acta Inform.
%V 12
%D 1979
%P 187-207
%K done
.La
The LR($k$)
.Xr "LR($k$)"
parsing machine is modified so that the reduce instruction
removes the reduced right-hand side from the stack and pushes an arbitrary
number of symbols back into the input stream. (The traditional LR($k$)
.Xr "LR($k$)"
reduce is a special case of this: it pushes the recognized non-terminal back
into the input and immediately shifts it. The technique is similar to that
put forward by D\(:om\(:olki [CF 1968].)
.Au D\\\\(:om\\\\(:olki CF 1968
The new machine is capable of parsing efficiently a subset of the Type 0
grammars,
.Df "DRP grammars" "" "DRP grammar"
(for Deterministic Regular Parsable).
.Xr "Deterministic Regular Parsable"
Membership of this subset is undecidable, but an approximate algorithm can be
constructed by extending the LR($k$)
.Xr "LR($k$)"
parse table
.Xr "parse table"
algorithm. Details are not given, but can be found in
a technical report by the first author.
%A Kurt Mehlhorn
%T Parsing macro grammars top down
%J Inform. Control
%V 40
%N 2
%D 1979
%P 123-143
%K done
.La
.Df "Macro grammars" "" "macro grammar"
are defined as follows. The non-terminals in a CF grammar are given
parameters, as if they were routines in a programming language. The values
of these parameters are strings of terminals and non-terminals (the latter
with the proper number of parameters). A parameter can be passed on,
possibly concatenated with some terminals and non-terminals, or can be made
part of the sentential form. An algorithm to construct a recursive-descent
.Xr "recursive descent"
parser for a macro grammar (if possible) is given.
%A G. Barth
%T Fast recognition of context-sensitive structures
%J Computing
%V 22
%N
%D 1979
%P 243-256
%K done
.La
A
.Df "recording grammar"
(an RG)
.Xs "RG" "recording grammar"
is a CF grammar in which each (numbered) production rule belongs to
one of three classes: normal, recording and directed. During production,
normal rules behave normally and a recording rule records its own occurrence
by appending its number to a string called the $pi$-element. When production
leaves a \(l"recording\(r" stage, the entire $pi$-element is added to a set
called the $OMEGA$-component, which collects all contexts created so far. When
production enters a \(l"directed\(r" stage, an element (a context) is
retrieved from the $OMEGA$-component, transferred through a mapping $I$ and
used to direct the choice of production rules until the element is exhausted.
The expressive power of RGs is equal to that of Type 0 grammars.
.Lp
An LL($k$)
.Xr "LL($k$)"
version of RGs can be defined, based on LL($k$)-ness of the
underlying CF grammar, plus a few simple restrictions on the mapping $I$; the
resulting property is called
.Df "RLL($k$)" .
.Lp
For parsing, an LL($k$)
.Xr "LL($k$)"
parse is performed; during \(l"normal\(r" parsing,
nothing special is done, during \(l"recording\(r" parsing the rule numbers
are recorded and subsequently added to the $OMEGA$-component; during
\(l"directed\(r" parsing, which is actually \(l"checking\(r" parsing, the
rule numbers are checked for consistency with the $OMEGA$-component, using a
simple finite transducer. The parser (+ checker) works in linear time.
.Xr "linear time dependency"
.Lp
It is not clear how convenient RLL($k$) RGs are; neither of the two examples
provided to demonstrate the power of RGs is RLL($k$).
%A G.Sh. Vol'dman
%T A parsing algorithm for context-sensitive grammars
%J Program. Comput. Softw.
%V 7
%N
%D 1981
%P 302-307
%K done
.La
.Xr "context-sensitive grammar"
Extends Earley's algorithm
.Xr "Earley parser"
first to length-increasing phrase structure
.Xr "phrase structure grammar"
grammars and then to non-decreasing PS grammars (= CS grammars).
The resulting algorithm has exponential time requirements
.Xr "exponential time dependency"
but is often much better.
%A Lawrence A. Harris
%T SLR(1) and LALR(1) parsing for unrestricted grammars
%J Acta Inform.
%V 24
%N
%D 1987
%P 191-209
%K done
.Xr "LALR(1)"
.La
The notion of an LR(0) item can easily be defined for unrestricted grammars:
\(l"For each item $ lambda -> mu sub 1 "\*(Dt" X mu sub 2 $ there is a
transition on $X$ to the item $ lambda -> mu sub 1 X "\*(Dt" mu sub 2 $
and an $epsilon$-transition
.Xr "$epsilon$-transition"
to any item $ X delta -> "\*(Dt" eta $\(r",
for any symbol $X$. These items are grouped by subset construction
.Xr "subset construction"
into the
usual states, called here
.Df "preliminary states" , "preliminary state"
since some of them may actually be ineffective. A GOTO
.Xr "GOTO-action"
function is also defined.
If we can, for a given grammar, calculate the FOLLOW sets
.Xr "FOLLOW set"
of all left-hand
sides (undecidable in the general case), we can apply a variant of the usual
SLR(1)
.Xr "SLR(1)"
test and construct a parsing table for a parser as described by
Turnbull and Lee [PSCS 1979].
.Au Turnbull PSCS 1979
.Au Lee PSCS 1979
.Lp
To obtain the LALR(1)
.Xr "LALR(1)"
definition, a look-ahead grammar system is defined that
will, for each item in each state, generate the (unrestricted) language of all
continuations after that item.
If we can, for a given grammar, calculate the FIRST
.Xr "FIRST set"
sets of all these
languages (undecidable in the general case), we can apply a variant of the
usual LALR(1)
.Xr "LALR(1)"
test and construct a parsing table for a similar parser. If one
of the above constructions succeeds, a linear-time
.Xr "linear time dependency"
parser is obtained.
.Lp
The author gives many hand-calculated examples and explores error detection
.Xr "error detection"
properties. More general definitions of SLR(1)
.Xr "SLR(1)"
and LALR(1)
.Xr "LALR(1)"
are possible,
encompassing larger sets of grammars, at the cost of a still further reduced
chance of decidability.
.EQ
delim $$
.EN
.P1 "Van Wijngaarden grammars and affix grammars"
Note that van Wijngaarden grammars and two-level grammars are synonyms; affix
.Xr "affix grammar"
grammars are different.
%A M. Sintzoff
%T Existence of a van Wijngaarden syntax for every recursively enumerable set
%J Annales de la Soci\('et\('e Scientifique de Bruxelles
%V 81
%N II
%D 1967
%P 115-118
%K done
.La
.Xr "syntax"
A relatively simple proof of the theorem that for every semi-Thue system we
can construct a VW grammar that produces the same set.
%A A. van\ Wijngaarden
%A B.J. Mailloux
%A J.E.L. Peck
%A C.H.A. Koster
%A M. Sintzoff
%A C.H. Lindsey
%A L.G.L.T. Meertens
%A R.G. Fisker
%T Revised report on the algorithmic language ALGOL 68
%J Acta Inform.
%V 5
%N
%D 1975
%P 1-236
%K done
.La
VW grammars found their widest application to date in the definition of
ALGOL 68.
.Xr "Algol 68"
Section 1.1.3 of the ALGOL 68 Revised Report contains a very
carefully worded description of the two-level mechanism.
The report contains many interesting applications.
%A C.H.A. Koster
%T Affix grammars
%B ALGOL 68 Implementation
%E J.E.L. Peck
%I North-Holland Publ. Co.
%C Amsterdam
%D 1971
%P 95-109
%K done
.Xr "Algol 68"
.La
Context conditions are expressed inside a context-free grammar by introducing
@i[affixes],
.Xr "affix"
which are divided in @i[derived]
.Xr "derived affix"
and @i[inherited]
.Xr "inherited affix"
and which have to fulfill user-defined @i[primitive predicates].
.Xr "primitive predicate"
If the affix grammar
.Xr "affix grammar"
is @i[well-formed],
.Xr "well-formed affix grammar"
a parser for it can be constructed.
%A David Crowe
%T Generating parsers for affix grammars
%J Commun. ACM
%V 15
%N 8
%D Aug 1972
%P 728-734
%K done
.La
A bounded-context
.Xr "bounded-context"
(Floyd productions)
.Xr "Floyd production"
parser is extended with affix
.Xr "affix"
manipulation.
%A A. van\ Wijngaarden
%T The generative power of two-level grammars
%B Automata, Languages and Programming
%S Lecture Notes in Computer Science #14
%E J. Loeckx
%I Springer-Verlag
%C Berlin
%D 1974
%P 9-16
%K done
.La
.Xr "generative grammar"
The generative power of VW grammars is illustrated by creating a VW grammar
that simulates a Turing machine; the VW grammar uses only one metanotion,
.Xr "metanotion"
thus proving that one metanotion suffices.
%A Sheila A. Greibach
%T Some restrictions on $W$-grammars
%J Intern. J. Comput. Inform. Sci.
%V 3
%N 4
%D 1974
%P 289-327
%K done
.La
The consequences of two easily checkable restrictions on the form of the rules
in a VW grammar are explored in great detail and are found to be surprising.
Although this highly technical paper is not directly concerned with parsing,
it is very instructive in that it shows methods of exploring the field.
%A C.H.A. Koster
%T Two-level grammars
%B Compiler Construction: An Advanced Course
%S Lecture Notes in Computer Science #21
%E F.L. Bauer & J. Eickel
%I Springer-Verlag
%C Berlin
%D 1974
%P 146-156
%K done
.La
Easy introduction to two-level (VW) grammars, starting from one-level VW
grammars. Examples of practical handling of context in a VW grammar.
%A P. Deussen
%T A decidability criterion for van Wijngaarden grammars
%J Acta Inform.
%V 5
%N
%D 1975
%P 353-375
%K done
.La
The criterion, which is given in detail, can be paraphrased very roughly as
follows: the language generated by a VW grammar is decidable if (but not
only if) there are no $epsilon$-rules and either there are no free metanotions
.Xr "metanotion"
(occurring on the right-hand side only) or there are no dummy metanotions
.Xr "metanotion"
(occurring on the left-hand side only).
%A David A. Watt
%T The parsing problem for affix grammars
%J Acta Inform.
%V 8
%N
%D 1977
%P 1-20
%K done
.La
A technique is described to convert an affix
.Xr "affix grammar"
grammar into a CF grammar called
a
.Df "head grammar"
which contains a special kind of non-terminal,
.Df "copy-symbols" . "copy-symbol"
For the head grammar they are $epsilon$-rules,
.Xr "$epsilon$-rule"
but for the affix grammar
.Xr "affix grammar"
they effect affix manipulations on the affix stack. Primitive predicates
.Xr "primitive predicate"
are also
$epsilon$-rules,
.Xr "$epsilon$-rule"
but do checks on the affixes. Parsing is done by any CF
parser, preferably LR(1).
.Xr "LR(1)"
The affixes are not used to control the parsing but
only to declare an input string erroneous: for the technique to work, the
affix grammar
.Xr "affix grammar"
must in effect be an attribute grammar.
.Xr "attribute grammar"
%A J. Craig Cleaveland
%A Robert C. Uzgalis
%T Grammars for Programming Languages
%I Elsevier
%C New York
%D 1977
%P 154
%K done
.La
In spite of its title, the book is a highly readable explanation of two-level
grammars, also known as van Wijngaarden grammars or VW grammars. After an
introductory treatment of formal languages, the Chomsky hierarchy
.Xr "Chomsky hierarchy"
and parse
trees, it is shown to what extent CF languages can be used to define a
programming language. These are shown to fail to define a language completely
and the inadequacy of CS grammars is demonstrated. VW grammars are then
explained and the remainder of the book consists of increasingly complex and
impressive examples of what a VW grammar can do. These examples include
keeping a name list, doing type checking and handling block structure in the
definition of a programming language. Recommended reading.
%A R. Meersman
%A G. Rozenberg
%T Two-level meta-controlled substitution grammars
%J Acta Inform.
%V 10
%N
%D 1978
%P 323-339
%K done
.La
The authors prove that the uniform substitution rule is essential for
two-level grammars; without it, they would just generate the CF languages.
This highly technical paper examines a number of variants of the mechanisms
involved.
%A Lutz Wegner
%T Bracketed two-level grammars \- a decidable and practical approach to language definition
%B Automata, languages and programming
%S Lecture Notes in Computer Science #71
%E Hermann A. Maurer
%I Springer-Verlag
%C Berlin
%D 1979
%P 668-682
%K done
.Xr "bracketed grammar"
.La
The metanotions
.Xr "metanotion"
of a VW grammar are partitioned into two blocks,
\(l"synthesized\(r"
.Xr "synthesized metanotion"
and \(l"derived\(r";
.Xr "derived metanotion"
they are separated in a hyperrule by special markers, \(l"brackets\(r", and
are treated more or less as attributes.
.Xr "attribute"
Under reasonable conditions parsability can be obtained. The
thus restricted VW grammars are very readable.
%A Lutz Michael Wegner
%T On parsing two-level grammars
%J Acta Inform.
%V 14
%N
%D 1980
%P 175-193
%X almost all references are inaccessible
%K done
.La
The article starts by defining a number of properties a VW grammar may
exhibit; among these are \(l"left(right) bound\(r",
.Xr "left bound"
.Xr "right bound"
\(l"free of hidden empty notions\(r",
.Xr "free of hidden empty notions"
\(l"uniquely assignable\(r"
.Xr "uniquely assignable"
and \(l"locally unambiguous\(r".
.Xr "locally unambiguous"
Most of these properties are undecidable, but sub-optimal tests can be
devised. For each VW grammar
$ G sub VW $, a CF
.Df "skeleton grammar"
$ G sub SK $ is defined by considering all hypernotions
.Xr "hypernotion"
in the VW grammar as
non-terminals of $ G sub SK $ and adding the cross-references of the VW
grammar as production rules to $ G sub SK $. $ G sub SK $ generates a
superset of $ G sub VW $. The cross-reference problem for VW grammars is
unsolvable but again any sub-optimal algorithm (or manual intervention) will
do. Parsing is now done by parsing with $ G sub SK $ and then reconstructing
and testing the metanotions.
.Xr "metanotion"
A long list of conditions necessary for the
above to work are given; these conditions are in terms of the properties
defined at the beginning.
%A Dick Grune
%T How to produce all sentences from a two-level grammar
%J Inform. Process. Lett.
%V 19
%N
%D Nov 1984
%P 181-185
%K done
.La
All terminal productions are derived systematically in breadth-first order.
.Xr "breadth-first production"
The author identifies pitfalls in this process and describes remedies. A
parser is used to identify the hyperrules involved in a given sentential form.
This parser is a general CF recursive descent
.Xr "recursive descent"
parser to which a consistency check
for the metanotions
.Xr "metanotion"
has been added; it is not described in detail.
%A A.J. Fisher
%T Practical LL(1)-based parsing of van Wijngaarden grammars
%J Acta Inform.
%V 21
%N
%D 1985
%P 559-584
%K done
.La
Fisher's parser is based on the idea that the input string was generated using
only a small, finite, part of the infinite
.Df "strict grammar"
that can be
generated from the VW grammar. The parser tries to reconstruct this part of
the strict grammar
.Xr "strict grammar"
on the fly while parsing the input. The actual parsing is
done by a top-down interpretative LL(1)
.Xr "LL(1)"
parser, called the
.Df "terminal parser" .
It is driven by a fragment of the strict grammar
.Xr "strict grammar"
and any time the definition
of a non-terminal is found missing by the terminal parser,
.Xr "terminal parser"
the latter asks
another module, the
.Df "strict syntax generator" ,
to try to construct it from the VW grammar. For this technique to work, the VW
grammar has to satisfy three conditions: the defining CF grammar of each
hyperrule is unambiguous, there are no free metanotions,
.Xr "metanotion"
and the skeleton grammar
.Xr "skeleton grammar"
(as defined by Wegner [VW 1980])
.Au Wegner VW 1980
is LL(1).
.Xr "LL(1)"
The parser system is
organized as a set of concurrent processes (written in occam),
.Xr "occam"
with both parsers, all hyperrule matchers and several other modules as separate
processes. The author claims that \(l"this concurrent organization ... is
strictly a property of the algorithm, not of the implementation\(r", but a
sequential, albeit slower, implementation seems quite possible. The paper
gives heuristics for the automatic generation of the cross-reference needed
for the skeleton grammar;
.Xr "skeleton grammar"
gives a method to handle
.Df "general hyperrules" , "general hyperrule"
hyperrules that fit all hypernotions,
.Xr "hypernotion"
efficiently; and pays much attention to the use of angle brackets
.Xr "angle brackets"
in VW grammars.
%A Jacques Cohen
%A Timothy J. Hickey
%T Parsing and compiling using Prolog
%J ACM Trans. Prog. Lang. Syst.
%V 9
%N 2
%D April 1987
%P 125-164
%K done
.La
See same paper [CF 1987].
.EQ
delim $$
.EN
.P1 "General context-free parsers"
%A E.T. Irons
%T A syntax-directed compiler for ALGOL 60
%J Commun. ACM
%V 4
%N 1
%D Jan 1961
%P 51-55
%K done
.Xr "Algol 60"
.La
The first to describe a full parser. It is essentially a full backtracking
.Xr "backtracking"
recursive descent
.Xr "recursive descent"
left-corner parser.
.Xr "left-corner parsing"
The published program is corrected in a Letter to the Editor by
B.H. Mayoh, @i[Commun. ACM], vol. 4, no. 6, p. 284, June 1961.
%A Itiroo Sakai
%T Syntax in universal translation
%B Proceedings 1961 International Conference on Machine Translation of Languages and Applied Language Analysis
%I Her Majesty's Stationery Office
%C London
%D 1962
%P 593-608
%K done
.La
.Xr "syntax"
Using a formalism that seems equivalent to a CF grammar in Chomsky Normal
.Xr "Chomsky Normal Form"
Form and a parser that is essentially a CYK parser,
.Xr "CYK parser"
the author describes a
translation mechanism in which the source language sentence is transformed
into a binary tree (by the CYK parser). Each production rule carries a mark
telling if the order of the two constituents should be reversed in the target
language. The target language sentence is then produced by following this new
order and by replacing words. A simple Japanese-to-English example is
provided.
%A E.T. Irons
%T The structure and use of the syntax directed compiler
%J Annual Review in Automatic Programming
%V 3
%N
%D 1962
%P 207-228
%K done
.La
Extended version of Irons [CF 1961].
.Au Irons CF 1961
%A E.T. Irons
%T An error-correcting parse algorithm
%J Commun. ACM
%V 6
%N 11
%D Nov 1963
%P 669-673
%K done
.La
Contrary to the title, the most interesting part of this paper is the parser
it describes, which is essentially Earley's algorithm
.Xr "Earley parser"
without look-ahead. The
invention of this parser was prompted by the author's dissatisfaction with
the error detection
.Xr "error detection"
properties of backtracking
.Xr "backtracking"
parsers. This one does not
backtrack, it keeps all possible parsings in parallel instead. When the set of
possible parsings becomes exhausted due to an error in the input, the last
non-empty set is analysed for continuations, both terminal and non-terminal,
including all successors and alternatives. Then input symbols are discarded
until one is found that is a terminal in the continuation set or the beginning
of a non-terminal in that set. Symbols are then inserted to bridge any
possible gap thus created, and parsing continues. Note that this is
essentially R\(:ohrich's algorithm. The author points out applications for
this parser as a pattern matcher.
%A Sheila A. Greibach
%T Formal parsing systems
%J Commun. ACM
%V 7
%N 8
%D Aug 1964
%P 499-504
%K done
.La
\(l"A formal parsing system $ G = ( V , mu , T, R ) $ consists of two finite
disjoint vocabularies, $V$ and $T$, a many-to-many map, $ mu $, from $V$
onto $T$, and a recursive set $R$ of strings in $T$ called syntactic
sentence classes\(r" (verbatim). This is intended to solve an additional
problem in parsing, which occurs often in natural languages: a symbol found
in the input does not always uniquely identify a terminal symbol from the
language (for instance, @i[will] (verb) versus @i[will] (noun)). On this
level, the language is given as the entire set $R$, but in practice it is
given through a \(l"context-free phrase structure
.Xr "phrase structure grammar"
generator\(r", i.e. a
grammar. To allow parsing, this grammar is brought into what is now known as
Greibach Normal Form:
.Xr "Greibach Normal Form"
each rule is of the form
$ Z -> aY sub 1 ... Y sub m $. Now a
.Df "directed production analyser"
is defined which consists of an unlimited set of pushdown stores
and an input stream, the entries of which are sets of terminal symbols,
derived through $ mu $ from the lexical symbols. For each consecutive input
entry, the machine scans the stores for a top non-terminal $Z$ for which there
is a rule $ Z -> a Y sub 1 ... Y sub m $ with $a$ in the input set. A new
store is filled with a copy of the old store and the top $Z$ is replaced by
$ Y sub 1 ... Y sub m $; if the resulting store is longer than the input, it
is discarded. Stores will contain non-terminals only. For each store that is
empty when the input is exhausted, a parsing has been found. This is in effect
non-deterministic top-down parsing with a one-symbol look-ahead. This is
probably the first description of a parser that will work for any CF grammar.
.Lp
A large part of the paper is dedicated to undoing the damage done by
converting to Greibach Normal Form.
.Xr "Greibach Normal Form"
%A T.V. Griffiths
%A S.R. Petrick
%T On the relative efficiencies of context-free grammar recognizers
%J Commun. ACM
%V 8
%N 5
%D May 1965
%P 289-300
%K done
.La
To achieve a unified view of the parsing techniques known at that time, the
authors define a non-deterministic two-stack machine whose only type of
instruction is the replacement of two given strings on the tops of both
stacks by two other strings; the machine is started with the input on one
stack and the start symbol on the other and it \(l"recognizes\(r" the input
if both stacks get empty simultaneously. For each parsing technique
considered, a simple mapping from the grammar to the machine instructions is
given; the techniques covered are top-down (called top-down), left-corner
.Xr "left-corner parsing"
(called bottom-up) and bottom-up (called direct-substitution). Next,
look-ahead techniques are incorporated to attempt to make the machine
deterministic. The authors identify left-recursion
.Xr "left-recursion"
as a trouble-spot. All
grammars are required to be $epsilon$-free.
.Xr "$epsilon$-free"
The procedures for the three parsing methods are given in a Letter to the
Editor, @i[Commun. ACM], vol. 8, no. 10, p. 594, Oct 1965.
%A Susumu Kuno
%T The predictive analyzer and a path elimination technique
%J Commun. ACM
%V 8
%N 7
%D July 1965
%P 453-462
%K done
.La
The author extends his
.Df "predictive analyser"
(in modern terminology: an exhaustive top-down parser for grammars in Greibach
Normal Form)
.Xr "Greibach Normal Form"
(see Kuno and Oettinger, reprinted by Grosz, Sparck\ Jones and
Webber [NatLang 1986])
.Au Grosz NatLang 1986
.Au "Sparck\\\\ Jones" NatLang 1986
.Au Webber NatLang 1986
with a table of well-formed substrings.
.Xr "well-formed substring table"
Through ingenious bit manipulation
the table is made to fit in a small memory. Time gains are considerable (as
expected).
%A Susumu Kuno
%T An augmented predicative analyzer for context-free languages \- its relative efficiency
%J Commun. ACM
%V 9
%N 11
%D Nov 1966
%P 810-823
%K done
.La
Converting a CF grammar to Greibach Normal Form
.Xr "Greibach Normal Form"
often greatly distorts its
structure. To keep track of the structure, the right-hand side of each rule in
the CF grammar is prefixed with a marker, a special non-terminal which
produces $epsilon$. A conversion algorithm is described that results in rules
of the form $ A -> M sup "+" a B C ... $, where $ M sup "+" $ is a
non-empty sequence of markers.
The Kuno predictive analyser
.Xr "predictive analyser"
(see Kuno [CF 1965])
.Au Kuno CF 1965
is extended with a second stack on which the marker parts of the rules are
kept. When a parsing is found, the marker stack allows easy reconstruction
of the parsing according to the original CF grammar. The parser is compared to
two other parsers, using a large number of criteria.
%A D.H. Younger
%T Recognition of context-free languages in time $ n sup 3 $
%J Inform. Control
%V 10
%N 2
%D Feb 1967
%P 189-208
%K done
.Xr "cubic time dependency"
.La
A Boolean recognition matrix $R$ is constructed in a bottom-up fashion, in
which $ R[i, l, p] $ indicates that the segment of the input string starting
at position $i$ with length $l$ is a production of non-terminal $p$.
This matrix can be filled in $ O ( n sup 3 ) $ actions, where $n$ is the
length of the input string. If $ R[0, n, 0] $ is set, the whole string is a
production of non-terminal 0. Many of the bits in the matrix can never be
used in any actual parsing; these can be removed by doing a top-down scan
starting from $ R[0, n, 0] $ and removing all bits not reached this way. If
the matrix contains integer rather than Boolean elements, it is easy to fill
it with the number of ways a given segment can be produced by a given
non-terminal; this yields the ambiguity rate.
.Xr "ambiguity rate"
%A S.H. Unger
%T A global parser for context-free phrase structure grammars
%J Commun. ACM
%V 11
%N 4
%D April 1968
%P 240-247
%K done
.La
The Unger parser
.Xr "Unger parser"
(as described in Section 4.1) is extended with a series of tests to avoid
partitionings that could never lead to success.
For instance, a section of the input is never matched against a non-terminal
if it begins with a token no production of the non-terminal could begin with.
Several such tests are described and ways are given to statically derive the
necessary information (FIRST sets,
.Xr "FIRST set"
LAST sets,
.Xr "LAST set"
EXCLUDE sets)
.Xr "EXCLUDE set"
from the grammar.
Although none of this changes the exponential character of the algorithm, the
tests do result in a considerable speed-up in practice.
(There is an essential correction to one of the flowcharts given
in \fICommun. ACM\fR, vol. 11, no. 6, p. 427, June 1968.)
%A B.A. Chartres
%A J.J. Florentin
%T A universal syntax-directed top-down analyzer
%J J. ACM
%V 15
%N 3
%D July 1968
%P 447-464
%K done
.La
The non-deterministic two-stack top-down parser of
Griffiths and Petrick [CF 1965]
.Au Griffiths CF 1965
.Au Petrick CF 1965
is extended with a third stack and a status variable. One stack holds
the rest of the input, the second holds the prediction that should match that
input and the third holds a tracing of the outline of the production tree
.Xr "production tree"
constructed so far; when input and prediction stack
.Xr "prediction stack"
are empty, the third stack holds
the completed parse tree. This three-stack mechanism can be run both forward
and backward; the status variable keeps track of the direction. By properly
reacting to the values on the tops of the stacks and the direction variable,
it is possible to make the mechanism perform a full backtracking
.Xr "backtracking"
exhaustive search.
Much work is spent on handling left recursion and $epsilon$-rules.
.Xr "$epsilon$-rule"
%A B\('alint D\(:om\(:olki
%T A universal compiler system based on production rules
%J BIT
%V 8
%N 4
%P 262-275
%D Oct 1968
%K done
.La
The heart of the compiler system described here is a production system
consisting of an ordered set of production rules, which are the inverses
of the grammar rules; note that the notions \(l"left-hand side\(r" (lhs) and
\(l"right-hand side\(r" (rhs) are reversed from their normal meanings in this
abstract. The system attempts to derive the start symbol, by
always applying the first applicable production rule (first in two respects:
from the left in the string processed, and in the ordered set of production
rules). This resolves shift/reduce conflicts in favour of
reduce, and reduce/reduce conflicts by length and by the order of
the production rules. When a reduction is found, the lhs of the reducing rule
is offered for semantic processing and the rhs is pushed back into the input
stream, to be reread. Since the length of the rhs is not restricted, the
method can handle non-CF grammars.
.Lp
The so-called \(l"Syntactic Filter\(r" uses a bitvector
.Xr "bitvector"
technique to determine if, and if so which, production
rule is applicable: for every symbol $i$ in the alphabet,
.Xr "alphabet"
there is a bitvector $B[i]$, with one bit for each of the positions in each
lhs; this bit set to 1 if this position contains symbol $i$.
There is also a bitvector $U$ marking the first symbol of each lhs, and a
bitvector $V$ marking the last symbol of each lhs. Now, a stack of bitvectors
$Q sub t$ is maintained, with $Q sub 0~=~0$ and
$Q sub t~=~( ( Q sub t-1 ">>" 1 )$ \(OR $U )$ \(AN $B [i sub t ]$, where
$i sub t$ is the $t$-th input symbol. $Q sub t$ contains the answer to the
question whether the last $j$ symbols received are the first $j$ symbols of
some lhs, for any lhs and $j$. A 1 \(l"walks\(r" through an lhs part of the
$Q$ vector, as this lhs is recognized.
An occurrence of a lhs is found if $Q sup t$ \(AN $V~!=~0$.
After doing a replacement, $t$ is set back $k$ places, where $k$ is the length
of the applied lhs, so a stack of $Q sub t$-s must be maintained. If some
$Q sub t~=~0$, we have an error.
An interesting implementation of the D\(:om\(:olki algorithm is given by Hext
and Roberts [CF 1970].
.Au Hext CF 1970
.Au "Roberts, P.S." CF 1970
%A T. Kasami
%A K. Torii
%T A syntax-analysis procedure for unambiguous context-free grammars
%J J. ACM
%V 16
%N 3
%D July 1969
%P 423-431
%K done
.La
A rather complicated presentation of a variant of the CYK algorithm,
.Xr "CYK parser"
including
the derivation of a $ O ( n sup 2 log ~ n ) $
.Xr "quadratic time dependency"
time bound for unambiguous
Chomsky Normal Form grammars.
.Xr "Chomsky Normal Form"
%A J. Earley
%T An efficient context-free parsing algorithm
%J Commun. ACM
%V 13
%N 2
%D Feb 1970
%P 94-102
%K done
.La
This famous paper gives an informal description of the Earley algorithm.
.Xr "Earley parser"
The
algorithm is compared both theoretically and experimentally with some
general search techniques and with the CYK algorithm.
.Xr "CYK parser"
It easily beats the
general search techniques. Although the CYK algorithm has the same worst-case
efficiency as Earley's,
.Xr "Earley parser"
it requires $ O ( n sup 3 ) $
.Xr "cubic time dependency"
on any grammar, whereas
Earley's requires $ O ( n sup 2 ) $
.Xr "quadratic time dependency"
on unambiguous grammars and $ O ( n ) $ on
bounded-state grammars. The algorithm is easily modified to handle Extended
CF grammars.
.Xr "extended context-free grammar"
(Also reprinted by Grosz, Sparck\ Jones and Webber [NatLang 1986])
.Au Grosz NatLang 1986
.Au "Sparck\\\\ Jones" NatLang 1986
.Au Webber NatLang 1986
%A J.B. Hext
%A P.S. Roberts
%T Syntax analysis by Dom\(:olki's algorithm
%J Computer J.
%V 13
%N 3
%D Aug 1970
%P 263-271
%K done
.La
D\(:om\(:olki's algorithm is a bottom-up parser in which the item sets are
represented as bitvectors.
.Xr "bitvector"
A backtracking version is presented which can
handle any grammar. To reduce the need for backtracking
.Xr "backtracking"
a 1-character
look-ahead is introduced and an algorithm for determining the actions on the
look-ahead is given.
Since the internal state is recalculated by vector operations for each input
character, the parse table
.Xr "parse table"
is much smaller than usual and its entries are one bit each.
This, and the fact that it is all bitvector operations, makes the algorithm
suitable for implementation in hardware.
%A Bernard Lang
%T Parallel non-deterministic bottom-up parsing
%J ACM SIGPLAN Notices
%V 6
%N 12
%D Dec 1971
%P 56-57
%K done
.La
The full breadth-first search
.Xr "breadth-first search"
of an Earley parser
.Xr "Earley parser"
is limited through the
use of weak-precedence
.Xr "weak precedence"
relations,
.Xr "precedence relation"
in so far as these are unique. Abstract of a
larger technical report.
%A F.P. Kaminger
%T Generation, recognition and parsing of context-free languages by means of recursive graphs
%J Computing
%V 11
%N 1
%D 1973
%P 87-96
%K done
.La
Formal description of the use of recursive graphs instead of CF grammars to
describe, generate and parse context-free languages.
%A Bernard Lang
%T Deterministic techniques for efficient non-deterministic parsers
%B Automata, languages and programming
%S Lecture Notes in Computer Science #14
%E J. Loeckx
%I Springer-Verlag
%C Berlin
%D 1974
%P 255-269
%K done
.Xr "deterministic parser"
.La
Explores the theoretical properties of doing breadth-first search
.Xr "breadth-first search"
to resolve
the non-determinism in a bottom-up automaton with conflicts. See Tomita
.Xr "Tomita parser"
[CF 1986]
.Au Tomita CF 1986
for a practical realization.
%A M. Bouckaert
%A A. Pirotte
%A M. Snelling
%T Efficient parsing algorithms for general context-free parsers
%J Inform. Sci.
%V 8
%N 1
%D Jan 1975
%P 1-26
%K done
.La
The authors observe that the Predictor in an Earley parser
.Xr "Earley parser"
will often predict
items that start with symbols that can never match the first few symbols of
the present input; such items will never bear fruit and could as well be left
out. To accomplish this, they extend the $k$-symbol reduction look-ahead
.Xr "reduction look-ahead"
Earley parser
.Xr "Earley parser"
with a $t$-symbol prediction mechanism; this results in very
general $ M sub k sup t $ parsing machines, the properties of which are
studied, in much formal detail. Three important conclusions can be drawn.
Values of $k$ or $t$ larger than one lose much more on processing than
they will normally gain on better prediction and sharper reduction; such
parsers are better only for asymptotically long input strings. The Earley
parser without look-ahead ($ M sub 0 sup 0 $) performs better than the parser
with 1 symbol look-ahead; Earley's recommendation to use always 1 symbol
look-ahead is unsound. The best parser is $ M sub 0 sup 1 $; i.e. use a one
symbol predictive look-ahead and no reduction look-ahead.
.Xr "reduction look-ahead"
%A L. Valiant
%T General context-free recognition in less than cubic time
%J J. Comput. Syst. Sci.
%V 10
%D 1975
%P 308-315
%K done
.Xr "Valiant parser"
.Xr "cubic time dependency"
.La
.Xr "time requirements"
Reduces CYK
.Xr "CYK parser"
to bit matrix multiplication and then applies
.Fs Strassen's
Volker Strassen, \(l"Gaussian elimination is not optimal\(r",
@i[Numerische Mathematik], vol. 13, p. 354-356, 1969. Shows how to multiply
two 2\(mu2 matrices using 7 multiplications rather than 8 and extends the
principle to larger matrices.
.Fe
algorithm.
%A C.H.A. Koster
%T A technique for parsing ambiguous grammars
%B GI-4. Jahrestagung
%S Lecture Notes in Computer Science #26
%E D. Siefkes
%I Springer-Verlag
%C New York
%D 1975
%P 233-246
%K done
.La
.Xr "ambiguity"
Three recursive-descent
.Xr "recursive descent"
parsing techniques are described: no backtrack,
.Xr "backtracking"
partial backtrack and full backtrack.
%A B. Sheil
%T Observations on context-free parsing
%J Statistical Methods in Linguistics
%D 1976
%P 71-109
%K done
.La
The author proves that any CF backtracking
.Xr "backtracking"
parser will have polynomial
.Xr "polynomial time dependency"
time requirements if provided with a @i[well-formed substring table]
.Xr "well-formed substring table"
(WFST), which holds the well-formed substrings recognized so far and which
is consulted before each attempt to recognize a substring.
The time requirements
.Xr "time requirements"
of the parser is $ O ( n sup { c + 1 } ) $ where $c$ is the maximum number
of non-terminals in any right-hand side. A
.Df "2-form grammar"
is a CF grammar such that no production rule in the grammar has more than two
non-terminals on the right-hand side; nearly all practical grammars are
already 2-form. 2-form grammars, of which Chomsky Normal Form
.Xr "Chomsky Normal Form"
grammars are a subset, can be parsed in $ O ( n sup 3 ) $.
.Xr "cubic time dependency"
An algorithm for a dividing
top-down parser with a WFST is supplied. Required reading for anybody who
wants to write or use a general CF grammar. Many practical hints and opinions
(controversial and otherwise) are given.
%A Susan L. Graham
%A Michael A. Harrison
%T Parsing of general context-free languages
%B Advances in Computing, Vol. 14
%I Academic Press
%C New York
%D 1976
%P 77-185
%K done
.La
The 109 page article describes three algorithms in a more or less unified
manner: CYK,
.Xr "CYK parser"
Earley's
.Xr "Earley parser"
and Valiant's.
.Xr "Valiant parser"
The main body of the paper is concerned with bounds for time and space
requirements.
.Xr "space requirements"
.Xr "time requirements"
Sharper bounds than usual are derived for special grammars, for instance,
for linear grammars.
.Xr "linear grammar"
%A Jaroslav Kr\('al
%T A top-down no backtracking parsing of general context-free languages
%B Mathematical Foundations of Computer Science
%S Lecture Notes in Computer Science #53
%E J. Gruska
%I Springer-Verlag
%C Berlin
%D 1977
%P 333-341
%K done
.La
.Xr "backtracking"
The states of a top-down breadth-first general CF parser are combined
whenever possible, resulting in an Earley-like parser
.Xr "Earley parser"
without the bottom-up
component.
%A G.K. Manacher
%T An improved version of the Cocke-Younger-Kasami algorithm
%J Comput. Lang. (Elmsford, NY)
%V 3
%D 1978
%P 127-133
%K done
.La
This paper discusses some modifications to the CYK algorithm
.Xr "CYK parser"
that make it
more efficient. First, the \(l"length induction\(r" iteration of CYK is
replaced by an iteration that combines sets of non-terminals that derive
strings of length $ j - 1 $ with sets of non-terminals that derive strings
of length $ k <= j - 1 $.
Then, the recognition table
.Xr "recognition table"
of CYK
.Xr "CYK parser"
is replaced by three tables of lists,
where each table has a list for each non-terminal/number pair.
The first table maps a non-terminal/length pair to a list of positions,
indicating where substrings of this length start that are derived by this
non-terminal.
The second table maps a non-terminal/position pair to a list of lengths,
indicating the lengths of the substrings starting at this position that are
derived by this non-terminal.
The third table maps a non-terminal/position pair to a list of lengths,
indicating the lengths of the substrings ending at this position that are
derived by this non-terminal.
With these modifications a time bound $ O ( s ( n ) ) $ is
established for unambiguous grammars, where $ s ( n ) $ is the number of
triplets $ ( A , i , j ) $ for which the non-terminal $A$ derives the
substring starting at position $i$, with length $j$.
This is at worst $ O ( n sup 2 ) $.
.Xr "quadratic time dependency"
%A W.L. Ruzzo
%T On the complexity of general context-free language parsing and recognition
%B Automata, Languages and Programming
%S Lecture Notes in Computer Science #71
%E Hermann A. Maurer
%I Springer-Verlag
%C Berlin
%D 1979
%P 489-497
%K done
.La
This is an extended abstract, summarizing some time requirement
.Xr "time requirements"
results:
it is shown that parsing strings of length $n$ is only $ O ( log ~ n ) $
harder than just recognizing them. Also, the time to multiply
$ { sqrt n } * { sqrt n } $ Boolean matrices is a lower bound on the time
needed to recognize all prefixes of a string, and this, in turn, is a lower
bound on the time needed to generate a convenient representation of all
parses of a string (basically the CYK recognition table,
.Xr "CYK parser"
.Xr "recognition table"
but reduced so that a non-terminal only is present in the recognition table
.Xr "recognition table"
if it can be used to derive the sentence).
%A S.L. Graham
%A M.A. Harrison
%A W.L. Ruzzo
%T An improved context-free recognizer
%J ACM Trans. Prog. Lang. Syst.
%V 2
%N 3
%D July 1980
%P 415-462
%K done
.La
The well-formed substring table
.Xr "well-formed substring table"
of the CYK parser
.Xr "CYK parser"
is filled with dotted items
as in an LR parser rather than with the usual non-terminals. This allows the
number of objects in each table entry to be reduced considerably. Special
operators are defined to handle $epsilon$- and unit rules.
.Lp
The authors do not employ any look-ahead in their parser; they claim that
constructing the recognition triangle
.Xr "recognition table"
is pretty fast already and that probably
more time will be spent in enumerating and analysing the resulting parse
trees. They speed up the latter process by removing all useless entries before
starting to generate parse trees. To this end, a top-down sweep through the
triangle is performed, similar to the scheme to find all parse trees, which
just marks all reachable entries without following up any of them twice. The
non-marked entries are then removed (p. 443).
.Lp
Much attention is paid to efficient implementation, using ingenious data
structures.
%A A. Bossi
%A N. Cocco
%A L. Colussi
%T A divide-and-conquer approach to general context-free parsing
%J Inform. Process. Lett.
%V 16
%N 4
%D May 1983
%P 203-208
%K done
.La
The proposed parsing method yields for a string $T$ two sets: a set of
partial parse trees that may be incomplete at their left edge (which then
coincides with the left end of $T$), called $L$, and a similar right-edge set
called $R$. To parse a string, it is cut in two pieces, each is parsed and
the $R$ set of the left-hand piece is combined with the $L$ set of the
right-hand piece.
%A Masaru Tomita
%T Efficient parsing for natural language
%I Kluwer Academic Publishers
%C Boston
%D 1986
%P 201
%K done
.La
Tomita
.Xr "Tomita parser"
describes an efficient parsing algorithm to be used in a
\(l"natural-language setting\(r": input strings of some tens of words and
considerable but not pathological ambiguity.
.Xr "ambiguity"
The algorithm is essentially LR,
starting parallel parses when an ambiguity is found in the LR-table.
.Xr "parse table"
Full examples are given of handling ambiguities, lexical elements with
multiple meanings and unknown lexical elements.
.Lp
The algorithm is compared extensively to Earley's algorithm
.Xr "Earley parser"
by measurement and
it is found to be consistently five to ten times faster than the latter, in
the domain for which it is intended. Earley's algorithm is better in
pathological cases; Tomita's fails on unbounded ambiguity.
.Xr "ambiguity"
No time bounds are
given explicitly, but graphs show a behaviour better than $ O ( n sup 3 ) $.
.Xr "cubic time dependency"
Bouckaert's algorithm (Bouckaert, Pirotte and Snelling [CF 1975])
.Au Bouckaert CF 1975
.Au Pirotte CF 1975
.Au Snelling CF 1975
is shown to be between Earley's
.Xr "Earley parser"
and Tomita's in speed.
.Lp
MacLisp programs of the various algorithms are given and the application in
the Nishida and Doshita Machine Translation System is described.
%A Eiichi Tanaka
%A Mitsuru Ikeda
%A Kimio Ezure
%T Direct parsing
%J Patt. Recog.
%V 19
%N 4
%D 1986
%P 315-323
%K done
.La
Variants of Unger's
.Xr "Unger parser"
and Earley's parser
.Xr "Earley parser"
are compared in a chromosome recognition
.Xr "chromosome recognition"
situation.
The possibility of stopping the Unger parser after the first parsing has been
found is exploited.
%A Jacques Cohen
%A Timothy J. Hickey
%T Parsing and compiling using Prolog
%J ACM Trans. Prog. Lang. Syst.
%V 9
%N 2
%D April 1987
%P 125-164
%K done
.Xr "Prolog"
.La
Several methods are given to convert grammar rules into Prolog clauses.
.Xr "Prolog clause"
In the bottom-up method, a rule $ E -> E + T $ corresponds to a clause
$ reduce([n(t),t(+),n(e)|X], [n(e)|X]) $ where the parameters represent the
stack before and after the reduction. In the top-down method, a rule
$ T prime -> * F T prime $ corresponds to a clause
$ rule(n(tprime), [t(*),n(f),n(tprime)]). $ A recursive descent
.Xr "recursive descent"
parser is obtained by representing a rule $ S -> a S b $ by the clause
$ s(ASB) $ $:-$ $ append(A,SB,ASB), append(S,B,SB), a(A), s(S), b(B). $ which
attempts to cut the input list $ASB$ into three pieces $A$, $S$ and $B$, which
can each be recognized as an $a$, an $s$ and a $b$, respectively. A fourth
type of parser results if ranges in the input list are used as parameters:
$ s(X1,X4) $ $:-$ $ link(X1,a,X2), s(X2,X3), link(X3,b,X4) $ in which
$ link(P,x,Q) $ describes that the input contains the token $x$ between
positions $P$ and $Q$. For each of these methods, ways are given to limit
non-determinism and backtracking,
.Xr "backtracking"
resulting among others in LL(1)
.Xr "LL(1)"
parsers.
.Lp
By supplying additional parameters to clauses, context conditions can be
constructed and carried around, much as in a VW grammar (although this term is
not used).
It should be noted that the resulting Prolog
.Xr "Prolog"
programs are actually
not parsers at all: they are just logic systems that connect input strings to
parsings. Consequently they can be driven both ways: supply a string and it
will produce the parsing; supply a parsing and it will produce the string;
supply nothing and it will produce all strings with their parsings in the
language.
.Lp
As a separate topic, it is shown that Prolog
.Xr "Prolog"
is an effective language to do
grammar manipulation in: calculation of FIRST
.Xr "FIRST set"
and FOLLOW sets,
.Xr "FOLLOW set"
etc.
As an equally unrelated topic, examples of code generation in Prolog
.Xr "Prolog"
are shown.
%A Masaru Tomita
%T An efficient augmented-context-free parsing algorithm
%J Am. J. Comput. Linguist.
%V 13
%N 1-2
%D Jan-June 1987
%P 31-46
%K done
.La
Tomita's parser
.Xr "Tomita parser"
[CF 1986]
.Au Tomita CF 1986
is extended with Boolean functions for the non-terminals that decide if a
proposed reduce is applicable given the context. A method for deriving these
functions in Lisp from more abstract specifications is given.
.EQ
delim $$
.EN
.P1 "LL parsing"
%A R. Kurki-Suonio
%T On top-to-bottom recognition and left recursion
%J Commun. ACM
%V 9
%N 7
%D July 1966
%P 527-528
%K done
.La
Gives a good account of Greibach's algorithm for the removal of
left-recursion
.Xr "left-recursion"
from a grammar. The resulting distortion of the parsing
process is countered by leaving ($epsilon$-producing) markers in the grammar
at the original ends of the right-hand sides in a left-recursive rule. This
2-page paper also gives an algorithm for removing $epsilon$-rules.
.Xr "$epsilon$-rule"
Again, these leave markers behind, which can interfere with the markers from
a possibly subsequent removal of left-recursion.
.Xr "left-recursion"
Rules for solving this interference are given.
%A K. \*(vCulik\ II
%T Contribution to deterministic top-down analysis of context-free languages
%J Kybernetica
%V 5
%N 4
%P 422-431
%D 1968
%K done
.La
This paper introduces
.Df LL(f) "" "LL($f$)"
grammars where $f$ is a function mapping strings of terminals to an
arbitrary range, always uniquely determining a
right-hand side. $f$ is called a
@i[distinctive function].
%A P.M. Lewis\ II
%A R.E. Stearns
%T Syntax-directed transduction
%J J. ACM
%V 15
%N 3
%D 1968
%P 465-488
%K done
.La
.Xr "transduction grammar"
Although this article is about transduction, it is often given as
a reference for LL($k$),
.Xr "LL($k$)"
because it is one of the first articles discussing
the LL($k$)
.Xr "LL($k$)"
property, and it has an appendix on the recognition of LL($k$)
languages.
%A D. Wood
%T The theory of left factored languages, Part I
%J Computer J.
%V 12
%N 4
%D 1969
%P 349-356
%K done
.La
A description of a variant of LL(1)
.Xr "LL(1)"
grammars and parsing.
%A R. Kurki-Suonio
%T Notes on top-down languages
%J BIT
%V 9
%N
%D 1969
%P 225-238
%K done
.La
Gives several variants of the LL($k$)
.Xr "LL($k$)"
condition. Also demonstrates the
existence of an LL($k$) language which is not LL($k - 1$).
%A D. Wood
%T The theory of left factored languages, Part II
%J Computer J.
%V 13
%N 1
%D 1970
%P 55-62
%K done
.La
More results about LL(1)
.Xr "LL(1)"
and LL($k$)
.Xr "LL($k$)"
grammars, including a recursive-descent
.Xr "recursive descent"
parser in pseudo-Algol 60.
.Xr "Algol 60"
%A D.J. Rosenkrantz
%A R.E. Stearns
%T Properties of deterministic top-down grammars
%J Inform. Control
%V 17
%N
%D 1970
%P 226-256
%K done
.La
Many formal properties of LL($k$)
.Xr "LL($k$)"
grammars are derived and tests for LL($k$) and strong-LL($k$)
.Xr "strong-LL($k$)"
are given.
%A Donald E. Knuth
%T Top-down syntax analysis
%J Acta Inform.
%V 1
%N
%D 1971
%P 79-110
%K done
.La
A
@i[Parsing Machine]
(PM) is defined, which is effectively a set of mutually
recursive Boolean functions which absorb input if they succeed and absorb
nothing if they fail. Properties of the languages accepted by PMs are
examined. This leads to CF grammars, dependency graphs, the null string
problem, back-up, LL($k$),
.Xr "LL($k$)"
follow-function, LL(1),
.Xr "LL(1)"
s-languages
and a comparison of top-down versus bottom-up parsing. The author is one of
the few scientists who provide insight in their thinking process.
%A Paul W. Abrahams
%T A syntax-directed parser for recalcitrant grammars
%J Intern. J. Comput. Math.
%V A3
%N
%D 1972
%P 105-115
%K done
.La
LL(1)
.Xr "LL(1)"
parsing with conflict resolvers,
.Xr "conflict resolver"
called
@i[oracles].
%A M. Griffiths
%T LL(1) grammars and analyzers
%B Compiler Construction: an advanced course
%S Lecture Notes in Computer Science #21
%E F.L. Bauer & J. Eickel
%I Springer-Verlag
%C New York
%D 1974
%P 57-84
%K done
.La
A discussion of the LL(1)
.Xr "LL(1)"
property, including a decision algorithm and the
production of an analyser in the form of executable text. These lecture notes
also discuss some grammar transformations,
.Xr "transformations on grammars"
including elimination of left-recursion,
.Xr "eliminating left recursion"
factorization, and substitution.
Semantic insertions (or hooks for semantic actions)
.Xr "semantic action"
are also given some attention.
%A T. Komor
%T A note on left-factored languages
%J Computer J.
%V 17
%N 3
%D 1974
%P 242-244
%K done
.La
Points out an error in a paper by Wood on left-factored languages [LL 1970],
.Au Wood LL 1970
and suggests an extension to Fosters SID [Transform 1968]
.Xr "transformations on grammars"
.Au Foster Transform 1968
involving $epsilon$-rules.
.Xr "$epsilon$-rule"
%A S. Jarzabek
%A T. Krawczyk
%T LL-regular grammars
%J Inform. Process. Lett.
%V 4
%N 2
%D 1975
%P 31-37
%K done
.La
Introduces LL-regular
.Xr "LL-regular"
(LLR)
.Xs "LLR" "LL-regular"
grammars: for every rule
$A -> alpha sub 1 | ... | alpha sub n$,
a partition ($R sub 1 , ... , R sub n$) of disjoint regular sets must be
given such that the rest of the input sentence is a member of exactly one of
these sets. A parser can then be constructed by creating finite-state
.Xr "finite-state automaton"
automata for these sets, and letting these finite state automata
.Xr "finite-state automaton"
determine the
next prediction.
%A A. Nijholt
%T On the parsing of LL-regular grammars
%B Mathematical Foundations of Computer Science
%S Lecture Notes in Computer Science #45
%E A. Mazurkiewicz
%I Springer-Verlag
%C Berlin
%D 1976
%P 446-452
%K done
.La
Derives a parsing algorithm for LL-regular
.Xr "LL-regular"
grammars with a regular pre-scan from
right to left that leaves markers, and a subsequent scan which consists of
an LL(1)-like
.Xr "LL(1)"
parser.
%A D. Wood
%T A bibliography of top-down parsing
%J ACM SIGPLAN Notices
%V 13
%N 2
%D Feb 1978
%P 71-76
%K done
.La
Contains some 90 literature references up to 1978 on deterministic top-down
parsing and related issues.
%A J. Lewi
%A K. de\ Vlaminck
%A J. Huens
%A M. Huybrechts
%T The ELL(1) parser generator and the error-recovery mechanism
%J Acta Inform.
%V 10
%P 209-228
%D 1978
%K done
.La
See same paper [ErrHandl 1978].
%A V.W. Setzer
%T Non-recursive top-down syntax analysis
%J Softw. Pract. Exper.
%V 9
%N 1
%D 1979
%P 237-245
%K done
.La
Compares recursive and non-recursive implementations of table-driven
.Xr "table-driven"
top-down parsers. The introduction of actions is facilitated by implementing
the driver and the tables as a loop over a case statement (on the states)
over case statements (on the input token).
%A Stephan Heilbrunner
%T On the definition of ELR($k$) and ELL($k$) grammars
%J Acta Inform.
%V 11
%N
%D 1979
%P 169-176
%K done
.La
Comparison and analysis of various definitions of extended LL($k$)
.Xr "extended LL($k$)"
and extended LR($k$),
.Xr "extended LR($k$)"
based on the transformations involved.
%A D.R. Milton
%A L.W. Kirchhoff
%A B.R. Rowland
%T An ALL(1) compiler generator
%J ACM SIGPLAN Notices
%V 14
%N 8
%D Aug 1979
%P 152-157
%K done
.La
Presents an LL(1)
.Xr "LL(1)"
parser generator
.Xr "parser generator"
and attribute
.Xr "attribute"
evaluator which allows LL(1)
conflicts to be solved by examining attribute values;
the generated parsers use the error correction
.Xr "error correction"
algorithm of Fischer, Milton
and Quiring [ErrHandl 1980].
.Au Fischer ErrHandl 1980
.Au Milton ErrHandl 1980
.Au Quiring ErrHandl 1980
%A D.A. Poplawski
%T On LL-regular grammars
%J J. Comput. Syst. Sci.
%V 18
%N
%D 1979
%P 218-227
%K done
.La
Presents proof that, given a regular partition, it is decidable whether a
grammar is LL-regular
.Xr "LL-regular"
with respect to this partition; it is
undecidable whether or not such a regular partition exists.
The paper then discusses a two-pass parser; the first pass works from right
to left, marking each terminal with an indication of the partition that the
rest of the sentence belongs to. The second pass then uses these indications
for its predictions.
%A V.N. Glushkova
%T Lexical analysis of LL($k$) languages
%J Program. Comput. Softw.
%V 5
%N
%D 1979
%P 166-172
%K done
.La
Examines the reduction of LL($k$)
.Xr "LL($k$)"
grammars to simple-LL(1)
.Xr "simple LL(1)"
grammars by combining terminal symbols into new terminal symbols.
%A J. Cohen
%A R. Sitver
%A D. Auty
%T Evaluating and improving recursive descent parsers
%J IEEE Trans. Softw. Eng.
%V SE-5
%N 5
%D Sept 1979
%P 472-480
%K done
.La
Derives formulas which express the execution time of systematically
generated recursive descent parsers,
.Xr "recursive descent"
and uses these formulas to estimate
the gain of various optimizations, such as the elimination of some
routine calls and merging of common code.
%A S. Sippu
%A E. Soisalon-Soininen
%T On constructing LL($k$) parsers
%B Automata, Languages and Programming
%S Lecture Notes in Computer Science #71
%E H.A. Maurer
%I Springer-Verlag
%C Berlin
%D 1979
%P 585-595
%K done
.La
Presents a method for constructing canonical LL($k$)
.Xr "LL($k$)"
parsers that can be
regarded as the dual to the LR($k$)
.Xr "LR($k$)"
technique of items and viable prefixes.
In the LL($k$)
.Xr "LL($k$)"
method we have LL($k$) items
.Xr "LL($k$) item"
and viable suffixes.
.Xr "viable suffix"
Like in the LR case, the LL($k$) method also has LA($p$)LL($k$) and SLL($k$)
.Xr "SLL(1) (Simple LL(1))"
variants; the SLL($k$) variant coincides with the strong-LL($k$)
.Xr "strong-LL($k$)"
grammars. Note that, although the S of SLL stands for Simple, this is not
the same Simple LL as the simple LL discussed in chapter 8.
%A A. Nijholt
%T LL-regular grammars
%J Intern. J. Comput. Math.
%V A8
%N
%D 1980
%P 303-318
%K done
.La
This paper discusses strong-LL-regular grammars, which are a subset of the
LL-regular
.Xr "LL-regular"
grammars, exactly as the strong-LL($k$)
.Xr "strong-LL($k$)"
grammars are a subset
of the LL($k$)
.Xr "LL($k$)"
grammars, and derives some properties.
%A Seppo Sippu
%A Eljas Soisalon-Soininen
%T On LL($k$) parsing
%J Inform. Control
%V 53
%N 3
%D June 1982
%P 141-164
%K done
.Xr "LL($k$)"
.La
Theoretical background to Sippu and Soisalon-Soininen [LL 1979].
.Au Sippu LL 1979
.Au Soisalon-Soininen LL 1979
%A K. John Gough
%T A new method of generating LL(1) look-ahead sets
%J ACM SIGPLAN Notices
%V 20
%N 6
%D June 1985
%P 16-19
%K done
.La
Presents an efficient method for computing the FIRST
.Xr "FIRST set"
and FOLLOW sets,
.Xr "FOLLOW set"
using \(l"begun-by\(r", \(l"precedes\(r", and
\(l"ends\(r" relations.
%A Thomas J. Sager
%T A technique for creating small fast compiler front ends
%J ACM SIGPLAN Notices
%V 20
%N 10
%D Oct 1985
%P 87-94
%K done
.La
Presents a predictive parser that has its tables compacted
.Xr "table compression"
.Xs "table compaction" "table compression"
through the use of a minimal perfect hash function, thus making them very
small.
An example is given for the Pascal
.Xr "Pascal"
language.
%A Barry Dwyer
%T Improving Gough's LL(1) look-ahead generator
%J ACM SIGPLAN Notices
%V 20
%N 11
%D Nov 1985
%P 27-29
%K done
.La
Refer to Gough [LL 1985].
.Au Gough LL 1985
Improves on Gough's algorithm by not computing those FIRST
.Xr "FIRST set"
and FOLLOW sets
.Xr "FOLLOW set"
that are not needed for the LL(1)
.Xr "LL(1)"
parser generation.
%A David R. Hanson
%T Compact recursive-descent parsing of expressions
%J Softw. Pract. Exper.
%V 15
%N 12
%D Dec 1985
%P 1205-1212
%K done
.La
Discusses recursive descent
.Xr "recursive descent"
parsing of expressions by using a precedence table
.Xr "precedence table"
for the operators instead of a parsing routine for each precedence
level.
There is for instance only one routine for expressions involving binary
operators; the precedence of the expression to be parsed is a parameter.
%A Reinhold Heckmann
%T An efficient ELL(1)-parser generator
%J Acta Inform.
%V 23
%N
%D 1986
%P 127-148
%K done
.La
The problem of parsing with an ELL(1)
.Xr "extended LL(1)"
grammar is reduced to finding various
FIRST
.Xr "FIRST set"
and FOLLOW sets.
.Xr "FOLLOW set"
Theorems about these sets are derived and very
efficient algorithms for their calculation are supplied.
%A Dick Grune
%A Ceriel J.H. Jacobs
%T A programmer-friendly LL(1) parser generator
%J Softw. Pract. Exper.
%V 18
%N 1
%D Jan 1988
%P 29-38
%K done
.La
Presents a practical ELL(1)
.Xr "extended LL(1)"
parser generator,
.Xr "parser generator"
called
@i[LLgen],
that generates fast error correcting recursive descent parsers.
.Xr "recursive descent"
In addition to the error correction,
.Xr "error correction"
@i[LLgen] features static as well as dynamic conflict resolvers
.Xr "conflict resolver"
and a separate compilation facility. The grammar can be viewed as a
program, allowing for a natural positioning of semantic actions.
.Xr "semantic action"
%A Keiichi Yoshida
%A Yoshiko Takeuchi
%T Some properties of an algorithm for constructing LL(1) parsing tables using production indices
%J J. Inform. Process.
%V 11
%N 4
%D 1988
%P 258-262
%K done
.La
Presents an LL(1)
.Xr "LL(1)"
parse table
.Xr "parse table"
algorithm that, rather than first computing
FIRST
.Xr "FIRST set"
and FOLLOW sets,
.Xr "FOLLOW set"
computes a so-called FIRST-table and FOLLOW-table,
which are indexed by a (non-terminal, symbol) pair, and deliver a grammar
rule number.
%A H. Dobler
%A K. Pirklbauer
%T Coco-2, \- A new compiler compiler
%J ACM SIGPLAN Notices
%V 25
%N 5
%D May 1990
%P 82-90
%K done
.La
The authors present an integrated system consisting of a lexical phase using
a heavily reduced FS automaton,
.Xr "finite-state automaton"
and a syntactic phase which uses a table-driven
.Xr "table-driven"
LL(1)
.Xr "LL(1)"
parser. Semantic actions
.Xr "semantic action"
are interspersed in the syntactic phase.
.EQ
delim $$
.EN
.P1 "LR parsing"
%A D.E. Knuth
%T On the translation of languages from left to right
%J Inform. Control
%V 8
%D 1965
%P 607-639
%K done
.La
This is the original paper on LR($k$).
.Xr "LR($k$)"
It defines the notion as an
abstract property of a grammar and gives two tests for LR($k$).
.Xr "LR($k$)"
The first
works by constructing for the grammar a regular grammar which generates all
possible already reduced parts (= stacks) plus their look-aheads; if this
grammar has the property that none of its words is a prefix to another of its
words, the original grammar was LR($k$).
.Xr "LR($k$)"
The second consists of implicitly
constructing all possible item sets (= states) and testing for conflicts.
Since none of this is intended to yield a reasonable parsing algorithm,
notation and terminology differs from that in later papers on the subject.
Several theorems concerning LR($k$) grammars are given and proved.
%A A.J. Korenjak
%T A practical method for constructing LR($k$) processors
%J Commun. ACM
%V 12
%N 11
%D Nov 1969
%P 613-623
%K done
.Xr "LR($k$)"
.La
The huge LR(1)
.Xr "LR(1)"
parsing table is partitioned as follows. A non-terminal $Z$
is chosen judiciously from the grammar, and two grammars are constructed,
$ G sub 0 $, in which $Z$ is considered to be a terminal symbol, and
$ G sub 1 $, which is the grammar for $Z$ (i.e. which has $Z$ as the start
symbol). If both grammars are LR(1) and moreover a master LR(1) parser can
be constructed that controls the switching back and forth between
$ G sub 0 $ and $ G sub 1 $, the parser construction succeeds (and the
original grammar was LR(1) too). The three resulting tables together are much
smaller than the LR(1) table for the original grammar.
It is also possible to chose a set of non-terminals $ Z sub 1 ... Z sub n $
and apply a variant of the above technique.
%A David Pager
%T A solution to an open problem by Knuth
%J Inform. Control
%V 17
%N
%D 1970
%P 462-473
%K done
.La
Highly mathematical paper concerning the properties of certain partitions of
the states of an LR(1)
.Xr "LR(1)"
parser with a view to reducing the size of the
LR automaton.
%A H. Langmaack
%T Application of regular canonical systems to grammars translatable from left to right
%J Acta Inform.
%V 1
%N
%D 1971
%P 111-114
%K done
.La
Different proof of the decidability of LR($k$).
.Xr "LR($k$)"
%A Franklin L. DeRemer
%T Simple LR($k$) grammars
%J Commun. ACM
%V 14
%N 7
%D July 1971
%P 453-460
%K done
.La
SLR($k$)
.Xr "SLR($k$)"
explained by its inventor. Several suggestions are made on how to
modify the method; use a possibly different $k$ for each state; use possibly
different lengths for each look-ahead string. The relation to Korenjak's
approach [LR 1969]
.Au Korenjak LR 1969
is also discussed.
%A A.V. Aho
%A J.D. Ullman
%T Optimization of LR($k$) parsers
%J J. Comput. Syst. Sci.
%V 6
%N
%D 1972
%P 573-602
%K done
.La
An algorithm is given to determine which entries in an LR($k$)
.Xr "LR($k$)"
table can never be accessed; the values of these entries are immaterial
(so-called
.Df "don't-care entries" ) "don't-care entry"
and can be merged with other values.
A second algorithm is given to determine which error entries could be merged
with which reduce entry, with the only result that error detection
.Xr "error detection"
is postponed.
Both algorithms and a merging technique are used to reduce table size. It is
proved that using these algorithms, one can produce SLR(1)
.Xr "SLR(1)"
and LALR(1)
.Xr "LALR(1)"
tables.
It is also proved that SLR(1)
.Xr "SLR(1)"
is identical to Korenjak's method [LR 1969]
.Au Korenjak LR 1969
with all non-terminals selected.
See also Soisalon-Soininen [LR 1982].
.Au Soisalon-Soininen LR 1982
%A David S. Wise
%T Generalized overlap resolvable grammars and their parsers
%J J. Comput. Syst. Sci.
%V 6
%N
%D Dec 1972
%P 538-572
%K done
.La
See same paper [Precedence 1972].
%A T. Anderson
%A J. Eve
%A J.J. Horning
%T Efficient LR(1) parsers
%J Acta Inform.
%V 2
%N
%D 1973
%P 12-39
%K done
.La
Coherent explanation of SLR(1),
.Xr "SLR(1)"
LALR(1),
.Xr "LALR(1)"
elimination of unit rules
.Xr "unit rule"
and table compression,
.Xr "table compression"
with good advice.
%A Karel \*(vCulik\ II
%A Rina Cohen
%T LR-regular grammars \- an extension of LR($k$) grammars
%J J. Comput. Syst. Sci.
%V 7
%N
%D 1973
%P 66-96
%K done
.La
The input is scanned from right to left by a FS automaton
.Xr "finite-state automaton"
which records its
state at each position. Next this sequence of states is parsed from left
to right using an LR(0)
.Xr "LR(0)"
parser. If such a FS automaton
.Xr "finite-state automaton"
and LR(0) parser
exist, the grammar is LR-regular.
.Xr "LR-regular"
The authors conjecture, however, that it
is unsolvable to construct this automaton and parser. Examples are given of
cases in which the problem can be solved.
%A A.V. Aho
%A J.D. Ullman
%T A technique for speeding up LR($k$) parsers
%J SIAM J. Computing
%V 2
%N 2
%D June 1973
%P 106-127
%K done
.Xr "LR($k$)"
.La
Describes two detailed techniques to eliminate unit rules,
.Xr "unit rule"
one by recognizing particular stack configurations and one by merging shifts
on non-terminals (GOTO's).
.Xr "GOTO-action"
%A Shoji Sekimoto
%A Kuniaki Mukai
%A Masaru Sudo
%T A method of minimizing LR($k$) parsers
%J Systems, Computers and Control
%V 4
%N 5
%D 1973
%P 73-80
%K done
.Xr "LR($k$)"
.La
The states of an LR(1)
.Xr "LR(1)"
parser are grouped into classes by one of several
equivalence relations. The parser records only in which class it is, not in
which state. When a reduction is called for, additional computation is
required to determine which reduction. The tables for the class transitions
are much smaller than those for the state transitions.
.Xr "state transition"
%A Jaroslav Kr\('al
%A Ji\*(vr\('i Demner
%T A note on the number of states of the DeRemer's recognizer
%J Inform. Process. Lett.
%V 2
%N
%D 1973
%P 22-23
%K done
.La
Gives a formula for the number of states of an SLR(1)
.Xr "SLR(1)"
parser for an LL(1)
.Xr "LL(1)"
grammar.
%A A.V. Aho
%A S.C. Johnson
%T LR parsing
%J Computing Surveys
%V 6
%N 2
%D 1974
%P 99-124
%K done
.La
LR parsing explained in a readable fashion, by the experts. Required reading.
%A Matthew M. Geller
%A Susan L. Graham
%A Michael A. Harrison
%T Production prefix parsing
%B Automata, languages and programming
%S Lecture Notes in Computer Science #14
%E J. Loeckx
%I Springer-Verlag
%C Berlin
%D 1974
%P 232-241
%K done
.La
The items in a non-deterministic LR(0|1) automaton
.Xr "non-deterministic automaton"
are simplified in that rather than $A -> beta "\*(Dt" gamma$ only $beta$ (the
.Df "production prefix" )
is recorded.
If the corresponding deterministic automaton
.Xr "deterministic automaton"
is free of
conflicts and has no compound states (that is, each state contains only one
production prefix)
.Xr "production prefix"
the grammar is a @i[production prefix grammar]. Table size
is proportional to grammar size. Production prefix(1)
.Xr "production prefix"
is between simple precedence
.Xr "simple precedence"
and SLR(1)
.Xr "SLR(1)"
in power.
%A David Pager
%T On eliminating unit productions from LR($k$) parsers
%B Automata, languages and programming
%S Lecture Notes in Computer Science #14
%E J. Loeckx
%I Springer-Verlag
%C Berlin
%D 1974
%P 242-254
%K done
.Xr "LR($k$)"
.La
The unit rules (and only the unit rules)
.Xr "unit rule"
of the grammar are collected in a
directed graph, which is a set of multi-rooted trees (no cycles allowed).
For each leaf, the states of all its predecessors are contracted.
%A J.J. Horning
%T LR grammars and analyzers
%B Compiler Construction, an Advanced Course
%S Lecture Notes in Computer Science #21
%E F.L. Bauer & J. Eickel
%I Springer-Verlag
%C New York
%D 1974
%P 85-108
%K done
.La
These lecture notes present a concise discussion of LR($k$)
.Xr "LR($k$)"
grammars and
LR(0),
.Xr "LR(0)"
SLR(1)
.Xr "SLR(1)"
(more restrictive adding of reduce entries by using FOLLOW
sets),
.Xr "FOLLOW set"
LALR(1)
.Xr "LALR(1)"
(using shift entries to determine state after reduce),
and LR(1)
.Xr "LR(1)"
(adding look-ahead to items) constructor algorithms.
Also some attention is given to the representation of LR tables, including
some compactification techniques.
.Xr "table compression"
%A Paul Purdom
%T The size of LALR(1) parsers
%J BIT
%V 14
%N
%D 1974
%P 326-337
%K done
.Xr "LALR(1)"
.La
Experimental size analysis for LALR(1) parsers. Although parser size can be
exponential in the grammar size, it is found in practice to be linear in the
grammar size.
%A Hans H. Kron
%A Hans-J\(:urgen Hoffman
%A Gerhard Winkler
%T On a SLR($k$)-based parser system which accepts non-LR($k$) grammars
%B GI-4. Jahrestagung
%S Lecture Notes in Computer Science #26
%E D. Siefkes
%I Springer-Verlag
%C New York
%D 1975
%P 214-223
%K done
.Xr "SLR($k$)"
.Xr "LR($k$)"
.La
For each inadequate state
.Xr "inadequate state"
in an LR(0) automaton,
.Xr "LR(0) automaton"
a resolution tree is
constructed of maximum depth $k$. If this construction succeeds, the grammar
is of type
.Df FSLR($k$) \&.
If it fails, a parser is generated that performs breadth-first search
.Xr "breadth-first search"
to
resolve the remaining inadequacies.
.Xr "inadequate state"
Detailed algorithms are given.
%A A.J. Demers
%T Elimination of single productions and merging of non-terminal symbols in LR(1) grammars
%J Comput. Lang. (Elmsford, NY)
%V 1
%N 2
%D April 1975
%P 105-119
%K done
.Xr "LR(1)"
.La
The unit rules
.Xr "unit rule"
are used to define subsets of the non-terminals, the
members of which can be treated as equivalent, similar to
Aho and Ullman [LR 1973].
.Au Aho LR 1973
.Au Ullman LR 1973
Explicit proofs are given.
%A Harry B. Hunt\ III
%A Thomas G. Szymanski
%A Jeffrey D. Ullman
%T On the complexity of LR($k$) testing
%J Commun. ACM
%V 18
%N 12
%D Dec 1975
%P 707-716
%K done
.Xr "LR($k$)"
.La
.Xr "time requirements"
Time bounds as a function of the grammar size are derived for testing many
properties of grammars. A practical result is that both the LL($k$)
.Xr "LL($k$)"
and the LR($k$)
.Xr "LR($k$)"
properties can be tested in $ O ( n sup { k + 2 } ) $. These
and other bounds given in the paper are upper bounds, and actual testing is
often much faster.
%A O.L. Madsen
%A B.B. Kristensen
%T LR-parsing of extended context-free grammars
%J Acta Inform.
%V 7
%N 1
%D 1976
%P 61-73
%K done
.La
The right parts are allowed to contain choices
{$omega sub 1$|$...$|$omega sub n$} and repetitions
$roman "{" omega roman "}" sup "*" $.
In addition to the dotted items in the LR sets,
there are also @i[marked] items, which have a $#$ rather than a \*(Dt.
.Xr "dot"
The
$#$ means one of three things: here starts a repetition, one element of a
repetition has just been recognized or one member of a choice has just been
recognized. Upon reduction, these marked items will tell how to unstack the
entire right-hand side.
%A R.C. Backhouse
%T An alternative approach to the improvement of LR($k$) parsers
%J Acta Inform.
%V 6
%N 3
%D 1976
%P 277-296
%K done
.Xr "LR($k$)"
.La
Traditionally, the field of bottom-up parsing is described in terms of
handle-finding
.Xr "handle"
automata.
The author describes it in terms of left-contexts, in which a
.Df "left-context"
is a set of stack configurations of the LR($k$)
.Xr "LR($k$)"
parser. Other bottom-up
techniques are explained as approximations to these sets.
%A Thomas G. Szymanski
%A John H. Williams
%T Non-canonical extensions of bottom-up parsing techniques
%J SIAM J. Computing
%V 5
%N 2
%D June 1976
%P 231-250
%K done
.La
Theory of non-canonical
.Xr "non-canonical parsing"
versions of several bottom-up parsing techniques,
with good informal introduction.
%A Marc L. Joliat
%T A simple technique for partial elimination of unit productions from LR($k$) parsers
%J IEEE Trans. Comput.
%V C-25
%N 7
%D July 1976
%P 763-764
%K done
.Xr "LR($k$)"
.La
A very simple algorithm is given that alters some of the transitions in an
LR parse table
.Xr "parse table"
to bypass unit rules.
.Xr "unit rule"
%A M.M. Geller
%A M.A. Harrison
%T Characteristic parsing: a framework for producing compact deterministic parsers
%J J. Comput. Syst. Sci.
%V 14
%N 3
%D June 1977
%P 265-317
%K done
.Xr "characteristic parsing"
.La
Given a deterministic LR(1) automaton,
.Xr "LR(1) automaton"
.Xr "deterministic automaton"
suppose we add some (arbitrary) items
to some states. This will have two effects: the discriminatory power of the
automaton will weaken and its minimum size will decrease (since now some
states will coincide).
For a large number of grammars there is a @i[characteristic] item addition
technique that will minimize automaton size while preserving just enough
power.
This requires a heavy mathematical apparatus.
%A Matthew M. Geller
%A Michael A. Harrison
%T On LR($k$) grammars and languages
%J Theoret. Comput. Sci.
%V 4
%N
%D 1977
%P 245-276
%K done
.Xr "LR($k$)"
.La
Theoretical groundwork for the \(l"characteristic parsing technique\(r"
.Xr "characteristic parsing"
of Geller and Harrison [LR June 1977].
%A D. Pager
%T The lane-tracing algorithm for constructing LR($k$) parsers and ways of enhancing its efficiency
%J Inform. Sci.
%V 12
%N
%D 1977
%P 19-42
%K done
.Xr "LR($k$)"
.La
An item $A -> beta "\*(Dt" X gamma$
.Xr "dot"
in an LR parser (called a \(l"configuration\(r" here) has in general two
kinds of successors:
.Xr "successor"
a set of \(l"immediate successors\(r"
.Xs "immediate successor" "successor"
$X -> "\*(Dt" xi sub n$ and the \(l"transition successor\(r"
.Xs "transition successor" "successor"
$A -> beta X "\*(Dt" gamma$.
An item together with a sequence of its successive successors is called a
.Df "lane" \&.
Lanes are used 1) to collect enough look-ahead context to convert an LR(0)
automaton
.Xr "LR(0) automaton"
to LALR(1);
.Xr "LALR(1)"
2) to determine which LALR(1) states should be split
to resolve remaining LALR(1) conflicts. The required algorithms are of
considerable complexity.
%A Wilf R. LaLonde
%T Regular right part grammars and their parsers
%J Commun. ACM
%V 20
%N 10
%D Oct 1977
%P 731-741
%K done
.La
The notion of regular right part
.Xr "regular right part grammar"
grammars and its advantages are described in detail.
A parser is proposed that does LR($k$)
.Xr "LR($k$)"
parsing to find the right end of the handle
.Xr "handle"
and then, using different parts of the same table, scans the stack backwards
using a look-ahead (to the left!) of $m$ symbols to find the left end; this
is called
.Df "LR($m$, $k$)" "."
The corresponding parse table
.Xr "parse table"
construction algorithm is given by LaLonde [LR 1979].
.Au LaLonde LR 1979
%A David Pager
%T A practical general method for constructing LR($k$) parsers
%J Acta Inform.
%V 7
%N 3
%D 1977
%P 249-268
%K done
.Xr "LR($k$)"
.La
When during the construction of an LR(1)
.Xr "LR(1)"
parser a state has to be added, one
can consider merging it with an already existing state, if no conflict can
arise from this. The problem is that it is not easy to tell whether
conflicts may arise from a certain merge. To this end, the notions
.Df "weak compatibility"
and
.Df "strong compatibility"
are defined. Algorithms for the efficient construction of conflict-free
small full LR(1) parse tables
.Xr "parse table"
are given.
%A D. Pager
%T Eliminating unit productions from LR($k$) parsers
%J Acta Inform.
%V 9
%N
%D 1977
%P 31-59
%K done
.Xr "LR($k$)"
.La
Very detailed description of a unit rule
.Xr "unit rule"
elimination algorithm.
%A A. Celentano
%T Incremental LR parsers
%J Acta Inform.
%V 10
%N
%D 1978
%P 307-321
%K done
.Xr "incremental parsing"
.La
Very clear exposition of how the Ghezzi and Mandrioli algorithm [LR 1979]
.Au Ghezzi LR 1979
.Au Mandrioli LR 1979
can be made to work on parse sequences rather than on parse trees, thus
improving efficiency.
%A Stephen C. Johnson
%T YACC: yet another compiler-compiler
%I Bell Laboratories
%C Murray Hill, New Jersey 07974
%D 1978
%P 34
%K done
.La
In spite of its title, @i[yacc]
.Xr "@i[yacc]"
is one of the most widely used parser
generators.
.Xr "parser generator"
It generates LALR(1)
.Xr "LALR(1)"
parsers from a grammar with embedded
semantic actions
.Xr "semantic action"
and features a number of disambiguating
.Xr "disambiguating rule"
and conflict-resolving mechanisms. The generated parser is in C.
.Xr "C"
%A Akifumi Makinouchi
%T On single production elimination in simple LR($k$) environment
%J J. Inform. Process.
%V 1
%N 2
%D 1978
%P 76-80
%K done
.Xr "LR($k$)"
.La
An SLR(1)
.Xr "SLR(1)"
parser is extended with the possibility of specifying grammar
rules of the form
$ \(no roman "{" C sub l roman "}" A \(no roman "{" C sub r roman "}" -> ... $,
which can only be applied when the symbol before the $A$ cannot produce a
member of $ roman "{" C sub l roman "}" $ as its last token, and the token
after $A$ is not in $ roman "{" C sub r roman "}" $.
Such rules allow some convenient ambiguities to be
resolved without loosing the generative power of the system.
%A W.R. LaLonde
%T Constructing LR parsers for regular right part grammars
%J Acta Inform.
%V 11
%N
%D 1979
%P 177-193
%K done
.La
Describes the algorithms for the regular right part
.Xr "regular right part grammar"
parsing technique
explained by LaLonde [LR 1977]. The back scan is performed using so-called
.Df "read-back tables" \&.
Compression techniques
.Xr "table compression"
for these tables are given.
%A Stephan Heilbrunner
%T On the definition of ELR($k$) and ELL($k$) grammars
%J Acta Inform.
%V 11
%N
%D 1979
%P 169-176
%K done
.La
See same paper [LL 1979].
%A Otto Mayer
%T On deterministic canonical bottom-up parsing
%J Inform. Control
%V 43
%N
%D 1979
%P 280-303
%K done
.La
A general framework is presented for deterministic canonical bottom-up
parsers, from which well-known parsers arise as special cases.
%A Carlo Ghezzi
%A Dino Mandrioli
%T Incremental parsing
%J ACM Trans. Prog. Lang. Syst.
%V 1
%N 1
%D July 1979
%P 58-70
%K done
.Xr "incremental parsing"
.La
The authors observe that when a grammar allows bottom-up parsing using some
technique $T$ and is at the same time RL($k$) for any $k$, then any
modification to the input text can only affect nodes that produce the
modified part. By keeping the entire parse tree in a both left-most and
right-most threaded form, these nodes can be located and updated quickly.
The case LR(1) \(AN RL(1)
.Xr "RL(1)"
is treated in full.
%A Kai Koskimies
%A Eljas Soisalon-Soininen
%T On a method for optimizing LR parsers
%J Intern. J. Comput. Math.
%V A7
%N
%D 1979
%P 287-295
%K done
.La
Defines criteria under which Pager's algorithm for the elimination of unit
rules [LR 1977]
.Au Pager LR 1977
.Xr "unit rule"
can be safely applied to SLR(1)
.Xr "SLR(1)"
parsers.
%A Kuo-Chung Tai
%T Noncanonical SLR(1) grammars
%J ACM Trans. Prog. Lang. Syst.
%V 1
%N 2
%D Oct 1979
%P 295-320
%K done
.La
A survey of non-canonical parsing methods
.Xr "non-canonical parsing"
is given and two non-canonical
variants of SLR(1)
.Xr "SLR(1)"
parsing are described.
%A Gerald A. Fisher\ Jr.
%A Manfred Weber
%T LALR(1) parsing for languages without reserved words
%J ACM SIGPLAN Notices
%V 14
%N 11
%D Nov 1979
%P 26-30
%K done
.Xr "reserved word"
.La
A heuristic is given for designing an LALR(1)
.Xr "LALR(1)"
programming language without reserved words.
.Xr "reserved word"
First design the LALR(1) language @i[with] reserved
words, using a non-terminal @t[identifier] for the identifiers. Now allow
@t[identifier] to also produce all reserved words and modify the grammar (or
the language) until the grammar is LALR(1) again, using feedback from an
LALR(1) parser generator.
.Xr "parser generator"
%A Eljas Soisalon-Soininen
%T On the space-optimizing effect of eliminating single productions from LR parsers
%J Acta Inform.
%V 14
%N
%D 1980
%P 157-174
%K done
.La
Improvement of Pager's unit rule
.Xr "unit rule"
elimination algorithm [LR 1977].
.Au Pager LR 1977
%A Carlo Ghezzi
%A Dino Mandrioli
%T Augmenting parsers to support incrementality
%J J. ACM
%V 27
%N 3
%D 1980
%P 564-579
%K done
.Xr "incremental parsing"
.La
The algorithm of Ghezzi and Mandrioli [LR 1979]
.Au Ghezzi LR 1979
.Au Mandrioli LR 1979
is extended to all LR($k$)
.Xr "LR($k$)"
grammars.
%A Jacek Witaszek
%T The LR(k) parser
%B Mathematical Foundations of Computer Science
%S Lecture Notes in Computer Science #88
%E P. Dembi\*('nski
%I Springer-Verlag
%C New York
%D 1980
%P 686-697
%K done
.La
Three size-reducing transformations on LR($k$)
.Xr "LR($k$)"
tables are defined that leave the LR($k$) property undisturbed.
One is similar to minimising a FS automaton,
.Xr "finite-state automaton"
one removes unused look-ahead and one allows delaying error detection.
.Xr "error detection"
No full algorithms given, but see Witaszek [LR 1988].
.Au Witaszek LR 1988
%A Bent Bruun Kristensen
%A Ole Lehrmann Madsen
%T Methods for computing LALR($k$) lookahead
%J ACM Trans. Prog. Lang. Syst.
%V 3
%N 1
%D Jan 1981
%P 60-82
%K done
.La
The LALR($k$)
.Xr "LALR($k$)"
look-ahead sets are seen as the solution to a set of equations,
which are solved by recursive traversal of the LR(0) automaton.
.Xr "LR(0) automaton"
Full algorithms
plus proofs are given.
%A R. Kemp
%T LR(0) grammars generated by LR(0) parsers
%J Acta Inform.
%V 15
%N
%D 1981
%P 265-280
%K done
.Xr "LR(0)"
.La
Theoretical analysis of the set of LR(0) grammars that produce a given
LR(0) parser.
%A Theodore P. Baker
%T Extending look-ahead for LR parsers
%J J. Comput. Syst. Sci.
%V 22
%N 2
%D 1981
%P 243-259
%K done
.La
A FS automaton
.Xr "finite-state automaton"
is derived from the LR automaton as follows: upon a reduce to
$A$ the automaton moves to all states that have an incoming arc marked $A$.
This automaton is used for analysing the look-ahead as in an LR-regular
.Xr "LR-regular"
parser (\*(vCulik and Cohen [LR 1973]).
%A Stephan Heilbrunner
%T A parsing automata approach to LR theory
%J Theoret. Comput. Sci.
%V 15
%N
%D 1981
%P 117-157
%K done
.La
Parsing is explained in terms of
.Df "item grammars" , "item grammar"
which describe the stack configurations of the parser.
The theory is first developed for LR and then applied uniformly to LL and LC.
%A Wilf R. LaLonde
%T The construction of stack-controlling LR parsers for regular right part grammars
%J ACM Trans. Prog. Lang. Syst.
%V 3
%N 2
%D April 1981
%P 168-206
%K done
.La
Traditional LR parsers shift each input token onto the stack; often, this
shift could be replaced by a state transition,
.Xr "state transition"
indicating that the shift has
taken place. Such a parser is called a
.Df "stack-controlling LR parser" ,
and will do finite-state recognition without stack manipulation whenever
possible. Algorithms for the construction of stack-controlling LR parse tables
.Xr "parse table"
are given. The paper is complicated by the fact that the new feature
is introduced not in a traditional LR parser, but in an LR parser for
regular right parts
.Xr "regular right part grammar"
(for which see LaLonde [LR 1977]).
.Au LaLonde LR 1977
%A Charles Wetherell
%A A. Shannon
%T LR \- automatic parser generator and LR(1) parser
%J IEEE Trans. Softw. Eng.
%V SE-7
%N 3
%D May 1981
%P 274-278
%K done
.La
This short paper discusses a full LR(1)
.Xr "LR(1)"
parser generator
.Xr "parser generator"
and parser,
written in ANSI 66 Fortran for portability, and using an algorithm
by Pager [LR 1977].
.Au Pager LR 1977
%A Augusto Celentano
%T An LR parsing technique for extended context-free grammars
%J Comput. Lang. (Elmsford, NY)
%V 6
%N 2
%D 1981
%P 95-107
%K done
.La
The results of repetitions or selections are popped off the parsing stack
before the entire right-hand side has been recognized. Remarkably, this can
be done for any extended LR(1)
.Xr "LR(1)"
grammar. Explicit algorithms are given.
%A Paul W. Purdom
%A Cynthia A. Brown
%T Parsing extended LR($k$) grammars
%J Acta Inform.
%V 15
%N
%D 1981
%P 115-127
%K done
.La
An LR state is stacked only at the beginning of a right-hand side; all other
work is done on a global state.
At a reduce, the reduced non-terminal is already on the top of the stack and
needs only to be unstacked.
This does not work for all extended LR($k$)
.Xr "extended LR($k$)"
grammars, but any extended LR($k$) can be converted into one for which the
method works.
%A Takehiro Tokuda
%T Eliminating unit reductions from LR($k$) parsers using minimum contexts
%J Acta Inform.
%V 15
%N
%D 1981
%P 447-470
%K done
.Xr "LR($k$)"
.La
Very densely written analysis of algorithms for the elimination of unit rules
.Xr "unit rule"
from a special class of LR($k$) parsers.
%A Colin Burgess
%A Laurence James
%T An indexed bibliography for LR grammars and parsers
%J ACM SIGPLAN Notices
%V 16
%N 8
%D Aug 1981
%P 14-26
%K done
.La
Useful, detailed and structured bibliography containing around 115 entries.
%A David Spector
%T Full LR(1) parser generation
%J ACM SIGPLAN Notices
%V 16
%N 8
%D Aug 1981
%P 58-66
%K done
.Xr "LR(1)"
.La
A heuristic algorithm for enlarging an LR(0)
.Xr "LR(0)"
table to full LR(1) is given and demonstrated on two examples.
With letter of correction (vol. 16, no. 11, Nov 1981, p. 2).
See also Ancona, Dodero and Gianuzzi [LR 1982]
.Au Ancona LR 1982
.Au Dodero LR 1982
.Au Gianuzzi LR 1982
and Spector [LR 1988].
.Au Spector LR 1988
%A M. Ancona
%A V. Gianuzzi
%T A new method for implementing LR($k$) tables
%J Inform. Process. Lett.
%V 13
%N 4/5
%D 1981
%P 171-176
%K done
.Xr "LR($k$)"
.La
For each inadequate state
.Xr "inadequate state"
there is a separate automaton handling that
inadequacy by doing a look-ahead of one token. If this automaton has
inadequate states the process is repeated. A tables construction algorithm
is given.
%A Eljas Soisalon-Soininen
%T Inessential error entries and their use in LR parser optimization
%J ACM Trans. Prog. Lang. Syst.
%V 4
%N 2
%D April 1982
%P 179-195
%K done
.La
More sophisticated and general algorithms are given for the techniques
described by Aho and Ullman [LR 1972].
.Au Aho LR 1972
.Au Ullman LR 1972
%A M. Ancona
%A G. Dodero
%A V. Gianuzzi
%T Building collections of LR($k$) items with partial expansion of lookahead strings
%J ACM SIGPLAN Notices
%V 17
%N 5
%D May 1982
%P 24-28
%K done
.Xr "LR($k$)"
.La
In addition to the usual terminals, non-terminals are allowed in the
look-ahead sets, leading to very substantial savings in the number of
states. Only if an inadequate state
.Xr "inadequate state"
turns up the non-terminals are developed
as far as needed to resolve the inadequacy. The algorithm will also work
reasonably for $k>1$.
%A J.C.H. Park
%T A new LALR formalism
%J ACM SIGPLAN Notices
%V 17
%N 7
%D July 1982
%P 47-61
%K done
.La
Simplified operators corresponding to Predict and Accept are defined
precisely and applied to LR and LALR parser generation. Difficult to read.
%A Frank DeRemer
%A Thomas J. Pennello
%T Efficient computation of LALR(1) look-ahead sets
%J ACM Trans. Prog. Lang. Syst.
%V 4
%N 4
%D Oct 1982
%P 615-649
%K done
.La
1. The LALR(1)
.Xr "LALR(1)"
look-ahead sets are calculated by four linear sweeps over the
LR(0) automaton,
.Xr "LR(0) automaton"
calculating the sets Direct Read, Read, Follow and
Look-Ahead, respectively.
2. An obvious simplification leads to \(l"Not Quite LALR(1)\(r",
.Df "NQLALR(1)" ,
and is shown to be inadequate.
3. The debugging of non-LALR(1) grammars is treated.
%A Colin Burgess
%A Laurence James
%T A revised index bibliography for LR grammars and parsers
%J ACM SIGPLAN Notices
%V 17
%N 12
%D Dec 1982
%P 18-26
%K nieuw
.La
A revision of Burgess and James [LR 1981],
.Au Burgess LR 1981
.Au James 1981
extending the number of entries to about 160.
%A Jorma Tarhio
%T LR parsing of some ambiguous grammars
%J Inform. Process. Lett.
%V 14
%N 3
%D 1982
%P 101-103
%K done
.La
.Xr "ambiguity"
The reduction items in all inadequate states
.Xr "inadequate state"
are collected. The rules in
them are extended at the end with \(l"synchronization symbols\(r", to make
the shift/reduce
.Xr "shift/reduce conflict"
and reduce/reduce conflicts
.Xr "reduce/reduce conflict"
go away. These synchronization
symbols are context-dependent; for instance each identifier could be followed
by a token indicating its type. The synchronization symbols are inserted in
the input stream by the lexical analyser while parsing.
%A Rakesh Agrawal
%A Keith D. Detro
%T An efficient incremental LR parser for grammars with epsilon productions
%J Acta Inform.
%V 19
%N 4
%D 1983
%P 369-376
%K done
.La
A linear time and space
.Xr "space requirements"
implementation of Celentano's algorithm [LR 1978]
.Au Celentano LR 1978
is described, which can also handle $epsilon$-rules.
.Xr "$epsilon$-rule"
%A Takehiro Tokuda
%T A fixed-length approach to the design and construction of bypassed LR($k$) parsers
%J J. Inform. Process.
%V 6
%N 1
%D 1983
%P 23-30
%K done
.La
The idea of removing unit reductions is extended to removing @i[all]
reductions that do not involve semantic actions;
.Xr "semantic action"
this leads to
.Df "bypassed LR($k$) parsers" \&. "bypassed LR($k$) parser"
Full algorithms are given. Some of the literature on removing unit rules
.Xr "unit rule"
is analysed critically.
%A Dashing Yeh
%T On incremental shift-reduce parsing
%J BIT
%V 23
%N 1
%D 1983
%P 36-48
%K done
.Xr "incremental parsing"
.La
The input tokens to an LR parser are stored in a linked list; each node in
this list also holds a pointer to a stack pertinent for the token in the
node. These stacks can be merged and are in fact also stored in the nodes.
This arrangement greatly simplifies incremental parsing.
Very clear explanation.
%A Kenzo Inoue
%A Fukumi Fujiwara
%T On LLC($k$) parsing method of LR($k$) grammars
%J J. Inform. Process.
%V 6
%N 4
%D 1983
%P 206-217
%K done
.La
Assume an LR($k$)
.Xr "LR($k$)"
grammar. Start parsing using the (full) LL($k$)
.Xr "full LL($k$)"
method,
until an LL($k$) conflict
.Xr "LL(1) conflict"
is encountered, say on non-terminal $A$.
$A$ is then parsed with the LR($k$)
.Xr "LR($k$)"
method, using the proper predicted look-ahead set.
If during the LR (sub)parsing the number of items narrows down to one,
an LL($k$) (sub-sub)parsing is started; etc.
Full algorithms for all tables are given.
LLC means \(l"Least Left Corner\(r".
.Xr "LLC"
.Xr "Least Left Corner"
%A Lothar Schmitz
%T On the correct elimination of chain productions from LR parsers
%J Intern. J. Comput. Math.
%V 15
%N 2
%D 1984
%P 99-116
%K done
.La
Rigorous proofs of some claims about unit-free LR($k$)
.Xr "LR($k$)"
parsers.
%A N.P. Chapman
%T LALR(1,1) parser generation for regular right part grammars
%J Acta Inform.
%V 21
%N
%D 1984
%P 29-45
%K done
.Xr "regular right part grammar"
.La
Efficient construction algorithm for LALR(1,1)
.Xr "LALR(1,1)"
parse tables,
.Xr "parse table"
which find the
right end of the handle
.Xr "handle"
by traditional LALR(1)
.Xr "LALR(1)"
parsing and then scan the stack backwards using a look-ahead of 1 symbol to
find the left end.
%A Joseph C.H. Park
%A K.M. Choe
%A C.H. Chang
%T A new analysis of LALR formalisms
%J ACM Trans. Prog. Lang. Syst.
%V 7
%N 1
%D Jan 1985
%P 159-175
%K done
.La
The recursive closure
.Xr "closure"
operator $CLOSURE$ of Kristensen and Madsen [LR 1981]
.Au Kristensen LR 1981
.Au Madsen LR 1981
is abstracted to an iterative $delta$-operator such that
$ CLOSURE == delta sup "*" $. This operator allows the formal derivation of
four algorithms for the construction of LALR look-ahead sets.
%A Esko Ukkonen
%T Upper bounds on the size of LR($k$) parsers
%J Inform. Process. Lett.
%V 20
%N 2
%D Feb 1985
%P 99-105
%K done
.La
Upper bounds for the number of states of an LR($k$)
.Xr "LR($k$)"
parser are given for several types of grammars.
%A S. Heilbrunner
%T Truly prefix-correct chain-free LR(1) parsers
%J Acta Inform.
%V 22
%N 5
%D 1985
%P 499-536
%K done
.La
A unit-free LR(1)
.Xr "LR(1)"
parser generator
.Xr "parser generator"
algorithm, rigorously proven correct.
%A Fred Ives
%T Unifying view of recent LALR(1) lookahead set algorithms
%J ACM SIGPLAN Notices
%V 21
%N 7
%D July 1986
%P 131-135
%K done
.La
A common formalism is given in which the LALR(1)
.Xr "LALR(1)"
look-ahead set construction
algorithms of DeRemer and Pennello [LR 1982],
.Au DeRemer LR 1982
.Au Pennello LR 1982
Park, Choe and Chang [LR 1985]
.Au Park LR 1985
.Au Choe LR 1985
.Au Chang LR 1985
and the author can be expressed. See also Park and Choe [LR 1987].
.Au Park LR 1987
.Au Choe LR 1987
%A Manuel E. Bermudez
%A Karl M. Schimpf
%T A practical arbitrary look-ahead LR parsing technique
%J ACM SIGPLAN Notices
%V 21
%N 7
%D July 1986
%P 136-144
%K done
.La
To resolve LR(0)
.Xr "LR(0)"
conflicts at run time, for each conflict state a FS
automaton
.Xr "finite-state automaton"
is developed that will do arbitrary look-ahead.
Grammars for which
parsers can be constructed by this technique are called
.Df "LAM($m$)"
where $m$ in some way limits the size of the look-ahead FS automata.
.Xr "finite-state automaton"
It can handle some non-LR($k$) grammars. See also Baker [LR 1981].
.Au Baker LR 1981
%A Thomas J. Pennello
%T Very fast LR parsing
%J ACM SIGPLAN Notices
%V 21
%N 7
%D July 1986
%P 145-151
%K done
.La
The tables and driver of a traditional LALR(1)
.Xr "LALR(1)"
parser are replaced by
assembler code performing linear search for small fan-out, binary search for
medium and a calculated jump for large fan-out. This modification gained a
factor of 6 in speed at the expense of a factor 2 in size.
%A Ikuo Nakata
%A Masataka Sassa
%T Generation of efficient LALR parsers for regular right part grammars
%J Acta Inform.
%V 23
%N
%D 1986
%P 149-162
%K done
.Xr "regular right part grammar"
.La
The stack of an LALR(1)
.Xr "LALR(1)"
parser is augmented with a set of special markers that indicate the start of
a right-hand side; adding such a marker during the shift is called a
.Df "stack-shift" "."
Consequently there can now be a shift/stack-shift conflict, abbreviated to
.Df "stacking conflict" "."
The stack-shift is given preference and any superfluous markers are
eliminated during the reduction.
Full algorithms are given.
%A A.M.M. Al-Hussaini
%A R.G. Stone
%T Yet another storage technique for LR parsing tables
%J Softw. Pract. Exper.
%V 16
%N 4
%D 1986
%P 389-401
%K done
.La
Excellent introduction to LR table compression
.Xr "table compression"
in general. The
.Df "submatrix technique"
introduced in this paper partitions the rows into a number of submatrices,
the rows of each of which are similar enough to allow drastic compressing.
The access cost is $O (1)$. A heuristic partitioning algorithm is given.
%A Masataka Sassa
%A Ikuo Nakata
%T A simple realization of LR-parsers for regular right part grammars
%J Inform. Process. Lett.
%V 24
%N 2
%D Jan 1987
%P 113-120
%K done
.Xr "regular right part grammar"
.La
For each item in each state on the parse stack of an LR parser, a counter is
kept indicating how many preceding symbols on the stack are covered by the
recognized part in the item. Upon reduction, the counter of the reducing
item tells us how many symbols to unstack. The manipulation rules for the
counters are simple. The counters are stored in short arrays, one array for
each state on the stack.
%A Joseph C.H. Park
%A Kwang-Moo Choe
%T Remarks on recent algorithms for LALR lookahead sets
%J ACM SIGPLAN Notices
%V 22
%N 4
%D April 1987
%P 30-32
%K done
.La
Careful analysis of the differences between the algorithms of Park, Choe and
Chang [LR 1985]
.Au Park LR 1985
.Au Choe LR 1985
.Au Chang LR 1985
and Ives [LR 1986].
.Au Ives LR 1986
See also Ives [LR 1987].
.Au Ives LR 1987
%A Fred Ives
%T Response to remarks on recent algorithms for LALR lookahead sets
%J ACM SIGPLAN Notices
%V 22
%N 8
%D 1987
%P 99-104
%K done
.La
Remarks by Park and Choe [LR 1987]
.Au Park LR 1987
.Au Choe LR 1987
are refuted and a new algorithm is presented that is significantly better
than that of Park, Choe and Chang [LR 1985]
.Au Park LR 1985
.Au Choe LR 1985
.Au Chang LR 1985
and that previously presented by Ives [LR 1986].
.Au Ives LR 1986
%A Nigel P. Chapman
%T LR Parsing: Theory and Practice
%I Cambridge University Press
%C New York, NY
%D 1987
%P 228
%K done
.La
Detailed treatment of the title subject. Highly recommended for anybody who
wants to acquire in-depth knowledge about LR parsing. Good on size of parse
tables
.Xr "parse table"
and attribute grammars.
.Xr "attribute grammar"
%A Eljas Soisalon-Soininen
%A Jorma Tarhio
%T Looping LR parsers
%J Inform. Process. Lett.
%V 26
%N 5
%D Jan 1988
%P 251-253
%K done
.La
For some (non-LR) grammars it is true that there are ways to resolve the
conflicts in an LR parser for them that will make the parser loop on some
inputs (executing an endless sequence of reduces). A test is given to detect
such grammars.
%A Jacek Witaszek
%T A practical method for finding the optimum postponement transformation for LR($k$) parsers
%J Inform. Process. Lett.
%V 27
%N 2
%D Feb 1988
%P 63-67
%K done
.La
By allowing the LR($k$)
.Xr "LR($k$)"
automaton to postpone error checking, the size of the automaton can be
reduced dramatically.
Finding the optimum postponement transformation is, however, a large
combinatorial problem.
A good heuristic algorithm for finding a (sub)optimal transformation is
given.
%A Dashing Yeh
%A Uwe Kastens
%T Automatic construction of incremental LR(1) parsers
%J ACM SIGPLAN Notices
%V 23
%N 3
%D March 1988
%P 33-42
%K done
.Xr "incremental parsing"
.La
Detailed algorithms for an incremental LR(1)
.Xr "LR(1)"
parser that allows multiple modifications and $epsilon$-rules.
.Xr "$epsilon$-rule"
%A Manuel E. Bermudez
%A Karl M. Schimpf
%T On the (non-)relationship between SLR(1) and NQLALR(1) grammars
%J ACM Trans. Prog. Lang. Syst.
%V 10
%N 2
%D April 1988
%P 338-342
%K done
.La
Shows a grammar that is SLR(1)
.Xr "SLR(1)"
but not NQLALR(1).
.Xr "NQLALR(1)"
%A Pierpaolo Degano
%A Stefano Mannucci
%A Bruno Mojana
%T Efficient incremental LR parsing for syntax-directed editors
%J ACM Trans. Prog. Lang. Syst.
%V 10
%N 3
%D July 1988
%P 345-373
%K done
.Xr "incremental parsing"
.La
The non-terminals of a grammar are partitioned by hand into sets of
\(l"incrementally compatible\(r" non-terminals, meaning that replacement of
one non-terminal by an incrementally compatible one is considered a minor
structural change. Like in Korenjak's method [LR 1969],
.Au Korenjak LR 1969
for a partitioning
in $n$ sets $n+1$ parse tables
.Xr "parse table"
are constructed, one for each set and one
for the grammar that represents the connection between the sets. The parser
user is allowed interactively to move or copy the string produced by a given
non-terminal to a position where an incrementally compatible one is
required. This approach keeps the text (i.e. the program text) reasonably
correct most of the time and uses rather small tables.
%A George H. Roberts
%T Recursive ascent: an LR analog to recursive descent
%J ACM SIGPLAN Notices
%V 23
%N 8
%D Aug 1988
%P 23-29
%K done
.Xr "recursive ascent"
.La
Each LR state is represented by a subroutine.
The shift is implemented as a subroutine call, the reduction is followed by a
subroutine return possibly preceded by a return stack adjustment.
The latter prevents the generation of genuine subroutines since it requires
explicit return stack manipulation.
A small and more or less readable LR(0)
.Xr "LR(0)"
parser is shown, in
which conflicts are resolved by means of the order in which certain tests
are done, like in a recursive descent parser.
.Xr "recursive descent"
%A F.E.J. Kruseman\ Aretz
%T On a recursive ascent parser
%J Inform. Process. Lett.
%V 29
%N 4
%D Nov 1988
%P 201-206
%K done
.Xr "recursive ascent"
.La
Each state in an LR automaton is implemented as a subroutine. A shift calls
that subroutine. A reduce to $X$ is effected as follows. $X$ and its length
$n$ are stored in global variables; all subroutines are rigged to decrement
$n$ and return as long as $n > 0$, and to call the proper GOTO
.Xr "GOTO-action"
state of $X$
when $n$ hits $0$. This avoids the explicit stack manipulation of Roberts
[LR 1988].
.Au "Roberts, G.H." LR 1988
%A David Spector
%T Efficient full LR(1) parser generation
%J ACM SIGPLAN Notices
%V 23
%N 12
%D Dec 1988
%P 143-150
%K done
.La
A relatively simple method is given for extending an LR(0)
.Xr "LR(0)"
table to full
LR(1).
.Xr "LR(1)"
The method isolates the inadequate states,
.Xr "inadequate state"
constructs the full look-ahead sets for them and then splits them
(and possible predecessor states). The algorithm is described informally.
%A Manuel E. Bermudez
%A George Logothetis
%T Simple computation of LALR(1) look-ahead sets
%J Inform. Process. Lett.
%V 31
%N 5
%D 1989
%P 233-238
%K done
.La
The original LALR(1)
.Xr "LALR(1)"
grammar is replaced by a not much bigger grammar that has been made to
incorporate the necessary state splitting through a simple transformation.
.Xr "transformations on grammars"
The SLR(1)
.Xr "SLR(1)"
automaton of this grammar is the LALR(1)
automaton
.Xr "LALR(1) automaton"
of the original grammar.
%A George H. Roberts
%T Another note on recursive ascent
%J Inform. Process. Lett.
%V 32
%N 5
%D 1989
%P 263-266
%K done
.Xr "recursive ascent"
.La
The fast parsing methods of Pennello [LR 1986],
.Au Pennello LR 1986
Kruseman\ Aretz [LR 1988]
.Au "Kruseman\\\\ Aretz" LR 1988
and Roberts
.Au "Roberts, G.H." LR 1988
are compared. A special-purpose optimizing compiler can select the appropriate
technique for each state.
%A James Kanze
%T Handling ambiguous tokens in LR parsers
%J ACM SIGPLAN Notices
%V 24
%N 6
%D June 1989
%P 49-54
%K done
.La
.Xr "ambiguity"
It may not always be possible to infer from the appearance of an input
symbol the terminal symbol it corresponds to in the parser. In that case a
default assumption can be made and the error recovery
.Xr "error recovery"
mechanism of the
parser can be rigged to try alternatives. A disadvantage is that an LALR
parser may already have made reductions (or a strong-LL
.Xr "strong-LL($k$)"
.Xr "strong-LL(1)"
parser may have made
$epsilon$-moves)
.Xr "$epsilon$-move"
that have ruined the context.
An implementation in UNIX's @i[yacc]
.Xr "@i[yacc]"
is given.
%A Daniel J. Salomon
%A Gordon V. Cormack
%T Scannerless NSLR(1) parsing of programming languages
%J ACM SIGPLAN Notices
%V 24
%N 7
%D July 1989
%P 170-178
%K done
.La
The traditional CF syntax
.Xr "syntax"
is extended with two rule types:
$A "\*(!>" B$,
.Xr "\*(!>"
which means that any sentential form in which $A$ generates a terminal
production of $B$ (with $B$ regular) is illegal, and $A "\*(!-" B$,
.Xr "\*(!-"
which means that any sentential form in which terminal productions of $A$ and
$B$ are adjacent, is illegal.
The authors show that the addition of these two types of rules allow one to
incorporate the lexical phase of a compiler into the parser.
The system uses a non-canonical
.Xr "non-canonical parsing"
SLR(1)
.Xr "SLR(1)"
parser.
%A J. Heering
%A P. Klint
%A J. Rekers
%T Incremental generation of parsers
%J ACM SIGPLAN Notices
%V 24
%N 7
%D July 1989
%P 179-191
%K done
.Xr "incremental parser generation"
.La
In a very unconventional approach to parser generation, the initial
information for an LR(0)
.Xr "LR(0)"
parser consists of the grammar only. As parsing
progresses, more and more entries of the LR(0) table (actually a graph)
become required and are constructed on the fly. LR(0) inadequacies
.Xr "inadequate state"
are resolved using Tomita's method.
.Xr "Tomita parser"
All this greatly facilitates handling
(dynamic) changes to the grammar.
%A R. Nigel Horspool
%T ILALR: an incremental generator of LALR(1) parsers
%B Compiler Compilers and High-Speed Compilation
%S Lecture Notes in Computer Science #371
%E D. Hammer
%I Springer-Verlag
%C Berlin
%D 1989
%P 128-136
%K done
.Xr "incremental parser generation"
.La
Grammar rules are checked as they are typed in. To this end, LALR(1)
.Xr "LALR(1)"
parse tables
.Xr "parse table"
are kept and continually updated.
When the user interactively adds a new rule, the sets FIRST
.Xr "FIRST set"
and NULLABLE
.Xr "nullable"
are recalculated and algorithms are given to distribute the consequences of
possible changes over the LR(0)
.Xr "LR(0)"
and look-ahead sets.
Some serious problems are reported and practical solutions are given.
%A Daniel J. Salomon
%A Gordon V. Cormack
%T Corrections to the paper: Scannerless NSLR(1) parsing of programming languages
%J ACM SIGPLAN Notices
%V 24
%N 11
%D Nov 1989
%P 80-83
%K done
.La
More accurate time measurements and corrections to the algorithms are
supplied. See same authors [LR July 1989].
%A Stylianos D. Pezaris
%T Shift-reduce conflicts in LR parsers
%J ACM SIGPLAN Notices
%V 24
%N 11
%D Nov 1989
%P 94-95
%K done
.La
It is shown that if an LR(1)
.Xr "LR(1)"
parser either has no shift/reduce conflicts
.Xr "shift/reduce conflict"
or has shift/reduce conflicts that have been decided to be solved by shifting,
the same parsing behaviour can be obtained from the corresponding LR(0)
.Xr "LR(0)"
parser (which will have no reduce/reduce conflicts)
.Xr "reduce/reduce conflict"
in which all
shift/reduce conflicts
.Xr "shift/reduce conflict"
are resolved in favour of the shift. With this
resolution principle, for instance the programming language C
.Xr "C"
can be parsed with an LR(0) parser.
%A Gregor Snelting
%T How to build LR parsers which accept incomplete input
%J ACM SIGPLAN Notices
%V 25
%N 4
%D April 1990
%P 51-58
%K done
.La
When an LR parser finds a premature end-of-file, the incomplete parse tree
is completed using some heuristics on the top state of the stack. The
heuristics mainly favour reduction over shift and their application is
repeated until the parse tree is complete or further completion would
involve too much guessing. The technique is explained in the setting of a
language-based editor.
%A George H. Roberts
%T From recursive ascent to recursive descent: via compiler optimizations
%J ACM SIGPLAN Notices
%V 25
%N 4
%D April 1990
%P 83-89
%K done
.La
Shows a number of code transformations that will turn an LR(1)
.Xr "LR(1)"
recursive ascent
.Xr "recursive ascent"
parser (see Roberts [LR 1988] and [LR 1989])
.Au "Roberts, G.H." LR 1988
.Au "Roberts, G.H." LR 1989
for an LL(1)
.Xr "LL(1)"
grammar into a recursive descent parser.
.Xr "recursive descent"
.EQ
delim $$
.EN
.P1 "Left-corner parsing"
.Xr "left-corner parsing"
This section also covers a number of related techniques: production-chain,
.Xr "production chain"
LLP($k$),
.Xr "LLP($k$)"
PLR($k$),
.Xr "PLR($k$)"
etc.
%A D.J. Rosenkrantz
%A P.M. Lewis\ II
%T Deterministic left-corner parsing
%B IEEE Conference Record 11th Annual Symposium on Switching and Automata Theory
%V 11
%D 1970
%P 139-152
%K done
.La
An LC($k$)
.Xr "LC($k$)"
parser decides the applicability of a rule when it has seen the
initial non-terminal of the rule if it has one, plus a look-ahead of $k$
symbols. Identifying the initial non-terminal is done by bottom-up parsing,
the rest of the rule is recognized top-down.
A canonical LC pushdown machine
.Xr "pushdown automaton"
can be constructed in which the essential entries on the pushdown stack are
pairs of non-terminals, one telling what non-terminal has been recognized
bottom-up and the other what non-terminal is predicted top-down. As with LL,
there is a difference between LC and strong-LC. There is a simple algorithm to
convert an LC($k$)
.Xr "LC($k$)"
grammar into LL($k$)
.Xr "LL($k$)"
form; the resulting grammar may
be large, though.
%A Y. Eric Cho
%T Simple left-corner grammars
%B Proc. Seventh Princeton Conference on Information Sciences and Systems
%E
%C Princeton
%D 1973
%P 557
%K done
.La
LC parsing is simplified by requiring that each right-hand side be
recognizable (after LC reduction) by its first two symbols and by handling
left recursion as a special case. The required tables are extremely small.
%A David B. Lomet
%T Automatic generation of multiple exit parsing subroutines
%B Automata, languages and programming
%S Lecture Notes in Computer Science #14
%E J. Loeckx
%I Springer-Verlag
%C Berlin
%D 1974
%P 214-231
%K done
.La
A
.Df "production chain"
is a chain of production steps
.Xr "production step"
$X sub 0 -> X sub 1 alpha sub 1$,
$X sub 1 -> X sub 2 alpha sub 2$, $ ... $
$X sub n-1 -> "@t[t]" alpha sub n$, with
$X sub 0 , ... , X sub n-1$ non-terminals and @t[t] a terminal.
If the input is known to derive from $X sub 0$ and starts with @t[t], each
production chain
.Xr "production chain"
from $X sub 0$ to @t[t] is a possible explanation of how
@t[t] was produced.
The set of all production chains
.Xr "production chain"
connecting $X sub 0$ to @t[t] is called a
.Df "production expression" \&.
An efficient algorithm for the construction and compression
.Xr "table compression"
of production expressions
.Xr "production expression"
is given. Each production expression
.Xr "production expression"
is then implemented as a
subroutine which contains the production expression
.Xr "production expression"
as a FS automaton.
.Xr "finite-state automaton"
%A Michael Hammer
%T A new grammatical transformation into LL($k$) form
%B Proceedings Sixth Annual ACM Symposium on Theory of Computing
%D 1974
%P 266-275
%K done
.La
Each left corner in a left-corner parser
.Xr "left-corner parsing"
is described as a FS automaton
.Xr "finite-state automaton"
and implemented as a subroutine.
Parsing is then performed by recursive descent
.Xr "recursive descent"
using these subroutines.
The FS automata can be incorporated into the grammar to yield an LL($k$)
.Xr "LL($k$)"
grammar.
%A J. Kr\('al
%A J. Demner
%T Parsing as a subtask of compiling
%B Mathematical Foundations of Computer Science
%S Lecture Notes in Computer Science #32
%E J. Be\*(vcv\('a\*(vr
%I Springer-Verlag
%C Berlin
%D 1975
%P 61-74
%K done
.La
Various considerations that went into the design of a variant of
left-corner parsing,
.Xr "left-corner parsing"
called
.Df "semi-top-down" \&.
%A E. Soisalon-Soininen
%A E. Ukkonen
%T A characterization of LL($k$) grammars
%B Automata, Languages and Programming
%E S. Michaelson & R. Milner
%I Edinburgh University Press
%C Edinburgh
%D 1976
%P 20-30
%K done
.La
Introduces a subclass of the LR($k$)
.Xr "LR($k$)"
grammars called predictive LR($k$)
(PLR($k$)).
.Xr "PLR($k$)"
The deterministic LC($k$)
.Xr "LC($k$)"
grammars are strictly included in this class, and a grammatical transformation
.Xr "transformations on grammars"
is presented to transform a PLR($k$)
.Xr "PLR($k$)"
into an LL($k$)
.Xr "LL($k$)"
grammar. PLR($k$)
.Xr "PLR($k$)"
grammars can therefore be parsed with the LL($k$) parser of the transformed
grammar.
.Xr "transformations on grammars"
A consequence is that the classes of LL($k$), LC($k$),
.Xr "LC($k$)"
and PLR($k$)
.Xr "PLR($k$)"
languages are identical.
%A A. Nijholt
%T Simple chain grammars
%B Automata, Languages and Programming
%S Lecture Notes in Computer Science #52
%E A. Salomaa & M. Steinby
%I Springer-Verlag
%C Berlin
%D 1977
%P 352-364
%K done
.La
A non-terminal $X$ is said to be
.Df "chain-independent"
if all production chains
.Xr "production chain"
(see Lomet [LC 1974])
.Au Lomet LC 1974
of $X$ end in a different terminal symbol.
Two symbols $X$ and $Y$ are \(l"mutually chain-independent\(r" if different
chains,
.Xr "production chain"
one starting with $X$ and the other with $Y$, end with different
symbols.
A CF grammar is a
.Df "simple chain grammar"
if it satisfies the following conditions:
(1) all its symbols are chain-independent,
(2) if $A -> alpha X beta$ and $A -> alpha Y gamma$, then $X$ and $Y$ are
mutually chain-independent, and
(3) if $A -> alpha$ and $A -> alpha beta$ then $beta = epsilon$.
.Lp
This class of grammars contains the LL(1)
.Xr "LL(1)"
grammars without $epsilon$-rules, and is a subset of the LR(0)
.Xr "LR(0)"
grammars. A simple parser for these grammars is presented.
%A Jaroslav Kr\('al
%T Almost top-down analysis for generalized LR($k$) grammars
%B Methods of algorithmic language implementation
%S Lecture Notes in Computer Science #47
%E A.P. Ershov and C.H.A. Koster
%I Springer-Verlag
%C Berlin
%D 1977
%P 149-172
%K done
.Xr "LR($k$)"
.La
Very well-argued introduction to semi-top-down
.Xr "semi-top-down"
parsing; see Kr\('al [LC 1975].
.Au Kr\\\\('al LC 1975
%A Jan Pittl
%T Exponential optimization for the LLP($k$) parsing method
%B Mathematical Foundations of Computer Science
%S Lecture Notes in Computer Science #53
%E J. Gruska
%I Springer-Verlag
%C Berlin
%D 1977
%P 435-442
%K done
.Xr "LLP($k$)"
.La
The automata by Lomet [LC 1974]
.Au Lomet LC 1974
are reduced using the \(l"characteristic parsing\(r"
.Xr "characteristic parsing"
technique of Geller and Harrison [LR 1977].
.Au Geller LR 1977
.Au Harrison LR 1977
%A Alan J. Demers
%T Generalized left corner parsing
%B Fourth ACM Symposium on Principles of Programming Languages
%E
%I
%C
%D 1977
%P 170-182
%K done
.La
The right-hand side of each rule is required to contain a marker. The part
on the left of the marker is the left corner;
.Xr "left-corner parsing"
it is recognized by SLR(1)
.Xr "SLR(1)"
techniques, the rest by LL(1)
.Xr "LL(1)"
techniques. An algorithm is given to determine
the first admissible position in each right-hand side for the marker.
%A Eljas Soisalon-Soininen
%A Esko Ukkonen
%T A method for transforming grammars into LL($k$) form
%J Acta Inform.
%V 12
%N
%D 1979
%P 339-369
%K done
.La
A restricted class of LR($k$)
.Xr "LR($k$)"
grammars is defined, the predictive LR($k$)
.Xr "LR($k$)"
or PLR($k$)
.Xr "PLR($k$)"
grammars, which can be handled by left-corner techniques.
.Xr "left-corner parsing"
Like LC($k$) grammars, they can be transformed
.Xr "transformations on grammars"
into LL($k$)
.Xr "LL($k$)"
grammars.
%A Esko Ukkonen
%T A modification of the LR($k$) method for constructing compact bottom-up parsers
%B Automata, Languages and Programming
%S Lecture Notes in Computer Science #71
%E Hermann A. Maurer
%I Springer-Verlag
%C Berlin
%D 1979
%P 646-658
%K done
.La
An LR($k$)
.Xr "LR($k$)"
parser is extended to do left-corner parsing
.Xr "left-corner parsing"
simultaneously by compounding the states on the stack.
This can be done for
.Df "weak-PLR($k$) grammars" "" "weak-PLR($k$)"
only, which, however, include almost all LR($k$)
.Xr "LR($k$)"
grammars.
The resulting table is gigantic but highly structured, and can be condensed
considerably.
%A Daniel Chester
%T A parsing algorithm that extends phrases
%J Am. J. Comput. Linguist.
%V 6
%N 2
%D April-June 1980
%P 87-96
%K done
.La
See same paper [NatLang 1980].
%A Jan Pittl
%T On LLP($k$) grammars and languages
%J Theoret. Comput. Sci.
%V 16
%N
%D 1981
%P 149-175
%K done
.La
See Pittl [LC 1982].
.Au Pittl LC 1982
All LR($k$)
.Xr "LR($k$)"
languages have an LLP($k$)
.Xr "LLP($k$)"
grammar.
LLP($k$)
.Xr "LLP($k$)"
lies somewhere between LL($k$)
.Xr "LL($k$)"
and LR($k$).
%A Jan Pittl
%T On LLP($k$) parsers
%J J. Comput. Syst. Sci.
%V 24
%N
%D 1982
%P 36-68
%K done
.La
This paper first presents a non-deterministic parser using a mixed
top-down-bottom-up strategy, and then examines the circumstances under
which these parsers are deterministic,
.Xr "deterministic parser"
resulting in the class of LLP($k$)
.Xr "LLP($k$)"
grammars. The parser does not have the correct-prefix property,
.Xr "correct-prefix property"
as the LL($k$)
.Xr "LL($k$)"
and LR($k$)
.Xr "LR($k$)"
parsers have.
%A Yuji Matsumoto
%A Hozumi Tanaka
%A Hideki Hirakawa
%A Hideo Miyoshi
%A Hideki Yasukawa
%T BUP: a bottom-up parser embedded in Prolog
%J New Generation Computing
%V 1
%N
%D 1983
%P 145-158
%K done
.La
A bottom-up parser for natural language text embedded in Prolog
.Xr "Prolog"
is described,
in which each grammar rule corresponds to a Prolog clause.
.Xr "Prolog clause"
The parser, which
is fact left-corner,
.Xr "left-corner parsing"
can deal with any cycle-free grammar with no $epsilon$-rules.
The dictionary is handled separately.
Explicit rules are given how to convert a grammar into Prolog clauses.
.Xr "Prolog clause"
A facility for remembering previous successes and failures is included.
A tracing facility is also described.
%A Kenzo Inoue
%A Fukumi Fujiwara
%T On LLC($k$) parsing method of LR($k$) grammars
%J J. Inform. Process.
%V 6
%N 4
%D 1983
%P 206-217
%K done
.La
See same paper [LR 1983].
%A Susan Hirsh
%T P-PATR: a compiler for unification-based grammars
%B Natural Language Understanding and Logic Programming, II
%E V. Dahl & P. Saint-Dizier
%I Elsevier Science Publ.
%C Amsterdam
%D 1988
%P 63-78
%K done
.La
Left-corner parsing
.Xr "left-corner parsing"
in Prolog.
.Xr "Prolog"
How to handle $epsilon$-rules
.Xr "$epsilon$-rule"
that hide left recursion
.Xr "left-recursion"
(remove them by duplicating the rule).
.EQ
delim $$
.EN
.P1 "Precedence and bounded-context parsing"
.Xr "bounded-context"
%A Harold Wolpe
%T Algorithm for analyzing logical statements to produce a truth function table
%J Commun. ACM
%V 1
%N 3
%D March 1958
%P 4-13
%K done
.La
The paper describes an algorithm to convert a Boolean expression into a
decision table. The expression is first fully parenthesized
.Xr "fully parenthesized expression"
through a number of substitution rules that represent the priorities of the
operators. Parsing is then done by counting parentheses.
Further steps construct a decision table.
%A J.H. Wegstein
%T From formulas to computer-oriented language
%J Commun. ACM
%V 2
%N 3
%D March 1959
%P 6-8
%K done
.La
A program that converts from arithmetic expressions to three-address code is
given as a one-page flowchart. The parser is basically operator-precedence,
.Xr "operator-precedence"
with built-in precedences.
%A Robert W. Floyd
%T Syntactic analysis and operator precedence
%J J. ACM
%V 10
%N 3
%D July 1963
%P 316-333
%K done
.La
Operator-precedence
.Xr "operator-precedence"
explained and applied to an Algol 60 compiler.
.Xr "Algol 60"
%A J. Eickel
%A M. Paul
%A F.L. Bauer
%A K. Samelson
%T A syntax-controlled generator of formal language processors
%J Commun. ACM
%V 6
%N 8
%D Aug 1963
%P 451-455
%K done
.La
In this early paper, the authors develop and describe what is basically a
(2,1) bounded-context
.Xr "bounded-context"
parser. Reduction rules have to have the
form $ U ~ <- ~ V $ or $ R ~ <- ~ ST $. Such a grammar is called an
.Df "R-language" ;
it is \(l"unique\(r" if the parse tables
.Xr "parse table"
can be constructed without
conflict. The terminology in the paper differs considerably from today's.
%A Robert W. Floyd
%T Bounded context syntax analysis
%J Commun. ACM
%V 7
%N 2
%D Feb 1964
%P 62-67
%K done
.Xr "bounded-context"
.La
For each right-hand side $R$ in the grammar, enough context is constructed
(by hand) so that when $R$ is found in a sentential form in the right
context in a bottom-up parser, it can safely be assumed to be the handle.
.Xr "handle"
%A Niklaus Wirth
%A Helmut Weber
%T EULER \- A generalization of ALGOL and its formal definition, Part 1/2
%J Commun. ACM
%V 9
%N 1/2
%D Jan/Feb 1966
%P 13-25/89-99
%K done
.Xr "Algol 60"
.La
Detailed description of simple
.Xr "simple precedence"
and extended precedence.
.Xr "extended precedence"
A table generation algorithm is given.
Part 2 contains the complete precedence table plus functions
.Xr "precedence functions"
for the language EULER.
.Xr "EULER"
%A David F. Martin
%T Boolean matrix methods for the detection of simple precedence grammars
%J Commun. ACM
%V 11
%N 10
%D Oct 1968
%P 685-687
%K done
.La
Finding the simple-precedence relations
.Xr "precedence relation"
is explained as matrix operations on matrices derived trivially from the
grammar.
%A James R. Bell
%T A new method for determining linear precedence functions for precedence grammars
%J Commun. ACM
%V 12
%N 10
%D Oct 1969
%P 567-569
%K done
.La
.Xr "precedence functions"
The precedence relations
.Xr "precedence relation"
are used to set up a connectivity matrix. Take the transitive closure
.Xr "closure"
and count 1's in each row. Check for correctness of the result.
%A Alain Colmerauer
%T Total precedence relations
%J J. ACM
%V 17
%N 1
%D Jan 1970
%P 14-30
%K done
.La
.Xr "precedence relation"
The non-terminal resulting from a reduction is not put on the stack but
pushed back into the input stream; this leaves room for more reductions on
the stack. This causes precedence relations
.Xr "precedence relation"
that differ considerably from simple precedence.
.Xr "simple precedence"
%A A. Learner
%A A.L. Lim
%T A note on transforming grammars to Wirth-Weber precedence form
%J Computer J.
%V 13
%N
%D 1970
%P 142-144
%K done
.La
An algorithm is given to transform
.Xr "transformations on grammars"
any CF grammar to simple precedence
.Xr "simple precedence"
form
(with possible duplicate right-hand sides).
%A Jacques Loeckx
%T An algorithm for the construction of bounded-context parsers
%J Commun. ACM
%V 13
%N 5
%D May 1970
%P 297-307
%K done
.La
.Xr "bounded-context"
By systematically generating all BC states the parser may encounter.
%A J. Ichbiah
%A S. Morse
%T A technique for generating almost optimal Floyd-Evans productions of precedence grammars
%J Commun. ACM
%V 13
%N 8
%D Aug 1970
%P 501-508
%K done
.La
The notion of \(l"weak precedence\(r"
.Xr "weak precedence"
is defined in the introduction.
The body of the article is concerned with efficiently producing good
Floyd-Evans productions
.Xr "Floyd production"
from a given weak precedence grammar.
.Xr "weak precedence"
%A A.V. Aho
%A P.J. Denning
%A J.D. Ullman
%T Weak and mixed strategy precedence parsing
%J J. ACM
%V 19
%N 2
%D April 1972
%P 225-243
%K done
.Xr "weak precedence"
.Xr "mixed-strategy precedence"
.La
The theory behind and a comparison of various bottom-up (shift/reduce)
parsing algorithms.
%A Shoji Sekimoto
%T Extended right precedence grammars and analyzing technique for them
%J Inform. Process. Japan
%V 12
%N
%D 1972
%P 21-25
%K done
.La
In the presence of two rules $A -> alpha X beta$ and $B -> beta$, weak
precedence
.Xr "weak precedence"
requires that there be no precedence relation
.Xr "precedence relation"
between $X$ and
$B$. This requirement is replaced by a more lenient (but more complicated)
one, resulting in
.Df "right precedence"
and is further relaxed to
.Df "extended right precedence" "."
%A David F. Martin
%T A Boolean matrix method for the computation of linear precedence functions
%J Commun. ACM
%V 15
%N 6
%D June 1972
%P 448-454
%K done
.La
.Xr "precedence functions"
Detailed description of a variant of Bell's method [Precedence 1969].
.Au Bell Precedence 1969
%A A.V. Aho
%A J.D. Ullman
%T Linear precedence functions for weak precedence grammars
%J Intern. J. Comput. Math.
%V A3
%N
%D 1972
%P 149-155
%K done
.Xr "weak precedence"
.La
.Xr "precedence functions"
The entries in a precedence table have four values: \*(d<, \*(d=, \*(d> and
blank. Since precedence functions can only represent three relations: $<$,
$=$ and $>$,
.Xr "precedence relation"
the blank is sacrificed, to the detriment of error detection.
.Xr "error detection"
A weak precedence
.Xr "weak precedence"
table holds only three kinds of entries: \*(D<, \*(d> and
blank, which can be mapped onto $<$, $>$ and $=$. The resulting matrix will
normally not allow precedence functions,
.Xr "precedence functions"
but it will if a number of the $=$'s are sacrificed.
An algorithm is given to (heuristically) determine the minimal set of $=$'s
to sacrifice; unfortunately this is done by calling upon a heuristic
algorithm for partitioning graphs.
%A J. McAfee
%A L. Presser
%T An algorithm for the design of simple precedence grammars
%J J. ACM
%V 19
%N 3
%D July 1972
%P 385-395
%K done
.La
An algorithm to construct for any CF grammar a grammar with conflict-free
simple-precedence relations
.Xr "precedence relation"
that generates the same language (with possible duplicate right-hand sides,
though).
%A David Crowe
%T Generating parsers for affix grammars
%J Commun. ACM
%V 15
%N 8
%D Aug 1972
%P 728-734
%K done
.La
See same paper [VW 1972].
%A David S. Wise
%T Generalized overlap resolvable grammars and their parsers
%J J. Comput. Syst. Sci.
%V 6
%N
%D Dec 1972
%P 538-572
%K done
.La
A CF grammar is
.Df "Generalized Overlap-Resolvable"
(GOR)
.Xs "GOR" "Generalized Overlap-Resolvable"
if the handle
.Xr "handle"
in a bottom-up parser can be found deterministically by
identifying the right-hand side on the top of the stack, preceded on the
stack by a token from a set of admissible left-context
.Xr "left-context"
tokens and by requiring that the next input token belong to a set of
admissible right-context tokens.
A grammar is
.Df "Overlap-Resolvable"
(OR)
.Xs "OR" "Overlap-Resolvable"
if it is GOR and $epsilon$-free.
.Xr "$epsilon$-free"
These grammars are between mixed-strategy precedence
.Xr "mixed-strategy precedence"
and SLR(1)
.Xr "SLR(1)"
in power. A
very efficient and flexible implementation using D\(:om\(:olki's technique is
described.
%A Rainer Zimmer
%T Soft precedence
%J Inform. Process. Lett.
%V 1
%N
%D 1972
%P 108-110
%K done
.La
A grammar with a conflict-free precedence table in which not all right-hand
sides are different, causes reduce conflicts. For each reduce conflict a
simple pattern is constructed which resolves the conflict by checking the
parse stack. If for each reduce conflict such a pattern exists, the grammar is
.Df "soft precedence" \&.
A matrix algorithm to find the patterns if they exist is given.
%A A.V. Aho
%A J.D. Ullman
%T Error detection in precedence parsers
%J Math. Syst. Theory
%V 7
%N 2
%D 1973
%P 97-113
%K done
.La
The full precedence matrix is split into two copies, one used to decide
between shifts and reduces, which contains \*(D<, \*(d> and blank, and the
other to determine the left end of the handle
.Xr "handle"
which contains \*(d<, \*(d= and blank.
The techniques of Aho and Ullman [Precedence 1972]
.Au Aho Precedence 1972
.Au Ullman Precedence 1972
are now applied to both matrices.
%A James N. Gray
%A Michael A. Harrison
%T Canonical precedence schemes
%J J. ACM
%V 20
%N 2
%D April 1973
%P 214-234
%K done read
.La
The theory behind precedence parsing.
%A G. Terrine
%T Coordinate grammars and parsers
%J Computer J.
%V 16
%N
%D 1973
%P 232-244
%K done
.La
A bounded-context
.Xr "bounded-context"
parser is made to stack dotted items rather than terminals
and non-terminals. This makes it stronger than bounded-context
.Xr "bounded-context"
but still weaker than LR.
%A M.D. Mickunas
%A V.B. Schneider
%T A parser-generating system for constructing compressed compilers
%J Commun. ACM
%V 16
%N 11
%D Nov 1973
%P 669-676
%K done
.La
Describes a bounded-context
.Xr "bounded-context"
parser with transduction facilities.
Includes a compression algorithm
.Xr "table compression"
for BC tables.
%A Susan L. Graham
%T On bounded right context languages and grammars
%J SIAM J. Computing
%V 3
%N 3
%D Sept 1974
%P 224-254
%K done
.Xr "bounded-context"
.La
Theory of same.
%A J.H. Williams
%T Bounded-context parsable grammars
%J Inform. Control
%V 28
%N 4
%D Aug 1975
%P 314-334
%K done
.La
A more general non-canonical
.Xr "non-canonical parsing"
form of bounded-context, called
.Df "bounded-context parsable" ,
is defined which allows, among others, the parsing in linear time
.Xr "linear time dependency"
of some
non-deterministic languages.
.Xr "deterministic language"
Although a parser could be constructed, it
would not be practical.
%A M.R. Levy
%T Complete operator precedence
%J Inform. Process. Lett.
%V 4
%N 2
%D Nov 1975
%P 38-40
%K done
.La
Establishes conditions under which operator-precedence
.Xr "operator-precedence"
works properly.
%A D.S. Henderson
%A M.R. Levy
%T An extended operator precedence parsing algorithm
%J Computer J.
%V 19
%N 3
%D 1976
%P 229-233
%K done
.La
The relation
.Xr "precedence relation"
\*(d< is split into $ "\*(d<" sub 1$ and $"\*(d<" sub 2$.
$a "\*(d<" sub 1 b $ means that $a$ may occur next to $b$,
$a "\*(d<" sub 2 b $ means that a non-terminal has to occur between them.
Likewise for \*(d= and \*(d>. This is
.Df "extended operator-precedence" \&.
%A M.S. Krishnamurthy
%A H.R. Ramesha\ Chandra
%T A note on precedence functions
%J Inform. Process. Lett.
%V 4
%N 4
%D Jan 1976
%P 99-100
%K done
.La
.Xr "precedence functions"
Proves for some simple-precedence
.Xr "simple precedence"
tables that no grammars for them exist.
%A R.K. Shyamasundar
%T A note on linear precedence functions
%J Inform. Process. Lett.
%V 5
%N 3
%D 1976
%P 81
%K done
.La
.Xr "precedence functions"
Comments on Krishnamurthy and Ramesha\ Chandra [Precedence 1976].
.Au Krishnamurthy Precedence 1976
.Au "Ramesha\\\\ Chandra" Precedence 1976
%A M.H. Williams
%T Complete operator precedence conditions
%J Inform. Process. Lett.
%V 6
%N 2
%D April 1977
%P 60-62
%K done
.Xr "operator-precedence"
.La
Revision of the criteria of Levy [Precedence 1975].
.Au Levy Precedence 1975
%A Eberhard Bertsch
%T The storage requirement in precedence parsing
%J Commun. ACM
%V 20
%N 3
%D March 1977
%P 192-194
%K done
.La
Suppose for a given grammar there exists a precedence matrix but the
precedence functions
.Xr "precedence functions"
$f$ and $g$ do not exists. There always exist sets of
precedence functions
.Xr "precedence functions"
$ f sub i $ and $ g sub j $ such that for two symbols
$a$ and $b$, comparison of $ f sub { c(b) } ( a ) $ and
$ g sub { d(a) } ( b ) $ yields the precedence relation
.Xr "precedence relation"
between $a$ and $b$,
where $c$ and $d$ are selection functions which select the $ f sub i $ and
$ g sub j $ to be compared. An algorithm is given to construct such a system
of functions.
%A R.K. Shyamasundar
%T Precedence parsing using D\(:om\(:olki's algorithm
%J Intern. J. Comput. Math.
%V A6
%N
%D 1977
%P 105-114
%K done
.La
D\(:om\(:olki's algorithm can find a reducible right-hand-side efficiently
but cannot know if it is a handle.
.Xr "handle"
Precedence parsing can find the handle
easily but has trouble determining which right-hand side it is. Together
they are a perfect match.
%A I.H. Sudborough
%T A note on weak operator precedence grammars
%J Inform. Process. Lett.
%V 7
%N 5
%D 1978
%P 213-218
%K done
.Xr "weak precedence"
.La
Introduces
.Df "weak operator-precedence"
and states that $L(SP) = L(WP)$ and $L(SP) \(sp L(WOP) \(sp L(OP)$, where
$SP$ is simple precedence,
.Xr "simple precedence"
$WP$ is weak precedence,
.Xr "weak precedence"
$WOP$ is weak
operator-precedence and $OP$ is operator-precedence,
.Xr "operator-precedence"
and $L(X)$ is the set of
languages generatable by $X$ grammars.
%A R.K. Shyamasundar
%T Precedence-regular grammars
%J Intern. J. Comput. Math.
%V A7
%N
%D 1979
%P 173-186
%K done
.La
Characterization of the class of grammars for which the
Shyamasundar/D\(:om\(:olki technique (Shyamasundar [Precedence 1977])
.Au Shyamasundar Precedence 1977
works.
Note that whereas in LL-
.Xr "LL-regular"
and LR-regular
.Xr "LR-regular"
it is the rest of the input that is
analysed by a FS automaton
.Xr "finite-state automaton"
to resolve a conflict, in
precedence-regular it is the stack that is analysed by a D\(:om\(:olki-like
automaton.
%A Peter Ru\*(vzi\*(vcka
%T Validity test for Floyd's operator precedence parsing algorithms
%B Mathematical Foundations of Computer Science
%S Lecture Notes in Computer Science #74
%E J. Be\*(vcv\('a\*(vr
%I Springer-Verlag
%C Berlin
%D 1979
%P 415-424
%K done
.Xr "operator-precedence"
.La
Additions to the criteria by Levy [Precedence 1975].
.Au Levy Precedence 1975
%A M.H. Williams
%T Conditions for extended operator precedence parsing
%J Computer J.
%V 22
%N 2
%D 1979
%P 164-168
%K done
.La
Tighter analysis of extended operator-precedence
.Xr "extended operator-precedence"
than Henderson and Levy
[Precedence 1976].
.Au Henderson Precedence 1976
.Au Levy Precedence 1976
%A Amiram Yehudai
%T A new definition for simple precedence grammars
%J BIT
%V 19
%N
%D 1979
%P 282-284
%K done
.La
A weaker definition of simple precedence
.Xr "simple precedence"
is given, which is then shown to define the same class.
%A K.R. Moll
%T Left context precedence grammars
%J Acta Inform.
%V 14
%N
%D 1980
%P 317-335
%K done
.La
Elaborate and definitely non-trivial refinement of the notion of precedence,
to achieve the viable-prefix property.
%A Wilf R. LaLonde
%A Jim des Rivieres
%T Handling operator precedence in arithmetic expressions with tree transformations
%J ACM Trans. Prog. Lang. Syst.
%V 3
%N 1
%D Jan 1981
%P 83-103
%K done
.Xr "operator-precedence"
.La
Algorithms that will restructure the parse tree when the operator
precedences are modified. The algorithm is also used to do parsing: first
produce a parse tree in standard form and then add the precedence information.
%A David A. Workman
%T SR($s,k$) parsers: A class of shift-reduce bounded-context parsers
%J J. Comput. Syst. Sci.
%V 22
%N 1
%D 1981
%P 178-197
%K done
.La
.Xr "bounded-context"
The look-back over all combinations of $m$ symbols on the stack in
BC($m$,$n$) parsers is replaced by an LR($m$)-like automaton, resulting in
an SR($m,n$) parser, if possible. The paper is mainly concerned with
theoretical properties of SR grammars and parsers.
%A M.H. Williams
%T A systematic test for extended operator precedence
%J Inform. Process. Lett.
%V 13
%N 4-5
%D End 1981
%P 187-190
%K done
.Xr "extended operator-precedence"
.La
The criteria of Williams [Precedence 1979] in algorithmic form.
.Au "Williams, M.H." Precedence 1979
%A M.C. Er
%T A note on computing precedence functions
%J Computer J.
%V 25
%N 3
%D 1982
%P 397-398
%K done
.La
.Xr "precedence functions"
By determining longest paths in a digraph.
%A Junichi Aoe
%A Yoneo Yamamoto
%A Ryosaku Shimada
%T A practical method for reducing weak precedence parsers
%J IEEE Trans. Softw. Eng.
%V SE-9
%N 1
%D Jan 1983
%P 25-30
%K done
.La
When a weak-precedence
.Xr "weak precedence"
parser finds a \*(d> relation
.Xr "precedence relation"
and starts a reduce
sequence, the sequence stops when a \*(D< is met; all intermediate relations
.Xr "precedence relation"
are required to be \*(d>, to continue the sequence. The authors modify the
parser to continue the sequence anyway, until a \*(D< is found; the
intermediate relations
.Xr "precedence relation"
are never tested and their values are immaterial.
This is exploited to reduce the parse table.
.Xr "parse table"
%A Piotr Wyrostek
%T On the `correct prefix property' in precedence parsers
%J Inform. Process. Lett.
%V 17
%N 3
%D Oct 1983
%P 161-165
%K done
.La
Extremely complicated transformation
.Xr "transformations on grammars"
of precedence grammars to mixed-strategy
.Xr "mixed-strategy precedence"
grammars which have, for some parsers, the correct-prefix
property.
.Xr "correct-prefix property"
With an erratum in @i[Inform. Process. Lett.], vol. 19, no. 2, p.
111, Aug 1984.
%A Piotr Wyrostek
%T Precedence technique is not worse than SLR(1)
%J Acta Inform.
%V 23
%N
%D 1986
%P 361-392
%K done
.La
The thesis in the title is proved by giving an algorithm that transforms
.Xr "transformations on grammars"
an SLR(1)
.Xr "SLR(1)"
grammar into a (1,1)-mixed-strategy precedence
.Xr "mixed-strategy precedence"
grammar with the
viable-prefix property (see also Graham [Precedence 1974]).
.Au Graham Precedence 1974
The resulting precedence table is often smaller than the SLR(1)
.Xr "SLR(1)"
table.
%A R. Nigel Horspool
%A Michael R. Levy
%T Correctness of an extended operator-precedence parsing algorithm
%J Inform. Process. Lett.
%V 24
%N 4
%D March 1987
%P 265-273
%K done
.La
Establishes conditions under which extended operator-precedence
.Xr "extended operator-precedence"
(see Henderson and Levy [Precedence 1976])
.Au Henderson Precedence 1976
.Au Levy Precedence 1976
works properly.
.EQ
delim $$
.EN
.P1 "Finite-state automata"
.Xr "finite-state automaton"
%A M.O. Rabin
%A D. Scott
%T Finite automata and their decision problems
%J IBM J. Research and Development
%V 3
%N
%D April 1959
%P 114-125
%K done
.La
A finite-state automaton
.Xr "finite-state automaton"
is considered as the definition of the set of
strings it accepts. Many fundamental properties of FS automata
.Xr "finite-state automaton"
are exposed
and proved. The very useful subset construction
.Xr "subset construction"
algorithm can be found in
Definition 11.
%A Ken Thompson
%T Regular expression search algorithm
%J Commun. ACM
%V 11
%N 6
%D June 1968
%P 419-422
%K done
.La
The regular expression
.Xr "regular expression"
is turned into a transition diagram,
.Xr "transition diagram"
which is then
interpreted in parallel. Remarkably, each step generates (IBM 7094) machine
code to execute the next step.
%A Walter L. Johnson
%A James S. Porter
%A Stephanie I. Ackley
%A Douglas T. Ross
%T Automatic generation of efficient lexical processors using finite state techniques
%J Commun. ACM
%V 11
%N 12
%D Dec 1968
%P 805-813
%K done
.La
Semantic actions
.Xr "semantic action"
are attached to some rules of a FS grammar. A variant of
the subset construction
.Xr "subset construction"
is described that requires the unique determination
of the states in which a semantic action is required.
%A Franklin L. DeRemer
%T Lexical analysis
%B Compiler Construction: An Advanced Course
%S Lecture Notes in Computer Science #21
%E F.L. Bauer & J. Eickel
%I Springer-Verlag
%C Berlin
%D 1974
%P 109-120
%K done
.La
1. General introduction to lexical analysers, hand-written and generated.
2. Simplification of the LR parser generator algorithm
.Xr "parser generator"
for the case of non-self-embedding
.Xr "self-embedding"
CF grammars
(which is possible since the latter in fact generate a regular language).
.Xr "regular language"
%A Alfred V. Aho
%A Margaret J. Corasick
%T Efficient string matching: an aid to bibliographic search
%J Commun. ACM
%V 18
%N 6
%D June 1975
%P 333-340
%K done
.La
.Xr "Aho and Corasick bibliographic search algorithm"
A given string embedded in a longer text is found by a very efficient FS
automaton
.Xr "finite-state automaton"
derived from that string.
.Xr "fast string search"
%A M.E. Lesk
%A E. Schmidt
%T Lex \- a lexical analyzer generator
%B UNIX Manuals
%I Bell Laboratories
%C Murray Hill, New Jersey
%D 1975
%P 13
%K done
.La
The regular grammar is specified as a list of regular expressions,
.Xr "regular expression"
each
associated with a semantic action,
.Xr "semantic action"
which can access the segment of the input
that matches the expression. Substantial look-ahead is performed if necessary.
@i[lex]
.Xr "@i[lex]"
is a well-known and often-used lexical-analyser generator.
.Xr "parser generator"
%A D. Langendoen
%T Finite-state parsing of phrase-structure languages
%J Linguistic Inquiry
%V 6
%N 4
%D 1975
%P 533-554
%K done
.La
See same author [NatLang 1975].
%A Roman Krzemie\*('n
%A Andrzej \(/Lukasiewicz
%T Automatic generation of lexical analyzers in a compiler-compiler
%J Inform. Process. Lett.
%V 4
%N 6
%D March 1976
%P 165-168
%K done
.La
A grammar is
.Df "quasi-regular" "" "quasi-regular grammar"
if it does not feature nested recursion; consequently it generates a regular
language.
.Xr "regular language"
An algorithm is given that identifies all quasi-regular subgrammars in a CF
grammar, thus identifying the \(l"lexical part\(r" of the grammar.
%A Thomas J. Ostrand
%A Marvin C. Paull
%A Elaine J. Weyuker
%T Parsing regular grammars with finite lookahead
%J Acta Inform.
%V 16
%N
%D 1981
%P 125-138
%K done
.La
Every regular (Type 3) language
.Xr "regular language"
can be recognized by a finite-state
automaton
.Xr "finite-state automaton"
without look-ahead, but such a device is not sufficient to do
parsing. For parsing, look-ahead is needed; if a regular grammar needs a
look-ahead of $k$ tokens, it is called
.Df "FL($k$)" .
FS grammars are either FL($k$), FL($inf$) or ambiguous;
.Xr "ambiguity"
a decision algorithm is described, which also determines the value of $k$, if
appropriate.
.Lp
A simple parsing algorithm is a FS automaton
.Xr "finite-state automaton"
gouverned by a look-up table for
each state, mapping look-aheads to new states. A second algorithm avoids these
large tables by constructing the relevant look-ahead sets on the fly.
%A V.P. Heuring
%T The automatic generation of fast lexical analysers
%J Softw. Pract. Exper.
%V 16
%N 9
%D 1986
%P 801-808
%K done
.La
The lexical analyser is not based directly on a FS automaton
.Xr "finite-state automaton"
but has a
number of built-in analysers for, e.g., @i[identifier], @i[integer],
@i[string], which can be parametrized. The lexical analyser is about 6 times
faster than UNIX @i[lex].
.Xr "@i[lex]"
%A Douglas W. Jones
%T How (not) to code a finite-state machine
%J ACM SIGPLAN Notices
%V 23
%N 8
%D Aug 1988
%P 19-22
%K done
.La
Small, well-structured and efficient code can be generated for a FS machine
by deriving a single deterministic regular expression
.Xr "regular expression"
from the FS machine
and implementing this expression directly using @b[while] and @b[repeat]
constructions.
%A Duane Szafron
%A Randy Ng
%T LexAGen: an interactive incremental scanner generator
%J Softw. Pract. Exper.
%V 20
%N 5
%D May 1990
%P 459-483
%K done
.Xr "incremental parsing"
.La
Extensive description of an interactive generator
.Xr "parser generator"
for lexical analysers, in
Smalltalk-80.
.EQ
delim $$
.EN
.P1 "Natural language handling"
%A Hamish P. Dewar
%A Paul Bratley
%A James P. Thorne
%T A program for the syntactic analysis of English sentences
%J Commun. ACM
%V 12
%N 8
%D 1969
%P 476-479
%K done
.La
The authors argue that the English language can be described by a regular
grammar: most rules are regular already and the others describe
concatenations of regular sublanguages.
.Xr "regular language"
The finite-state parser used constructs the state
subsets on the fly, to avoid large tables. Features (attributes) are used to
check consistency and to weed out the state subsets.
%A W.A. Woods
%T Transition networks for natural languages
%J Commun. ACM
%V 13
%N 10
%D Oct 1970
%P 591-606
%K done
.La
A recursive-descent
.Xr "recursive descent"
parser guided by transition networks rather than by
grammar rules.
%A D. Langendoen
%T Finite-state parsing of phrase-structure languages
%J Linguistic Inquiry
%V 6
%N 4
%D 1975
%P 533-554
%K done
.La
A subset of the CF grammars that produces regular (FS) languages
.Xr "regular language"
is analysed
and an algorithm is given to produce a FS parser for any grammar belonging to
this subset. Much attention is paid to the linguistic applicability of such
grammars. We advice the reader of this paper to make a list of the
abbreviations used in it, to assist in reading.
%A William A. Woods
%T Cascaded ATN grammars
%J Am. J. Comput. Linguist.
%V 6
%N 1
%D Jan-March 1980
%P 1-12
%K done
.La
The grammar (of a natural language) is decomposed into a number of grammars,
which are then
.Df "cascaded" , "cascaded grammar"
that is, the parser for grammar $G sub n$ obtains
as input the linearized parse tree
.Xr "linearized parse tree"
produced by the parser for $G sub n-1$.
Each grammar can then represent a linguistic hypothesis. An efficient
implementation is given.
%A Daniel Chester
%T A parsing algorithm that extends phrases
%J Am. J. Comput. Linguist.
%V 6
%N 2
%D April-June 1980
%P 87-96
%K done
.La
A variant of a backtracking
.Xr "backtracking"
left-corner parser
.Xr "left-corner parsing"
is described that is particularly convenient for handling continuing phrases
like: \(l"the cat that caught the rat that stole the cheese\(r".
%A Harry Tennant
%T Natural language processing
%I Petrocelli Books, Inc.
%C Princeton, N.J.
%D 1981
%P 276
%K done
.La
Easy-going introduction to natural language processing; covers syntax,
.Xr "syntax"
semantics, knowledge representation and dialog with many amusing examples.
With glossary.
%A Philips J. Hayes
%A George V. Mouradian
%T Flexible parsing
%J Am. J. Comput. Linguist.
%V 7
%N 4
%D Oct-Dec 1981
%P 232-242
%K done
.La
A directional breadth-first bottom-up parser yields some sets of partial
parse trees for segments of the input text. Then several heuristics are used
to combine these into a \(l"top-level hypothesis\(r". The purpose is to be
able to parse fragmented or ungrammatical natural language input.
%A Ursula Klenk
%T Microcomputers in linguistic data processing: Context-free parsing
%J Microprocess. Microprogram.
%V 9
%N 5
%D May 1982
%P 281-284
%K done
.La
Shows the feasibility of the implementation of four general CF parsers on a
very small (48 kbytes) PC: breadth-first top-down, backtracking
.Xr "backtracking"
top-down, bottom-up and Earley's algorithm.
.Xr "Earley parser"
%A K. Sparck\ Jones
%A Y. Wilks
%T Automatic natural language parsing
%I Ellis Horwood Ltd.
%C Chicester
%D 1983
%P 208
%K done
.La
Eighteen short chapters on the application of parsing in NL processing, using
CF grammars, Augmented Transition Networks, transducers,
Generalized Phrase Structure Grammars
.Xr "Generalized Phrase Structure Grammar"
and otherwise. Many literature references.
%A Margaret King\ (Ed.)
%T Parsing Natural Language
%I Academic Press
%C London/New York
%D 1983
%P 308
%K done
.La
A compilation of twelve tutorials on aspects of parsing in a linguistic
setting. Very readable.
%A Stuart M. Shieber
%T Direct parsing of ID/LP grammars
%J Linguistics and Philosophy
%V 7
%N
%D 1984
%P 135-154
%K done
.La
In this very readable paper, the Earley parsing technique
.Xr "Earley parser"
is extended in a straightforward way to ID/LP grammars
.Xr "Immediate Dominance/Linear Precedence grammar"
(Gazdar et al. [NatLang 1985]).
.Au Gazdar NatLang 1985
Practical algorithms are given.
%A Gerald Gazdar
%A Ewan Klein
%A Geoffrey Pullum
%A Ivan Sag
%T Generalized phrase structure grammar
%I Basil Blackwell Publisher, Ltd.
%C Oxford, UK
%D 1985
%P 276
%K done
.La
The phrase structure
.Xr "phrase structure grammar"
of natural languages is more easily and compactly described using
.Df "Generalized Phrase Structure Grammars" "" "Generalized Phrase Structure Grammar"
(GPSGs)
.Xs "GPSG" "Generalized Phrase Structure Grammar"
or
.Df "Immediate Dominance/Linear Precedence grammars" "" "Immediate Dominance/Linear Precedence grammar"
(ID/LP grammars)
.Xs "ID/LP grammar" "Immediate Dominance/Linear Precedence grammar"
than using conventional CF grammars.
Theoretical foundations of these grammars are given and the results are used
extensively in linguistic syntactic theory.
GPSGs are not to be confused with general phrase structure
.Xr "phrase structure grammar"
grammars, aka Chomsky Type 0 grammars, which are called \(l"unrestricted\(r"
phrase structure grammars
.Xr "phrase structure grammar"
in this book.
.Lp
The difference between GPSGs, ID/LP grammars and CF grammars is explained
clearly. A GPSG is a CF grammar, the non-terminals of which are not
unstructured names but sets of
.Df features "" "feature (GPSG)"
with their values; such compound non-terminals are called
.Df categories \&. "category (GPSG)"
An example of a feature is @t[NOUN], which can have the values @t[+] or @t[-];
@t[] will be a constituent of the categories \(l"noun phrase\(r",
\(l"noun\(r", \(l"noun subject\(r", etc.
.Lp
ID/LP grammars differ from GPSGs in that the right-hand sides of production
rules consist of multisets of categories rather than of ordered sequences.
Thus, production rules (Immediate Dominance rules) define vertical order in
the production tree
.Xr "production tree"
only. Horizontal order in each node is restricted
through (but not necessarily completely defined by) Linear Precedence rules.
Each LP rule is considered to apply to every node; this is called the
.Df "Exhaustive Constant Partial Ordering property" \&.
%A Mary Dee Harris
%T Natural Language Processing
%I Reston Publ. Comp, Prentice Hall
%C Reston, Virg.
%D 1985
%P 368
%K done
.La
A good and slow-paced introduction to natural language processing, with a
clear algorithmic view.
Lexical analysis including look-up algorithms, phrase structure grammars
.Xr "phrase structure grammar"
(actually context-free) and semantic networks are explained
and much attention is paid to attaching semantics to the structures obtained.
%A Veronica Dahl
%A Patrick Saint-Dizier
%T Natural language understanding and logic programming
%I Elsevier Science Publ.
%C Amsterdam
%D 1985
%P 243
%K done
.La
Seventeen papers on the application of various grammar types to natural
languages.
%A Glenn Blank
%T A new kind of finite-state automaton: Register vector grammar
%B Ninth International Conference on Artificial Intelligence
%I UCLA
%D Aug 1985
%P 749-756
%K done
.La
In FS grammars, emphasis is on the states: for each state it is specified
which tokens it accepts and to which new state each token leads. In
.Df "Register-Vector grammars" "" "register-vector grammar"
(RV grammars)
.Xs "RV grammar" "register-vector grammar"
emphasis is on the tokens: for each token it is
specified which state it maps onto which new state(s). The mapping is done
through a special kind of function, as follows. The state is a (global)
vector (array) of registers (features, attributes). Each register can be
@i[on] or @i[off]. For each token there is a condition vector with elements
which can be @i[on], @i[off] or @i[mask] (= @i[ignore]); if the condition
matches the state, the token is allowed. For each token there is a result
vector with elements which can be @i[on], @i[off] or @i[mask] (= @i[copy]);
if the token is applied, the result-vector elements specify how to
construct the new state. $epsilon$-moves
.Xr "$epsilon$-move"
are incorporated by having tokens
(called @i[labels]) which have $epsilon$ for their representation.
Termination has to be programmed as a separate register.
.Lp
RV grammars are claimed to be compact and efficient for describing the FS
component of natural languages. Examples are given. Embedding is handled by
having a finite number of levels inside the state.
%A Barbara J. Grosz
%A Karen Sparck\ Jones
%A Bonnie Lynn Webber
%T Readings in natural language processing
%I Morgan Kaufmann Publishers, Inc.
%C Los Altos, Ca. 94022
%D 1986
%P 664
%K done
.La
Selected papers on NL processing, covering syntactic models, semantic
interpretation, discourse interpretation, language action and intention, NL
generation and actual systems.
%A Walter Goshawke
%A Ian D.K. Kelly
%A J. David Wigg
%T Computer translation of natural language
%I Sigma Press
%C Wilslow, UK
%D 1987
%P 275
%K done
.La
The book consists of three parts. 1) Overview of progress in Machine
Translation. 2) Description of the intermediate code SLUNT
.Xr "SLUNT"
(Spoken Languages Universal Numeric Translation), a stylized numeric
language-independent vehicle for semantics. 3) The International Communicator
System, a set of programs to manipulate SLUNT structures.
%A Leonard Bolc\ (Ed.)
%T Natural language parsing systems
%I Springer-Verlag
%C Berlin
%D 1987
%P 367
%K done
.La
A collection of recent papers on parsing in a natural language environment.
Among the subjects are Earley
.Xr "Earley parser"
and CYK parsers,
.Xr "CYK parser"
assigning probabilities to
ambiguous
.Xr "ambiguity"
parsings, error recovery
.Xr "error recovery"
and, of course, attaching semantics to parsings.
%A Jonathan H. Reed
%T An efficient context-free parsing algorithm based on register vector grammars
%B Third Annual IEEE Conference on Expert Systems in Government
%D 1987
%P 34-40
%K done
.La
The principles of RV grammars
.Xr "RV grammar"
(Blank [NatLang 1985])
.Au Blank NatLang 1985
are applied to CF
grammars by having a separate RV grammar for each syntactic category, each
allowing the names of syntactic categories as tokens. The Earley parsing
algorithm
.Xr "Earley parser"
is then adapted to handle these grammars. Measurements indicate
that the parser is 1 to 3 times faster on small grammars and 5 to 10 times
on large grammars.
%A V. Dahl
%A P. Saint-Dizier
%T Natural language understanding and logic programming, II
%I Elsevier Science Publ.
%C Amsterdam
%D 1988
%P 345
%K done
.La
Eighteen papers and two panel sessions on programs for natural language
understanding, mostly in Prolog.
.Xr "Prolog"
%A Glenn D. Blank
%T A finite and real-time processor for natural language
%J Commun. ACM
%V 32
%N 10
%D Oct 1989
%P 1174-1189
%K done
.Xr "real-time parser"
.La
Several aspects of the register-vector grammars
.Xr "register-vector grammar"
of Blank [NatLang 1985]
.Au Blank NatLang 1985
are treated and extended: notation, center-embedding (3 levels),
non-determinism through boundary-backtracking,
.Xr "backtracking"
efficient implementation.
.EQ
delim $$
.EN
.P1 "Error handling"
%A W.B. Smith
%T Error detection in formal languages
%J J. Comput. Syst. Sci.
%V 4
%N
%D Oct 1970
%P 385-405
%K done
.La
A formal paper that examines properties of recognizers that determine whether
the number of substitution errors that has occurred is bounded by some
function.
Different language classes and different levels of numbers of errors are
examined. It appears that there is little difference between languages
under a constant maximum number of errors and under a constant maximum
number of errors per block.
%A J.E. LaFrance
%T Optimization of error-recovery in syntax-directed parsing algorithms
%J ACM SIGPLAN Notices
%V 5
%N 12
%D Dec 1970
%P 2-17
%K done
.La
Floyd productions
.Xr "Floyd production"
are divided into groups, and each production in a group is tried in order.
If all productions of a group fail, error recovery
.Xr "error recovery"
takes place, depending on the type(s) of the rules in the group. Apart
from local corrections, in some cases all possible productions are
traced three symbols ahead. The result is compared with the next four input
symbols, using a set of twenty patterns, each pattern modeling a particular
kind of error.
If this fails, a FOLLOW-set recovery
.Xr "error recovery"
technique is applied.
The implications of implementing this error recovery
.Xr "error recovery"
technique in a
backtracking recursive descent
.Xr "recursive descent"
parser are discussed.
%A A.V. Aho
%A T.G. Peterson
%T A minimum-distance error-correcting parser for context-free languages
%J SIAM J. Computing
%V 1
%N 4
%D 1972
%P 305-312
%K done
.Xr "minimum distance error handling"
.La
A CF grammar is extended with error productions
.Xr "error production"
so that it will produce
$ SIGMA sup "*" $; this is effected by replacing each occurrence of a terminal
in a rule by a non-terminal that produces said terminal \(l"with 0 errors\(r"
and any amount of garbage, including $epsilon$, \(l"with 1 or more
errors\(r". The items
.Xr "Earley item"
in an Earley parser
.Xr "Earley parser"
are extended with a count,
indicating how many errors were needed to create the item. An item with error
count $k$ is added only if no similar item with a lower error count is
present already.
%A C.J. Burgess
%T Compile-time error diagnostics in syntax-directed compilers
%J Computer J.
%V 15
%N 4
%D 1972
%P 302-307
%K done
.La
This paper attempts to define error diagnostics formally by incorporating them
as error productions
.Xr "error production"
in the grammar, and examines the extent to which the
positioning of these productions and messages in the grammar can be done
automatically. For left-factored grammars
.Xr "left-factoring"
it appears to be easy.
%A E.G. James
%A D.P. Partridge
%T Adaptive correction of program statements
%J Commun. ACM
%V 16
%N 1
%D Jan 1973
%P 27-37
%K done
.La
Discusses an error correction
.Xr "error correction"
technique that uses artificial intelligence and approximate pattern matching
techniques, basing corrections on built-in statistics, which are adapted
continuously.
%A R.W. Conway
%A T.R. Wilcox
%T Design and implementation of a diagnostic compiler for PL/I
%J Commun. ACM
%V 16
%N 3
%D 1973
%P 169-179
%K done
.La
Describes a diagnostic PL/C
.Xr "PL/C"
compiler, using a systematic method for finding
places where repair is required, but the repair strategy for each of these
places is chosen by the implementor. The parser uses a separable transition
diagram
.Xr "transition diagram"
technique (see Conway [Misc 1963]).
.Au "Conway, M.E." Misc 1963
The error messages detail the error found and the repair chosen.
%A G. Lyon
%T Syntax-directed least-errors analysis for context-free languages: a practical approach
%J Commun. ACM
%V 17
%N 1
%D Jan 1974
%P 3-14
%K done
.La
Discusses a least-errors analyser,
.Xr "least-error correction"
based on Earley's parser
.Xr "Earley parser"
without look-ahead.
The Earley items
.Xr "Earley item"
are extended with an error count, and the parser is started
with items for the start of each rule, in each state set. Earley's scanner is
extended as follows: for all items with the dot
.Xr "dot"
in front of a terminal,
the item is added to the same state set with an incremented error count and
the dot after the terminal (this represents an insertion of the terminal);
if the terminal is not equal to the input symbol associated with the state
set, add the item to the next state set with an incremented error count and
the dot after the terminal (this represents a replacement); add the item as it
is to the next state set, with an incremented error count (this represents a
deletion). The completer does its work as in the Earley parser, but also
updates error counts.
Items with the lowest error counts are processed first, and
when a state set contains an item, the same item is only added if it has
a lower error count.
%A R.A. Wagner
%T Order-$n$ correction for regular languages
%J Commun. ACM
%V 17
%N 5
%D May 1974
%P 265-268
%K done
.Xr "regular language"
.La
Presents an O($n$) algorithm which, given a string and a finite-state
automaton,
.Xr "finite-state automaton"
can correct the string to an acceptable one with a minimum number
of edit operations.
%A C. Ghezzi
%T LL(1) grammars supporting an efficient error handling
%J Inform. Process. Lett.
%V 3
%N 6
%D July 1975
%P 174-176
%K done
.La
Faced with an erroneous token in an environment where empty productions can
occur, a strong-LL(1)
.Xr "strong-LL(1)"
parser will often do some $epsilon$-moves
.Xr "$epsilon$-move"
before reporting the error; this makes subsequent error recovery
.Xr "error recovery"
more difficult.
This undesirable behaviour can be avoided by splitting each rule into a
number of copies, one for each set of tokens it may be followed by.
An efficient algorithm for this transformation
.Xr "transformations on grammars"
on the grammar is supplied. The
resulting grammar is of type
.Df "CRLL(1)" .
%A Susan L. Graham
%A Steven P. Rhodes
%T Practical syntactic error recovery
%J Commun. ACM
%V 18
%N 11
%D Nov 1975
%P 639-650
%K done
.La
See Section 10.6.1 for a discussion of this error recovery
.Xr "error recovery"
method.
%A J.-P. L\('evy
%T Automatic correction of syntax errors in programming languages
%J Acta Inform.
%V 4
%N
%D 1975
%P 271-292
%K done
.La
.Xr "syntax error"
When a bottom-up parser encounters an error, part of the stack is pushed
back into the input stream (for instance, until a
.Df "beacon token"
is on the top of the stack). Starting from the new state now uncovered on
the stack, all possible parsings of the input allowing at most $n$ errors
are constructed, using breadth-first search
.Xr "breadth-first search"
and
Lyon's scheme [ErrHandl 1974],
.Au Lyon ErrHandl 1974
until all parsers are in the same state or all parsers need to assume
an $n+1$-st error. In the latter case the input is rejected, otherwise one
parse is chosen and parsing continues.
%A S. Feyock
%A P. Lazarus
%T Syntax-directed correction of syntax errors
%J Softw. Pract. Exper.
%V 6
%N 2
%D 1976
%P 207-219
%K done
.La
.Xr "syntax error"
When an error is detected,
.Xr "error detection"
the following error correction
.Xr "error correction"
strategy is applied:
.Ip 1.
A set of correction strings is generated (delete current symbol, insert
symbol, replace symbol, interchange with next symbol).
.Ip 2.
This set is filtered (correction syntactically and semantically acceptable?).
.Ip 3.
If there is more than one element left, use a heuristic to determine the
"best" one. If only one is left, this is the one. If none are left, back-up
one input symbol, and go back to step 1.
%A David Gries
%T Error recovery and correction
%B Compiler Construction, an Advanced Course, Second Edition
%E F.L. Bauer & J. Eickel
%I Springer-Verlag
%C New York
%D 1976
%P 627-638
%K done
.La
Mostly an annotated bibliography containing some 35 entries, not all on
error handling.
%A J. Ciesinger
%T Generating error recovery in a compiler generating system
%B GI-4 Fachtagung \(:uber Programmiersprachen
%S Lecture Notes in Computer Science #34
%E H.-J. Schneider & M. Nagl
%I Springer-Verlag
%C New York
%D 1976
%P 185-193
%K done
.La
Proposes an error recovery
.Xr "error recovery"
method using pairs of elements of the alphabet,
.Xr "alphabet"
called \(l"braces\(r", which are used to select part of the input that
contains the error and select a goal (non-terminal) to which this part must
be reduced.
Some conditions are derived which must be fulfilled by the braces, and it is
shown that the braces can be computed automatically, at parser generation
time.
%A K.S. Fu
%T Error-correcting parsing for syntactic pattern recognition
%B Data Structure, Computer Graphics and Pattern Recognition
%E A. Klinger et al.
%I Academic Press
%C New York
%D 1977
%P 449-492
%K done
.La
Discusses the least-errors analyser
.Xr "least-error correction"
of Aho and Peterson [ErrHandl 1972]
.Au Aho ErrHandl 1972
.Au Peterson ErrHandl 1972
in the context of stochastic grammars.
.Xr "stochastic grammar"
Least-errors then becomes maximum likelihood. Many examples are given.
%A S.Y. Lu
%A K.S. Fu
%T Stochastic error-correcting syntax analysis for recognition of noisy patterns
%J IEEE Trans. Comput.
%V 26
%N 12
%D 1977
%P 1268-1276
%K done
.La
This paper models deletion, insertion, and replacement errors into a
stochastic disformation model: each error has a probability associated
with it. Then, the model is incorporated into the stochastic context-free
grammar,
.Xr "stochastic grammar"
and an Earley parser
.Xr "Earley parser"
is modified to look for the most likely error correction.
.Xr "error correction"
This proves to be inefficient, so a sequential
classification algorithm (SCA) is used. This SCA uses a stopping rule
that tells when it has seen enough terminals to make a decision.
The authors are interested in pattern recognition rather than in parse trees.
%A George Poonen
%T Error recovery for LR($k$) parsers
%B Inf. Process. 77
%E Bruce Gilchrist
%I IFIP, North Holland Publ. Co.
%C Amsterdam
%D Aug 1977
%P 529-533
%K done
.Xr "LR($k$)"
.La
A special token,
.Xr "error token"
ERRORMARK, is added to the grammar, to represent any
incorrect stretch of input. When encountering an error in an LR(1)
.Xr "LR(1)"
parser, scan the stack for states having a shift on ERRORMARK, collect all
shift tokens of these states into an acceptable-set,
.Xr "acceptable-set"
skip the input until an acceptable token is found and unstack until the
corresponding accepting state
.Xr "accepting state"
is uncovered.
%A Jean E. Musinski
%T Lookahead recall error recovery for LALR parsers
%J ACM SIGPLAN Notices
%V 12
%N 10
%D Oct 1977
%P 48-60
%K done
.La
Shows how the error recovery
.Xr "error recovery"
of a specific LALR(1)
.Xr "LALR(1)"
parser can be improved by
what amounts to the restricted decomposition of symbols on the stack, to
increase the acceptable set.
%A E.-W. Dieterich
%T Parsing and syntactic error recovery for context-free grammars by means of coarse structures
%B Automata, Languages and Programming
%S Lecture Notes in Computer Science #52
%E A. Salomaa & M. Steinby
%I Springer-Verlag
%C Berlin
%D 1977
%P 492-503
%K done
.La
Proposes a two-level parsing process that separates the coarse structures from
the rest of the grammar.
These coarse structures consist of characteristic brackets, for instance
\fBbegin\fP and \fBend\fP.
Error recovery
.Xr "error recovery"
can then also be applied to these two levels.
%A S. Sippu
%A E. Soisalon-Soininen
%T On defining error recovery in context-free parsing
%B Automata, Languages and Programming
%S Lecture Notes in Computer Science #52
%E A. Salomaa & M. Steinby
%I Springer-Verlag
%C Berlin
%D 1977
%P 492-503
%K done
.La
Uses a grammatical transformation
.Xr "transformations on grammars"
that leads to an LR grammar that incorporate
certain replacement, deletion, or insertion errors.
%A Charles Wetherell
%T Why automatic error correctors fail
%J Comput. Lang. (Elmsford, NY)
%V 2
%N
%D 1977
%P 179-186
%K done
.La
Shows that there is no hope of building efficient automatic syntactic
error correctors
.Xr "error correction"
which can handle large classes of errors perfectly.
The author argues that parser writers should instead study the error
patterns and work for efficient correction of common errors. Language
designers must concentrate on ways to make languages less susceptible to
common errors.
%A D.A. Turner
%T Error diagnosis and recovery in one pass compilers
%J Inform. Process. Lett.
%V 6
%N
%D 1977
%P 113-115
%K done
.La
Proposes an extremely simple(minded) error recovery
.Xr "error recovery"
method for recursive descent
.Xr "recursive descent"
parsers: when an error occurs, the parser enters a recovering state.
While in this recovering state, error messages are inhibited. Apart from
that, the parser proceeds until it requires a definite symbol. Then, symbols
are skipped until this symbol is found or the end of the input is reached.
Because this method can result in a lot of skipping, some fine-tuning can
be applied.
%A Thomas J. Pennello
%A Frank DeRemer
%T A forward move algorithm for LR error recovery
%B Fifth Annual ACM Symposium on Principles of Programming Languages
%D Jan 1978
%P 241-254
%K done
.La
Refer to Graham and Rhodes [ErrHandl 1975].
.Au Graham ErrHandl 1975
.Au Rhodes ErrHandl 1975
Backward moves
.Xr "backward move"
are found to be
detrimental to error recovery.
.Xr "error recovery"
The extent of the forward move is determined
as follows. At the error, an LALR(1)
.Xr "LALR(1)"
parser is started in a state including
@i[all] possible items. The thus extended automaton is run until it wants to
reduce past the error detection point.
.Xr "error detection point"
The resulting right context is used in error correction.
.Xr "error correction"
An algorithm for the construction of a reasonably sized extended LALR(1)
.Xr "LALR(1)"
table is given.
%A Kuo-Chung Tai
%T Syntactic error correction in programming languages
%J IEEE Trans. Softw. Eng.
%V SE-4
%N 5
%D 1978
%P 414-425
%K done
.La
Presents a technique for syntactic error correction
.Xr "error correction"
called
.Df "pattern mapping" .
Patterns model the editing of the input string at the error detection point.
.Xr "error detection point"
These patterns are constructed by the parser developer.
The patterns are sorted by a criterion called the
@i[minimum distance correction with k correct look-ahead symbols],
.Xr "minimum distance error handling"
and whenever correction is required, the first matching pattern is used.
If no such pattern is found, error correction
.Xr "error correction"
fails and another error recovery
.Xr "error recovery"
method must be applied.
%A M. Dennis Mickunas
%A John A. Modry
%T Automatic error recovery for LR parsers
%J Commun. ACM
%V 21
%N 6
%D June 1978
%P 459-465
%K done
.La
When an error is encountered, a set of provisional parsings of the beginning
of the rest of the input (so-called @i[condensations])
.Xr "condensation phase"
are constructed:
for each state a parsing is attempted and those that survive according to
certain criteria are accepted. This yields a set of target states. Now the
stack is \(l"frayed\(r" by partly or completely undoing any reduces; this
yields a set of source states. Attempts are made to connect a source state
to a target state by inserting or deleting tokens. Careful rules are given.
%A J. Lewi
%A K. de\ Vlaminck
%A J. Huens
%A M. Huybrechts
%T The ELL(1) parser generator and the error-recovery mechanism
%J Acta Inform.
%V 10
%P 209-228
%D 1978
%K done
.La
Presents a detailed recursive descent
.Xr "recursive descent"
parser generation scheme for ELL(1)
.Xr "extended LL(1)"
grammars, and also presents an error recovery
.Xr "error recovery"
method based on so-called
.Df "synchronization triplets" "" "synchronization triplet"
($a$,$b$,$A$). $a$ is a terminal from FIRST($A$),
.Xr "FIRST set"
$b$ is a terminal from
LAST($A$).
The parser operates either in parsing mode or in error mode.
It starts in parsing mode, and proceeds until an error occurs. Then, in error
mode, symbols are skipped until either an end-marker $b$ is found where $a$
is the last encountered corresponding begin-marker, in which case parsing mode
resumes, or a begin-marker $a$ is found, in which case $A$ is invoked in
parsing mode. As soon as $A$ is accepted, error-mode is resumed.
The success of the method depends on careful selection of synchronization
triplets.
.Xr "synchronization triplet"
%A G. David Ripley
%T A simple recovery-only procedure for simple precedence parsers
%J Commun. ACM
%V 21
%N 11
%D Nov 1978
%P 928-930
%K done
.Xr "simple precedence"
.La
When an error (character-pair,
.Xr "character-pair error"
reduction
.Xr "reduction error"
or stackability)
.Xr "stackability error"
is encountered, the error is reported and the contents of the stack are
replaced by the one error symbol
.Xr "error symbol"
@t[??], which has the relation \*(d<
.Xr "precedence relation"
to all other symbols.
Then the parser is restarted.
Subsequent attempts to reduce across the error symbol
.Xr "error symbol"
just result in a reduction to the error symbol;
.Xr "error symbol"
no semantic routine is called.
%A Joachim Ciesinger
%T A bibliography of error-handling
%J ACM SIGPLAN Notices
%V 14
%N 1
%D Jan 1979
%P 16-26
%K done
.La
Around 90 literature references from 1963-1978.
%A C.N. Fischer
%A K.-C. Tai
%A D.R. Milton
%T Immediate error correction in strong LL(1) parsers
%J Inform. Process. Lett.
%V 8
%N 5
%D June 1979
%P 261-266
%K done
.Xr "immediate error detection property"
.La
A strong-LL(1)
.Xr "strong-LL(1)"
parser will sometimes perform some
incorrect parsing actions, connected with $epsilon$-matches, when confronted
with an erroneous input symbol, before signalling an error; this impedes
subsequent error correction.
.Xr "error correction"
A subset of the LL(1)
.Xr "LL(1)"
grammars is defined, the
.Df "nullable LL(1) grammars" , "nullable"
in which rules can only produce $epsilon$ directly, not indirectly. A special
routine, called before an $epsilon$-match is done, hunts down the stack to see
if the input symbol will be matched or predicted by something deeper on the
stack; if not, an error is signalled immediately. An algorithm to convert any
strong-LL(1)
.Xr "strong-LL(1)"
grammar into a non-nullable strong-LL(1) grammar is given. (See
also Mauney and Fischer [ErrHandl 1981]).
.Au Mauney ErrHandl 1981
.Au Fischer ErrHandl 1981
%A Susan L. Graham
%A Charles B. Haley
%A William N. Joy
%T Practical LR error recovery
%J ACM SIGPLAN Notices
%V 14
%N 8
%D Aug 1979
%P 168-175
%K done
.La
A considerable number of techniques is integrated. First-level error recovery
.Xr "error recovery"
does forward-move,
.Xr "forward move"
restricting the possibilities to one correction
only, using a cost function. The backward move
.Xr "backward move"
is controlled by error tokens
.Xr "error token"
in the grammar. The second level does panic mode
.Xr "panic mode"
error recovery
.Xr "error recovery"
using \(l"beacon tokens\(r";
.Xr "beacon token"
disaster is prevented by dividing the grammar into
sections (like \(l"declarations\(r" or \(l"statement\(r"), which the error
recovery
.Xr "error recovery"
will not leave.
%A Ajit B. Pai
%A Richard B. Kieburtz
%T Global context recovery: a new strategy for syntactic error recovery by table-driven parsers
%J ACM Trans. Prog. Lang. Syst.
%V 2
%N 1
%D Jan 1980
%P 18-41
%K done
.Xr "global error handling"
.La
.Xr "table-driven"
A
.Df "fiducial" "" "fiducial symbol"
symbol is a terminal symbol that has the property that if it occurs on the
top of the stack of an LL(1)
.Xr "LL(1)"
parser, it will to a large degree determine the
rest of the stack. Two more explicit definitions are given, the most
practical being: a terminal symbol that occurs only once in the grammar, in
a rule for a non-terminal that occurs only once in the grammar, etc. Now, if
an error occurs that cannot be repaired locally, the input is discarded
until a fiducial symbol
.Xr "fiducial symbol"
$z$ appears. Then the stack is popped until $z$, or
a non-terminal $N$ that produces $z$, appears. In the latter case $n$ is
\(l"developed\(r" until $z$ appears. Parsing can now continue. If the stack
gets empty in this process, the start symbol is pushed anew; it will produce
$z$.
.Lp
The paper starts with a very readable introduction to error recovery
.Xr "error recovery"
and a
good local error correction
.Xr "local error handling"
algorithm.
%A T. Krawczyk
%T Error correction by mutational grammars
%J Inform. Process. Lett.
%V 11
%N 1
%P 9-15
%D 1980
%K done
.La
Discusses an error correction
.Xr "error correction"
method that automatically extends a grammar by
adding certain mutations of grammar rules, so that input with separator and
parenthesis errors can be corrected, while retaining the LR($k$)
.Xr "LR($k$)"
grammar class.
The parser delivers the parsing in the form of a list of grammar rules used;
the mutated rules in this list are replaced by their originals.
%A Steven Pemberton
%T Comments on an error-recovery scheme by Hartmann
%J Softw. Pract. Exper.
%V 10
%N 3
%D 1980
%P 231-240
%K done
.La
Error recovery
.Xr "error recovery"
in a recursive descent
.Xr "recursive descent"
parser is done by passing to each
parsing routine a set of \(l"acceptable\(r" symbols. Upon encountering an
error, the parsing routine will insert any directly required terminals and
then skip input until an acceptable symbol is found. Rules are given and
refined on what should be in the acceptable set for certain constructs in
the grammar.
%A Johannes R\(:ohrich
%T Methods for the automatic construction of error correcting parsers
%J Acta Inform.
%V 13
%N 2
%D Feb 1980
%P 115-139
%K done
.La
See Section 10.7.3 for a discussion of this error recovery method.
.Xr "error recovery"
The paper also discusses implementation of this method in LL($k$)
.Xr "LL($k$)"
and
LR($k$)
.Xr "LR($k$)"
parsers, using so-called
@i[deterministic continuable stack automata].
%A Seppo Sippu
%A Eljas Soisalon-Soininen
%T A scheme for LR($k$) parsing with error recovery, Part I: LR($k$) parsing/Part II: Error recovery/Part III: Error correction
%J Intern. J. Comput. Math.
%V A8
%N
%D 1980
%P 27-42/107-119/189-206
%K done
.La
A thorough mathematical theory of non-deterministic and deterministic
LR($k$)-like
.Xr "LR($k$)"
parsers
.Xr "deterministic parser"
(which subsumes SLR($k$)
.Xr "SLR($k$)"
and LALR($k$))
.Xr "LALR($k$)"
is given. These parsers
are then extended with error productions
.Xr "error production"
such that all errors that are at
least $k$ tokens apart are corrected. It should be noted that the resulting
parsers are almost certainly non-deterministic.
%A C.N. Fischer
%A D.R. Milton
%A S.B. Quiring
%T Efficient LL(1) error correction and recovery using only insertions
%J Acta Inform.
%V 13
%N 2
%D 1980
%P 141-154
%K done
.La
See Section 10.7.4 for a discussion of this error recovery
.Xr "error recovery"
method.
%A Kuo-Chung Tai
%T Predictors of context-free grammars
%J SIAM J. Computing
%V 9
%N 3
%D Aug 1980
%P 653-664
%K done
.La
Author's abstract: "A
.Df predictor
of a context-free grammar $G$ is a substring of a sentence in $L(G)$ which
determines unambiguously the contents of the parse stack immediately before
(in top-down parsing) or after (in bottom-up parsing) symbols of the
predictor are processed. Two types of predictors are defined, one for
bottom-up parsers, one for top-down parsers. Algorithms for finding
predictors are given and the possible applications of predictors are
discussed." Predictors are a great help in error recovery.
.Xr "error recovery"
%A C.N. Fischer
%A J. Mauney
%T On the role of error productions in syntactic error correction
%J Comput. Lang. (Elmsford, NY)
%V 5
%N
%D 1980
%P 131-139
%K done
.La
Presents a number of examples in a Pascal
.Xr "Pascal"
parser
illustrating the use of error productions
.Xr "error production"
in
cases where an automatic error corrector
.Xr "error correction"
would not find the right continuation.
.Xr "continuation"
Error productions
.Xr "error production"
can be added to the grammar regardless of the error corrector.
%A Jon Mauney
%A Charles N. Fischer
%T An improvement to immediate error detection in strong LL(1) parsers
%J Inform. Process. Lett.
%V 12
%N 5
%D 1981
%P 211-212
%K done
.Xr "immediate error detection property"
.La
.Xr "strong-LL(1)"
The technique of Fischer, Tai and Milton [ErrHandl 1979]
.Au Fischer ErrHandl 1979
.Au Tai ErrHandl 1979
.Au Milton ErrHandl 1979
is extended to all LL(1)
.Xr "LL(1)"
grammars by having the special routine which is called before an
$epsilon$-match is done do conversion to non-nullable
.Xr "nullable grammar"
on the fly. Linear time
.Xr "linear time dependency"
dependency is preserved by setting a flag when the test succeeds, clearing it
when a symbol is matched and by not performing the test if the flag is set:
this way the test will be done at most once for each symbol.
%A Stuart O. Anderson
%A Roland C. Backhouse
%T Locally least-cost error recovery in Earley's algorithm
%J ACM Trans. Prog. Lang. Syst.
%V 3
%N 3
%D July 1981
%P 318-347
%K done
.Xr "Earley parser"
.Xr "locally least-cost error recovery"
.La
Parsing and error recovery
.Xr "error recovery"
are unified so that error-free parsing is
zero-cost error recovery.
.Xr "error recovery"
The information already present in the Earley items
.Xr "Earley item"
is utilized cleverly to determine possible continuations.
.Xr "continuation"
From these and from the input, the locally least-cost error recovery
.Xr "locally least-cost error recovery"
can be calculated,
albeit at considerable expense. Detailed algorithms are given.
%A Rodney W. Topor
%T A note on error recovery in recursive descent parsers
%J ACM SIGPLAN Notices
%V 17
%N 2
%D Feb 1982
%P 37-40
%K done
.La
Followset error recovery
.Xr "error recovery"
is implemented in a recursive-descent
.Xr "recursive descent"
parser by
having one parse-and-error-recovery routine which is passed the actual
routine for a rule, its FIRST set
.Xr "FIRST set"
and its FOLLOWS set. This reduces the size
of the parser considerably and prevents clerical errors in hand-written
parsers.
Also see subsequent letter by C.B. Boyd, vol. 17, no. 8, p. 101-103.
%A Michael G. Burke
%A Gerald A. Fisher
%T A practical method for syntactic error diagnosis and repair
%J ACM SIGPLAN Notices
%V 17
%N 6
%D June 1982
%P 67-78
%K done
.La
See Burke and Fisher [ErrHandl 1987].
.Au Burke ErrHandl 1987
.Au Fisher ErrHandl 1987
%A Jon Mauney
%A Charles N. Fischer
%T A forward-move algorithm for LL and LR parsers
%J ACM SIGPLAN Notices
%V 17
%N 6
%D June 1982
%P 79-87
%K done
.La
Upon finding an error, a Graham, Harrison and Ruzzo general CF parser
[CF 1980]
.Au Graham CF 1980
.Au Harrison CF 1980
.Au Ruzzo CF 1980
is started to do a forward move
.Xr "forward move"
analysis using cost functions. The
general CF parser is run over a restricted piece of the input, allowing
.Df "regional least-cost error correction" \&.
%A F. Jalili
%A J.H. Gallier
%T Building friendly parsers
%B 9th Annual ACM Symposium on Principles of Programming Languages
%I ACM
%C New York
%D 1982
%P 196-206
%K done
.La
An interactive LALR(1)
.Xr "LALR(1)"
parser is described that uses forward move
.Xr "forward move"
error recovery
.Xr "error recovery"
to better prompt the user with possible corrections. The
interactions of the interactive parsing and the forward move
.Xr "forward move"
algorithm are
described in fairly great detail.
%A S.O. Anderson
%A R.C. Backhouse
%T An alternative implementation of an insertion-only recovery technique
%J Acta Inform.
%V 18
%D 1982
%P 289-298
%K done
.La
Argues that the FMQ error corrector
.Xr "FMQ error correction"
of Fischer, Milton and Quiring
[ErrHandl 1980]
.Au Fischer ErrHandl 1980
.Au Milton ErrHandl 1980
.Au Quiring ErrHandl 1980
does not have to compute a complete insertion. It is sufficient to compute the
first symbol. If $w ~ = ~ w sub 1 w sub 2 ... w sub n$ is an optimal
insertion for the error $a$ following prefix $u$, then $w sub 2 ... w sub n$
is an optimal insertion for the error $a$ following prefix $u w sub 1$.
Also, immediate error detection
.Xr "immediate error detection property"
is not necessary. Instead, the error corrector
.Xr "error correction"
is called for every symbol, and returns an empty insertion if the symbol
is correct.
%A S.O. Anderson
%A R.C. Backhouse
%A E.H. Bugge
%A C.P. Stirling
%T An assessment of locally least-cost error recovery
%J Computer J.
%V 26
%N 1
%D 1983
%P 15-24
%K done
.La
Locally least-cost error recovery
.Xr "locally least-cost error recovery"
consists of a mechanism for editing the
next input symbol at least cost, where the cost of each edit operation is
determined by the parser developer.
The method is compared to Wirth's followset method
(see Stirling [ErrHandl 1985]) and compares favorably.
.Au Stirling ErrHandl 1985
%A Seppo Sippu
%A Eljas Soisalon-Soininen
%T A syntax-error-handling technique and its experimental analysis
%J ACM Trans. Prog. Lang. Syst.
%V 5
%N 4
%D Oct 1983
%P 656-679
%K done
.La
.Xr "syntax error"
Phrase level error recovery
.Xr "phrase level error handling"
replaces the top $m$ elements from the stack and
the next $n$ input tokens by a single non-terminal such that parsing can
continue. The authors explore various search sequences to determine the
values of $m$ and $n$. Local error recovery
.Xr "local error handling"
can be incorporated by introducing for each terminal @t[t] a new production
rule @t[Term_t \*(-> Empty t], and having a production rule
@t[Empty \*(-> $epsilon$].
This allows both the correction of a phrase $( n = 0, m = 0 )$ to @t[Term_t]
(i.e. insertion of @t[t]) and of a phrase $(n,m)$ to @t[Empty] (i.e. deletion
of $(n,m)$).
Experimental results are given.
%A K. Hammond
%A V.J. Rayward-Smith
%T A survey on syntactic error recovery and repair
%J Comput. Lang. (Elmsford, NY)
%V 9
%N 1
%D 1984
%P 51-68
%K done
.La
Divides the error recovery
.Xr "error recovery"
schemes into three classes:
.Ip 1.
local recovery
.Xr "error recovery"
schemes, such as \(l"panic mode\(r",
.Xr "panic mode"
the followset method, the FMQ
.Xr "FMQ error correction"
method (see Fischer, Milton and Quiring [ErrHandl 1980]),
.Au Fischer ErrHandl 1980
.Au Milton ErrHandl 1980
.Au Quiring ErrHandl 1980
LaFrance's pattern matching method (see LaFrance [ErrHandl 1970]),
.Au LaFrance ErrHandl 1970
and Backhouse's locally least-cost method
.Xr "locally least-cost error recovery"
(see Backhouse et al. [ErrHandl 1983]);
.Au Backhouse ErrHandl 1983
.Ip 2.
regional error recovery
.Xr "regional error handling"
schemes, such as forward/backward move
.Xr "forward move"
.Xr "backward move"
(see for instance Graham and Rodhes [ErrHandl 1975]); and
.Au Graham ErrHandl 1975
.Au Rhodes ErrHandl 1975
.Ip 3.
global error recovery
.Xr "global error handling"
.Xr "error recovery"
schemes, such as global minimum distance error recovery
.Xr "minimum distance error handling"
(see for instance Aho and Peterson [ErrHandl 1972]
.Au Aho ErrHandl 1972
.Au Peterson ErrHandl 1972
and Lyon [ErrHandl 1974]),
.Au Lyon ErrHandl 1974
and mutational grammars (see for instance Krawczyk [ErrHandl 1980]).
.Au Krawczyk ErrHandl 1980
.Lp
The paper summarizes the advantages and disadvantages of each method.
%A Michael Spenke
%A Heinz M\(:uhlenbein
%A Monika Mevenkamp
%A Friedemann Mattern
%A Christian Beilken
%T A language-independent error recovery method for LL(1) parsers
%J Softw. Pract. Exper.
%V 14
%N 11
%D Nov 1984
%P 1095-1107
%K done
.La
Presents an error recovery
.Xr "error recovery"
method using deletions and insertions. The choice
between different possible corrections is made by comparing the
cost of the insertion with the reliability of the symbol. A correction
is plausible if the reliability of the first non-skipped symbol
is larger than the insert-cost of the insertion. The correction is selected
among the plausible corrections, such that the fewest symbols are skipped.
Reliability and insert-cost of each symbol are tunable.
%A Colin P. Stirling
%T Follow set error recovery
%J Softw. Pract. Exper.
%V 15
%N 3
%D March 1985
%P 239-257
%K done
.La
Describes the followset technique for error recovery:
.Xr "error recovery"
at all times there
is a set of symbols that depends on the parse stack and that will not be
skipped, called the @i[followset].
When an error occurs, symbols are skipped until one is found that is a member
of this set. Then, symbols are inserted and/or the parser state is adapted
until this symbol is legal.
In fact there is a family of error recovery
.Xr "error recovery"
(correction)
.Xr "error correction"
methods that differ
in the way the followset is determined. The paper compares several of these
methods.
%A Pyda Srisuresh
%A Michael J. Eager
%T A portable syntactic error recovery scheme for LR(1) parsers
%B Proc. 1985 ACM Comput. Sc. Conf.
%E W.D. Dominick
%I ACM
%C New Orleans
%D March 1985
%P 390-399
%K done
.Xr "LR(1)"
.La
Presents a detailed account of the implementation of an error recovery
.Xr "error recovery"
scheme
that works at four levels, each one of a more global nature. The first and
the second level are local, attempting to recover from the error by editing
the symbol in front of the error detection point
.Xr "error detection point"
and the error symbol
.Xr "error symbol"
itself.
The third level uses error tokens,
.Xr "error token"
and the last level is panic mode.
.Xr "panic mode"
%A Helmut Richter
%T Noncorrecting syntax error recovery
%J ACM Trans. Prog. Lang. Syst.
%V 7
%N 3
%D July 1985
%P 478-489
%K done
.La
.Xr "syntax error"
See Section 10.8 for a discussion of this method.
The errors can be pinpointed better
by parsing backwards from the error detection point,
.Xr "error detection point"
using a reverse grammar until
again an error is found. The actual error must be in the indicated interval.
Bounded-context
.Xr "bounded-context"
grammars are conjectured to yield deterministic suffix-grammars.
.Xr "suffix-grammar"
%A Kwang-Moo Choe
%A Chun-Hyon Chang
%T Efficient computation of the locally least-cost insertion string for the LR error repair
%J Inform. Process. Lett.
%V 23
%N 6
%D 1986
%P 311-316
%K done
.La
Refer to Anderson, Backhouse, Bugge and Stirling [ErrHandl 1983]
.Au "Anderson, S.O." ErrHandl 1983
.Au Backhouse ErrHandl 1983
.Au Bugge ErrHandl 1983
.Au Stirling ErrHandl 1983
for locally least-cost error correction.
.Xr "locally least-cost error recovery"
The paper presents an efficient
implementation in LR parsers, using a formalism described by
Park, Choe and Chang [LR 1985].
.Au Park LR 1985
.Au Choe LR 1985
.Au Chang LR 1985
%A Tudor B\*(ual\*(uanescu
%A Serban Gavril\*(ua
%A Marian Gheorghe
%A Radu Nicolescu
%A Liviu Sofonea
%T On Hartman's error recovery scheme
%J ACM SIGPLAN Notices
%V 21
%N 12
%D Dec 1986
%P 80-86
%K done
.La
More and tighter acceptable-sets
.Xr "acceptable-set"
for more grammar constructions; see Pemberton [ErrHandl 1980].
.Au Pemberton ErrHandl 1980
%A Michael G. Burke
%A Gerald A. Fisher
%T A practical method for LL and LR syntactic error diagnosis and recovery
%J ACM Trans. Prog. Lang. Syst.
%V 9
%N 2
%D April 1987
%P 164-197
%K done
.La
Traditional error recovery
.Xr "error recovery"
assumes that all tokens up to the error symbol
.Xr "error symbol"
are correct.
The article investigates the option of allowing earlier tokens to be modified.
To this end, parsing is done with two parsers, one of which is a
number of tokens ahead of the other. The first parser does no actions and
keeps enough administration to be rolled back, and the second performs the
semantic actions;
.Xr "semantic action"
the first parser will modify the input stream or stack so
that the second parser will never see an error. This device is combined with
three error repair
.Xr "error repair"
strategies: single token recovery, scope recovery and
secondary recovery. In single token recovery, the parser is rolled back and
single tokens are deleted, inserted or replaced by tokens specified by the
parser writer. In scope recovery, closers as specified by the parser
writer are inserted before the error symbol.
.Xr "error symbol"
In secondary recovery, sequences
of tokens around the error symbol
.Xr "error symbol"
are discarded. In each case, a recovery is
accepted if it allows the parser to advance a specified number of tokens
beyond the error symbol.
.Xr "error symbol"
It is reported that this techniques corrects three
quarters of the normal errors in Pascal
.Xr "Pascal"
programs in the same way as a
knowledgeable human would. The effects of fine-tuning are discussed.
%A Jon Mauney
%A Charles N. Fischer
%T Determining the extent of lookahead in syntactic error repair
%J ACM Trans. Prog. Lang. Syst.
%V 10
%N 3
%D July 1988
%P 456-469
%K done
.La
A correction of an error can be validated by trying it and parsing on until
a symbol is found with the so-called Moderate Phrase Level Uniqueness.
.Xr "phrase level error handling"
Once such a symbol is found, all minimal corrections of the error are
equivalent in the sense that after this MPLU symbol, the acceptable suffixes
will be identical.
Measurements indicate that in Pascal
.Xr "Pascal"
the distance between two such symbols is fairly short, for the most part.
%A Gordon V. Cormack
%T An LR substring parser for noncorrecting syntax error recovery
%J ACM SIGPLAN Notices
%V 24
%N 7
%D July 1989
%P 161-169
%K done
.La
.Xr "syntax error"
Presents a method to produce an LR parser for the substrings of a language
described by a bounded-context(1,1)
.Xr "bounded-context"
grammar, thereby confirming
Richter's [ErrHandl 1985]
.Au Richter ErrHandl 1985
conjecture that this can be done for BC grammars.
The resulting parser is about twice as large as an ordinary LR parser.
.EQ
delim $$
.EN
.P1 "Transformations on grammars"
.Xr "transformations on grammars"
%A J.M. Foster
%T A syntax-improving program
%J Computer J.
%V 11
%N 1
%D May 1968
%P 31-34
%K done
.La
.Xr "syntax"
The parser generator
.Xr "parser generator"
SID (Syntax Improving Device) attempts to remove LL(1)
conflicts
.Xr "LL(1) conflict"
by eliminating left-recursion,
.Xr "eliminating left recursion"
and then left-factoring,
.Xr "left-factoring"
combined with inline substitution.
If this succeeds, SID generates a parser in machine language.
%A Kenichi Taniguchi
%A Tadao Kasami
%T Reduction of context-free grammars
%J Inform. Control
%V 17
%N
%D 1970
%P 92-108
%K done
.La
Considers algorithms to reduce or minimize the number of non-terminals in a
grammar.
%A M.D. Mickunas
%A R.L. Lancaster
%A V.B. Schneider
%T Transforming LR($k$) grammars to LR(1), SLR(1) and (1,1) bounded right-context grammars
%J J. ACM
%V 23
%N 3
%D July 1976
%P 511-533
%K done
.Xr "SLR(1)"
.Xr "LR(1)"
.Xr "bounded-context"
.La
The required look-ahead of $k$ tokens is reduced to $k-1$ by incorporating
the first token of the look-ahead into the non-terminal; this requires
considerable care. The process can be repeated until $k=1$ for all LR($k$)
.Xr "LR($k$)"
grammars and even until $k=0$ for some grammars.
%A D.J. Rosenkrantz
%A H.B. Hunt
%T Efficient algorithms for automatic construction and compactification of parsing grammars
%J ACM Trans. Prog. Lang. Syst.
%V 9
%N 4
%D Oct 1987
%P 543-566
%K done
.La
Many grammar types are defined by the absence of certain conflicts: LL(1),
.Xr "LL(1) conflict"
LR(1),
.Xr "LR(1)"
operator-precedence,
.Xr "operator-precedence"
etc.
A simple algorithm is given to modify a given grammar
.Xr "modifying a grammar"
to avoid such conflicts.
Modification is restricted to the merging of non-terminals and possibly the
merging of terminals; semantic ambiguity
.Xr "ambiguity"
thus introduced will have to be cleared up by later inspection.
Proofs of correctness and applicability of the algorithm are given. The
maximal merging of terminals while avoiding conflicts is also used to reduce
grammar size.
.EQ
delim $$
.EN
.P1 "General books on parsing"
%A Peter Zilany Ingerman
%T A Syntax-Oriented Translator
%I Academic Press
%C New York
%D 1966
%P 132
%K done
.La
Readable and realistic (for that time) advice for DIY compiler construction,
in archaic terminology. Uses a backtracking LC parser improved by FIRST sets.
.Xr "FIRST set"
%A William M. McKeeman
%A James J. Horning
%A David B. Wortman
%T A Compiler Generator
%I Prentice Hall
%C Englewood Cliffs, N.J.
%D 1970
%P 527
%K done
.La
Good explanation of precedence and mixed-strategy parsing.
.Xr "mixed-strategy precedence"
Full application
to the XPL compiler.
%A Alfred V. Aho
%A Jeffrey D. Ullman
%T The Theory of Parsing, Translation and Compiling
%T Volume I: Parsing
%I Prentice Hall
%C Englewood Cliffs, N.J.
%D 1972
%P 542
%K nieuw
.La
The book describes the parts of formal languages and automata theory relevant
to parsing in a strict mathematical fashion. Since much of the pertinent
theory of parsing had already been developed in 1972, the book is still
reasonably up to date and is a veritable trove of definitions, theorems,
lemmata and proofs.
.Lp
The required mathematical apparatus is first introduced, followed by a survey
of compiler construction and by properties of formal languages. The rest of
the book confines itself to CF and regular languages.
.Xr "regular language"
.Lp
General parsing methods are treated in full: backtracking
.Xr "backtracking"
top-down and bottom-up, CYK
.Xr "CYK parser"
and Earley.
.Xr "Earley parser"
Directional non-backtracking methods are explained in detail, including
general LL($k$),
.Xr "LL($k$)"
LC($k$)
.Xr "LC($k$)"
and LR($k$),
.Xr "LR($k$)"
precedence parsing and various other approaches.
A last chapter treats several non-grammatical methods for language
specification and parsing.
.Lp
Many practical matters concerning parser construction are treated in volume II,
where the theoretical aspects of practical parser construction are covered;
recursive descent is not mentioned, though.
%A Frederick W. Weingarten
%T Translation of Computer Languages
%I Holden-Day
%C San Francisco, Calif.
%D 1973
%P 180
%K done
.La
Describes some parsing techniques in an clear and easy style. The coverage of
subjects is rather eclectic. A full backtracking
.Xr "backtracking"
top-down parser for
$epsilon$-free
.Xr "$epsilon$-free"
non-left-recursive grammars and a full backtracking bottom-up
parser for $epsilon$-free
.Xr "$epsilon$-free"
grammars are described. The author does not
explicitly forbid $epsilon$-rules,
.Xr "$epsilon$-rule"
but his internal representation of grammar
rules cannot represent them. The Earley parser
.Xr "Earley parser"
is described well, with an
elaborate example. For linear-time parsers,
.Xr "linear time dependency"
bounded-context
.Xr "bounded-context"
and precedence
are treated; a table-construction algorithm is given for precedence
.Xr "precedence table"
but not
for bounded-context.
.Xr "bounded-context"
LR($k$)
.Xr "LR($k$)"
is vaguely mentioned, LL($k$)
.Xr "LL($k$)"
not at all.
Good additional reading. Contains many algorithms and flowcharts similar to
Cohen and Gotlieb [Misc 1970].
.Au "Cohen, D.J." Misc 1970
.Au Gotlieb Misc 1970
%A R.C. Gonzales
%A M.G. Thomason
%T Syntactic Pattern Recognition
%I Addison-Wesley
%C Reading, Mass.
%D 1978
%P 283
%K done
.La
This book provides numerous examples of syntactic descriptions of objects not
normally considered subject to a syntax.
.Xr "syntax"
Examples range from simple segmented
closed curves, trees and shapes of letters, via bubble chamber events,
electronic networks, and structural formulas of rubber molecules to snow
flakes, ECGs, and fingerprints. Special attention is paid to grammars for
non-linear objects, for instance web grammars, plex grammars and shape
grammars. A considerable amount of formal language theory is covered. All
serious parsing is done using the CYK algorithm;
.Xr "CYK parser"
Earley,
.Xr "Earley parser"
LL($k$)
.Xr "LL($k$)"
and
LR($k$)
.Xr "LR($k$)"
are not mentioned. Operator-precedence,
.Xr "operator-precedence"
simple precedence
.Xr "simple precedence"
and finite automata are occasionally used. The authors are wrong in claiming
that an all-empty row in the CYK recognition matrix signals an error in the
input.
.br
Interesting chapters about
.Df "stochastic grammars" , "stochastic grammar"
i.e. grammars with probabilities attached to the production rules, and about
.Df "grammatical inference" ,
i.e. methods to derive a reasonable grammar that will produce all
sentences in a representative set $ R sup "\(pl" $ and will not produce the
sentences in a counterexample set $ R sup "\(mi" $.
%A John E. Hopcroft
%A Jeffrey D. Ullman
%T Introduction to Automata Theory, Languages, and Computation
%I Addison-Wesley
%C Reading, Massachussetts
%D 1979
%P 418
%K done
.La
A must for readers interested in formal language theory
and computational (im)possibilities.
%A Roland C. Backhouse
%T Syntax of Programming Languages
%I Prentice Hall
%C London
%D 1979
%P 290
%K done
.La
.Xr "syntax"
Grammars are considered in depth, as far as they are relevant to programming
languages. FS automata
.Xr "finite-state automaton"
and the parsing techniques LL and LR are treated in
detail, and supported by lots of well-explained math. Often complete and
efficient algorithms are given in Pascal.
.Xr "Pascal"
Much attention is paid to error recovery
.Xr "error recovery"
and repair,
.Xr "error repair"
especially to least-cost repairs
.Xr "locally least-cost error recovery"
and locally optimal repairs. Definitely recommended for further reading.
%A A.J.T. Davie
%A R. Morisson
%T Recursive Descent Compiling
%I Ellis Horwood Ltd.
%C Chichester
%D 1981
%P 195
%K done
.La
Well-balanced description of the design considerations that go into a
recursive descent
.Xr "recursive descent"
compiler; uses the St. Andrews University S-algol compiler
as a running example.
%A V.J. Rayward-Smith
%T A First Course in Formal Languages
%I Blackwell Scientific
%C Oxford
%D 1983
%P 123
%K done
.La
Very useful intermediate between R\('ev\('esz [Books 1985]
.Au R\\\\('ev\\\\('esz Books 1985
and Hopcroft and Ullman [Books 1979].
.Au Hopcroft Books 1979
.Au Ullman Books 1979
Quite readable (the subject permitting); simple examples; broad coverage. No
treatment of LALR, no bibliography.
%A Gy\(:orgy E. R\('ev\('esz
%T Introduction to Formal Languages
%I McGraw-Hill
%C Singapore
%D 1985
%P 199
%K done
.La
This nifty little book contains many results and elementary proofs of formal
languages, without being \(l"difficult\(r". It gives a description of the ins
and outs of the Chomsky hierarchy,
.Xr "Chomsky hierarchy"
automata, decidability and complexity of context-free language recognition,
including the
.Df "hardest CF language" .
Parsing is discussed, with descriptions of the Earley,
.Xr "Earley parser"
LL($k$)
.Xr "LL($k$)"
and LR($k$)
.Xr "LR($k$)"
algorithms, each in a few pages.
%A William A. Barrett
%A Rodney M. Bates
%A David A. Gustafson
%A John D. Couch
%T Compiler Construction: Theory and Practice
%I Science Research Associates
%C Chicago
%D 1986
%P 504
%K done
.La
A considerable part (about 50%) of the book is concerned with parsing;
formal language theory, finite-state automata, top-down en bottom-up parsing
and error recovery are covered in very readable chapters.
Only those theorems are treated that relate directly to actual parsing;
proofs are quite understandable.
The book ends with an annotated bibliography of almost 200 entries, on
parsing and other aspects of compiler construction.
%A A.V. Aho
%A R. Sethi
%A J.D. Ullman
%T Compilers: Principles, Techniques and Tools
%I Addison-Wesley
%C Reading, Mass.
%D 1986
%P 796
%K done
.La
The \(l"Red Dragon Book\(r". Excellent, UNIX-oriented treatment of compiler
construction. Even treatment of the various aspects.
%A Anton Nijholt
%T Computers and Languages: Theory and Practice
%S Studies in Computer Science and Artificial Intelligence, 4
%I North-Holland
%C Amsterdam
%D 1988
%P 482
%K done
.La
Treats in narrative form computers, natural and computer languages, and
artificial intelligence, their essentials, history and interrelationships; for
the sophisticated layperson. The account is interspersed with highly critical
assessments of the influence of the military on computers and artificial
intelligence. Much global information, little technical detail; treats parsing
in breadth but not in depth.
.EQ
delim $$
.EN
.P1 "Some books on computer science"
%A David Harel
%T Algorithms: The Spirit of Computing
%I Addison-Wesley
%C Reading, Mass
%D 1987
%P 425
%K done
.La
Excellent introduction to the fundamentals of computer science for the
sophisticated reader.
%A Robert Sedgewick
%T Algorithms
%I Addison-Wesley
%C Reading, Mass.
%D 1988
%P 657
%K done
.La
Comprehensive, understandable treatment of many algorithms, beautifully done.
%A Jeffrey D. Smith
%T Design and Analysis of Algorithms
%I PWS-Kent Publ. Comp.
%C Boston
%D 1989
%P 447
%K done
.La
Good introductory book, treating list handling, searching, breadth-first
.Xr "breadth-first search"
and depth-first search,
.Xr "depth-first search"
dynamic programming, etc., etc.