Data Compression
Literature references and annotations by Dick Grune, dick@dickgrune.com.
Last update: Thu Dec 10 18:56:34 2009.
These references and annotations were originally intended
for personal use and are presented here only in the hope
that they may be useful to others.
There is no claim to completeness or even correctness.
Each annotation represents my understanding of the text
at the moment I wrote the annotation.
No guarantees given; comments and content criticism welcome.
D. Salomon,
"Data Compression",
Springer,
New York,
1998,
pp. 448.
Alistair Moffat,
Andrew Turpin,
"On the Implementation of MinimumRedundancy Prefix Codes",
in Data Compression Conference,
1996,
pp. 170179.
Describes in detail a modification to Huffman compression, which can be
implemented very efficiently.
Full of nifty and complicated algorithms.
R.N. Horspool,
G.V. Cormack,
"Constructing wordbased text compression algorithms",
in 2nd IEEE Data Compression Conference, Snowbird,
April 1992,
Gregory K. Wallace,
"The JPEG still picture compression standard",
Commun. ACM,
34,
#4,
April 1991,
pp. 3144.
JPEG stands for "Joint Photographic Experts Group".
The picture is divided into blocks of 8 µ 8 pixels.
Each such block is subjected to a Forward Discrete Cosine Transform, a
Fourier Transform derivative, which yields in principle 64 coefficients:
the DC component and 63 spatial harmonics, the AC components.
the blocks are treated in "reading order", from top left to bottom right.
Three forms of compression are done:
1. the DC component is replaced by its difference from the previous one;
this is lossless.
2. depending on the desired compression, the AC components are quantized
either by reduction to one of the values in a given table or by truncating
to their N most significant bits; this is lossy.
3. the resulting stream undergoes Huffman coding.
There is a completely different scheme for doing lossless JPEG compression.
It works by defining a predictor (e.g. the average of the pixel above and the
one to the left) and then recording the difference between this prediction and
the actual value, followed by the usual Huffman coding.
Didier Le Gall,
"MPEG: a video compression standard for multimedia applications",
Commun. ACM,
34,
#4,
April 1991,
pp. 4758.
In principle MPEG consists of a sequence of JPEG pictures; in practice
there is a lot of additional trickery, which has to do with interpicture
compression.
The pictures in and MPEG stream are of three types:
I pictures (I for intra), which are new, independent pictures, used to start
a new set of related pictures;
P pictures (P for predicted), which follow a given I picture and which are
specified as differences with the I picture; and
B pictures (B for Bidirectional), which come in between the I and P pictures
and which are specified as the weighted average of the surrounding I and P
pictures.
A possible sequence could be: I B B P B B P B B I ...
MPEG allows but does not require much more sophisticated motion detection
methods.
Ming Lou,
"Overview of the px64 kbit/s video coding standard",
Commun. ACM,
34,
#4,
April 1991,
pp. 6063.
Very brief description.
The official name of the coding is "CCITT Recommendation H.261 Video Format".
It consists of a hierarchical data structure, with each field consisting
of a header and a sequence of subfields, each of which has again a similar
format.
The lowest level contains the data, compressed in various ways, or may even
be omitted, if there was no change.
Timothy Bell,
Ian H. Witten,
John G. Cleary,
"Modeling for Text Compression",
ACM Computing Surveys,
21,
#4,
pp. 557591.
Dec. 1989,
G.V. Cormack,
"Data Compression",
Encyclopedia of Science and Technology Yearbook,
McGrawHill,
1988,
D.W. Jones,
"Application of splay trees to data compression",
Commun. ACM,
31,
#8,
Aug. 1988,
pp. 9961007.
G.V. Cormack,
R.N. Horspool,
"Data compression using dynamic Markov modelling",
Computer J.,
30,
#6,
Dec. 1987,
pp. 541550.
R.N. Horspool,
G.V. Cormack,
"Comments on "A locally adaptive data compression scheme"",
Commun. ACM,
30,
#9,
Sept. 1987,
pp. 792793.
I.H. Witten,
R.M. Neal,
J.G. Cleary,
"Arithmetic coding for data compression",
Commun. ACM,
30,
#6,
June 1987,
pp. 520540.
H. Tanaka,
"Data structure of Huffman codes and its application to efficient coding and decoding",
IEEE Trans. Inform. Theory,
33,
#1,
Jan. 1987,
pp. 154156.
J.L. Bentley,
D.D. Sleator,
R.E. Tarjan,
V.K. Wei,
"A locally adaptive data compression scheme",
Commun. ACM,
29,
#4,
April 1986,
pp. 320330.
R.N. Horspool,
G.V. Cormack,
"Comments on "Data compression using static Huffman codedecode tables"",
Commun. ACM,
29,
#2,
Feb. 1986,
pp. 150152.
Gordon V. Cormack,
"Data compression on a database system",
Commun. ACM,
28,
#12,
Dec. 1985,
pp. 13361342.
The exit and entry hooks for record retrieval are used to compress and
decompress the records.
Since the records are relatively small, little is known about their
contents, speed is essential, and space is at a premium, the following
drastic scheme was conceived.
Basically, each character is compressed using a Huffman table determined
by the previous character; the Huffman tables are determined offline by
sampling the data base.
Two problems had to be solved, bot relating to the size of the fixed
tables inside the routine.
The first was that the usual Huffman tables have some entries of
excessive length; these are used very infrequently but still take up
space inside the routine.
This was solved by a modification to the Huffman table generation
algorithm to limit the length of any encoding to 15 bits.
This degrades the tables a little, but not much.
The second was that 256 Huffman tables just occupy too much space,
period.
This was solved as follows.
For each pair of tables, the degradation caused by combining the tables
was computed, and the tables with the lowest degradation penalty were
combined, until the algorthm fitted.
The algorithm achieved a 65% size reduction on the records, and a 42%
reduction in database size; the difference is caused by nonrecord
material in the database.
The compression slowed down the computation by 17 %, but sped up I/O by
33%, resulting in a net gain.
R.N. Horspool,
G.V. Cormack,
"A practical method for data compression with computer applications",
in CIPS Session 84,
Calgary,
1984,
G.V. Cormack,
R.N. Horspool,
"Algorithms for adaptive Huffman codes",
Inf. Proc. Letters,
18,
#3,
1984,
pp. 159165.
John G. Cleary,
Ian H. Witten,
"Data Compression Using Adaptive Coding and Partial String Matching",
IEEE Transactions on Communications,
1984,
32,
pp. 396402.
Improves LempelZiv on two counts: records the frequency of the words in the
dictionary and adapts the code accordingly; and uses arithemtic encoding.
T.A. Welch,
"A technique for highperformance data compression",
IEEE Comput.,
17,
#6,
June 1984,
pp. 819.
A.M.M. AlHussaini,
"File compression using probabilistic grammars and LR parsing",
Ph.D. Thesis,
Loughborough,
1983,
pp. ???.
D. Severance,
"A practitioner's guide to data base compression",
Inf. Syst.,
8,
#1,
1983,
pp. 5162.
J.A. Storer,
T.G. Szymanski,
"Data compression via textual substitution",
J. ACM,
29,
#4,
Oct. 1982,
pp. 928951.
H. Reghbati,
"An overview of data compression tecniques",
Computer,
14,
#4,
May 1981,
pp. 7175.
M. Rodeh,
V.R. Pratt,
S. Even,
"Linear algorithm for data compression via string matching",
J. ACM,
28,
#1,
Jan. 1981,
pp. 1624.
J. Ziv,
A. Lempel,
"A universal algorithm for sequential data compression",
IEEE Trans. Inf. Theory,
23,
#3,
May 1977,
pp. 337343.
E.J. Schuegraf,
"A survey of data compression methods for nonnumeric records",
Can. J. Inf. Sci.,
2,
#1,
May 1977,
pp. 93105.
P. Elias,
"Universal codeword sets and representations of the integers",
IEEE Trans. Inform. Theory,
21,
#2,
March 1975,
pp. 194203.
S.E. Hutchins,
"Data compression on contextfree languages",
in IFIP '71,
NorthHolland Publ.,
1972,
pp. 104109.
D. Huffman,
"A method for the construction of minimum redundancy codes",
Proc. I.R.E.,
40,
#9,
Sept. 1952,
pp. 10981101.
