Hashing
Literature references and annotations by Dick Grune, dick@dickgrune.com.
Last update: Sat Sep 19 09:19:12 2009.
These references and annotations were originally intended
for personal use and are presented here only in the hope
that they may be useful to others.
There is no claim to completeness or even correctness.
Each annotation represents my understanding of the text
at the moment I wrote the annotation.
No guarantees given; comments and content criticism welcome.
Maurizio Panti,
Salvatore Valenti,
"A modulusoriented hash function for the construction of minimal perfect tables",
ACM SIGPLAN Notices,
27,
#11,
pp. 3338.
Nov. 1992,
ChinChen Chang,
TzongChen Wu,
"A letteroriented perfect hash scheme based upon sparse table compression",
Softw. Pract. Exper.,
21,
#1,
Jan. 1991,
pp. 3549.
F.J. Burkowski,
G.V. Cormack,
"Use of perfect hashing in a paged memory management unit",
in 1990 International Conference on Parallel Processing, Illinois,
1990,
pp. 96100.
Peter K. Pearson,
"Fast hashing of variablelength text strings",
Commun. ACM,
33,
#6,
June 1990,
pp. 677680.
Technical correspondence in CACM, Nov 1991.
Vijay Kumar,
"Concurrent operations on extendible hashing and its performance",
Commun. ACM,
33,
#6,
June 1990,
pp. 681694.
B.J. McKenzie,
R. Harries,
T. Bell,
"Selecting a hashing algorithm",
Software  Practice and Experience,
20,
#2,
Feb. 1990,
pp. 209224.
Statistical tests are applied to 8 hash functions extracted from 8 actual
compilers.
Five are linear congruence functions and these are the ones that perform best.
The very best is
H = H sub n bold mod 2 sup 30 bold mod 1008 , with
h sub i = 613 h sub { i  1 } + c sub i .
M.V. Ramakrishna,
P.Å. Larson,
"File organization using composite perfect hashing",
ACM Trans. Database Syst.,
14,
#2,
June 1989,
pp. 231263.
Authors' abstract:
Perfect hashing refers to hashing with no overflows.
We propose and analyze a composite perfect hashing
scheme for large external files. The scheme guarantees
retrieval of any record in a single disk access.
Insertions and deletions are simple, and the file size
may vary considerably without adversely affecting the
performance. A simple variant of the scheme supports
efficient range searches in addition to being a
completely dynamic file organization scheme. These
advantages are achieved at the cost of a small amount
of additional internal storage and increased cost of
insertions.
R.J. Enbody,
H.C. Du,
"Dynamic hashing schemes",
ACM Computing Surveys,
20,
#2,
June 1988,
pp. 85115.
PerÅke Larson,
"Dynamic hash tables",
Commun. ACM,
31,
#4,
April 1988,
pp. 446457.
R.N. Horspool,
G.V. Cormack,
"Hashing as a compaction technique for LR parser tables",
Software: Practice and Experience,
17,
#6,
June 1987,
pp. 413416.
Remarkably, shows that hashing is at most a bearable alternative for
graph colouring and other compression techniques, its most attractive
feature being its ease of implementation.
The possibility of applying perfect hashing is mentioned only in
passing.
E. Lodi,
F. Luccio,
"Split sequence hash search",
Inform. Proc. Letters,
20,
pp. 131136.
1985,
G.V. Cormack,
R.N. Horspool,
M. Kaiserswerth,
"Practical perfect hashing",
Computer J.,
28,
#1,
1985,
pp. 5458.
Describes open hashing in which the second level is implemented as hash
tables with a possibly different hash function for each secondlevel
hash table.
For each entry, the primary hash table records the position of the
secondary hash table plus the function used.
The secondary tables are kept free from hash duplicates; this ensures
retrieval in constant time, which is the authors' interpretation of
the term "perfect hashing".
If a hash duplicate occurs during insertion, the old (secondary) hash
function is abandoned and the contents of the entire secondary hash table
are rehashed to a larger array, using another hash function, one that
does not create duplicates.
This function is found by exhaustive search.
As an implementation detail, the secondary hash tables are kept in a
single array; this is important because the algorithm is designed to use
secondary storage for the secondary tables.
Two variants are discussed, both based on the heuristic that if the
primary hash table is large enough, the secondary tables are small; here
"large enough" means having roughly one entry for each element.
The first variant uses perfect hash functions for the secondary tables,
thus achieving optimal packing.
The functions are again found by exhaustive search, during insertion.
It is proved that insertion can be done in constant time, but some
insertions can be hundreds of times more expensive than others.
In the second variant, a secondary set of r elements is stored in a
hash table of size r sup 2 .
Under this regimen, the variation in the insertion time is no more than
a factor of 10.
Detailed statistics are given.
J.K. Mullin,
"Spiral storage: efficient dynamic hashing with constant performance",
Comput. J.,
28,
#3,
1985,
pp. 330334.
T.J. Sager,
"A polynomial time generator for minimal perfect hash functions",
Commun. ACM,
28,
#5,
May 1985,
pp. 523532.
E. Veklerov,
"Analysis of dynamic hashing with deferred splitting",
ACM Trans. Database Syst.,
10,
#1,
March 1985,
pp. 9096.
P.Å. Larson,
"Linear hashing with overflowhandling by linear probing",
ACM Trans. Database Syst.,
10,
#1,
March 1985,
pp. 7589.
W.C. Chen,
J.S. Vitter,
"Analysis of new variants of coalesced hashing",
ACM Trans. On Database Sys.,
9,
#4,
pp. ??????.
December 1984,
See also 10 (1), March 1985
P. Pritchard,
"The study of an ordered minimal perfect hashing scheme",
Letter to the Editor,
Commun. ACM,
27,
#11,
pp. ??????.
Nov. 1984,
K. Ramamohanarao,
R. SacksDavis,
"Recursive linear hashing",
ACM Trans. Database Syst.,
9,
#3,
Sept. 1984,
pp. 369391.
U. Bechtold,
K. Kuspert,
"On the use of extendible hashing without hashing",
Info. Proc. Lett.,
19,
#1,
pp. ??????.
July 1984,
C.C. Chang,
"The study of an ordered minimal perfect hashing scheme",
Commun. ACM,
27,
#4,
pp. ??????.
Apr. 1984,
W.C. Chen,
J.S. Vitter,
"Analysis of earlyinsertion standard coalesced hashing",
SIAM J. Computing,
12,
#4,
1983,
pp. 667676.
S. Nishihara,
K. Ikeda,
"Reducing the retrieval time of hashing method by using predictors",
Commun. ACM,
26,
#12,
pp. ??????.
Dec. 1983,
R.C. Bell,
B. Floyd,
"A Monte Carlo study of Cichelli hashfunction solvability",
Commun. ACM,
26,
#11,
pp. ??????.
Nov. 1983,
P.A. Larson,
"Analysis of uniform hashing",
J. ACM,
30,
#4,
pp. 805819.
Oct. 1983,
M.W. Du,
T.M. Hsieh,
K.F. Jea,
D.W. Shieh,
"The study of a new perfect hash scheme",
IEEE Trans. Software Eng.,
SE9,
#3,
pp. ??????.
May 1983,
J.S. Vitter,
"Analysis of the search performance of coalesced hashing",
J. ACM,
30,
#2,
April 1983,
pp. 231258.
P. Flajolet,
"On the performance evaluation of extendable hashing and trie searching",
Acta Inform.,
20,
1983,
pp. 345369.
D.B. Lomet,
"Bounded index exponential hashing",
ACM Trans. Database Syst.,
8,
#1,
March 1983,
pp. 136165.
K. Ramamohanarao,
J.W. Lloyd,
"Dynamic hashing schemes",
Comput. J.,
25,
#4,
1982,
pp. 478485.
J.S. Vitter,
"Implementations for coalesced hashing",
Commun. ACM,
25,
#12,
Dec. 1982,
pp. 911926.
P.Å. Larson,
"Performance analysis of linear hashing with partial extensions",
ACM Trans. Database Syst.,
7,
#4,
Dec. 1982,
pp. 556587.
C.R. Cook,
R.R. Oldehoeft,
"A letteroriented minimal perfect hash function",
ACM SIGPLAN Notices,
17,
#9,
pp. 1827.
Sept. 1982,
P.A. Larson,
"A singlefile version of linear hashing with partial expansions",
Proc. 8th Conf. Very Large Data bases,
pp. 300309.
ACM,
New York,
Sept. 1982,
J.S. Vitter,
"Deletion algorithms for hashing that preserve randomness",
J. Algorithms,
3,
Sept. 1982,
pp. 261275.
J.K. Mullin,
"Tightly controlled linear hashing without separate overflow storage",
BIT,
21,
pp. 390400.
1981,
G. Jaeschke,
"Reciprocal hashing: a method for generating minimal perfect hashing functions",
Commun. ACM,
24,
#12,
pp. 829833.
Dec. 1981,
J.S. Vitter,
"A sharedmemory implementation of coalesced hashing",
Inform. Process. Lett.,
13,
#2,
Nov. 1981,
pp. 7779.
M. Scholl,
"New file organizations based on dynamic hashing",
ACM Trans. Database Syst.,
6,
#1,
March 1981,
pp. 194211.
Witold Litwin,
"Linear hashing: A new tool for file and table addressing",
in Sixth International Conference on Very Large Data Bases,
ed. by Frederick H. Lochovsky and ?. Taylor,
IEEE,
1980,
pp. 212223.
P.A. Larson,
"Linear hashing with partial expansions",
in Proc. 6th Conf. Very Large Data bases,
pp. 224232.
ACM,
New York,
1980,
G. Jaeschke,
G. Osterburg,
"On Cichelli's minimal perfect hash function method",
Commun. ACM,
23,
#??,
pp. 728729.
1980,
D.U. Sarwate,
"A note on universal classes of hash functions",
Info. Proc. Lett.,
10,
#1,
pp. 4145.
Feb. 1980,
R.J. Cichelli,
"Minimal perfect hash functions made simple",
Commun. ACM,
23,
#1,
pp. 1719.
Jan. 1980,
C.H. Papadimitriou,
P.A. Bernstein,
"On the performance of balanced hashing functions when the keys are not equiprobable",
ACM Trans. Prog. Lang. and Sys.,
2,
#1,
pp. ??????.
Jan. 1980,
K. Raiha,
S.H. Zweben,
"An optimal insertion algorithm for onesided heightbalanced binary search trees",
Commun. ACM,
22,
#9,
pp. ??????.
Sept. 1979,
Ronald Fagin,
Jurg Nievergelt,
Nicholas Pippenger,
H. Raymond Strong,
"Extendible hashing  a fast access method for dynamic files",
ACM Trans. Database Syst.,
4,
#3,
Sept. 1979,
pp. 315344.
The hash table has 2 sup d entries pointing to leaves. Each leaf has a
local depth d prime , with d prime <= d . Entry k points to a leaf
which contains between min and max records; for each record in this leaf
we have
h mod 2 sup d prime = k mod 2 sup d prime ,
where h is the record's hash value.
This arrangement means that if for a leaf d prime < d ,
2 sup { d  d prime } entries in the hash table will point to that leaf.
When a record with hash value h is to be inserted, it is put in the leaf
pointed to by entry h mod 2 sup d prime .
If as a result this leaf is now too full there are two possibilities:
d prime < d and d prime = d .
If d prime < d the leaf is pointed to by more than one entry; the leaf
is split into two leaves.
If d prime = d the hash table is doubled by duplicating it; now
d prime < d and see above.
G.H. Gonnet,
J.I. Munro,
"Efficient ordering of hash tables",
SIAM J. Comp.,
8,
#3,
pp. ??????.
Aug. 1979,
J.L. Carter,
M.N. Wegman,
"Universal classes of hash functions",
J. Comput. Syst. Sci.,
18,
#2,
April 1979,
pp. 143154.
M.R. Anderson,
M.G. Anderson,
"Comments on perfect hashing functions  A single probe retrieving method for static sets",
Commun. ACM,
22,
#2,
pp. ??????.
Feb. 1979,
R.L. Rivest,
"Optimal arrangement of keys in a hash table",
Journal of the ACM,
25,
pp. 200209.
1978,
P.Å. Larson,
"Dynamic hashing",
BIT,
18,
#2,
1978,
pp. 184201.
L.J. Guibas,
"The analysis of hashing techniques that exhibit kary clustering",
Journal of the ACM,
25,
pp. 54455.
1978,
L.J. Guibas,
E. Szemeredi,
"The analysis of double hashing",
J. Comp. Sys. Sci.,
16,
pp. 22674.
1978,
C. Halatsis,
G. Philokyprou,
"Pseudochaining in hash tables",
Commun. ACM,
21,
#7,
pp. ??????.
July 1978,
Renzo Sprugnoli,
"Perfect hashing functions: a single probe retrieving method for static sets",
Commun. ACM,
20,
#11,
Nov. 1977,
pp. 841850.
[ Pages and pages of number theory ]
I.F. Blake,
A.G. Konhein,
"Big buckets are (are not) better",
J. ACM,
24,
#4,
Oct. 1977,
pp. 591606??.
A.L. Rosenberg,
L.J. Stockmayer(sp?),
"Hashing schemes for extendible arrays",
J. ACM,
24,
#2,
April 1977,
pp. 199221.
E. Goto,
T. Ida,
"Parallel hashing algorithms",
Inform. Process. Letters,
6,
#1,
pp. 813.
Feb. 1977,
D. Severance,
R. Duhne,
"A practitioner's guide to addressing algorithms",
Commun. ACM,
19,
#6,
June 1976,
pp. 314326.
O. Amble,
D.E. Knuth,
"Ordered hash tables",
Computer J.,
18,
pp. 13542.
1975,
S.P. Ghosh,
V.Y. Lum,
"Analysis of collisions when hashing by division",
Inf. Syst.,
1,
pp. 1522.
1975,
G.O. Kno??t,
"Hashing functions",
Computer Journal,
18,
pp. ??????.
Aug. 1975,
V. Batagelj,
"The quadratic hash method when the table size is not a prime number",
Commun. ACM,
18,
#4,
pp. ??????.
Apr. 1975,
W.D. Maurer,
T.G. Lewis,
"Hash table methods",
Computing Surveys,
7,
#1,
pp. 519.
1975,
A.E. Ackerman,
"Quadratic search for hash tables of size p sup n ",
Commun. ACM,
17,
#3,
pp. ??????.
Mar. 1974,
R. Loeser,
"Some performance tests of 'quicksort' and descendants",
Commun. ACM,
17,
#3,
pp. ??????.
Mar. 1974,
V.Y. Lum,
"General performance analysis of key to address transformation methods",
Commun. ACM,
16,
#10,
Oct. 1973,
pp. ??????.
R.P. Brent,
"Reducing the retrieval time of scatter storage techniques",
Commun. ACM,
16,
#2,
Feb. 1973,
pp. 105109.
B. Carter,
"The reallocation of hashcoded tables",
Commun. ACM,
16,
#1,
Jan. 1973,
pp. 1114.
F.R.A. Hopgood,
J. Davenport,
"The quadratic hash method where the table size is a power of 2",
Comptr. J.,
15,
#4,
pp. ??????.
1972,
M.C. Harrison,
"Implementation of the substring test by hashing",
Commun. ACM,
14,
#12,
Dec. 1971,
pp. 777779.
J.R. Bell,
C.H. Kaman,
"The linear quotient hash code",
Commun. ACM,
13,
#11,
pp. 675677.
Nov. 1970,
J.R. Bell,
"The quadratic quotient method  A hash code eliminating secondary clustering",
Commun. ACM,
13,
#2,
pp. ??????.
Feb. 1970,
C.E. Radke,
"The use of quadratic residue research",
Commun. ACM,
13,
#2,
pp. ??????.
Feb. 1970,
F.R.A. Hopgood,
"A solution to the table overflow problem for hash tables",
Comp. Bull. (???),
11,
#??,
1968,
pp. 297???.
R. Morris,
"Scatter storage techniques",
Commun. ACM,
11,
#1,
pp. 3844.
1968,
W.D. Maurer,
"An improved hash code for scatter storage",
Commun. ACM,
11,
#1,
pp. 3537.
1968,
G. Schay,
W.G. Spruth,
"Analysis of a file addressing method",
Commun. ACM,
5,
#8,
Aug. 1962,
pp. 459452.
The authors consider open hashing of records in a sequential file on
disk and the point is to reduce (sequential) search time (this is 1962).
(They agree that chained hashing is preferable for random access
memory.)
Suppose a new record R hashes (the authors don't use the word "hash")
to address A.
Rather than storing R in the first empty bin after A (allowing
buffer circularity), which can be far away, they store it in the first
bin that holds a record S that hashes to a higher address than A (or
of course in an empty bin if they meet that first).
They store the evicted record S in the next bin, etcetera, until an
empty bin is found and the eviction process ends.
This constructs a sorted list of records with hash values >= A
starting from each address A in the file.
This makes sure that the records with hash value A are never far away
from address A.
An exemplary analysis of the algorithm is given, resulting in a function
describing the expected displacement of records as a function of the
load factor.
L.R. Johnson,
"An indirect method for addressing on secondary keys",
Commun. ACM,
4,
#??,
??? 1961,
pp. 218222.
[First publication on chained hashing?]
W.W. Peterson,
"Addressing for randomaccess storage",
IBM Jour. Res. Dev.,
1,
#2,
pp. 249253 (130146???, or both?).
1957,
