Mathematics, Physics, etc.
Literature references and annotations by Dick Grune, dick@dickgrune.com.
Last update: Thu Apr 27 15:47:33 2023.
These references and annotations were originally intended
for personal use and are presented here only in the hope
that they may be useful to others.
There is no claim to completeness or even correctness.
Each annotation represents my understanding of the text
at the moment I wrote the annotation.
No guarantees given; comments and content criticism welcome.
Colignatus, Thomas,
A Measure of Association (Correlation) in Nominal Data (Contingency Tables), Using Determinants,
2007,
pp. 27.
Exploratory paper on the possibility of obtaining a methodologically sound
correlation coefficient for contingency tables indexed by nominal values.
[DG: My last look at statistics was in 1966, hence this small refresher.]
A contingency table is a matrix M the columns of which are labeled with
attributes A_{1}..A_{k} from one category A and the rows with
attributes B_{1}..B_{k} from another category B.
Each element M_{i}_{,}_{j} of the matrix contains the number of samples
(the "frequency") that have been observed that have both attribute A_{i} and attribute
B_{j}. A standard example has
"gender" as category A, with attributes "male" and "female", and
"political affiliation" as category B, with attributes "Democrat" and
"Republican". The correlation coefficient should then express the degree of
correlation between gender and political affiliation.
(In the original design the correlation coefficient was supposed to express
in how far political affiliation was "contingent" upon gender, but this
interpretation implies causation, which the table cannot provide.)
Correlation coefficients are traditionally defined for sets of pairs of
numeric values (x_{i}, y_{i}); x could be soil acidity and y
crop yield per squre meter.
A correlation coefficient of +1 means that x_{i} and y_{i} grow and
shrink in lockstep, 0 means that they are independent, −1 means
that if one is larger, the other is smaller, etc. No values outside [−1..+1]
occur.
This does not apply directly to contingency tables: there is no set of pairs
of numerical values but rather a frequency table indexed by nominal values.
Still, intuitively both types of data seem to contain similar information: the
degree of interdependence of two sets of data, numeric for vectors (x_{i} and
y_{i}), and nominal for contingency tables (the labels on the rows and columns).
A major obstacle to grafting the techniques for vectors to the tables is that
the numeric values in the vectors can be manipulated algebraically, allowing
ordering, taking the average, etc., whereas all that is impossible for nominal
values.
An inroad can be made by considering the basic idea on which correlation
coefficient calculation for vectors is based: comparing the observed situation
with what would be expected if no correlation existed. This comparison leads to
the x_{i} − xbar forms in statistics (xbar is the expected value
of the stochastic variable x). For lack of better information the expected
value is estimated as the average of all x_{i}. Summing the
squares of these deviations from average yield, after some adjustments, the
χ² measure. Unlike the correlation coefficient above, a χ² value
is always >= 0; if 0, there is a complete correlation between the data, and
the higher the χ² value the more independent the data are.
Along the same lines the expected frequencies in a contingency table can be
computed from the averages of the rows and columns in which each frequency
resides, and a χ² measure can be constructed. Its normalized form
(scaled to a value between 0 and 1) is known as "Cramér's V". It is,however,
methodologically unsound because it uses the numerical average of sets of
frequencies, but these sets (rows and columns of the matrix) have no
probability distribution. It is this problem that this paper is trying to
solve/evade.
It is a small step to view a contingency table as an m×n
matrix. If the matrix is square (n×n), it has a determinant.
Each row in the matrix defines a point in an ndimensional space, plus
a vector from the origin of the ndimensional space to that point.
These n vectors span up a geometric solid in the shape of an
ndimensional rhomboid (parallelepiped).
This rhomboid lives in an ndimensional box the sides of which
have lengths equal to the sums of the n columns of the matrix, and its
volume is equal to the absolute value of the determinant of the matrix.
So far the vector algebra; now for the statistics.
If the rows in the matrix are closely related the vectors they define are all
pointing roughly in the same direction, and the volume of their rhomboid
is small, but when they are strongly independent, their vectors are more
perpendicular to each other, and the volume they span fills almost the entire
ndimensional box.
So the volume ratio of the box that is occupied by the
rhomboid spanned up by the vectors is indicative for the independence of the
data in the contingency matrix!
Although this Volume Ratio as a measure of association in contingency tables is
not based on any statistical model, it has several properties that suggest that
it is not a completely crazy choice:
• The Volume Ratio is automatically normalized to lie between 0 (full
association) and 1 (full independence).
Cramér's V requires a somewhat heuristic normalization.
• The Volume Ratio for (2×2) tables is identical to Cramér's V.
• If there is only one nonzero frequency in each row and column, each row
is specifically associated wit one column, and vice versa. Such a matrix is
orthogonal, so the Volume Ratio is 1 (as with Cramér's V): total
nonassociation, which means total independence.
• If all frequencies are equal each row is equally associated with each
column. The Volume Ratio method and Cramér's V both yield 0, confirming total
association.
• The Volume Ratio does not change when two rows or columns are swapped (as
with Cramér's V).
• The Volume Ratio is independent of scale: it does not change when all
table entries are multiplied by the same factor (as with Cramér's V).
But other phenomena are harder to explain:
• For a table filled with random numbers from the range [1..999], the
Volume Ratio approaches 0
(0.00018... for 6×6, 0.000000... for 12×12), implying total association,
whereas Cramér's V still shows considerable independence
(0.26... for 6×6, 0.16.. for 12×12). [DG: own observation.]
Problems arise when the contingency table is not square; in that case it does
not have a determinant, and the above method cannot be applied directly.
This is solved by considering square submatrices and their relationships, as
explained in the rest of the text of the paper. The text proper is followed by 8
appendices with notes, detailed examples, explanations, and considerations.
[DG: I applied both methods to the genderclass correlation tables from two
Papuan languages, Burmeso (nonTNG), described by M. Donohue, and Mian (TNG),
described by S. Fedden and G.G. Corbett.
Background: When natural languages classify nouns into groups they usually do
this on the basis of gender: French le soleil, la lune.
Other languages classify according to size, shape, etc.: Swahili mtoto
child, kitoto little baby.
Very few languages use both systems simultaneously, with each noun having both
a gender and a class.
Using gender to label the columns and class for the rows, the number of nouns
having gender i and class j can be put in element [i, j] of
the genderclass correlation table. The correlation coefficient would then
indicate how independent the notions of gender and class are in the language.
For the languages Burmeso and Mian the obtained vales are
Burmeso (6×6): Cramér's V = 0.488549; value_free = 0.600000; Volume Ratio = 0.000040
Mian (4×6): Cramér's V = 0.726651; value_free = 0.833333; Volume Ratio = 
Indeed the Mian table looks more orthogonal than the one for Burmeso, in
accordance with the above values. The measure used in those papers, however,
is the ratio of the number of empty entries to the maximum number of empty
entries a nontrivial table of the given dimensions can have. This is a
valuefree measure, because it does not depend on the actual values in the
table, only on their presence.
The values from the valuefree method are also given above and are seen to
agree reasonably well with Cramér's V, thereby validating the valuefree
method somewhat. Also, its computation does not require a computer.]
Mark Ronan,
Symmetry and the Monster,
Oxford University Press,
Oxford,
2006,
pp. 255.
This is a history book about the history of research into group theory and the
discovery of the "Monster", not a book about that Monster.
The math has been simplified beyond recognition, and even after reading up on
the subject in the Wikipedia and with a PhD in computer science, I could not
make head or tail of it.
The first problem is that the author does not make clear what he means by "a
symmetry".
We learn that the "zillions of symmetries" of the Rubik cube are
"generated by 90 degree turns", which in the lines above are compared to
"symmetry operators".
This suggests that the 24 turns (4 on each of the 6 sides) are the operators
and that the positions that can be achieved are the symmetries.
But operators in a (mathematical) group have the property that the combination
of two operators is again an operator in that group, so any configuration can
be achieved with a single (compound) operator.
So are all these operators "symmetries"? I find it confusing.
Symmetries are also explained as permutations, but the relationship remains
vague.
A second problem is that the level of explanation is very uneven: the root
sign is explained, but the jfunction is written out without any explanation.
We learn a lot about the people around the Monster but next to nothing about
the Monster itself, except that it is 196,884dimensional, but that's already
on the cover.
Does it have a geometric representation, like a cube? Or is it just a network
of symbols? (Does a network of symbols have symmetries?)
If it can be geometric,it must have sides.
Are all sides the same length like in a cube or a dodecahedron?
How big is it if the length of the shortest side is 1 unit?
Answers to such questions would have made the Monster much more accessible.
Perhaps the subject is too complicated to allow a popularized treatment, in
which case sticking to just the history is OK.
But it would have been nice to see an example or two of representatives of the
simpler symmetry groups.
Some examples are given, but they are not assigned to groups.
And it would have been nice to be told to what position in the periodic table
of symmetries Rubik's cube occupies, probably the most complicated symmetric
object any of us can relate to.
Marcus Du Sautoy,
The Music of the Primes: Why an Unsolved Problem in Mathematics Matters,
Harpercollins,
2003,
pp. 335.
Mostly about the people involved in attacks on the Riemann hypothesis, and
indeed supplying interesting biographies of them.
The application of primes in cryptography is emphasized, justifying the second
half of the title.
The math is exceptionally shallow; modulo arithmetic is called "clock
arithmetic".
Julian Havil,
Gamma  Exploring Euler's Constant,
Princeton Science Library,
Princeton,
2003,
pp. 266.
"Fun with Series" would probably be a better title, but within that realm the
book indeed focuses on γ, the Gamma function, the harmonic series, etc., in
14 chapters.
The book closes with two chapters on the distribution of primes and the
Riemann zeta function.
Two appendices, about Taylor expansions and Complex Function Theory, provide
handy refresher courses on the subjects.
Most chapters start in low gear but soon speed up; not all explanations are as
clear as I'd hoped.
The material is covered in quite reasonable depth, the most difficult results
sketched only.
M. Copi Irving,
Carl Cohen,
Introduction to Logic,
Prentice Hall,
Upper Saddle River, NJ,
1998,
pp. 714.
Thorough, interesting, readable, good.
Samuel D. Guttenplan,
The Language of Logic,
Basil Blackwell,
Oxford, UK,
1987,
pp. 336.
Pleasant introduction.
William H. Press,
Brian P. Flannery,
Saul A. Teukolsky,
William T. Vetterling,
Numerical Recipes  The Art of Scientific Computing,
Cambridge Univ. Press,
Cambrigde, England,
1986,
pp. 818.
A much more amusing and easygoing account than one would expect, given the
subject. Chapters on: linear algebraic equations, interpolation and
extrapolation, integration of functions, evaluation fo functions, special
functions (Gamma, Bessel, Jacobi, etc.), random numbers, sorting(!), root
finding and nonlinear sets of equations, minimization or maximization of
functions, eigensystems, Fourier transform spectral functions, statistical
description of data, modeling of data, integration of ordinary differential
equations, twopoint boundaryvalue problems, and partial differential
equations.
With programs and program diskettes in Fortran and Pascal.
H. M. Edwards,
Riemann's Zeta Function,
Dover,
Mineola, NY.,
1974,
pp. 315.
Of considerable depth.
The first chapter explains Riemann's famous 1859 paper
"On the Number of Primes Below a Given Size", and the subsequent 11 chapters
cover many famous papers and theorems based on Riemann's paper.
Requires serious study.
