There are arithmetic errors in the paper that created Student’s T-test.
William Sealy Gosset, writing as Student, famously published The Probable Error of a Mean in 1908 deriving the T-distribution and suggesting the T-test.
In revisiting the paper while preparing my lectures for this week I have noticed some discrepancies in the Illustration II section. Seeking out the data sources used by Student, I could verify the data used and that the discrepancy is isolated to two arithmetic errors on behalf of Student.
Illustration II uses data on growing soft and hard wheat in heavy and light soil and measuring straw and corn yields from the different factors over 3 years. Student gives us:
From the yield measurements by Voelcker, Student extracts differences and then computes mean difference, standard deviation of the difference, and his \(z\) measure (that seems to be \(\overline{x}/s\)).
Looking up (scanned copies of) the Journal of the Royal Agricultural Society, I could find the data tables where Student had sourced this data set spread of 3 volumes of the Journal:
We can verify the numbers as reported by Student in these tables from Voelcker, and could digitize the data set by copying all the numbers into a data frame or tibble:
year | soil | yieldtype | seedtype | yield |
---|---|---|---|---|
1899 | light | corn | soft | 7.85 |
1899 | heavy | corn | soft | 8.89 |
1900 | light | corn | soft | 14.81 |
1900 | heavy | corn | soft | 13.55 |
1901 | light | corn | soft | 7.48 |
1901 | heavy | corn | soft | 15.39 |
1899 | light | corn | hard | 7.27 |
1899 | heavy | corn | hard | 8.32 |
1900 | light | corn | hard | 13.81 |
1900 | heavy | corn | hard | 13.36 |
1901 | light | corn | hard | 7.97 |
1901 | heavy | corn | hard | 13.13 |
1899 | light | straw | soft | 12.81 |
1899 | heavy | straw | soft | 12.87 |
1900 | light | straw | soft | 22.22 |
1900 | heavy | straw | soft | 20.21 |
1901 | light | straw | soft | 13.97 |
1901 | heavy | straw | soft | 22.57 |
1899 | light | straw | hard | 10.71 |
1899 | heavy | straw | hard | 12.48 |
1900 | light | straw | hard | 21.64 |
1900 | heavy | straw | hard | 20.26 |
1901 | light | straw | hard | 11.71 |
1901 | heavy | straw | hard | 18.96 |
For your convenience, here is this table in a downloadable format, as well as the differences that Student focuses on, both in CSV-format.
The discrepancy lies in the computation of the differences. Consider the straw yields in light soil in 1900 and 1901. Student states for us:
Year | Soft | Hard | Increase |
---|---|---|---|
1900 | 22.22 | 21.64 | 0.78 |
1901 | 13.97 | 11.71 | 2.66 |
But these subtractions are not accurate! 22.22-21.64=0.58 and 13.97-11.71=2.26. These miscalculations then contribute to throwing off Student’s subsequent computations.
Student also computes standard deviations as \(\sqrt{\frac{1}{n}\sum(x_i-\overline{x})^2}\) and not \(\sqrt{\frac{1}{n-1}\sum(x_i-\overline{x})^2}\), throwing the values off by a factor \(\sqrt{\frac{n-1}{n}}=\sqrt{\frac{5}{6}}\).
Student reports:
Type | Soft Average | Hard Average | Increase Average | Increase Std Dev | \(z\) |
---|---|---|---|---|---|
Corn | 11.328 | 10.643 | 0.685 | 0.778 | 0.88 |
Straw | 17.442 | 15.927 | 1.515 | 1.261 | 1.20 |
A computation directly from the provided data instead yields, where the standard deviations and \(z\)-values following Student’s definitions are also included:
Type | Soft Average | Hard Average | Increase Average | Increase Std Dev | Increase Std Dev (Student) | z | z (Student) |
---|---|---|---|---|---|---|---|
corn | 11.32833 | 10.64333 | 0.685000 | 0.9197554 | 0.705000 | 0.7447632 | 0.9716312 |
straw | 17.44167 | 15.96000 | 1.481667 | 1.4048974 | 1.644833 | 1.0546440 | 0.9008005 |
I am unable to fully reconstruct Student’s stated values, not just of summary statistics for the Straw yield type (where the computation of the differences is already erroneous) but for standard deviation and \(z\)-value for either type. Whether or not I use the formulas given in Student (1908) I still get different values for the standard deviation.
These resulting \(z\)-values (computed by Student to be 0.88 and 1.20; reconstructed to either 0.745, 1.05 or 0.972, 0.901) are then used in lookup tables compiled by Student. These lookup tables correspond to computing \(CDF_{T(n-1)}(z\sqrt{n-1})\) - as can be seen by Student reporting the computed \(p=0.9465\) for \(z=0.88\) and \(p=0.9782\) for \(z=1.20\). A modern approach would instead compute \(CDF_{T(n-1)}(z\sqrt{n})\). Student then uses these to compute odds \(p/(1-p)\).
\(z\) | \(CDF_{T(n-1)}(z\sqrt{n-1})\) | Odds | \(CDF_{T(n-1)}(z\sqrt{n})\) | Odds |
---|---|---|---|---|
0.88 | 0.9469 | 17.8 | 0.9582 | 22.9 |
1.20 | 0.9782 | 44.8 | 0.9839 | 61 |
0.745 | 0.9217 | 11.8 | 0.9362 | 14.7 |
1.05 | 0.9671 | 29.4 | 0.975 | 39.1 |
0.972 | 0.9591 | 23.5 | 0.9685 | 30.7 |
0.901 | 0.95 | 19 | 0.9608 | 24.5 |
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/michiexile/rbind-io, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".