History

A chronological narrative.

The historical episodes from across this book, assembled in chronological order. Each entry links back to the lesson where it appears in full context.

23 history entries from this book, in chronological order.

Before 1700

1614 Napier's logarithms, Briggs's base ten, Euler's e 0 Logarithms and exponentials

John Napier published the first table of logarithms in 1614 (Napier 1614), with the explicit aim of replacing the multiplication of large numbers — a daily burden in astronomy and navigation — by addition. His logarithms were defined kinematically, by comparing a point moving at constant speed with one moving at speed proportional to its distance remaining, and were not quite our natural logarithm, but the central property — products become sums — was there from the start. Henry Briggs visited Napier and together they recast the tables to base 10 (Briggs 1624), the form that dominated calculation for three centuries through printed tables and the slide rule. The constant $e$ and the natural logarithm came later, from the calculus: Euler named $e$ and established its central role in the Introductio (Euler 1748), where the exponential and the trigonometric functions are unified. The decibel is a direct descendant — Bell Labs' logarithmic unit for ratios of power, named for Alexander Graham Bell.

1665 Newton, Leibniz, and why we have multiple notations 1 Derivatives

Differential calculus was developed independently by Isaac Newton in England (1665–1666, "fluxions") and Gottfried Wilhelm Leibniz in Germany (1675–1684). The two formulations are mathematically equivalent but use different notation: Newton's $\dot x$ for time derivatives, $\ddot x$ for second derivatives; Leibniz's $df/dx$ , $d^2 f/dx^2$ . Leibniz's notation generalises cleanly to multivariable calculus and made his approach dominant on the continent; Newton's notation survived in physics and mechanics, where time is a privileged variable.

The dispute over priority — fuelled by national rivalries and by Newton's accusations that Leibniz had plagiarised his work — soured Anglo-Continental mathematics for nearly a century. Britain stayed loyal to Newton's clunkier "fluxional" calculus; the Continent ran with Leibniz's notation and produced Euler, Lagrange, Laplace, and Fourier. The British eventually capitulated in the early 1800s. We use both notations today as a residue of the history: $\dot x$ for time, $\partial / \partial x$ for space, $f'$ when there is one variable and we don't want to be fussy about which.

1687 From Newton's spring to the bandpass filter 5 Second-order linear ODEs

The equation $m \ddot x + k x = 0$ for simple harmonic motion appears as Proposition 38, Book I, of Newton's 1687 Principia, in his analysis of a body oscillating on a "perfectly elastic" spring. Newton already knew that the solution was sinusoidal and that the period depended only on $\sqrt{m/k}$ — independent of amplitude. The same equation governs the small-angle pendulum (his Proposition 52), which is where the more famous SHM derivation lives.

Damping was added gradually through the 18th and 19th centuries; Lord Rayleigh's Theory of Sound (1877) gives the equation $m \ddot x + b \dot x + k x = 0$ in the modern form. The classification of regimes — overdamped, underdamped, critically damped — comes from late-19th-century galvanometer design, where engineers cared about getting the needle to settle as quickly as possible without ringing. The optimum is critical damping, and "critical damping" is a term of art that crossed over from galvanometers into acoustics, mechanical engineering, and circuit design wholesale.

The complex-impedance approach to forced oscillators ( $\tilde X = F_0 / [(k - m\omega^2) + ib\omega]$ written as one line of algebra) was systematised by Charles Steinmetz for AC circuits in the 1890s — see also the Complex Exponentials chapter. The same algebra of impedances ties together acoustic, electrical, and mechanical filters; the bandpass filter of every audio EQ and every radio tuner is exactly this driven-oscillator equation, with different physical meanings for the symbols.

18th century

1733 From de Moivre to Laplace to Gauss 10 The Gaussian and the central limit theorem

The bell curve's first appearance was in 1733, when Abraham de Moivre computed the limiting shape of the binomial distribution as $n \to \infty$ . He derived $\binom{n}{k} p^k (1-p)^{n-k}$ as an approximate Gaussian for large $n$ , what we'd now call a special case of the Central Limit Theorem. The result was buried in an obscure pamphlet; few people read it.

The curve was rediscovered and popularised by Pierre-Simon Laplace, who derived a more general central-limit result in his 1812 Théorie analytique des probabilités. Laplace argued that sums of many independent measurement errors should be Gaussian-distributed, regardless of the individual error distributions — the modern CLT framing.

Carl Friedrich Gauss developed the distribution from a completely different angle in 1809: he asked, what distribution makes the sample mean the maximum-likelihood estimator of the true value? The unique answer is the Gaussian. This is why we call it Gaussian today, even though de Moivre had the curve a century earlier and Laplace had the limit theorem.

The proof of the CLT in its modern form is due to Aleksandr Lyapunov in 1901 and Jarl Waldemar Lindeberg in 1922. The Lindeberg condition — a precise statement of "no individual $X_i$ should dominate the sum" — is what makes the theorem rigorous.

1747 d'Alembert, Euler, and the vibrating-string controversy 6 The 1-D wave equation: d’Alembert and characteristics

Jean le Rond d'Alembert derived the traveling-wave solution $u(x, t) = F(x - ct) + G(x + ct)$ in his 1747 Recherches sur la courbe que forme une corde tendue mise en vibration, the first solution of a partial differential equation in history. The setup was a vibrating string of length $L$ pinned at both ends; his solution combined right- and left-going waves to satisfy both the wave equation and the boundary conditions.

A controversy followed almost immediately. Leonhard Euler in 1748 pointed out that d'Alembert's $F$ and $G$ — being functions of the spatial coordinate $x \pm ct$ — could in principle be arbitrary curves, not just analytic formulae. D'Alembert insisted on smooth analytic functions only; Euler insisted on admitting "geometric" curves like piecewise-linear shapes. Daniel Bernoulli in 1753 proposed a third approach: the solution should be a superposition of sinusoidal modes — exactly the Fourier-series picture — which led to a further dispute between Bernoulli, d'Alembert, and Euler over whether any function could be represented as such a sum.

The full reconciliation came only after Fourier's 1822 Théorie analytique de la chaleur (and a century of subsequent foundational work in analysis): yes, the two pictures are equivalent and both admit arbitrary reasonable functions, but doing so required a more careful understanding of what "function" and "convergence" meant. The 75-year vibrating-string controversy ended up being the seed dispute that motivated modern analysis. See also the History block in 7.1 — the two stories are continuous.

1748 Euler 1748, Steinmetz 1893 3 Euler's formula and the phasor

Leonhard Euler stated the identity $e^{i\theta} = \cos\theta + i\sin\theta$ in his 1748 Introductio in analysin infinitorum. He derived it from the Taylor series, much as above, treating the substitution $x \to i\theta$ in the exponential series as a formal manipulation. At the time the legitimacy of complex numbers was contested — some mathematicians regarded $\sqrt{-1}$ as a meaningless symbol — and Euler's identity was one of the strongest arguments for taking them seriously. The special case $\theta = \pi$ gives $e^{i\pi} + 1 = 0$ , often cited as the most beautiful equation in mathematics for the way it ties together five fundamental constants.

The use of complex exponentials as phasors for engineering analysis came nearly 150 years later. Charles Proteus Steinmetz, a German-American engineer at General Electric, introduced the phasor method in an 1893 paper to handle AC-circuit analysis. Before Steinmetz, the equations of alternating-current networks were solved by trigonometric identities — slow, error-prone, and unscalable. Steinmetz's phasor representation collapsed the algebra into single-line formulas, and within a decade AC power systems were the standard. The same trick reaches into acoustics through the wave-equation phasor solutions you'll meet in Sound Ch 5.

1748 Chords before sines: Hipparchus to the analytic turn 0 Trigonometry

Trigonometry began as a table-making craft for astronomy. Hipparchus of Nicaea (2nd century BCE) is credited with the first table of chords — for each central angle, the length of the chord it cuts on a fixed circle — which Ptolemy systematised in the Almagest (2nd century CE). The chord is a near-relative of the sine: the chord of angle $\theta$ on a unit-diameter circle is $\sin(\theta/2)$ doubled. The half-chord — the modern sine — was the Indian refinement; the Sanskrit jyā ("bowstring") was transliterated into Arabic and then, by a copyist's reading of the consonants, mistranslated into Latin as sinus ("fold, bay"), which is the word we still use.

The decisive shift was from geometry to analysis. Leonhard Euler, in the Introductio in analysin infinitorum (Euler 1748), treated sine and cosine as functions of a real variable rather than ratios in a triangle, connected them to the exponential through $e^{i\theta} = \cos\theta + i\sin\theta$ , and in doing so made the entire table of identities consequences of a single algebraic fact. Every identity in this lesson is, after Euler, a corollary of $e^{i\theta}$ .

1763 Bayes 1763, Laplace 1774, and a 200-year argument 10 Bayesian inference and signal detection

Thomas Bayes was a Presbyterian minister and amateur mathematician in 18th-century England. He wrote An Essay towards solving a Problem in the Doctrine of Chances sometime before his death in 1761, but never published it. The manuscript was found among his papers by Richard Price, who edited and submitted it to the Royal Society; it appeared in the Philosophical Transactions in 1763, two years after Bayes had died.

The paper introduced what we now call Bayes' rule — initially as a special case for the binomial distribution — and applied it to the problem of estimating an unknown probability from observed successes and failures. The crucial conceptual move was to treat the unknown parameter (the probability of success) as itself having a distribution. This was philosophically radical: parameters were generally thought of as fixed unknowns, not as random variables.

Pierre-Simon Laplace independently rediscovered and generalised the rule in his 1774 Mémoire sur la probabilité des causes par les événements. Laplace took it much further — using Bayesian arguments throughout his career to tackle problems from celestial mechanics (determining the orbits of comets) to demography (estimating population sizes from birth-rate data).

The Bayesian / frequentist split crystallised in the early 20th century, with Ronald Fisher, Jerzy Neyman, and Karl Pearson on the frequentist side arguing for objective, parameter-free statistics, and Harold Jeffreys, Bruno de Finetti, and L. J. Savage on the Bayesian side defending the subjective-probability interpretation. The argument lasted decades; modern statistics largely shrugs and uses both. The rise of computational Bayesian methods (Markov-chain Monte Carlo, variational inference) in the 1990s tipped the practical balance toward Bayesian methods for complex models, and machine-learning's adoption of probabilistic-programming languages (Stan, PyMC, Pyro) has made Bayes the default for most inference today.

19th century

1805 Gauss had the FFT in 1805 9 The FFT

The Cooley–Tukey algorithm was published in 1965, in James Cooley and John Tukey's six-page paper An algorithm for the machine calculation of complex Fourier series. The paper is credited as the foundation of modern digital signal processing — within a decade, every audio compression scheme, every radar pulse compression, every MRI reconstruction depended on it. By the 1980s the FFT was running on dedicated DSP chips in millions of consumer devices.

The algorithm had been written down before. Carl Friedrich Gauss, in 1805, was fitting trigonometric series to astronomical observations of the orbits of the asteroids Pallas and Juno. He computed the Fourier coefficients of his data points via what we now recognise as a radix-2 decomposition — the same butterfly structure as Cooley–Tukey, with the same $\mathcal{O}(N \log N)$ scaling. He wrote the calculation in a Latin notebook entry that was never published in his lifetime; the relevant section appeared only in Volume 3 of his collected works in 1866, a year after Cooley and Tukey were born. The Gauss algorithm was found by historians of mathematics in the 1970s — after the FFT had already conquered signal processing under Cooley and Tukey's names.

The lesson, as far as there is one: an algorithm that no one knows about benefits no one. The FFT's 160-year hibernation between Gauss and Cooley–Tukey is one of the clearer cases of "the right idea, in the wrong notebook, at the wrong time." Modern numerical computing's debt is to the rediscovery and its consequences, not to the original.

1807 Fourier's heat equation and a rejected memoir 6 The heat equation and Laplace’s equation

Joseph Fourier wrote the heat equation $\partial_t u = D \nabla^2 u$ in his 1807 memoir to the French Academy of Sciences, Sur la propagation de la chaleur dans les corps solides. To solve it on a bounded interval, he proposed expanding the initial temperature as a sum of sinusoidal modes — what we now call a Fourier series — and showing that each mode decayed independently with rate $D k^2$ .

The memoir was rejected. Lagrange, on the review panel, objected that "arbitrary functions" could not in general be expressed as such a sum, and the mathematics of convergence wasn't rigorous enough to settle the question. Fourier rewrote, expanded, and resubmitted; the work was published as Théorie analytique de la chaleur in 1822. By then it was already influencing all of mathematics: the analytical machinery built to make Fourier's claims rigorous — Cauchy's theory of convergence, Riemann's theory of integration, Cantor's set theory, Lebesgue's measure theory — became the foundation of modern analysis. The same machinery underwrites every PDE technique in this chapter and the Fourier methods of Foundations 7.

The irony is that the heat equation, derived by Fourier as the physical motivation for the series, ended up far less famous in physics than the Fourier transform that came out of the analytic theory built to validate his solution. Generations of physics students meet Fourier methods without ever learning that he was trying to solve the heat-diffusion problem.

1821 Cauchy and the rigorisation of the calculus 1 Derivatives

For 150 years after Newton and Leibniz, calculus worked in practice but rested on shaky foundations. Newton's "fluxions" and Leibniz's "infinitesimals" were treated as quantities both vanishingly small and non-zero — a contradiction that Bishop Berkeley famously skewered in his 1734 pamphlet The Analyst, calling them "the ghosts of departed quantities." Mathematicians used the methods because they worked; philosophers complained because they made no logical sense.

Augustin-Louis Cauchy's 1821 Cours d'analyse and 1823 Résumé gave the modern definition of the derivative as a limit of difference quotients: $f'(x) = \lim_{h \to 0} [f(x+h) - f(x)]/h$ , with the limit defined by what we now call an $\varepsilon$ – $\delta$ statement. The reformulation eliminated infinitesimals entirely. Karl Weierstrass refined Cauchy's definitions in the 1850s into the rigorous $\varepsilon$ – $\delta$ framework taught today.

This is the version of the derivative in the opening of this lesson — Cauchy's, not Newton's. The modern student inherits a calculus that has been logically clean for two centuries; the original was workable but informal for nearly as long as it has been rigorous.

1822 Fourier, Bernoulli, and the function controversy 7 Fourier series

Joseph Fourier introduced the trigonometric-series decomposition in his 1822 Théorie analytique de la chaleur (Fourier 1822), motivated by the heat equation. His claim — that any function on a bounded interval could be expanded as such a series — was sharply contested by Lagrange and others, because it required admitting functions with corners, jumps, and other "pathological" features that the 18th-century theory of analysis could not handle.

The same dispute, in different form, had played out 75 years earlier between d'Alembert, Euler, and Daniel Bernoulli over the vibrating-string solution (see Sound 3.3). Fourier's work forced the resolution: a "function" is anything that takes input to output, not just an analytic formula. Modern analysis — Cauchy's theory of convergence, Riemann's theory of integration, Cantor's set theory, Lebesgue's measure theory — was built to make Fourier's claim rigorous. Acoustics ended up getting its frequency-domain methods as a byproduct.

The Gibbs phenomenon is a footnote in the same story. Wilbraham noticed the overshoot in 1848, but his paper was forgotten. In 1898 the physicist Albert Michelson — of Michelson-Morley fame — built a mechanical harmonic analyser and observed the overshoot. When he wrote a letter to Nature asking whether this was an artefact of his apparatus, Gibbs replied in 1899 with the mathematical explanation. The phenomenon was named for Gibbs even though Wilbraham had it first.

1850 From Cayley to Hilbert: a century building the spectral theorem 4 Eigenvalues and eigenvectors

Matrix algebra as we know it was assembled by Arthur Cayley and James Joseph Sylvester in the 1850s in England. Cayley's 1858 Memoir on the Theory of Matrices defined matrix addition, multiplication, and the characteristic polynomial — the equation $\det(A - \lambda I) = 0$ from this lesson. Sylvester coined the word "matrix" in 1850 and introduced "discriminant" and "minor" along with much of the modern vocabulary. The two were friends and lifelong correspondents; the era is sometimes called the Cayley–Sylvester period of algebra.

The eigenvalue–eigenvector machinery was fully understood for finite matrices by the 1880s. The leap to infinite dimensions — operators on function spaces, the natural home of PDEs and quantum mechanics — was made by David Hilbert in the early 1900s, in his work on integral equations. Hilbert's six papers from 1904–1910 established what we now call Hilbert space, and the proof that self-adjoint operators on a Hilbert space have a complete orthonormal eigenbasis is the spectral theorem, the deepest result in the chain. The full machinery was reformulated and extended by Hilbert's student John von Neumann in the 1930s, providing the mathematical foundation that Werner Heisenberg's matrix mechanics and Erwin Schrödinger's wave mechanics needed to be the same theory. Eigenvalues, in other words, ran the central arc of mathematical physics from 1850 to 1930.

1854 Riemann's integral and the price of rigour 1 Integrals

Bernhard Riemann's 1854 Habilitationsschrift gave the integral its modern definition: the limit of sums $\sum f(t_i)\Delta t$ as the partition is refined, taken over arbitrary partitions and arbitrary sample points $t_i$ , with the requirement that the limit exists and is independent of the choices. This was the first definition that worked for functions more pathological than Cauchy's 1823 version had allowed — in particular, for functions with infinitely many discontinuities in any interval.

Riemann's framework made integration a property of the function rather than a recipe for evaluation. A function is Riemann-integrable if the limit exists; not all bounded functions are. Henri Lebesgue's 1902 reformulation extended the theory to a much wider class (the Lebesgue integral, which agrees with Riemann's on functions both can handle but assigns values to many that Riemann cannot). For the bounded, piecewise-continuous functions of acoustics and physics, the Riemann integral is enough and is what we use throughout the bookshelf.

1872 Dedekind constructs the real numbers 1 Integrals

Calculus assumes the real numbers — that there is a continuum on which limits, derivatives, and integrals make sense. For most of mathematical history this was taken as obvious. Richard Dedekind, in his 1872 essay Stetigkeit und irrationale Zahlen (Continuity and Irrational Numbers), made it rigorous: a real number is a Dedekind cut — a partition of the rationals into two non-empty sets, one entirely below the other, with no rational sitting between them. The cut "is" the irrational at the boundary.

Georg Cantor, the same year, gave an alternative construction via Cauchy sequences of rationals (declaring two sequences equivalent when their difference goes to zero). Both constructions produce the same field of real numbers and the same completeness property: every Cauchy sequence converges. Without one of these constructions, calculus has no logical ground to stand on; with either, every theorem from Cauchy and Riemann to the modern wave equation rests on a defensible foundation.

1877 Rayleigh, Buckingham, and the dimensionless number 8 Dimensional analysis

Lord Rayleigh's 1877 Theory of Sound used dimensional reasoning throughout — to guess scaling laws, to check derivations, to argue that certain phenomena could only depend on dimensionless combinations of parameters. He didn't formalise the technique; he just used it everywhere. By the early 1900s "Rayleigh's method" was an informal craft.

In 1883 Osborne Reynolds, studying flow through pipes, identified what we now call the Reynolds number $Re = \rho U L / \mu$ — a dimensionless group whose value distinguished laminar from turbulent flow regardless of the absolute scale of the pipe. This was the first time a named dimensionless number was understood as the physically-meaningful parameter of a problem.

In 1914 Edgar Buckingham (US Bureau of Standards) formalised what Rayleigh had been doing: if a physical relationship involves $n$ variables with $k$ independent dimensions, it can be rewritten as a relation among $n - k$ dimensionless groups. The Buckingham π theorem turned dimensional analysis from craft into a recipe. Almost every dimensionless number in physics — Mach, Reynolds, Prandtl, Strouhal, Helmholtz — emerged from this framework.

1880 The vectors that fought a war 2 Divergence and curl

Vector calculus as we use it — gradient, divergence, curl, the $\nabla$ operator — was assembled between 1853 and the 1890s out of two competing formalisms.

William Rowan Hamilton invented quaternions in 1843 (allegedly carving the formula $i^2 = j^2 = k^2 = ijk = -1$ into the stone of Brougham Bridge in Dublin). He intended them as the natural algebra for three-dimensional rotations and physical quantities, and spent the rest of his life evangelising for them. James Clerk Maxwell's Treatise on Electricity and Magnetism (1873) was written in a hybrid quaternion notation: the operator we now call $\nabla$ was Hamilton's "nabla" (named after a Hebrew harp shaped like the symbol).

In the 1880s, J. Willard Gibbs (at Yale) and Oliver Heaviside (in England, working independently) extracted a stripped-down "vector algebra" from Hamilton's quaternions — keeping the dot and cross products, abandoning the quaternion arithmetic — and used it to reformulate Maxwell's equations into the form we now see. A pitched war broke out in the late-19th-century mathematical journals between the quaternion adherents (Peter Guthrie Tait was the loudest) and the new vector-calculus camp (Gibbs, Heaviside). The vector-calculus side won decisively. By 1900, physics and engineering had abandoned quaternions; today they survive only in computer graphics (for rotation interpolation) and in pure mathematics. The notation $\nabla$ and the calculus you use here is the residue of Gibbs and Heaviside's victory.

1890 May's 1976 plea to study the simple map 11 Sensitive dependence and the logistic map

The logistic map as a population model goes back to Pierre-François Verhulst in the 1840s, but its chaotic side was not appreciated until the 1970s. In a 1976 review in Nature, the physicist-turned-ecologist Robert May laid out how this "very simple" first-order difference equation runs through stable points, stable cycles, and "an apparently chaotic regime in which... the trajectory looks like the sample function of a random process" — all by varying one parameter ([May 1976][1]). May's closing was a manifesto: he urged that the model "be studied in the elementary mathematics courses," arguing that the widespread intuition that simple equations have simple solutions was doing real harm in fields from ecology to economics. The deeper structure — that the rate of period-doubling is universal — was uncovered the same year by Mitchell Feigenbaum, the subject of the next lesson.

The conceptual roots run back further, to Henri Poincaré, who in his 1890 work on the three-body problem found orbits "so tangled that I cannot even begin to draw them," and grasped that small differences in initial conditions could produce large differences later — sensitive dependence, four decades before anyone had a name for it ([Poincaré 1890][2]).

[1]: /foundations/bibliography#may-1976 [2]: /foundations/bibliography#poincare-1890

1892 Lyapunov's stability, repurposed for instability 11 Lyapunov exponents and the horizon of prediction

Aleksandr Lyapunov introduced his exponents in his 1892 doctoral thesis The General Problem of the Stability of Motion, written in Kharkiv and concerned with the opposite of chaos: he wanted rigorous conditions under which a mechanical equilibrium is stable, so that perturbations decay. His "first method" linearised the dynamics and read stability from the exponential rates — negative rates meaning a perturbation dies away. The same exponents, taken positive, became the central diagnostic of chaos eighty years later, once Lorenz and others realised that the interesting systems were the ones where Lyapunov's quantity comes out positive. The numerical recipe for extracting the full spectrum from data or simulation — periodically renormalising the separation vector to keep it infinitesimal — was worked out by Benettin and others around 1980 and is still the standard method.

Early 20th century

1926 Schrödinger 1926, and the two quantum mechanicses 6 The Schrödinger equation

Quantum mechanics was discovered twice in the same year. Werner Heisenberg's 1925 paper introduced matrix mechanics: physical observables were represented by infinite matrices and the dynamics by matrix multiplication. The mathematics was unfamiliar to physicists — Born and Jordan had to teach Heisenberg what a matrix was — but it correctly predicted the spectral lines of the hydrogen atom and the spectra of more complicated atoms.

Erwin Schrödinger, working independently in early 1926, was guided by de Broglie's 1924 hypothesis that matter has wave-like character. He wrote down the wave equation $i\hbar\, \partial_t \Psi = \hat H \Psi$ and showed that its eigenvalues for the hydrogen-atom potential gave the Bohr energy levels exactly. The mathematics was the separation-of-variables technique already familiar from acoustics — which is precisely the parallel this lesson develops.

The two formulations looked utterly different. Heisenberg's was algebraic and discrete; Schrödinger's was differential and continuous. Within months of publication (1926), Schrödinger himself proved that the two were mathematically equivalent — different representations of the same theory. Paul Dirac's 1930 textbook The Principles of Quantum Mechanics and John von Neumann's 1932 Mathematische Grundlagen der Quantenmechanik gave the unified abstract formulation in terms of operators on Hilbert space, which is the formulation modern physics uses. The same Hilbert space, complete with self-adjoint operators and the spectral theorem, that runs through the rest of Foundations 6.

1933 Kolmogorov's axioms for probability 10 Random variables and distributions

For two centuries after Pascal and Fermat's 1654 correspondence on games of chance, probability was treated as a collection of computational recipes — useful, intuitive, and logically unmoored. The frequentist interpretation ("probability is long-run frequency") and the Laplacian interpretation ("probability is equally likely cases") each worked in specific settings but failed in others, and there was no agreed answer to what "probability" meant.

Andrey Kolmogorov's 1933 monograph Grundbegriffe der Wahrscheinlichkeitsrechnung (Foundations of the Theory of Probability) gave the axiomatic definition that is now standard: a probability is a non-negative, countably-additive measure on a sigma-algebra of events, normalised so that the certain event has measure 1. Random variables are measurable functions on this space; expectations are Lebesgue integrals.

The axiomatisation did three things at once. It unified frequentist and subjective probabilities into a single mathematical object (only the interpretation differs). It connected probability to measure theory and so to the rest of 20th-century analysis. And it provided the formal ground for the convergence theorems (law of large numbers, central limit theorem, martingale convergence) that underwrite all of modern statistical inference.

Late 20th century

1961 Lorenz, a truncated printout, and the butterfly 11 Flows, strange attractors, and the Lorenz system

Edward Lorenz found his attractor by accident. In 1961, rerunning a weather simulation on a Royal McBee vacuum-tube computer, he restarted midway by typing in numbers from an earlier printout — which showed only three decimal places, while the machine held six. The rounded restart tracked the original for a while, then diverged completely: a different forecast from a difference of one part in a thousand. Lorenz recognised that this was not a numerical artefact but a property of the equations, and distilled it into the three-variable system in his 1963 paper Deterministic Nonperiodic Flow — a title that states the paradox outright ([Lorenz 1963][1]).

The paper, published in a meteorology journal, went largely unnoticed by mathematicians and physicists for a decade. Lorenz's later image stuck: a butterfly flapping its wings in Brazil might set off a tornado in Texas — not because the butterfly supplies the energy, but because the atmosphere amplifies the tiny perturbation until, weeks later, it has rearranged the large-scale weather. The shape of his attractor, by happy coincidence, looks like a butterfly too.

[1]: /foundations/bibliography#lorenz-1963

1975 Feigenbaum's pocket calculator and a universal constant 11 The period-doubling route to chaos

In 1975 Mitchell Feigenbaum, at Los Alamos, was computing the bifurcation points of period-doubling maps on an HP-65 programmable calculator. The convergence was slow, so to guess where the next bifurcation would fall he computed the ratio of successive gaps — and found it tending to a constant, $4.6692$ . Trying a completely different map, $x \mapsto r\sin(\pi x)$ , he expected a different number and got the same one. The shared constant told him the behaviour was universal, independent of the specific nonlinearity ([Feigenbaum 1978][1]).

The mechanism — a renormalisation-group fixed point in the space of maps — connected chaos to Kenneth Wilson's contemporaneous Nobel-winning work on phase transitions, where the same mathematics explains why fluids and magnets share critical exponents. Feigenbaum's papers were rejected repeatedly before publication; the result was so unexpected that referees did not believe a simple universal constant could govern the onset of chaos across unrelated systems.

[1]: /foundations/bibliography#feigenbaum-1978