11.1 Random variables and distributions
This chapter develops the probability and statistics that the rest of the bookshelf uses: random walks and Brownian motion for Sound 1.3, Poisson spike trains for Hearing Ch 5, Bayesian inference for Hearing 8, and the Gaussian distribution as the asymptotic shape that almost every noise model relaxes to.
This first lesson is the working vocabulary: what a random variable is, how to describe its distribution, and the five or six named distributions that appear repeatedly across the bookshelf.
What is a random variable?
A random variable is a quantity whose value is not known with certainty until you measure it. Examples:
- The number of times a tossed coin lands heads in 10 trials.
- The arrival time of the next photon from a faint star.
- The voltage measured across a noisy resistor at a particular instant.
- The displacement of a Brownian particle after one second.
A random variable is not the same as a single random outcome. It is the entire family of possible outcomes plus the probabilities of each. Two random variables are equal in distribution if they have the same probabilities of producing the same outcomes — they need not produce the same value in any particular trial.
Random variables come in two flavours:
- Discrete — takes values from a countable set: or or the integers. Examples: coin tosses, spike counts, photon counts.
- Continuous — takes values from an uncountable interval, typically or or some subset. Examples: position of a Brownian particle, voltage, arrival time.
Describing a distribution
For a discrete random variable , the probability mass function (PMF) gives the probability of each possible value:
For a continuous random variable, individual points have probability zero (a continuous variable could be exactly or exactly , but the probability of landing on any single value with infinite precision is zero). Instead we describe the distribution by a probability density function (PDF):
The density has units of “probability per unit .” For voltage it would have units of ; for time, units of .
Both kinds of distribution can be described equivalently by their cumulative distribution function (CDF):
The CDF goes from 0 at to 1 at , monotonically. For continuous , .
Expectation and variance
The expected value (or mean) of a random variable is the probability-weighted average of its possible values:
The expected value is a deterministic number — a property of the distribution, not of any particular realisation. It is the long-run average if you repeated the experiment many times.
The variance measures how spread out the distribution is around the mean:
where . The variance has units of ; the square root, called the standard deviation , has the same units as and is the more natural “typical spread” measure.
▶ The shortcut formula Var = E[X²] − μ²
Expand the definition:
Linearity of expectation (which holds because the integral or sum is linear in the integrand):
The variance is therefore “the mean of the square minus the square of the mean.” Useful: it avoids computing the centred quantity and then squaring, which can be slower than computing directly and subtracting.
Higher-order moments — , — measure further aspects of the distribution. The third moment around the mean (normalised by ) is skewness, measuring asymmetry. The fourth moment around the mean (normalised by , minus 3) is kurtosis, measuring tail heaviness. For most distributions in the bookshelf, mean and variance suffice.
Six distributions to know
Six named distributions appear repeatedly across the bookshelf. They are worth knowing by name, mean, variance, and shape.
1. Uniform on
Continuous, PDF is constant on and zero outside:
Mean , variance . The “default” distribution when you have no information beyond bounds.
2. Bernoulli()
Discrete, two outcomes: 1 with probability , 0 with probability . A single coin toss. Mean , variance . The building block of the binomial and Poisson.
3. Binomial()
Discrete, the number of successes in independent Bernoulli() trials:
Mean , variance . As with fixed, approaches a Gaussian (Central Limit Theorem). As with fixed (i.e. ), approaches a Poisson — see lesson 11.4.
4. Gaussian (normal),
Continuous, the bell curve:
Mean , variance . The asymptotic shape of any sum of many independent identically-distributed (i.i.d.) random variables with finite variance — the Central Limit Theorem. Develop it in 11.2.
5. Exponential(λ)
Continuous, supported on :
Mean , variance . The waiting-time distribution for a memoryless process — the inter-arrival times of a Poisson process. Develop it in 11.4.
6. Poisson(λ)
Discrete, supported on the non-negative integers:
Mean , variance (variance equals mean — a Poisson-process signature). The number of events in a fixed time interval for a Poisson process. Develop it in 11.4.
Joint, marginal, and conditional
When two random variables and appear together we need new tools.
The joint distribution or gives the probability or density of taking the pair . The marginal distribution of alone is recovered by summing or integrating out :
The conditional distribution of given is
Two random variables are independent if — i.e. learning the value of one tells you nothing about the other.
These three concepts — joint, marginal, conditional — are the algebraic infrastructure of Bayesian inference. Bayes’ rule rearranges them.
Two more facts that will come up
Linearity of expectation holds with or without independence: for any constants .
Variance is not linear in general — but for independent random variables it adds: . The same identity rescaled: standard deviations do not add for independent sums, but their squares do. This is the algebraic content of “errors add in quadrature.”
What we use this for
The vocabulary of this lesson is the prerequisite for everything else in the chapter:
- 11.2 develops the Gaussian and the CLT.
- 11.3 builds random walks and Brownian motion from sums of i.i.d. Bernoulli or Gaussian steps.
- 11.4 develops the Poisson process from a memoryless-arrivals argument.
- 11.5 does Bayesian inference, signal detection theory, and ROC curves.
Across the rest of the bookshelf:
- Thermal noise voltage on a resistor is Gaussian with (Johnson–Nyquist).
- Spike counts in an auditory-nerve fibre are Poisson-distributed with mean equal to the firing rate times the window.
- Brownian motion is the continuum limit of an i.i.d. random walk.
- Bayesian perception models the brain’s posterior over stimuli given sensory evidence.