6.1 Signal averaging and the EEG noise floor

A scalp electrode pressed against the vertex of an adult head records continuous electrical activity from the cortex below — the electroencephalogram (EEG). Its dominant components are the alpha rhythm (~10 Hz, prominent when eyes are closed), beta activity (~20 Hz, present during attention), and slower delta and theta waves; the total recorded amplitude at the vertex sits between 10 and 100 µV, depending on state.

The ABR is a transient response to a brief sound stimulus, lasting about 10 ms and peaking at amplitudes of roughly 0.3–0.5 µV at vertex — a hundred times smaller than the EEG it rides on. Direct recording of a single click presentation gives, in the response window, a trace dominated by EEG with no visible structure attributable to the click. The auditory response is there, but invisible.

The technique that makes it visible is signal averaging — known to physiologists since Dawson’s electromechanical superimposer of 1947, and one of the few measurement techniques in clinical medicine that derives directly from the central limit theorem of Math Foundations 11.2 refresher →.

N trials averaged = 1

signal amplitude = 0.30 µV noise σ = 1.50 µV

A small time-locked signal (red dashed) is buried in EEG noise (~1.5 µV σ for scalp recordings). One trial is unrecognisable; averaging N independent trials averages the signal to its true amplitude but the noise as σ/√N. The SNR grows as √N — a 4× reduction in noise requires 16 trials, a 10× reduction requires 100, a 100× reduction requires 10,000. The clinical ABR averages 1500–2000 trials (about 1 minute at 21 clicks/sec) to extract a ~0.3 µV signal from a ~1.5 µV noise floor — a ~20 dB SNR improvement that makes the response unambiguous. The same principle scales to all evoked-potential testing: longer averaging buys lower noise but plateaus once signal-to-noise approaches the limit set by the trial-to-trial variability of the underlying response.

The √N rule

Suppose a click is presented at $t = 0$ and the scalp potential is recorded as $V_k(t)$ at trial $k$ , with the response window $0 \le t \le 10$ ms. Model the trial as

$V_k(t) = s(t) + n_k(t),$

where $s(t)$ is the deterministic, time-locked, same-on-every-trial evoked response, and $n_k(t)$ is the EEG noise on that trial. Suppose the noise is

stationary (its statistics don’t change across trials),
independent across trials (different trials’ noise samples are uncorrelated),
zero-mean Gaussian with standard deviation $\sigma$ .

Then the average over $N$ trials is

$\bar V_N(t) = \frac{1}{N} \sum_{k=1}^N V_k(t) = s(t) + \frac{1}{N} \sum_{k=1}^N n_k(t).$

The first term is unchanged. The second term is the mean of $N$ independent Gaussian variables of standard deviation $\sigma$ ; by elementary statistics this is itself Gaussian with standard deviation $\sigma / \sqrt{N}$ .

The signal-to-noise ratio after averaging is

$\text{SNR}_N = \frac{|s|}{\sigma / \sqrt{N}} = \sqrt{N} \cdot \frac{|s|}{\sigma}.$

The SNR grows as $\sqrt{N}$ , or equivalently, +3 dB per doubling of N. To gain 20 dB of SNR (a factor of 100 in linear amplitude, a factor of 10,000 in power) requires $N = 100^2 = 10{,}000$ trials — though for typical ABR conditions, 1500–2000 trials achieves an SNR sufficient for clinical interpretation.

▶ Why the noise grows linearly while the signal grows linearly — but it's the ratio that matters Derivation

It’s tempting to say “the signal averages — therefore both signal and noise grow with N at the same rate.” Not quite. The averaging operator $\frac{1}{N} \sum$ has two ingredients: a sum (which adds N terms) and a normalisation (divide by N).

For the time-locked signal, every trial contributes the same $s(t)$ to the sum, so the sum is $N s(t)$ and the average is just $s(t)$ .

For the noise, every trial contributes an independent zero-mean random number $n_k(t)$ to the sum. The variance of the sum is $\sum_k \text{Var}(n_k) = N \sigma^2$ (because variances add for independent variables), so the sum’s standard deviation is $\sigma \sqrt{N}$ . Dividing by N gives the average’s standard deviation as $\sigma / \sqrt{N}$ .

The key is that for the signal, every $n_k$ adds coherently (they’re all the same number); for the noise, every $n_k$ adds incoherently (they’re independent). Coherent addition gives $N$ times the amplitude; incoherent gives $\sqrt{N}$ . The ratio — which is what determines visibility — therefore grows by $N / \sqrt{N} = \sqrt{N}$ .

The same principle appears in radar (pulse compression), astronomy (multiple-exposure stacking), and digital communications (matched filtering): coherent summation buys signal at a faster rate than incoherent noise summation can wash it out.

What “noise” actually means in clinical evoked-potential recording

The √N rule assumes idealised stationary Gaussian noise. Real EEG noise is:

Quasi-stationary at best. Patient drowsiness, awake-vs-asleep state changes, and movement artefacts produce non-stationary noise. Averaging across non-stationary epochs makes the effective noise σ larger than the within-epoch σ would suggest, slowing convergence.
Non-Gaussian. Eye blinks (10–50 µV bursts) and muscle artefacts (broadband, 50+ µV) produce outlier samples that distort the simple average. Modern clinical ABR systems implement artefact rejection: any trial whose peak amplitude exceeds a threshold (typically ±15 to ±25 µV) is discarded and not included in the average.
Time-correlated within a trial. EEG is dominated by low-frequency rhythms, so the noise within the response window is not white. Filtering (typically 100–3000 Hz bandpass for ABR) removes the dominant low-frequency components and effectively whitens the residual within the response band.

Clinical practice handles these realities with:

Bandpass filtering appropriate to the response (100–3000 Hz for ABR, 30–500 Hz for ASSR, 1–30 Hz for CAEP).
Artefact rejection discarding noisy trials.
Weighted averaging (Don et al., 1984): rather than weighting every accepted trial equally, weight each trial by the inverse of its estimated noise variance. Noisy-but-not-rejected trials contribute less; quiet trials contribute more. For typical ABR, weighted averaging produces equivalent SNR with about 30% fewer trials than unweighted averaging.
Two-buffer (replicate) averaging: maintain two independent averages over odd-numbered and even-numbered trials. When the two replicates agree, the response is real; when they disagree, the response is at noise level. This is the same A/B/A+B/A−B trick used in OAE testing.

How long does an ABR take?

The standard click ABR delivers stimuli at about 21 clicks/sec (49 ms inter-click interval, allowing the response to complete and the brain to recover). 1500 trials therefore takes about 71 seconds — call it a minute and a half per condition. A full diagnostic ABR battery (click + tone bursts at 500, 1000, 2000, 4000 Hz, each at 3–5 levels, on both ears) takes 60–90 minutes. Newborn ABR screening uses an automated rule (Cz-A1 vs Cz-A2 statistical detection, with a stopping criterion) that typically requires 500–1500 trials at a faster click rate (37 clicks/sec) — a 30-second test per ear when the infant is quiet.

The fundamental limit

The √N rule promises that any time-locked response can in principle be recovered by averaging enough trials. In practice the recovery is limited by:

Trial-to-trial response variability. If $s(t)$ itself varies trial-to-trial (e.g., a sleeping infant whose arousal state varies), this variability adds to the noise budget and the averaged response converges to the trial average of $s(t)$ , not to any single trial’s signal. Trial-to-trial response variability sets the floor below which no amount of averaging can go.
Patient time. A drowsy newborn might tolerate 5 minutes; a sedated adult might tolerate 30. The number of trials per condition is therefore a constrained resource.
Stationarity assumption. As trial count grows into the thousands, the assumption that EEG noise has been stationary across the entire recording becomes harder to defend. Brain-state changes within a 90-minute test mean that some response components may have shifted in latency or amplitude across the recording.

These limits set the practical reach of evoked-potential testing — somewhere between 0.1 and 0.3 µV of effective response amplitude is the floor below which clinical recordings cannot reliably go, regardless of trial count.

Next lesson: the ABR specifically — five waves, three minutes of recording per intensity, and a clinical interpretation framework that has been the backbone of objective hearing assessment since the 1970s.