10.5 The FFT

The discrete Fourier transform (DFT) takes $N$ time samples and produces $N$ frequency-domain coefficients:

X_k \;=\; \sum_{n=0}^{N-1} x_n\, e^{-2 \pi i\, k n / N}, \qquad k = 0, 1, \ldots, N-1.

Computed naively, this is $N^2$ complex multiplications: $N$ output coefficients, each a sum of $N$ products. For $N = 10^6$ — a routine size for an audio buffer — that’s $10^{12}$ multiplications, hours of compute on a desktop CPU.

The Fast Fourier Transform (FFT) computes the same thing in $\mathcal{O}(N \log N)$ multiplications. For $N = 10^6$ that’s $2 \times 10^7$ — about a millionth as much work. The factor-of- $N$ speedup is what makes modern signal processing practical: every spectrogram, every audio plugin, every MRI reconstruction, every JPEG-2000 transform, every weather-model spectral solver runs on it.

This lesson sketches how the speedup works.

The Cooley–Tukey idea

The trick is divide-and-conquer. Split the sum over $n$ into even-indexed and odd-indexed terms:

X_k \;=\; \sum_{m = 0}^{N/2 - 1} x_{2m}\, e^{-2\pi i\, k(2m)/N} \;+\; \sum_{m = 0}^{N/2 - 1} x_{2m+1}\, e^{-2\pi i\, k(2m+1)/N}.

Pull the $e^{-2\pi i k / N}$ factor out of the second sum:

X_k \;=\; \underbrace{\sum_{m=0}^{N/2-1} x_{2m}\, e^{-2\pi i\, k m / (N/2)}}_{E_k} \;+\; e^{-2\pi i k / N} \underbrace{\sum_{m=0}^{N/2-1} x_{2m+1}\, e^{-2\pi i\, k m / (N/2)}}_{O_k}.

Each of $E_k$ and $O_k$ is itself a DFT, but of length $N/2$ . The full transform $X_k$ is built from $E_k$ and $O_k$ as

X_k \;=\; E_k + e^{-2 \pi i\, k / N}\, O_k, \qquad X_{k + N/2} \;=\; E_k - e^{-2 \pi i\, k / N}\, O_k,

for $k = 0, 1, \ldots, N/2 - 1$ . The complex exponential $e^{-2\pi i k / N}$ is the twiddle factor. The two equations together are the butterfly — a single pair of complex multiplications and additions that produces a pair of output values from a pair of input partial transforms.

Recursing: an $N$ -point DFT reduces to two $(N/2)$ -point DFTs plus $N/2$ butterflies. Recursing again, each $(N/2)$ -point DFT reduces to two $(N/4)$ -point DFTs plus $N/4$ butterflies. The recursion bottoms out at trivial 1-point DFTs ( $X_0 = x_0$ ).

Total operation count: at each of the $\log_2 N$ levels of recursion, the cost is $N/2$ butterflies, each of $\mathcal{O}(1)$ complex multiplications. Total $\mathcal{O}(N \log N)$ .

Bit-reversal

The recursive split of inputs is by even/odd indexing. After all the splits, input $x_n$ ends up at output position $\text{bit-reverse}(n)$ . For $N = 8$ :

| input index $n$ | binary | bit-reversed | post-reordering position | |---|---|---|---| | 0 | 000 | 000 | 0 | | 1 | 001 | 100 | 4 | | 2 | 010 | 010 | 2 | | 3 | 011 | 110 | 6 | | 4 | 100 | 001 | 1 | | 5 | 101 | 101 | 5 | | 6 | 110 | 011 | 3 | | 7 | 111 | 111 | 7 |

So the FFT either reorders inputs before the butterflies (decimation-in-time) or reorders outputs after (decimation-in-frequency). Either way, the bit-reverse step is part of the standard implementation.

The signal-flow diagram

stage 0 of 3

The Cooley–Tukey radix-2 FFT computes the discrete Fourier transform in three stages for N = 8 (general: log₂N stages). Each stage applies N/2 "butterflies" — each butterfly is a pair of complex additions weighted by a *twiddle factor* w = e^(−2πi · k / m). The input is reshuffled in bit-reversed order (so that x₀, x₄, x₂, x₆, x₁, x₅, x₃, x₇ for N = 8), then the butterflies combine pairs at increasing scale: stage 1 pairs adjacent samples, stage 2 pairs across groups of 2, stage 3 pairs across groups of 4. The output X_k emerges in natural order. The total cost is N log₂N ≪ N² for any non-trivial N, which is why an FFT for a million samples takes microseconds and an explicit DFT would take minutes.

The interactive shows the Cooley–Tukey signal-flow graph for $N = 8$ . The input on the left is in bit-reversed order. Each of the three stages applies four butterflies in parallel; the active butterflies at the current stage are shaded. After three stages the output is in natural order on the right. Total: $3 \times 4 = 12$ butterflies, each two complex multiplications. The naive DFT would need $8 \times 8 = 64$ multiplications.

When N is not a power of 2

The pure radix-2 FFT requires $N$ to be a power of 2. There are generalisations:

Mixed-radix algorithms factor $N$ into small primes and apply different butterfly structures at each level. The FFTW library’s “wisdom-driven” approach finds optimal mixes empirically for any $N$ .
Bluestein’s algorithm handles arbitrary $N$ by padding to the next power of 2 and using a clever convolution trick. Cost is still $\mathcal{O}(N \log N)$ but with a larger constant.

In practice you pick a buffer size that’s a power of 2 (or a smooth product of small primes) and use whatever the standard library provides. For audio, $N = 1024$ to $N = 4096$ are typical.

Where FFT shows up

The FFT is the workhorse algorithm behind any application that processes signals or fields in the frequency domain. A partial list:

Spectrograms. A short-time Fourier transform takes overlapping windowed segments of audio and FFTs each one. Every visualisation in Sound 8.2 — Spectrograms uses this.
Convolution. The convolution theorem of Foundations 7 says convolution in the time domain equals multiplication in the frequency domain. So convolution of two signals of length $N$ can be done in $\mathcal{O}(N \log N)$ via FFT-multiply-IFFT, instead of $\mathcal{O}(N^2)$ direct. This is huge for things like filter design and image-processing kernels.
Spectral solvers for PDEs. Represent the field as a Fourier series; the Laplacian is then diagonal (acting on each mode as $-k^2$ ). Solving Poisson’s equation in Fourier space requires one FFT, one element-wise division by $-k^2$ , and one inverse FFT — total $\mathcal{O}(N \log N)$ . Way faster than relaxation for periodic domains.
MRI reconstruction. The MRI scanner samples the frequency domain directly. The image is the inverse FFT of the acquired data. Reconstruction time is dominated by the FFT.
JPEG, JPEG 2000, MP3, AAC. All compress in a frequency-related domain (DCT, MDCT) computed via FFT-like algorithms.

⏳ The history — Gauss had the FFT in 1805

The Cooley–Tukey algorithm was published in 1965, in James Cooley and John Tukey’s six-page paper An algorithm for the machine calculation of complex Fourier series. The paper is credited as the foundation of modern digital signal processing — within a decade, every audio compression scheme, every radar pulse compression, every MRI reconstruction depended on it. By the 1980s the FFT was running on dedicated DSP chips in millions of consumer devices.

The algorithm had been written down before. Carl Friedrich Gauss, in 1805, was fitting trigonometric series to astronomical observations of the orbits of the asteroids Pallas and Juno. He computed the Fourier coefficients of his data points via what we now recognise as a radix-2 decomposition — the same butterfly structure as Cooley–Tukey, with the same $\mathcal{O}(N \log N)$ scaling. He wrote the calculation in a Latin notebook entry that was never published in his lifetime; the relevant section appeared only in Volume 3 of his collected works in 1866, a year after Cooley and Tukey were born. The Gauss algorithm was found by historians of mathematics in the 1970s — after the FFT had already conquered signal processing under Cooley and Tukey’s names.

The lesson, as far as there is one: an algorithm that no one knows about benefits no one. The FFT’s 160-year hibernation between Gauss and Cooley–Tukey is one of the clearer cases of “the right idea, in the wrong notebook, at the wrong time.” Modern numerical computing’s debt is to the rediscovery and its consequences, not to the original.

What we use this for

The bookshelf’s spectrogram and frequency-spectrum interactives use FFT under the hood:

SineWaveDemo / TwoSineExplorer (MATH-9, MATH-10 in the Sound book) — FFT of generated waveforms.
WaveformSpectrum (SOUND-4) — short-time FFT.
Cochleagram (HEARING-15) — gammatone filterbank instead of FFT, but the same underlying frequency-decomposition idea.

The next and final lesson, 10.6, covers root-finding — the algorithm behind finding eigenvalues of the characteristic polynomial in Foundations 5.2, among many other things.