10.5 The FFT

The discrete Fourier transform (DFT) takes NN time samples and produces NN frequency-domain coefficients:

Xk  =  n=0N1xne2πikn/N,k=0,1,,N1.X_k \;=\; \sum_{n=0}^{N-1} x_n\, e^{-2 \pi i\, k n / N}, \qquad k = 0, 1, \ldots, N-1.

Computed naively, this is N2N^2 complex multiplications: NN output coefficients, each a sum of NN products. For N=106N = 10^6 — a routine size for an audio buffer — that’s 101210^{12} multiplications, hours of compute on a desktop CPU.

The Fast Fourier Transform (FFT) computes the same thing in O(NlogN)\mathcal{O}(N \log N) multiplications. For N=106N = 10^6 that’s 2×1072 \times 10^7 — about a millionth as much work. The factor-of-NN speedup is what makes modern signal processing practical: every spectrogram, every audio plugin, every MRI reconstruction, every JPEG-2000 transform, every weather-model spectral solver runs on it.

This lesson sketches how the speedup works.

The Cooley–Tukey idea

The trick is divide-and-conquer. Split the sum over nn into even-indexed and odd-indexed terms:

Xk  =  m=0N/21x2me2πik(2m)/N  +  m=0N/21x2m+1e2πik(2m+1)/N.X_k \;=\; \sum_{m = 0}^{N/2 - 1} x_{2m}\, e^{-2\pi i\, k(2m)/N} \;+\; \sum_{m = 0}^{N/2 - 1} x_{2m+1}\, e^{-2\pi i\, k(2m+1)/N}.

Pull the e2πik/Ne^{-2\pi i k / N} factor out of the second sum:

Xk  =  m=0N/21x2me2πikm/(N/2)Ek  +  e2πik/Nm=0N/21x2m+1e2πikm/(N/2)Ok.X_k \;=\; \underbrace{\sum_{m=0}^{N/2-1} x_{2m}\, e^{-2\pi i\, k m / (N/2)}}_{E_k} \;+\; e^{-2\pi i k / N} \underbrace{\sum_{m=0}^{N/2-1} x_{2m+1}\, e^{-2\pi i\, k m / (N/2)}}_{O_k}.

Each of EkE_k and OkO_k is itself a DFT, but of length N/2N/2. The full transform XkX_k is built from EkE_k and OkO_k as

Xk  =  Ek+e2πik/NOk,Xk+N/2  =  Eke2πik/NOk,X_k \;=\; E_k + e^{-2 \pi i\, k / N}\, O_k, \qquad X_{k + N/2} \;=\; E_k - e^{-2 \pi i\, k / N}\, O_k,

for k=0,1,,N/21k = 0, 1, \ldots, N/2 - 1. The complex exponential e2πik/Ne^{-2\pi i k / N} is the twiddle factor. The two equations together are the butterfly — a single pair of complex multiplications and additions that produces a pair of output values from a pair of input partial transforms.

Recursing: an NN-point DFT reduces to two (N/2)(N/2)-point DFTs plus N/2N/2 butterflies. Recursing again, each (N/2)(N/2)-point DFT reduces to two (N/4)(N/4)-point DFTs plus N/4N/4 butterflies. The recursion bottoms out at trivial 1-point DFTs (X0=x0X_0 = x_0).

Total operation count: at each of the log2N\log_2 N levels of recursion, the cost is N/2N/2 butterflies, each of O(1)\mathcal{O}(1) complex multiplications. Total O(NlogN)\mathcal{O}(N \log N).

Bit-reversal

The recursive split of inputs is by even/odd indexing. After all the splits, input xnx_n ends up at output position bit-reverse(n)\text{bit-reverse}(n). For N=8N = 8:

input index nnbinarybit-reversedpost-reordering position
00000000
10011004
20100102
30111106
41000011
51011015
61100113
71111117

So the FFT either reorders inputs before the butterflies (decimation-in-time) or reorders outputs after (decimation-in-frequency). Either way, the bit-reverse step is part of the standard implementation.

The signal-flow diagram

x_k (input, bit-reversed)after stage 1after stage 2after stage 3x_0X_0x_4X_1x_2X_2x_6X_3x_1X_4x_5X_5x_3X_6x_7X_7Total butterflies for N = 8: 12 (≈ N · log₂N = 24). Direct DFT would need N² = 64 complex multiplications.
stage 0 of 3

The Cooley–Tukey radix-2 FFT computes the discrete Fourier transform in three stages for N = 8 (general: log₂N stages). Each stage applies N/2 "butterflies" — each butterfly is a pair of complex additions weighted by a *twiddle factor* w = e^(−2πi · k / m). The input is reshuffled in bit-reversed order (so that x₀, x₄, x₂, x₆, x₁, x₅, x₃, x₇ for N = 8), then the butterflies combine pairs at increasing scale: stage 1 pairs adjacent samples, stage 2 pairs across groups of 2, stage 3 pairs across groups of 4. The output X_k emerges in natural order. The total cost is N log₂N ≪ N² for any non-trivial N, which is why an FFT for a million samples takes microseconds and an explicit DFT would take minutes.

The interactive shows the Cooley–Tukey signal-flow graph for N=8N = 8. The input on the left is in bit-reversed order. Each of the three stages applies four butterflies in parallel; the active butterflies at the current stage are shaded. After three stages the output is in natural order on the right. Total: 3×4=123 \times 4 = 12 butterflies, each two complex multiplications. The naive DFT would need 8×8=648 \times 8 = 64 multiplications.

When N is not a power of 2

The pure radix-2 FFT requires NN to be a power of 2. There are generalisations:

In practice you pick a buffer size that’s a power of 2 (or a smooth product of small primes) and use whatever the standard library provides. For audio, N=1024N = 1024 to N=4096N = 4096 are typical.

Where FFT shows up

The FFT is the workhorse algorithm behind any application that processes signals or fields in the frequency domain. A partial list:

The history — Gauss had the FFT in 1805

The Cooley–Tukey algorithm was published in 1965, in James Cooley and John Tukey’s six-page paper An algorithm for the machine calculation of complex Fourier series. The paper is credited as the foundation of modern digital signal processing — within a decade, every audio compression scheme, every radar pulse compression, every MRI reconstruction depended on it. By the 1980s the FFT was running on dedicated DSP chips in millions of consumer devices.

The algorithm had been written down before. Carl Friedrich Gauss, in 1805, was fitting trigonometric series to astronomical observations of the orbits of the asteroids Pallas and Juno. He computed the Fourier coefficients of his data points via what we now recognise as a radix-2 decomposition — the same butterfly structure as Cooley–Tukey, with the same O(NlogN)\mathcal{O}(N \log N) scaling. He wrote the calculation in a Latin notebook entry that was never published in his lifetime; the relevant section appeared only in Volume 3 of his collected works in 1866, a year after Cooley and Tukey were born. The Gauss algorithm was found by historians of mathematics in the 1970s — after the FFT had already conquered signal processing under Cooley and Tukey’s names.

The lesson, as far as there is one: an algorithm that no one knows about benefits no one. The FFT’s 160-year hibernation between Gauss and Cooley–Tukey is one of the clearer cases of “the right idea, in the wrong notebook, at the wrong time.” Modern numerical computing’s debt is to the rediscovery and its consequences, not to the original.

What we use this for

The bookshelf’s spectrogram and frequency-spectrum interactives use FFT under the hood:

The next and final lesson, 10.6, covers root-finding — the algorithm behind finding eigenvalues of the characteristic polynomial in Foundations 5.2, among many other things.