8.1 Sound as a spectrum — pitch, timbre, and the frequency axis

The Fourier transform (refresher: Fourier) gives us a second, equivalent way to describe a sound. Instead of pressure as a function of time, $p(t)$ , we look at amplitude as a function of frequency, $|\tilde p(\omega)|$ . The two pictures are mathematically equivalent — one is the Fourier transform of the other — but psychologically they are very different. What we hear as a sound aligns much more closely with the frequency-domain picture than with the time-domain waveform.

Pitch and pure tones

A sinusoidal pressure variation at frequency $f$ Hz is heard as a pure tone of pitch $f$ . The amplitude controls loudness; the frequency controls pitch.

Frequency 440 Hz Amplitude 0.50

Slide the frequency and hear the pitch change. Note two things:

The mapping from frequency to perceived pitch is logarithmic. Doubling the frequency raises the pitch by an octave; an interval that sounds like a perfect fifth is a frequency ratio of $3:2$ (≈ 700 cents); the equal-tempered semitone is a ratio of $2^{1/12} \approx 1.0595$ .
The audible range spans roughly 20 Hz to 20 kHz for young humans — three orders of magnitude.

Pure tones are perceptually clean but musically lifeless. Real musical and speech sounds are richer.

Harmonics and timbre

A note from a violin or a flute at the same pitch as our sinusoid above doesn’t sound the same. Why?

Look at the spectrum. A bowed violin string playing $A_4$ doesn’t produce a pure 440 Hz sinusoid — it produces a complex periodic waveform whose Fourier series has a large component at 440 Hz and additional components at 880, 1320, 1760, … Hz, the integer multiples. These higher components are the harmonics, and the relative amplitudes of the harmonics determine the timbre — the quality that distinguishes one instrument from another.

f₁

440 Hz

a₁

0.80

f₂

660 Hz

a₂

0.40

phase Δ

0°

presets:

Two sinusoids superposed are the simplest non-trivial periodic waveform. With a 2:1 frequency ratio you get something that sounds reedy. With 3:2 you get a perfect fifth — and the two notes blend into a single richer pitch. With a tiny ratio detuning (say 1.01:1) you get beats — a slow envelope at the difference frequency.

Why integer harmonics matter

Most musical instruments produce sounds with integer-related harmonics. The reason traces back to chapter 3: a string, a tube, a membrane, anything with simple boundary conditions on a 1-D or symmetric geometry, supports modes at integer multiples of a fundamental. The resulting sound is consonant — the harmonics align — which is what makes pitched instruments sound “musical”.

A bell, a cymbal, or a wood block has inharmonic overtones — modes at non-integer ratios of the fundamental. The resulting sound has a pitch that’s harder to identify and a quality that feels more percussive than melodic. The whole geography of acoustic music depends on this distinction.

Speech in the spectrum

Speech is even richer. The vocal folds vibrate at a fundamental frequency (around 120 Hz for adult males, 220 Hz for adult females) producing harmonics up to 5 kHz or more. The vocal tract — a closed-open tube of length about 17 cm — acts as a resonator, amplifying certain frequencies and attenuating others. The resonant peaks are called formants, and their positions encode vowel identity.

Ee (as in “beet”): F1 ≈ 280 Hz, F2 ≈ 2250 Hz.
Ah (as in “father”): F1 ≈ 770 Hz, F2 ≈ 1100 Hz.
Oo (as in “boot”): F1 ≈ 290 Hz, F2 ≈ 870 Hz.

Without the formant pattern the vocal-fold buzz is unintelligible. The vocal tract shapes it — and the shaping is what carries linguistic information. The Hearing book picks up this thread when speech meets the cochlea (chapter 4 there).

Why the frequency picture matters now

We need it because most acoustic systems — rooms, instruments, the ear — act on sounds frequency-by-frequency. The next three lessons make this operational:

8.2: spectrograms turn a time-domain recording into a time-frequency image where speech, music, and noise become legible.
8.3: acoustic filters and the room impulse response, viewed as multipliers in the frequency domain.
8.4: resonance is a Lorentzian peak in the frequency response, with width $\Delta\omega = \omega_0/Q$ .

⏳ The history — Helmholtz's resonators and the analysis of tone

Hermann von Helmholtz, in his 1863 Die Lehre von den Tonempfindungen, demonstrated that complex musical tones could be analysed into their Fourier components using a set of precisely tuned acoustic resonators — hollow brass spheres, each with a narrow opening, that amplified a single frequency from the ambient sound field. By holding different resonators to his ear, Helmholtz could identify the individual harmonics present in a sung vowel or a bowed string. The experiments provided the first empirical confirmation that Fourier’s mathematics described the physical reality of sound.

The resonators also let Helmholtz demonstrate that timbre — the quality distinguishing a violin from a flute playing the same note — is determined by the relative amplitudes and phases of the harmonics, not by the fundamental frequency alone. This insight connects the physics of sound (this chapter) to the neuroscience of hearing: the cochlea performs the same Fourier-like decomposition that Helmholtz did with his brass spheres, but continuously and in real time.

Read the original: On the Sensations of Tone (Hermann von Helmholtz (tr. Alexander Ellis, 1895), 1863)

The Fourier mathematics is in Foundations 7; what’s new here is how it speaks for sound.