8.1 Sound as a spectrum — pitch, timbre, and the frequency axis
The Fourier transform (refresher: Fourier) gives us a second, equivalent way to describe a sound. Instead of pressure as a function of time, , we look at amplitude as a function of frequency, . The two pictures are mathematically equivalent — one is the Fourier transform of the other — but psychologically they are very different. What we hear as a sound aligns much more closely with the frequency-domain picture than with the time-domain waveform.
Pitch and pure tones
A sinusoidal pressure variation at frequency Hz is heard as a pure tone of pitch . The amplitude controls loudness; the frequency controls pitch.
Slide the frequency and hear the pitch change. Note two things:
- The mapping from frequency to perceived pitch is logarithmic. Doubling the frequency raises the pitch by an octave; an interval that sounds like a perfect fifth is a frequency ratio of (≈ 700 cents); the equal-tempered semitone is a ratio of .
- The audible range spans roughly 20 Hz to 20 kHz for young humans — three orders of magnitude.
Pure tones are perceptually clean but musically lifeless. Real musical and speech sounds are richer.
Harmonics and timbre
A note from a violin or a flute at the same pitch as our sinusoid above doesn’t sound the same. Why?
Look at the spectrum. A bowed violin string playing doesn’t produce a pure 440 Hz sinusoid — it produces a complex periodic waveform whose Fourier series has a large component at 440 Hz and additional components at 880, 1320, 1760, … Hz, the integer multiples. These higher components are the harmonics, and the relative amplitudes of the harmonics determine the timbre — the quality that distinguishes one instrument from another.
Two sinusoids superposed are the simplest non-trivial periodic waveform. With a 2:1 frequency ratio you get something that sounds reedy. With 3:2 you get a perfect fifth — and the two notes blend into a single richer pitch. With a tiny ratio detuning (say 1.01:1) you get beats — a slow envelope at the difference frequency.
Why integer harmonics matter
Most musical instruments produce sounds with integer-related harmonics. The reason traces back to chapter 3: a string, a tube, a membrane, anything with simple boundary conditions on a 1-D or symmetric geometry, supports modes at integer multiples of a fundamental. The resulting sound is consonant — the harmonics align — which is what makes pitched instruments sound “musical”.
A bell, a cymbal, or a wood block has inharmonic overtones — modes at non-integer ratios of the fundamental. The resulting sound has a pitch that’s harder to identify and a quality that feels more percussive than melodic. The whole geography of acoustic music depends on this distinction.
Speech in the spectrum
Speech is even richer. The vocal folds vibrate at a fundamental frequency (around 120 Hz for adult males, 220 Hz for adult females) producing harmonics up to 5 kHz or more. The vocal tract — a closed-open tube of length about 17 cm — acts as a resonator, amplifying certain frequencies and attenuating others. The resonant peaks are called formants, and their positions encode vowel identity.
- Ee (as in “beet”): F1 ≈ 280 Hz, F2 ≈ 2250 Hz.
- Ah (as in “father”): F1 ≈ 770 Hz, F2 ≈ 1100 Hz.
- Oo (as in “boot”): F1 ≈ 290 Hz, F2 ≈ 870 Hz.
Without the formant pattern the vocal-fold buzz is unintelligible. The vocal tract shapes it — and the shaping is what carries linguistic information. The Hearing book picks up this thread when speech meets the cochlea (chapter 4 there).
Why the frequency picture matters now
We need it because most acoustic systems — rooms, instruments, the ear — act on sounds frequency-by-frequency. The next three lessons make this operational:
- 8.2: spectrograms turn a time-domain recording into a time-frequency image where speech, music, and noise become legible.
- 8.3: acoustic filters and the room impulse response, viewed as multipliers in the frequency domain.
- 8.4: resonance is a Lorentzian peak in the frequency response, with width .
The Fourier mathematics is in Foundations 7; what’s new here is how it speaks for sound.