8.1 Sound as a spectrum — pitch, timbre, and the frequency axis

The Fourier transform (refresher: Fourier) gives us a second, equivalent way to describe a sound. Instead of pressure as a function of time, p(t)p(t), we look at amplitude as a function of frequency, p~(ω)|\tilde p(\omega)|. The two pictures are mathematically equivalent — one is the Fourier transform of the other — but psychologically they are very different. What we hear as a sound aligns much more closely with the frequency-domain picture than with the time-domain waveform.

Pitch and pure tones

A sinusoidal pressure variation at frequency ff Hz is heard as a pure tone of pitch ff. The amplitude controls loudness; the frequency controls pitch.

Slide the frequency and hear the pitch change. Note two things:

Pure tones are perceptually clean but musically lifeless. Real musical and speech sounds are richer.

Harmonics and timbre

A note from a violin or a flute at the same pitch as our sinusoid above doesn’t sound the same. Why?

Look at the spectrum. A bowed violin string playing A4A_4 doesn’t produce a pure 440 Hz sinusoid — it produces a complex periodic waveform whose Fourier series has a large component at 440 Hz and additional components at 880, 1320, 1760, … Hz, the integer multiples. These higher components are the harmonics, and the relative amplitudes of the harmonics determine the timbre — the quality that distinguishes one instrument from another.

individual sinestheir sumtime → (9.1 ms)
440 Hz
0.80
660 Hz
0.40
presets:

Two sinusoids superposed are the simplest non-trivial periodic waveform. With a 2:1 frequency ratio you get something that sounds reedy. With 3:2 you get a perfect fifth — and the two notes blend into a single richer pitch. With a tiny ratio detuning (say 1.01:1) you get beats — a slow envelope at the difference frequency.

Why integer harmonics matter

Most musical instruments produce sounds with integer-related harmonics. The reason traces back to chapter 3: a string, a tube, a membrane, anything with simple boundary conditions on a 1-D or symmetric geometry, supports modes at integer multiples of a fundamental. The resulting sound is consonant — the harmonics align — which is what makes pitched instruments sound “musical”.

A bell, a cymbal, or a wood block has inharmonic overtones — modes at non-integer ratios of the fundamental. The resulting sound has a pitch that’s harder to identify and a quality that feels more percussive than melodic. The whole geography of acoustic music depends on this distinction.

Speech in the spectrum

Speech is even richer. The vocal folds vibrate at a fundamental frequency (around 120 Hz for adult males, 220 Hz for adult females) producing harmonics up to 5 kHz or more. The vocal tract — a closed-open tube of length about 17 cm — acts as a resonator, amplifying certain frequencies and attenuating others. The resonant peaks are called formants, and their positions encode vowel identity.

Without the formant pattern the vocal-fold buzz is unintelligible. The vocal tract shapes it — and the shaping is what carries linguistic information. The Hearing book picks up this thread when speech meets the cochlea (chapter 4 there).

Why the frequency picture matters now

We need it because most acoustic systems — rooms, instruments, the ear — act on sounds frequency-by-frequency. The next three lessons make this operational:

The Fourier mathematics is in Foundations 7; what’s new here is how it speaks for sound.