1.5 The McGurk effect

Watch (and listen to) the clip below. Try the demonstration the presenter sets up: keep your eyes open, then close them, and notice what you hear.

BBC Horizon's demonstration of the McGurk effect. Lawrence Rosenblum's lab has been documenting this since the 1970s.

What is happening

The audio track is the same throughout. What changes is the face on the screen. When you watch a mouth articulate /ga/ while the audio plays /ba/, your brain integrates the two streams and reports a third syllable, often /da/. With your eyes closed and the audio alone, you hear /ba/ — the literal acoustic signal. Open your eyes and the percept moves to /da/, against everything the cochlea actually sent your brain.

The McGurk effect — first reported by Harry McGurk and John MacDonald in 1976 — is one of the most studied phenomena in multisensory perception, and one of the cleanest demonstrations that the auditory percept is the result of inference, not transcription. The brain has a generative model of how mouths produce sounds; when the visual input is incompatible with the audio, the brain finds the most likely syllable that could produce both, and reports that.

In Bayesian terms (movement 9):

P(syllableaudio,video)P(audio,videosyllable)P(syllable).P(\text{syllable} \mid \text{audio}, \text{video}) \propto P(\text{audio}, \text{video} \mid \text{syllable}) \cdot P(\text{syllable}).

If only the audio is present, the likelihood term is sharp on /ba/ and the listener hears /ba/. With incongruent video, the joint likelihood is highest somewhere else — typically a place-of-articulation compromise between the two. The brain commits to that compromise and the listener hears it, clearly, with no sense that anything was reconciled.

This effect is one of the strongest cases that perception is the output of a generative model running inference, not a transcript of sensory input.

We will pick this up in detail in movement 9.