3.1 Speech reception threshold and word recognition

The two foundational measurements of speech audiometry are the speech reception threshold (SRT) and the word recognition score (WRS). Together they characterise the listener’s two complementary speech abilities: at what level speech becomes intelligible, and how clearly the listener can identify words once they are well above that level.

The SRT

The speech reception threshold is the lowest level (in dB HL) at which the listener correctly identifies 50% of presented two-syllable words (“spondees” — words with equal stress on each syllable like baseball, hotdog, cowboy). The CID W-1 word list of 36 spondees has been the US standard since the 1950s; equivalent lists exist in most clinical languages.

The procedure mirrors pure-tone audiometry:

Present a spondee at a level well above expected threshold (typically pure-tone average + 30 dB).
Lower the level in 10-dB steps until the patient fails to repeat.
Increase in 5-dB steps until reliable repetition resumes.
The SRT is the lowest level at which the patient correctly repeats at least 50% of presentations.

The SRT should agree with the pure-tone average (PTA at 500, 1000, 2000 Hz) within about ±6 dB. A discrepancy of more than 10 dB suggests one of:

A misunderstood test — the patient didn’t grasp the task. Common with elderly patients, children, or those with cognitive impairment.
A functional / non-organic loss — the patient’s pure-tone thresholds are higher than their actual sensitivity (e.g., compensation-seeking exaggeration). SRT-PTA disagreement is the audiologist’s classic flag for further investigation.
A central auditory processing disorder — the patient hears the tones but processes the words poorly. Rare but real; SRT > PTA in such cases.

The WRS

The word recognition score measures, at a level well above threshold, what percentage of mono-syllabic words the patient correctly identifies. The standard test in the US is the CID W-22 list (50 phonetically-balanced mono-syllables like give, toe, bath), each presented in the carrier phrase “You will say…”. The patient repeats; the audiologist scores.

Mono-syllables (as opposed to the redundant spondees used for SRT) are chosen because they offer minimal contextual cues — the patient must rely on the acoustic signal alone, without the linguistic redundancy that would let context fill in missed phonemes. This makes the WRS sensitive to subtle phonemic confusions that the SRT would miss.

The WRS is reported as a percent-correct at a specified presentation level, conventionally 40 dB above the SRT (or at the patient’s “most comfortable level,” MCL). A normal-hearing listener should score 92–100% at 40 dB SL.

The psychometric function

If the WRS is measured at multiple presentation levels, the resulting performance-intensity function shows the standard psychometric shape: a sigmoidal climb from 0% to a maximum, with the 50% point at or near the SRT.

listener:

Moderate SNHL shifts the curve further and lowers the ceiling — the patient can never reach 100% no matter how loud. The speech reception threshold (SRT) is the level at which the listener correctly repeats 50% of presented words. The maximum word recognition score (WRS) is the asymptotic performance at high presentation levels. The slope of the psychometric function reflects the listener's ability to use additional acoustic information as level increases. Rollover — a decline in WRS at high levels — is the classical audiometric flag for retrocochlear pathology (e.g., vestibular schwannoma) and warrants MRI.

Pick a listener type and read the psychometric function. Three features deserve attention:

The SRT (50%-correct point) shifts to the right with increasing hearing loss. Mild SNHL produces an SRT around 35 dB HL; severe loss can push it past 75 dB HL.
The ceiling drops with severity. Normal-hearing listeners reach 100%; moderate-SNHL listeners may plateau at 85–90%; severe losses can ceiling around 70%. The phrase “speech becomes loud but garbled” describes a listener whose curve has shifted up but whose ceiling has dropped below 90%.
Rollover — a decline in WRS at very high levels — is the audiometric red flag for retrocochlear pathology. The classic example is a vestibular schwannoma (acoustic neuroma) pressing on cranial nerve VIII. The cochlea works, the brain works, but neural transmission across the affected segment degrades at higher intensities (more fibres saturate or fail). The rollover ratio (peak WRS − WRS at maximum tested level) is the quantitative test; ratios > 0.4 typically prompt MRI.

Most-comfortable and uncomfortable levels

Two additional speech-derived measurements:

MCL (most comfortable level) — the presentation level at which the patient prefers to listen for sustained speech. Typically 35–50 dB above SRT for adult listeners with normal or near-normal hearing; higher for impaired listeners.
UCL (uncomfortable level, also UL or LDL) — the level at which speech becomes physically uncomfortable. The dynamic range (UCL − SRT) is the useful operating range for a hearing aid. Normal-hearing dynamic range is about 100 dB; loudness recruitment in cochlear hearing loss can compress this to 30–40 dB, which is the central design problem of WDRC compression (see Ch 7).

History

⏳ The history — Carhart and the CID lists

The Central Institute for the Deaf (CID) in St. Louis, founded in 1914, became the primary American research centre for clinical hearing measurement under its director Edmund Prince Fowler and then Hallowell Davis. During and after WWII, with thousands of veterans needing aural rehabilitation, CID developed the standardised word lists that became the US clinical baseline.

The CID W-1 spondee list (36 two-syllable words) was published by Hudgins, Hawkins, Karlin, and Stevens in 1947 and remains in clinical use today, though digital recordings have replaced the original 78-rpm phonograph records. The CID W-22 mono-syllabic word list (50 phonetically-balanced words across four 50-word sub-lists) was developed by Hirsh, Davis, Silverman, Reynolds, Eldert, and Benson in 1952 as the open-set word-recognition standard.

Raymond Carhart, at Northwestern, championed using both — SRT plus a separate WRS at a comfortable level — as the speech-audiometry “fingerprint” of a hearing loss. Carhart’s clinical protocols, codified in his 1971 Modern Developments in Audiology chapter on speech audiometry, are essentially the protocols US audiologists still follow.

The biggest modern evolution is the move from quiet to noise: speech-in-noise testing (HINT, QuickSIN, BKB-SIN, AzBio) developed from the 1990s onward to address the well-documented fact that quiet WRS poorly predicts real-world function. We cover those in 3.3.

What’s next

The next lesson, 3.2 — The speech banana and the audibility map, pivots from behavioural speech testing to a spectral picture of speech: the long-term average speech spectrum, the famous “speech banana” on the audiogram, and the question of which phonemes a given threshold curve renders inaudible.