Woman talking through a megaphone

Cheering, Talking, Singing

For many of us, the voice is an unreliable tool. Through inflammation, stress, injury, and emotional trauma, we may lose our voices and discover that, for all its limitations, speech is an unreplaceable means of communication (though speech contains may unvoiced sounds also).

We have an impressive ability to yell, though it’s not recommended. Yelling stresses the voices of coaches, drill instructors and others more than we might guess. This is an occupational hazard that drill  instructors share with voice actors.

They won’t deafen the recruit by yelling because they can’t yell loud enough to deafen themselves. I haven’t gone through the literature on the psychological effects of yelling, but it might be interesting that in communicating power and status through the voice, researchers noted less variability in pitch and more variability in loudness.

The particular contributor to loudness that is responsible for the “command voice” is resonance. Even if you have no immediate need for voice dominance, you could demonstrate the effectiveness of resonance by shattering a wine glass.

We may look upon yelling as uncivil, but consider the Lombard effect, or raising your voice in a noisy room. It may feel uncivil, but it’s actually involuntary. It’s the reason that the noise in a noisy restaurant is bound to get louder.

Our voices don’t just communicate dinner orders and football plays, of course. They express our emotions, using a variety of acoustic patterns, perhaps more effectively than our facial expressions do. Sometimes vision triumphs, as in the McGurk effect. If it didn’t, it would be all the worse for ventriloquists.

A man and a woman are shown at a restaurant table, chatting.

BIO: As for the mechanics, the larynx and voice are fascinating adaptations. Here’s the really quick story* about the one musical instrument the musician never gets to see.

While exhaling, we tighten and relax the vocal folds (or vocal cords) in our throats (or larynges) to make a sound. (The singular form is larynx, which should never be pronounced “lair-nicks”.) The vocal folds are made to vibrate in an air current and make sound by the Bernoulli effect.   (To see vocal cords in action, go to this site and click on “tutorial”, then on “next”, where you’ll find a stroboscopic video of voicing in action. Here’s the explanation of what you saw.)

Since the voice is a social instrument, our uses of it benefit from having listeners, who have biological adaptations of their own. The shape of the pinna assists our hearing, though the details can get complicated. But the contours are not random.

The outer ear always seems to be pink in young animals, doesn’t it? That’s related to the fact that ears do more than hear.

Does it seem funny that we create artificial pinnas to enlarge our own when we cup our hands around our ears to hear better? It boosts middle-frequency sound intensities by about 8 dB.

This is like listening to the ocean in a seashell. Our cupped hands are sound reflectors, like high-backed chairs, but to even more effect they are resonators, like the seashell.

The cartilaginous ridges and folds of the external ear are mostly there for structural support. They don’t provide much resonance (p. 9). The ear canal (auditory meatus, concha) resonates maximally at 3-4 kHz, right in the middle of the speech range.

A speaker creates vowels by re-shaping the vocal tract (mouth and throat) into different Helmholtz resonators.

We also cup our hands around our mouths when we shout, but that’s a different principle–actually two principles: directing the sound and preventing some back-interference, like a megaphone. Still, it puts us in an exclusive category with orangutans and harmonica players.

Both talking and singing make use of sound resonance, which is particularly noticeable in making your singing sound operatic in the shower stall. Because the basilar membrane varies in stiffness along its length from base to apex, the resonance to sound differs along its length as well. Near the base, where the membrane is stiffest, it resonates most strongly to high frequencies. At the apex, a looser membrane resonates more to low frequencies. You can see how this works in animations here.

Resonance has sometimes been called “sympathetic vibration”. The external ear canal amplifies sound by resonance. The tectorial membrane resonates to sound, too. One of the biggest resonators in the body are our bones, which we experience as bone conduction.

The funny way our voice sounds when we have a cold is partly due to fluid in the middle ear, which presses outward on the eardrum. The middle ear is normally an air-filled cavity, drained of fluid by the Eustachian tube. When a cold makes the walls of the Eustachian tube swell, fluid is trapped in the middle ear. Then the ossicles can’t perform their main function, impedance-matching. Normally sound waves in air flail weakly against the eardrum and must be amplified by the ossicles to make an impression against the resistance, or impedance, of the denser fluid within the cochlea. Ever notice that when you are underwater you can’t hear the lifeguard very well? That’s because of an impedance mismatch. The lifeguard’s voice bounces off the surface water without having the energy to produce sound waves in the water, which is denser. So you don’t hear him.

Then, from a cold or underwater, when we can’t hear airborne sounds so well, bone conduction becomes more noticeable. Bone conduction* is an important factor in how we judge sounds. (You can see for yourself in the illustration below, from this site. It shows the cochlea embedded in bone.)

Movies have used infrasound to arouse fear, but mostly they use other tricks. Like screams.

Speech and voices evolved together, along with other including gestures and brain organization as a complex example of coevolution still built as much on hypothesis as on evidence. Clearly speech demands extensive brain connections to make use of the vocal apparatus, but answering the chicken-egg (research here) and horse-before-cart questions of coevolution will not be easy.

*There are longer ones here, there, and yonder.

PSYCHO: As far as our sense of hearing is concerned, sound varies in intensity and frequency, which we experience as loudness* and pitch**. A third feature, timbre, allows us to distinguish the overtones that accompany sound frequencies.

We can examine these in a sound spectrogram. You can download for free a fascinating tool for displaying sound spectrograms of your voice or other sounds, called Audacity. (Here’s the manual, if you prefer instructions to tinkering. Incidentally, I have no connection to the folks responsible for Audacity.)

If you would like a look at sounds you might not be able to generate, try this website. Just click on the drop-down menu labeled “sound sample”.

The listening that makes voice production useful drives the complicated process of speech perception, which you are invited to pursue further in this article and that one. Several surprising facts will turn up. For one thing, we have learned to categorize each speech sound into a recognizable English sound, even if it is not typical. This is categorical perception.  Although 6-month-old infants are “universal listeners” who can distinguish the sounds in any language, a few months later they hear only speech sounds that belong that belong to their native language.

Another surprise is that word boundaries are so often illusory. We hear the boundaries between one word and the next, but we’re the ones who placed them there! We usually don’t divide speech sounds into words when we speak them, but only when we listen to them. This is the segmentation problem of distinguishing one word from another. Listen to yourself say “We were away a year ago.” Now look at a website that shows such a sentence’s sound pattern. (It’s near the bottom of the Web page.) There are no word boundaries in that sentence! This is typical of our speech.

There is often more than one way to place breaks in a speech stream to mark the word boundaries. Evidence suggests that we hold several possible meanings of a sentence in mind until the end of a sentence, then quickly (and unconsciously) choose the best interpretation. Ambiguity, which is extremely common, slows the process. And when our top-down expectations don’t match the bottom-up sensation of speech? The result is a mondegreen (and its bilingual equivalent, soramimi).

Now add to that our propensity for mispronunciation. Speaking an unfamiliar language creates even more mondegreens. The French invented a new verb for this, yaourter. However, a team of Belgian and Swiss psycholinguists have made an argument that syllable segmentation saves us from outright despair.

If we were aware of the ambiguity of speech we wouldn’t be tricked as often into hearing mondegreens. (For example, isn’t it reasonable that linguists love ambiguity more than most people? That’s an ambiguous sentence.) But ambiguity helps us to communicate by making speech more efficient, surprisingly (research here). To remove the ambiguity we rely mostly on context. Without ambiguity we would have an even harder time communicating, despite its unexpected effects

So I guess we should be proud of our mondegreens. Lots of folks think that accents (which most of us have, to some extent) are what make speech hard to understand, but they aren’t a big problem after some practice unless we can’t hear well or we’re getting on in years. We’re mostly stuck with them, anyway.

Incidentally, efforts to identify the most ambiguous English word show up on the Internet now and then. Favorites include “get”, “set” and “run”.

As the MIT Research article above indicated, such ambiguity is related to the efficiency of English, which can express the speaker’s thoughts without being required by grammar to specify gender, emphasis, and other conditions that inflections bring. There are other efficient languages–Indonesian, Mandarin Chinese–as well as highly inflected, inefficient ones such as Kabardian, Dyirbal, and Icelandic, as well as an artificial language called Lojban that may be the least ambiguous language on earth.

Psycholinguists also distinguish lexical ambiguity and syntactic ambiguity, but this discussion is supposed to be about the voice. I don’t think ambiguous voicing is usually a problem.

One further point is that the voice is a stress indicator. The three major processes involved in voice production—breathing, phonation, resonance—are altered by stress. Consequently, a dysphonia may arise from stress and the voice may sometimes betray a lie.

*Because loudness is a subjective experience, there are subjective units for loudness. They’re called phons and sones. They become important when you want to explain how the loudness (or loud) button on a radio works or why one animal sounds louder than another. It’s not just a matter of decibels.

** We don’t so much hear a pitch as invent it. Pitch is the subjective impression of a sound’s frequency. Our ears are not physical measuring instruments, but it is possible to measure the sound’s pitch, though only indirectly. Pitch can be measured in mels.

SOCIAL: Speech must have evolved as a social trait, and not just to report the weather. We emulate the speech of our friends, for example.

The voice itself is intrinsically social (research here), and infants are correspondingly good at recognizing Mom’s voice.

More generally, there’s evidence that men prefer higher pitches in women’s voices and women find lower pitches more attractive in men’s voices. Sadly, recognizing people by their voices is a challenge for people with phonagnosia.

We relish the chance to cheer or sing with others (and it may have specific clinical benefits, though participation in musical performances without singing confers its own social rewards.)

If you and your browser allow programs using Adobe Flash, you can test your knowledge of the voice with a Jeopardy-style game here. Then patch up deficiencies in your score there.

Leave a Comment

Your email address will not be published. Required fields are marked *