We hugely underestimate how processed all of our senses are.
Hearing doesn't listen to pressure waves. It does some very complex real time source separation to distinguish between different sound sources.
Then it performs overtone and resonance matching to identify different speakers.
Then it follows up with phoneme recognition to identify words - which somehow identifies phoneme invariants across a wide range of voice types, including kids, male/female, local/foreign and social register(class)/accent.
Then it recognises emotional cues from intonation, which again are invariant across a huge range of sources.
And then finally its labels all of that as linguistic metadata, converts the phonemes into words, and parses the word combinations.
It's not until you try to listen to a foreign language that you hear the almost unprocessed audio for what it is. And even that still has elements of accent and intonation recognition.
Non linguistic elements of verbal communication are so universal nobody even really notices when fictional alien species in media communicate to human protagonists. It's noncontroversial to the audience that hostile/non-hostile, instruction, friendliness, cooperation, etc, are all embedded in the tones of all animals, robots, and even tree creatures throughout the universe.
I’ve heard garbled words over a bad connection that I didn’t understand, only to have my brain parse them seconds later without intentional effort. It makes me wonder, is the language center processing the memorized version of sound here or is it reprocessing at the lower level?
This is basically an everyday experience for me, and not limited to telephone calls. My hearing is not great, and I’ll often ask someone to repeat something only to finally finish parsing it about a second after I asked because in the intervening time my brain reconstructed a signal from a bunch of noise.
I've had the same experience my entire life. That said, testing shows my hearing isn't great, but it's not horrible either. I even have an exceptional ability to identify actors solely by their voice when others can't identify them at all. I've often wondered if there is something else going on that runs deeper than just surface level "hearing". Like I'm hearing slower but more deeply.
I suspect I have it, too. I seem to experience sound differently than other people. Separating voices from background noise is HARD, but I'm also very good (I think) at recognizing actors by voice.
I don’t know if I’m hearing more deeply, but basically the same experience: failed every hearing test I ever took but I’m not deaf either. Just need to rely more on post-processing than others, and the best I’ve been able to figure is I didn’t get enough socialization at critical points of my early development, which is true, I just can’t prove that it’s also related to my hearing.
Fascinating... this happens to me a lot also, but I never really realized what I was experiencing until reading your comment. Later on I will feel guilty asking people to repeat things when I actually understood them. I never considered what you are saying, that I didn't yet have the information when I asked them, but later did.
I often wonder if my hearing is poor, or if I am just overly sensitive to the possibility of mis-hearing people, that I would rather err on the side of confusion. I overhear a lot of other people talking that I can tell misunderstand each other, but neither are aware, and I wouldn't want to do that.
And, if it's your native language, you can't help but process it.
This is why I like talking to 4-year-olds, they see the world as it truely is, and can communicate it back out. They don't have all the conditioned learning the rest of us have, but can see a clearer picture without bias.
You're asking for a summary of the entire fields of speech perception and psycholinguistics. A good place to start is the groundbreaking speech perception experiments done at Haskins Laboratories in the 1950s by people such as Alvin Liberman.
Hearing doesn't listen to pressure waves. It does some very complex real time source separation to distinguish between different sound sources.
Then it performs overtone and resonance matching to identify different speakers.
Then it follows up with phoneme recognition to identify words - which somehow identifies phoneme invariants across a wide range of voice types, including kids, male/female, local/foreign and social register(class)/accent.
Then it recognises emotional cues from intonation, which again are invariant across a huge range of sources.
And then finally its labels all of that as linguistic metadata, converts the phonemes into words, and parses the word combinations.
It's not until you try to listen to a foreign language that you hear the almost unprocessed audio for what it is. And even that still has elements of accent and intonation recognition.