« Chuc Mung Nam Moi
» Some observations about the difference in tones and vowels between northern Vietnamese and southern Vietnamese

Speech recognition software and the human brain

02.10.08 | admin | In general, Mandarin, Vietnamese, Chinese, French

If you’ve ever used any speech recognition software you’ll know that the state of the art in this field is still far from perfect, far from the performance of a real human being. Even with big players like IBM and Microsoft, with the latter including speech recognition in its Vista operating system, speech recognition software still requires significant amounts of training by the person using it yet still makes many mistakes.

Why is this? Why do computers make such seemingly trivial mistakes that even a child wouldn’t make? Part of it is that we don’t speak as clearly as we think we do. Fortunately, given appropriate context the listener can figure out what we’re saying quite seamlessly. Homophones, words that have the same pronunciation but are spelled differently, are generally not a problem in spoken language. But computers have trouble distinguishing words and phrases that to us seem very different. Sometimes we are lazy about properly enunciating each word. And often times words run together without us noticing but computers have a hard time deciding who wear one word ends and another word begins.

What does this have to do with learning a foreign language? Well, humans have the same problem when listening to a foreign language that they don’t know. We don’t know where one word ends and one word begins. When listening to French, with its liaison connecting the often unpronounced end of one word, it’s especially difficult although there are some tricks. For example, in French words almost never begin with a ‘z’ sound and rarely have it in the middle of a word so usually when you hear that sound it is an ’s’ or ‘z’ at the end of a word.

But in general when listening to French you have to know 90% of which words are being spoken in a sentence even if you don’t know the meaning otherwise you will just hear a stream of random syllables. Many languages are like this although in monosyllabic languages like Chinese and Vietnamese (nominally) it’s less of a problem because every “word” is just a vowel (or diphthong, etc.) surrounded by optional single consonant sounds.

And in any language the more grammar you know the more you can place words in context by category and the more vocabulary you know the more complete context you will have two separate the words you don’t know. In the beginning though when you don’t know most of the words its difficulty even repeat or write down a spoken sentence because it’s just a jumble of sounds rather than a smaller number of distinct words. And this is the problem that speech recognition software has because computers mostly rely on sound and the probability that two or more words go together. Beyond that computers generally don’t “understand” a sentence in order to distinguish homophones and the similar sounding phrases. So when we as humans try to understand a foreign language we must strive to go beyond that and understand enough of what’s being said to guess the meaning of the words we don’t know.

Leave a comment

Add your comment below, or trackback from your own site. Subscribe to these comments.

Be nice. Keep it clean. Stay on topic. No spam.

You can use these tags:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>

:

:


« Chuc Mung Nam Moi
» Some observations about the difference in tones and vowels between northern Vietnamese and southern Vietnamese