How do machines mimic human speech?

Machines are learning to talk. Through the combined efforts of language scientists, acousticians and electronics experts, synthetic speech is allowing clocks to announce the time, machines to read to the blind and cars to warn their owners that it’s time to fill up.

In order to develop such chatty contraptions, linguists first had to learn what makes up a word. Linguists have broken down human language into a small number of identifiable sounds, or phonemes. All the words in Standard English are said to be composed of just 40 to 50 basic phonemes stung together and adjusted for syntax.

A computer is taught to recognize and synthesize words in one of two ways. In the first, known as synthesis by analysis, it takes recorded samplings of the human voice and analyses their sound waves every one-hundredth of a second. It then extracts and stores certain key attributes, such as predominant frequencies and energy levels. Later, the machine is able to mimic these impulses electrically and using filters, oscillators and noise generators, turn them into sounds. Since the computer monitors each tiny nuance, synthesis by analysis can produce extremely lifelike voices. Vocabulary, however, is limited to those words actually programmed into its memory.

The other method, synthesis by rule, allows enormous versatility because any word can be produced. The computer is programmed with the basic phonemes and the rules of pronunciation and stress, from which it assembles words. But what is gained in flexibility is lost in clarity, since it’s difficult to reduce all the permutations of pronunciation and inflection to a single set of rules. Regardless of the technique used, voice-synthesis systems are becoming ever more commonplace.

Related post:

Leave a Reply

Your email address will not be published. Required fields are marked *