I’ve Always Wanted To Know:
How Does A Computer Understand Speech?
Software like Dragon Natural Speaking or services like Google Voice Search or Apple’s Siri Digital Assistant allow you to speak to a computer or smartphone and have the computer understand what you are saying. How does this work though?
The most basic description is that a microphone records sound, a program or server analyzes that sound and converts it into text to be used by the program for whatever action is desired. The process is very complex beginning with the microphone recording the sound. The microphone records not only the sound of your voice but background sounds like the spinning of fans, a car outside, your home heating system and other possible distractions for the program to deal with. This can be reduces by noise canceling microphones or multiple microphones which use hardware and software to limit the background noises recorded.
The next step involves using a server or computer program to translate the audio into text. This process involves a lot of advanced mathematical algorithms and processes to determine the words being spoken. Since people have different accents and speak at different paces the software needs to be designed to recognize a wide variety. The quality of the speech recognition is usually determined by this step since better-designed software has a higher success rate at translating the correct words. Systems are used to recognize likely words placed together, so if the initial recognition shows as “I will fall you back later” the software may choose to make the word “fall” into “call” because it is more likely the word said based on phrases commonly said.
The final step involves using what is spoken in the program. Programs like Dragon Dictation simply output the translated words into text on the screen so you can type by speaking to the program. Smartphone applications like Google Search or Siri (among many others) can have keywords or phrases which perform specific actions. The software then needs to recognize where the keywords begin and what the requested action is. If you say “Send a text message to Molly” and “Molly I need to send her a text message” to Siri the first will result in a new message being created to Molly in your address book and the second will be a typed message on screen that says “Molly I need to send her a text message.”
What can you do to make speech recognition more accurate?
- Speak clearly in a normal rate of speech and volume.
- Use voice recognition in places with lower background noise.
- Use a high quality microphone.
- Don’t use slang or uncommon words.
- Learn the commands if your software has them.
Do you have a general technology or electronics question you always wanted to know like “How does a Microwave work?” or “Why do LED’s last so long?” Write me at Tim@WorldStart.com and your question may be answered in an upcoming “I Always Wanted To Know.” For specific computer support questions ask our writers by clicking here.