MENU

How voice recognition works in today’s connected cars

How voice recognition works in today’s connected cars

Technology News |
By eeNews Europe



Audio has a long, sometimes difficult route in the connected car — traveling from your mouth all the way to the speech recognizer “hearing” what you said. In the short version of this journey, there are two halves:

Part 1: Inside the car cabin

The first half of the journey takes you through the interior of the vehicle – from your mouth to the car’s microphone. Unfortunately, cars can be very noisy environments. If you could hear the bumping, grunting, and shuffling of the stagehands, would you really enjoy the play? Think of everything that you can hear in the car: engine revs, potholes, tractor trailers passing on the right, kids playing in the backseat, windshield wipers, climate control noise… and then finally your voice.

Take potholes: A common condition on many country roads that I frequent… You engage the VR system and say “Call Al”. You merge to an off-ramp at just the right time — and the VR system might actually hear “Call <BUMP><BUMP><BUMP>” instead. Competing voices is another common pitfall in the voice-enabled car. While driving with your kids, you attempt to change the radio station by voice. The VR system now has to interpret what is meant by “DADD…” – “Tune to DAD 100.3 FM” – “…DYYY”. Noise and interferences like these can cause significant misrecognitions and other unwanted behaviors from the VR system.



Part 2: Inside the voice recognition (VR) system

The second half of audio’s journey can be equally difficult. Having a correctly-structured audio configuration within an infotainment system is critical to a successful user experience. During a voice recognition dialog, the system must know when to start and stop listening for the user (the “listening window”). Like stagehands opening and closing the curtain during scene changes, this has a significant impact on the user experience. If the curtain opens early, the audience sees what they shouldn’t. If it closes too quickly, the audience will miss key elements of the plot. In the case of the car’s VR, this is equivalent to the system hearing “<BEEP> Dial 911” or “Dial 1-800-5//cutoff” respectively. Both situations could cause the user to get an unexpected result.

Other areas of audio configuration also present potential difficulties for the end user experience. A common reaction to a non-working VR system is to speak more loudly with each failure (as humans, we sometimes do this in conversation to make sure we are clearly heard). But what if the audio level in the voice recognizer is already configured at too high a volume? Yelling will only make the problem worse, frustrating the user with multiple failed recognitions. This is why proper configuration and tuning is so important.

The voice-enabled car of the future will selectively ignore driving and passenger noises, allowing a seamless and error-free experience for the operator. Luckily for us, the future is approaching quickly. Today, there are exciting new technologies aimed at addressing some of these common audio challenges. New developments in Digital Signal Processing allow both stationary (like road and fan noise) and non-stationary (like road bumps) noises to be well-suppressed. Other new technologies allow the system to ignore interfering speakers (one variant is called “off-axis suppression”). With this enabled, passengers are able to hold side conversations while you speak voice recognition commands without worry.
It’s time to “Barge-in”


But you don’t expect only to be well understood. Wouldn’t it be great to activate your car by voice or interrupt it at any time? With Dragon Drive you are already able to “Barge-in” and use “Wake-up-words”. Barge-in allows the user to speak at any time during a dialog – no more waiting for your chance to speak. Wake-up-word is a customizable feature of the voice recognition system to start a new dialogue. “Hello Dragon” is one example of this, or you can even customize your Wake-up-word to be most anything you want.

Did you know that you can literally wake up your car with your voice? Just speak and an ‘assistant-like’ experience is at the ready, delivering content, information, or whatever else you may need. This allows for a completely hands- and eyes-free voice session, which is both convenient and safe in an automotive setting. And go ahead – interrupt your assistant at any time – no more waiting for your turn to speak. Wake-up-word and barge-in are at your service.

Most current automotive voice recognition systems require the press of a button (the Push-to-talk button, or PTT) to initiate voice dialog or to interrupt prompt playback. Pressing the PTT button is often followed by a “beep” sound, which indicates the system is listening. Then the user is presented with two options: wait to speak until after the beep sounds, or speak over the beep and system prompts with no hope that the message is understood (…not unlike candidates at a political primary debate…!). This is somewhat unnatural when compared to normal human conversation, though. In practice, this type of system design risks the user speaking too early – by not waiting for or over-anticipating the “beep” – causing the first words of the user’s command to be missed. With wake-up-word and Barge-in, everything changes.

Wake-up-word is a technology that allows initiation of the voice recognition system by speaking a predefined phrase, such as “Hello, Dragon.” Wake-up-word specifically refers to starting a new dialog by voice. With Wake-up-word enabled, the user can start a fresh voice recognition session at any time by simply saying “Hello, Dragon” (or customize it to almost anything you want). The system can be ready for wake-up-word commands even while the user is listening to music or in the middle of navigation route guidance.


Barge-in technology, while similar to Wake-up-word, offers a slightly different user interface enhancement: it allows a user to speak at any time during a dialog. Imagine someone is sending a text message by voice. After finishing dictating, the system might read back a final confirmation – Please confirm: your text will go to John Smith. It says: “Hi John, are you available…” Rather than waiting until the whole text message is read back, or having to hit a steering-wheel button to confirm by voice, the user can simply speak, “OK send it,” at any time. Literally – just barge-in and speak. Specialized digital-signal-processing allows the system to listen to your speech while ignoring any sounds of its own that are being played in the car cabin.

Barge-in and Wake-up-word make the in-car experience incredibly intuitive, allowing you to speak and converse with infotainment systems just as you would another person. The back and forth dialog is far more natural, more conversational.
Going forward, what if I cannot just talk with my car, asking for information or give instructions to the system but also am able to pay at a parking lot or toll booth with my voice because the car verifies my identity per voice biometrics and provides a token of my identity to the payment system, with the voice as a password rather than having to mess with signatures or typed PINs and passwords?
I would buy a car like that. And chances are, once my car has gotten to know me and created a personal and efficient UI for me during its lifetime, I’d stick to that carmaker and get a car from the same brand again, if it allows me to carry my settings to my next car, so right from the start, the new one recognizes me and provides me with something I want from my car: my own, familiar experience.

About the author:
Connor Smith is Senior Audio Engineer at Nuance Communications, Inc.

If you enjoyed this article, you will like the following ones: don't miss them by subscribing to :    eeNews on Google News

Share:

Linked Articles
10s