Voice input processing for automotive speech recognition systems

Voice input processing for automotive speech recognition systems

Technology News |
By eeNews Europe

The automobile environment is one of the most challenging in this respect. A variety of noise sources both outside of the car (passing cars, honking horns) and inside (multiple passengers talking, the air conditioning fan, the radio) along with audio reverberations off the hard surfaces result in the lackluster performance with which many car owners are familiar.

Further, in order to avoid false triggers, the driver of the car needs to push a button to trigger the speech command system. This is not just a nuisance but also a safety hazard.

Yet few applications could benefit more from using speech recognition for voice command operation than the automobile. It is therefore critical and of great value if technology can make speech recognition more effective in cars, detecting commands reliably in the presence of all disturbances without use of button-presses. While fundamentally being a speech recognition problem, performance improvements will primarily come by processing the voice input signal by removing noise and disturbances.

In recent years, one of the key areas that Conexant has focused its vast experience in audio technology is in Voice Input Processing (VIP). By doing careful design from the microphone interface, providing clean bias signals and low-noise pre-amplification and gain control, to implementing complex digital signal processing algorithms on its high-performance yet low-power DSPs, Conexant has been able to deliver VIP devices for a number of applications including TVs, home appliances and automobiles. Within those applications, one of the primary advantages of using the Conexant solution is to improve the performance of speech recognition engines, where the Conexant solution has been optimized for many of the common speech recognition algorithms for use in challenging environments.

To achieve superior performance, several algorithms are employed to enhance the desired input signal and suppress noise sources in a coordinated manner. Conexant’s Selective Source Pickup (SSP) algorithm is uniquely able to separate the desired signal from the noise sources by analyzing statistical and spatial information in the signal.

The interference coming from the local loudspeakers is cancelled with Conexant’s advanced Multi-channel Acoustic Echo Canceller (MAEC), reverberation is suppressed with a novel de-reverberation algorithm, and the remaining environmental noise is attenuated by a Non-Stationary Noise Reduction (NSNR) algorithm. Tuning these algorithms together, and in particular if they are tuned for a specific speech recognition engine, can vastly improve the word hit rate without any changes to the speech recognition system.

Figure 1. Disturbances in automobile environment

Selective source pickup (SSP)

Independent Component Analysis (ICA) is an emerging area of research within audio technology that attempts to separate or extract different voice or noise sources. Established in the early 90s, it is based on the idea that the underlying sources of a mixed signal are statistically independent. Using prior knowledge of the statistics of the certain types of signals combined with the measured correlation parameters, adaptive techniques can in fact separate or “de-mix” the combined signal to extract one or more of the underlying sources. Typically, ICA algorithms require an extreme amount of processing power and memory. This makes them impractical for implementation in embedded real-time systems.

Conexant’s SSP algorithm utilizes some of the fundamental ideas from ICA, reduces these requirements to a practical level and yet delivers on the promise of separating one talker from another talker or from the environmental noise using only two microphones. The decision of which source to extract can be made in real time. The algorithm can simply extract the dominant talker or use the position of the talker with respect to the microphones to decide what signal to extract. In effect, this allows the VIP to zoom in on a single talker in a room or car filled with interference from other sources, which can be extremely useful for a speech recognition application in an automobile environment.

Multi-channel acoustic echo cancelling (MAEC)

One of the most controlled sources of noise in a car is the audio being played back from a radio or CD. Most current speech recognition systems require that audio playback be either attenuated or fully squelched for the recognition system to work.

However, using echo cancellation techniques, the audio playback signal can be estimated as it appears at the microphone and subtracted out, leaving only the desired voice signal for the speech recognition engine. This is common practice for speakerphone conversations over Bluetooth, but with audio playback there are typically multiple speakers playing the audio, potentially from multiple independent tracks.

This makes the echo cancelling problem significantly more difficult, as the algorithm must try to estimate the different echo paths from the different speakers for multiple tracks from a limited number of microphones. Some MAEC algorithms require modification of the playback signal to be able to de-correlate the signals to where these echo paths can be resolved. For high-fidelity audio such modifications are not acceptable.

Conexant has developed an MAEC algorithm that does not require any such modifications yet is able to deliver true full-duplex performance. The result is that the speech recognition engine can reliably detect speech even with audio playback at a high level, without the need for a button-push to lower or squelch the playback level first.

High dynamic range analog to digital converter (ADC)

Human hearing has a large dynamic range, allowing people to hear signals over a 100dB span. This in turn means that signals encountered in everyday life span this range, from low-level whispers to rock music testing the upper limits of hearing tolerance. For an audio input system to effectively process its relevant inputs and remove all extraneous noise, the analog to digital converter needs to be able to cleanly convert the signals at all levels without saturating the high-level ones and at the same time avoiding noticeable quantization noise in the ADC itself or noise from the input amplifier.

Many of the DSP algorithms depend on linearity of the input signal, and if it is saturated they will quickly break down. If the speech input itself is saturated, the speech recognition algorithm will perform poorly. To achieve optimal performance, Conexant has developed a microphone pre-amplifier and ADC that can achieve 106 dB dynamic range, enough to cover the range of signal levels required for an automobile environment. For example, when playing loud music from the car radio, this dynamic range allows the MAEC to estimate and linearly subtract the radio signal echo received at the microphones, leaving only a clean representation of the driver’s commands.

Figure 2. Conexant’s CX20805 voice input processor

Conexant’s dedicated voice input processor

Conexant has developed an integrated device, the CX20805, that performs all necessary voice input functions for speech recognition for multiple environments, including the automobile. It includes low-noise microphone preamplifiers and high dynamic range ADCs supporting up to four microphones, combined with a low power yet high performance 800 MIPS DSP to perform the sophisticated algorithms described above. Putting this all together, Conexant is able to offer a solution that significantly improves the quality of voice reception in multiple environments. In particular, it has the potential to enable voice command systems in automobiles to work reliably and dependably to where voice command becomes the preferred method of control by the driver.

About the author

Sverrir Olafsson is VP of engineering at Conexant, managing audio product development. He has managed the development of various technologies for Conexant, including voice-band data modems, VoIP, Wireless LAN and MFP. He was one of the key developers of Rockwell’s and Conexant’s data and fax modems, and an active participant in the V.34, V.90 and V.92 standards developments. Olafsson holds a bachelor’s and master’s degree in electrical engineering from the California Institute of Technology. He holds over 50 patents in communication technologies.

Article by courtesy of EE Times

If you enjoyed this article, you will like the following ones: don't miss them by subscribing to :    eeNews on Google News


Linked Articles