Speech recognition in the car
Integrated in-car speech solutions are becoming more and more popular for drivers as the technology not only helps motorists to stay connected while they’re on the move, but also helps to improve driver safety. Research suggests that in the UK, 82% of drivers that have access to speech solutions in the car, make use of this technology. Speech recognition and text-to-speech technology can enable drivers to have complete usability of their mobile phones, to input destination entries into navigation systems and to control their infotainment systems without needing to remove their hands from the wheel.
As laws regarding the use of mobile phones while driving continue to become more and more stringent in order to help improve driver safety, speech solutions can eliminate the notion that the car is a white spot when it comes to communication. However, despite this technology being available and despite these increasingly strict laws, recent research in the UK from the RAC revealed that 50% of motorists admitted to checking their phone, 21% were likely to read a social media alert and 31% admitted to texting at the wheel.
These stats highlight the desire for drivers to stay connected while on the road; however it’s essential that drivers start making the most of the technology available to them in order to improve their safety while they stay in touch. Archaic speech solutions were sometimes difficult and unintuitive to use. If drivers were unable to easily operate in-car systems and were unable to master the correct series of commands, it was likely that they would become distracted, which completely undermines the benefits of using speech solutions in the car.
The accuracy and usability of embedded speech solutions continues to improve and drivers are no longer required to learn a series of complicated commands, making it easier to interact vocally with electronic devices. By providing drivers with one, easy-to-use interface for voice control, in-car speech solutions can help to improve convenience and usability of technology within their car. For system integrators, however, the challenge can come with making extremely complex systems work together to create an easy-to-use offering for drivers.
To combat this, more and more OEMs are requesting a complete offering from speech solution providers. When developing infotainment systems however, it’s not enough for the system integrators to just have the core function engines available – text-to-speech and automatic speech recognition. OEMs are increasingly requesting that the internal communication between these functions is hidden from them so that they can present a single platform that can be easily integrated into customer systems. Developers can then re-work these platforms and include any specific requirements, patented algorithms or customised voice commands to fit their individual needs.
Infotainment systems are becoming more and more sophisticated with OEMs combining navigation, media control, point-of-interest search capabilities and phone control. An integrated speech platform can hide all of the main components that enable in-car voice control and the simple interface can help speed up the development process. Additional software can layer on top of core engines to assist with a more efficient system integration process for developers.
Below is a diagram of an integrated speech system platform (for full resolution, click here):
Using the above model, music search via voice is a prime example to demonstrate how complex processes can be hidden both from system integrators and therefore from the end-user by having an easy to manage interface in place. A music list on an MP3 player is known as a dynamic vocabulary – this is opposed to a static vocabulary such as a destination list on a navigation system. A track list will vary with every MP3 player that’s connected to the system and each time the artist and song names will be downloaded to create a list. This information will then be sent to the speech platform which will internally do all the complex processing that is necessary to create a grammar to ensure the understanding of commands and the different possibilities of vocabularies that could be used.
Dynamic vocabularies require online grammar generation as the text involved cannot be known in advance. Navigation systems, however, can have a list of destination names already stored on the system and so these have stored phonetic data that can be used for giving directions and for recognising location requests. This data can therefore be retrieved offline. The speech platform must be flexible enough to process both static and dynamic grammars efficiently.
Moving forward, speech dialog systems will become more distributed. This technology will help to enhance connectivity and productivity in the car. As well as requesting destinations and point-of-interest searches, users will be able to retrieve information such as what the weather is like at their destination or what the local cinema will be showing. People will be constantly connected to a server from their cars over a wireless connection. This will allow them to process requests with more sophisticated speech recognition systems than the ones that are currently available. It will soon be possible to have one speech recogniser available on board and one in the cloud, however the transition will be seamless between the two as all information will be presented on one simple voice user interface integrated into the human machine interface (HMI).
In order for this to be a possibility, speech solution providers need to ensure that integrating in-car speech solutions as they exist now is as simple as possible. This means concealing the complex systems that work together to provide a seamless and complete offering, including speech recognition and text-to-speech capabilities. OEMs can work with solution providers to customise their solutions as required, however implementing a middleware framework will assist in quicker and simpler integration for OEMs as well as car manufacturers.
About the author: Gerhard Hanrieder is head of speech platform at SVOX AG, Zurich, Switzerland.