An algorithm developed by European researchers allows smaller devices such as smartphones, smart glasses or wearables to decode speech without needing substantial memory or internet access.
The algorithm cuts the amount of memory required by immediately deleting the data as it is used, rather than retaining all the data to analyse all the possible combinations but still without sacrificing recognition quality.
It was developed by Professor Panagiotis Karras from the University of Copenhagenâs Department of Computer Science, together with linguist Nassos Katsamanis of the Athena Research Centre in Greece, and researchers from Aalto University in Finland and KTH in Sweden, allows
âSpeech recognition fundamentally works by matching the small speech sounds we use to form words and sentencesâknown as phonemesâwith a library of corresponding sounds,â said Karras. âProbabilities are calculated for matches and the subsequent combinations that go on to form our words and sentences. The most likely sequences are calculated and the software translates these sounds into text.â
Single words or very short sentences are generally manageable when current software needs to store alternative sequences and libraries of potential sound interpretations. However, as sentences become longer and potential word combinations more complex, the demand for RAM increases.
Instead, the SIEVE algorithm uses a beam search approach that continuously halves the problem. At every stretch along its path through the analysis, it only remembers the midpoint. The result is a significantly reduced need for temporary memory, as these “midpoints” are recalculated before the final route is presented.
âThe algorithm conceived by Panos and developed further by our team, does something entirely new,â says Katsamanis. âUnlike the existing gold standard algorithm used since speech recognitionâs early days, our algorithm only stores a fraction of the processing data, serving as a set of âcoordinates.â With these, an entire sequence can be reconstructed, which makes speech recognition possible with significantly less RAM.â
This involves entirely new code for which the researchers have sought a patent. This takes more time and computational power, but the researchers say the difference is negligible and not an issue with the increasing performance of edge devices.
âCertain small devices can already recognize and act based upon a few words without internet connectivity. For example, a smart home system can recognize keywords such as “turn on” or “turn off”. This is known as small-vocabulary speech recognition. With our algorithm, it will be possible to recognize more extensive instructions or, in principle, entire languages â without an internet connection. This is referred to as large-vocabulary speech recognition,â said Karras.
This could be used to translate foreign languages while traveling, regardless of internet access.
âThis algorithm can help democratize language technology by making information more accessible. To make translation tools and speech assistants available regardless of internet access will allow more people to engage in society. In particular, it will help people without written language skills or those with physically disabilities, by enabling them to understand and influence societal decisions,â said Katsamanis.
Another key advantage of this speech recognition invention is keeping the data secure, and reducing the overall power consumption without having to use wireless links and datacentre AI processing.
âIt is vital to reduce energy consumption to minimize reliance on fossil fuels, as many data centres still use these energy sources,â said Karras.