
Cognitive hearing aid puts DNNs to work with the wearer’s brain
Here, the neural recording serves as a way to identify to which speaker (among multiple interfering speakers) the user is listening to, so the hearing aid can then use its DNN-based audio source separation algorithms to only amplify the speech of the appropriate speaker. Although this approach could be combined with other more traditional tricks, like audio beam forming, it is more accurate in automatically and dynamically tracking a user’s direction of attention.
With their paper “Neural decoding of attentional selection in multi-speaker environments without access to clean sources” published in the Journal of Neural Engineering, the researchers claim a breakthrough in auditory attention decoding (AAD), bringing cognitively controlled hearing aids one step closer to reality.
The hearing aid system relies on deep neural network (DNN) audio source separation algorithms that work with the spectrogram of simultaneous speakers. The audio mixture is fed to several DNNs, each trained to separate a specific speaker from a mixture (DNN Spk1 to DNN SpkN).
Simultaneously, as the wearer is attending to one of the speakers, a spectrogram of that speaker is reconstructed from the neural recordings of the user, which can then be compared with the outputs of each of the DNNs. The spectrogram most similar to the neural reconstruction is then converted to audio and added to the mixture to amplify the speech of the attended speaker.

In order to automatically separate each speaker from the mixture, the researchers employed a method of single-channel speech separation that uses a class of DNNs known as long short-term memory (LSTM) DNNs. Each DNN was trained to separate one specific speaker from two speaker mixtures, but the researchers also propose a system that could work in a real-world situation where a device would contain multiple DNNs, each trained to separate specific speakers, any of whom may or may not be present in the environment.
Adding a novel speaker only requires about 20mn of clean speech (for example when the listener would be conversing with only one person), the authors note, but once trained, the system only takes 10 seconds to identify which speech to track and amplify.
The authors argue that although the number of DNNs (and associated speakers) could be limited by hardware constraints, such smart hearing aids could off-load some of the computing to a nearby smartphone.
What’s more, thanks to the live neural recordings, the DNN-based hearing aid system was able to dynamically switch from one recognized speaker to another, naturally following the wearer’s focus from one conversation to another, performing a +12 dB amplification relative to the speech mixture. Using audio amplification rather than filtering means the wearers can still hear other speakers in the background, making it possible for them to switch their attention should they choose to do so.
In the absence of a known target speaker, then cognitive hearing aid could switch to a default operation.
In their study, the researchers relied on invasive electrocorticography (ECoG) recordings, placing electrodes directly in contact with the brain. They did so to determine the varying contributions of different auditory cortical areas to AAD. But in previous work, they have already established the feasibility of applying such AAD techniques to non-invasive neural recordings too, making it more accessible.
Columbia Engineering – https://engineering.columbia.edu
