The sensor consists of an off-the-shelf VCSEL 850nm laser with an embedded photodiode, packaged with the company’s proprietary ASIC for processing the soundwaves that are detected optically from reading out the speakers’ skin vibrations. Interviewed by eeNews Europe, Rammy Bahalul, Vice President of Sales and Business Development for VocalZoom gave more details about the technology.
“When someone speaks, the sound propagates all over the skin too, and we can measure these vibrations by detecting the laser’s reflection on the skin through an interferometer. The way the interferometer works is that any back reflections interfere with the stabilized laser wavelength in the cavity, and that impacts the laser power”.
The ASIC monitors the laser power fluctuations as read by the built-in photodiode, and turns it into a noise-free “audio” signal that can then be fused with the real audio signal recorded by a microphone, either through an audio processor or cloud software.
“It is similar to bone conduction, but without contact, we can measure vibrations up to 1.5kHz”, continued Bahalul, “we are not reading lips but actual facial vibrations, these can be detected from the cheeks, all around the neck and even behind the ears.”
The optical sensor can be placed a few millimetres away up to a meter, making it practical for applications in headsets, wearables, smartphones or laptops, but also in automotive applications where it could be mounted into the rear-view mirror or in ATMs.
When tested with leading speech recognition providers, the startup claims its HMC sensor makes all the difference in noisy environments, (even in strong and complex noise), reducing almost all errors and making speech recognition more widely usable. In a high noise environment, the company is able to revive original speech from -10dB (inaudible voice versus high noise) to 20dB when VocalZoom enabled.
As well as improving speech recognition, audio signal fusion from the optical sensor and a microphone could enable many features currently served by discrete sensors. It could be used to perform more robust voice identification through multi-factor biometrics (each individual having a unique facial “sound signature”), but also serve as an accurate and low power voice wakeup solution. The sensor is accurate enough to detect the speaker’s heart rate from the skin, doubling as a liveness sensor, since it can make the difference between a sound speaker and a live person.
“For biometrics, all the processing can be done locally on the VocalZoom chip, by not having to transit to the device’s main processor, data is much more secure”, emphasizes Bahalul.
By detecting the direction of arrival, the sensor can also verify that the person of focus is in the right direction and at the right distance, say in ATM or automotive use cases, to only listen to a particular user.
VocalZoom has recently completed a design-in phase with speech recognition provider iFLYTEK who plans to launch a headset based on its technology before the end of the year. Even in a busy call centre, the audio fusion performed in the headsets would pick up a clean signal from each individual call receptionist. The two companies are also collaborating on the development of an automotive voice control and voice biometrics product, based on a far-range version of VocalZoom’s HMC sensor.
Bahalul told us the company was expecting its sensor to be integrated into motorbike helmet headsets by the end of 2017 and that it could be prototyped into a car mirror or into a car infotainment’s system by the first half of 2017.
For now VocalZoom is open to different business models.
“We could sell a complete sensor module with the laser and the ASIC on a small PCB, or for high volume OEMs, we could license the technology so they could manufacture everything and optimise the sensor, for example to integrate it into smartphone cameras” said Bahalul.
“We can perform all the audio processing on our ASIC, so MEMS microphones could share the ASIC. We are looking at multiple vendors to shrink the technology to their standards. When you actually look at the cost of the individual sensors that our solution could replace in a smartphone, we could offer a much better cost structure” he concluded.
Visit VocalZoom at www.VocalZoom.com