
Unlike traditional high-powered and directed microphones – which can’t eliminate nearby sounds, ambient noise, and the effect of acoustics when they capture audio – the novel system uses two cameras and a laser to sense high-speed, low-amplitude surface vibrations. These vibrations can be used to reconstruct sound, capturing isolated audio without inference or a microphone.
“We’ve invented a new way to see sound,” says Mark Sheinin, a post-doctoral research associate at the Illumination and Imaging Laboratory (ILIM) in the School of Computer Science’s Robotics Institute (RI). “It’s a new type of camera system, a new imaging device, that is able to see something invisible to the naked eye.”
The researchers say they have completed several successful demos of their system’s effectiveness in sensing vibrations and the quality of the sound reconstruction. They captured isolated audio of separate guitars playing at the same time and individual speakers playing different music simultaneously.
They analyzed the vibrations of a tuning fork, and used the vibrations of a bag of Doritos near a speaker to capture the sound coming from a speaker. This demo, say the researchers, pays tribute to prior work done by MIT researchers who developed one of the first visual microphones in 2014.
The new system dramatically improves upon past attempts to capture sound using computer vision. The researchers’ work uses ordinary cameras that cost a fraction of the high-speed versions employed in past research while producing a higher quality recording. The dual-camera system can capture vibrations from objects in motion, such as the movements of a guitar while a musician plays it, and simultaneously sense individual sounds from multiple points.
“We’ve made the optical microphone much more practical and usable,” says Srinivasa Narasimhan, a professor in the RI and head of the ILIM. “We’ve made the quality better while bringing the cost down.”
The system works by analyzing the differences in speckle patterns – the way coherent light behaves in space after it is reflected off of a rough surface – from images captured with a rolling shutter and a global shutter. An algorithm computes the difference in the speckle patterns from the two video streams and converts those differences into vibrations to reconstruct the sound.
The researchers create the speckle pattern by aiming a laser at the surface of the object producing the vibrations, like the body of a guitar. That speckle pattern changes as the surface vibrates. A rolling shutter captures an image by rapidly scanning it, usually from top to bottom, producing the image by stacking one row of pixels on top of another. A global shutter captures an image in a single instance all at once.
“This system pushes the boundary of what can be done with computer vision,” says Matthew O’Toole, an assistant professor in the RI and Computer Science Department. “This is a new mechanism to capture high speed and tiny vibrations, and presents a new area of research.”
Most work in computer vision focuses on training systems to recognize objects or track them through space — research important to advancing technologies like autonomous vehicles. That this work enables systems to better see imperceptible, high-frequency vibrations, say the researchers, opens new applications for computer vision.
For example, the dual-shutter, optical vibration-sensing system could allow sound engineers to monitor the music of individual instruments free from the interference of the rest of the ensemble to fine tune the overall mix. Manufacturers could use the system to monitor the vibrations of individual machines on a factory floor to spot early signs of needed maintenance.
“If your car starts to make a weird sound, you know it is time to have it looked at,” says Sheinin. “Now imagine a factory floor full of machines. Our system allows you to monitor the health of each one by sensing their vibrations with a single stationary camera.”
For more, see “Dual-Shutter Optical Vibration Sensing.”
