
Embedded AI enables custom gesture recognition in ToF sensor
STMicroelectronics has added several embedded AI models to its fifth generation time of flight sensor, with one allowing custom gesture recognition.
“A new feature is people monitoring with smart gesture detection. That’s a new market that was almost unknown a few years ago,” said David Maucotel product line manager in the imaging sub-group. The 3D depth sensing market is set to increase 3x by 2030, rising from $6.7bn in 2020 to $17.6bn in 2030, he says.
The latest VL53L8 is an 8 x 8 time of flight array with a metasurface lens, backed by four AI models and an analytic model for a range of human presence detections functions as well as the gesture recognition when used in a laptop or monitor. It is also adding posture recognition to highlight if people are not sitting properly.
“ST presence we are combining an 8×8 multizone sensor with an IR laser 940nm to measure the absolute distance. Then we add AI software for a range of smart features. It is not a camera, there are only 64 zones hidden behind the glass in the bezel and the data processing is very light so the AI fits into any small microcontroller in the sensor hub in any computer and this takes less than 1mW in standby,” said Herve Grotard, marketing director at ST.
ST has trained a simple neural network with around 10,000 weights with gestures, but allows engineers to train the model with their own gestures.
Other embedded AI features include putting the laptop to sleep immediately if there is no one at the screen, as well as waking up if the right person looks at the screen. There is also multi-person detection to prevent shoulder-surfing.
This can boost battery life by as much as 20%.
“We have developed algorithms that monitor the presence of the user in front to the laptop and detect a departure. As soon as we detect the user has left we want to lock the PC and then a few seconds later push it into sleep mode,” said Oliver Lemarchand, application team manager at ST.
“The sensor also provides a motion map and we use that with the distance map to detect the person. We have designed and trained a dedicated network to differentiate between a human and an object that could stay in the filed of view such as a chair, or any kind of object.”
“In Gen5 there is adaptive screen dimming and this is a pure AI solution trained to detect the user’s head orientation in six possible directions and dim the screen slowly when not looking at the screen. When we visited Microsoft they explained there is a default timer set to 5 minutes and 75% of users never change this default where anyone can use the laptop without any security while we can switch everything off in 2s.”
“A new feature is an onlooker alert and notification and some OEMs give the option to blur the screen. This is possible as the VL53L8 improves the snr and combines the distance and motion to detect four different people in front of the sensor using a purely analytic algorithm.”
The hand gesture detection is purely based on embedded AI designed by ST with the data capture. “We are willing to share that with PC OEMs, with a web application, and you can retrain the network with any gesture and deploy on any ST32 microcontroller. This is typically 10,000 weights and because they are small they are easy to deploy on a 32bit microcontroller, it doesn’t need an AI accelerator, as its just multiply accumulate instructions.”
The 8 x 8 array is the optimum size, says Grotard. “We tried with 16×16 but you get a quarter of the light in each zone so that hits the accuracy and the distance which is why we stayed with 8 x 8,” he said. The performance has also been improved with a metasurface lens.
“The previous VL53L5 was using a refractive lens but was not sharp enough to separate one zone from another. The L8 is a breakthrough as we are using a metasurface optical diffractive element and the lens is square to improve the sharpness,” he said,
“Body posture detection is something we have been developing and we have delivered to an OEM as a proof of concept for mass deployment next year,” said Lemarchand. “In discussion with doctors we wanted to see if we could use the time of flight data to monitor the posture of people in front of the laptop during the day and trigger notifications if people do not move enough.” In 2020 lower back plan affected 619m people and that is expected to increase to 843m by 2050
