VTT depth camera tracks shoppers
The research focuses mainly on behaviour pattern analysis based on data fusion across a network of low-cost ceiling-mounted depth cameras (measuring the distance to a variety of surfaces with the help of a laser dot pattern operating in the IR range).
With appropriate refresh rates and data fusion performed across a single coordinate system for unified tracking spanning the physical space covered by the cameras, the video analytics algorithms are able to track moving customers in a wide range of lighting conditions (including complete darkness), and can handle view occlusions that are typically difficult to interpret using 2D cameras.
In a histogram fashion, traffic heat maps can be superimposed over the shop’s floor to analyse how the products layout impacts the consumers’ whereabouts and dwelling time.
In the paper which will be part of a workshop on “Video Analytics for Audience Measurement in Retail and Digital Signage”, the researchers explain that depth sensors are less privacy-threatening than 2D or 3D stereo cameras, as they do not provide actual photographic information.
Though that may still be combined with other aspects of audience measurement as they are considered in this workshop, such as gender recognition, age group estimation, ethnicity recognition, emotion analysis, or free eye gaze estimation (think about adaptive displays that trigger their advertising message when you gaze in their direction).
Anyhow, as a way to keep up with online competition, the retail and advertisement industries are seeking more data out of their customers’ in-shop behaviour to measure their engagement with products, booths, or newly launched campaigns.
And store performance optimization, as the paper describes it, comes in the shape of always-on analytics. Exact real-time information on customer behaviour goes beyond today’s point-of-sale data analysis and opens up new adaptive advertising scenarios to funnel more customers towards the cashier.
Doing so, the researchers identified three classes of motion patterns to be acted upon: passers-by in passage areas, decisive customers also dubbed “quick shoppers” as these know precisely what they are looking for, and exploratory customers also qualified as “slow shoppers”.
From a retailer’s perspective, identifying these classes of motion is an enabler, since different classes of advertisements (concise or more detailed) will be served to different types of customers. Sound and lighting effects distributed throughout the shop could even come into play for the most elaborate scenarios. In fact, retail shops of the future may well have to display warning signs that by entering their premises, you agree to be subconsciously manipulated to the maximum extent permitted by law.
Depth versus 2D cameras
“On the sensing side, with off-the-shelf depth sensors, the detection range is about 8 meters with a accuracy of 1 to 10cm depending on the distance from the sensor. With upcoming stereo camera version we can extend the range up to 20 meter, but this version is not yet in the piloting phase”, told us Johannes Peltola, Team Leader for Smart Service Interfaces at VTT.
But why not use regular video analytics to determine people’s positions on video streams from installed CCTVs instead of relying on a dedicated network of depth sensing nodes?
Peltola acknowledges that competing solutions include cameras or thermal sensors. He pointed out that while the range and accuracy of thermal imagers is about the same as depth sensing cameras, their price is higher. Cameras allow a longer range but they are less accurate in people detection and tracking.
“We could and we have developed such camera analytics tools. The camera is performing well, if it is positioned directly upwards from the monitored area. This will limit the range and it is often unpractical due to normal ceiling height. Calculating the people flow from typical CCTV stream is possible, but in crowded situations it generates much more errors compared with the use of depth information. Better visual algorithms require a lot more CPU/GPU power, so low cost processing is difficult to achieve” says Peltola.
But what more information does depth sensing provide you? We asked.
“Compared to a camera image, depth allows better object segmentation (segment based on distance rather than texture that may be similar between two objects or different inside a same object). It improves calibration since sensor position and physical object properties can be calculated directly from sensor data.
Depth information also helps managing occlusion, when one person is temporarily blocking the view of another. The distance information tells you directly which object is behind which object and based on distance it can be tracked again when it becomes visible”, clarified Peltola.
WiFi-enabled depth sensors
To add more data to the system, could you conceive depth sensing from a product’s perspective? For example, a product shelf or a booth could detect people getting close, reaching for a product?
“It’s bit early to define the exact functionalities for a commercial product, but we are investigating how to detect groups, group behaviour, front of the booth behaviour (e.g. reaching the shelf), predicting next point of interests, etc.” told us Peltola.
In fact VTT has another paper in the making, “Predicting Consumers’ Locations in Dynamic Environments via 3D Sensor-Based Tracking”, to present the technology at the 8th International Conference on Next Generation Mobile Apps, Services and Technologies.
With this knowhow, the Finnish research center is investigating the possibility to spin-off a company that would commercialize low-cost WiFi-enabled depth-sensing nodes.
“In current pilots we are already using ARM Cortex A9 based small/cheap computers, so it is just housing the sensor to the same package”, added Peltola.
Pilots of the tracking system will run this summer and autumn at Shalkwijk Shopping Centre in Haarlem in the Netherlands, and with Procter & Gamble in Brussels, and in the city of Rovaniemi in Finland.