“Until now, if you were to hang an advertising poster in the pedestrian zone, and wanted to know how many people actually looked at it, you would not have had a chance,” said Andreas Bulling, head of the Perceptual User Interfaces group at Saarland University and the Max Planck Institute for Informatics.
At the moment, this information is captured with special eye tracking equipment which needed minutes-long calibration, and everyone has to wear such a tracker. Real-world studies, such as in a pedestrian zone, or even just with multiple people, are very complicated and in the worst case, impossible.
Together with his PhD student Xucong Zhang, and his former PostDoc Yusuke Sugano, now a Professor at Osaka University, Bulling has developed a new generation of algorithms for estimating gaze direction.
These use a neural network where a clustering of the estimated gaze directions is carried out. In a second step, the most likely clusters are identified, and the gaze direction estimates are used for the training of a target-object-specific eye contact detector. This means the tracking can be carried out with no involvement from the user, and the method can also improve further, the longer the camera remains next to the target object and records data. “In this way, our method turns normal cameras into eye contact detectors, without the size or position of the target object having to be known or specified in advance,” said Bulling.
The researchers have tested their method in two scenarios: in a workspace, the camera was mounted on the target object, and in an everyday situation, a user wore an on-body camera, so that it took on a first-person perspective. The result: Since the method works out the necessary knowledge for itself, it is robust, even when the number of people involved, the lighting conditions, the camera position, and the types and sizes of target objects vary.
“We can in principle identify eye contact clusters on multiple target objects with only one camera, but the assignment of these clusters to the various objects is not yet possible,” said Bulling. “Our method currently assumes that the nearest cluster belongs to the target object, and ignores the other clusters. This limitation is what we will tackle next. This paves the way not only for new user interfaces that automatically recognize eye contact and react to it, but also for measurements of eye contact in everyday situations, such as outdoor advertising, that were previously impossible.”
A demonstration video is at https://www.youtube.com/watch?v=ccrS5XuhQpk