Vision-In-Package system integrates real-time face detection and recognition

Technology News | April 20, 2017

By Julien Happich

Optoelectronics Sensing / Conditioning Digital Signal Processing

The Vision-In-Package (VIP) system, as they call it, packs a camera system with a low-power processor (ARM Cortex M4/M7 with 8MB RAM), a high-dynamic range imager, optics, and a communication interface. The system occupies only around 4 cm³ and weighs less than 20g including a battery cell and features a complete facial analysis pipeline running in real time and fully embedded within the VIP system.

The software is compact and stand-alone with no external dependencies. It is comprised of a minimal version of the uKOS operating system (developed under the μKernel project – www.ukos.ch) and a face analysis package running on it. Unlike existing systems that run on powerful hardware architectures, the VIP system requires several orders of magnitude less CPU time and memory and the analysis pipeline runs at around 4-5 frames per second at QVGA resolution.

First, all the faces in an acquired frame are detected, which typically takes less than a hundred ms to run and requires only a few hundred KB of RAM memory. Then facial attributes, such as corners of the eyes and nose, are located within each detected face region and the face undergoes a normalization step (a rough geometric transformation that aligns the eyes horizontally and scales the face to a standard size, together with a photometric normalization that re-moves non-linear intensity variations caused by shadows and non-uniform illumination). Then actual face recognition takes place, extracting descriptive features at landmark locations to uniquely identifying people in a database of registered faces. New individuals can be registered to this database instantly at any time with just a single click and without requiring any re-training.

To achieve this, the researchers used efficient machine learning algorithms including the Adaboost, ensemble of regression trees and LBP algorithms, which they trained on millions of examples with ground truth annotations. The resulting classifiers typically take a few hundred kilobytes of space and are fast to run even on low-end mobile processors, they say.

The standalone unit could find use in wearables, marketing and advertisement analytics (collecting viewership and demographics data, robotics (for more personalized interactions) but also among TV manufacturers (1984-style TV sets are coming), the automotive industry (to monitoring driver drowsiness and distraction or for automated settings adjustments) and the pervasive security cameras (getting smarter every day).

CSEM – www.csem.ch

Related articles:

The future of video surveillance: HD, hyperspectral and stereoscopic

If you enjoyed this article, you will like the following ones: don't miss them by subscribing to : eeNews on Google News