The Pneuro Engine is a multipurpose energy-optimized accelerator designed for neural networks and image processing chains. It utilizes a clustered SIMD architecture optimized for MAC operations (as found in SIMD extensions of popular processor architectures) but with a distributed memory optimized for near neighbourhood accesses and design reuse management.
The research team is from CEA List based at Saclay-Paris and at Embedded World demonstrated the use of an FPGA-based Pneuro on a GlobalSensing Technologies board to perform face recognition. The same technology has also been ported to a Raspberry Pi 2B containing a quad-core Cortex-A7, and to an Odroid XU3 board based on a quad Cortex-A15 processor.
The embedded convolutional neural network requires 450 kilo operations per second and has 60 neurons in a hidden layer and was tasked with identifying faces from a database of 18,000 images, which it was able to do with a 96 percent accuracy rate.
Pneuro has more than five times higher performance when implemented in FPGA than when running in software on an ARM processor and because of the parallelism offered this can be achieved at much lower clock frequencies, as shown in the table below.
It would be more efficient again if implemented as synthesized hardware in an SOC. The engineers estimate that in 28nm FDSOI manufacturing process technology the Pneuro block would occupy less than 0.5 square millimeters and could run at up to 1.8 tera operations per second.
Frederic Surleau, the research executive responsible for industrial partnerships, is contactable through CEA Tech List at Saclay, Paris to discuss licensing and collaboration.
Related links and articles: