MENU

General purpose neural inferencing engine targets DSP acceleration

General purpose neural inferencing engine targets DSP acceleration

Technology News |
By eeNews Europe



For FIR (finite impulse response) filters, the company says nnMAX is able to process up to 1 Gigasamples per second with hundreds and even thousands of “taps” or coefficients. FIR filters are widely used in a large number of commercial and aerospace applications. Cheng Wang, Flex Logix’s senior VP engineering and co-founder, disclosed these benchmarks and more at the online Linley Spring Processor Conference in a presentation titled “DSP Acceleration using nnMAX.”

“Because nnMAX is so good at accelerating AI inference, customers started asking us if it could also be applied to DSP functions,” said Geoff Tate, CEO and co-founder of Flex Logix. “When we started evaluating their models, we found that it can deliver similar performance to the most expensive Xilinx FPGAs in the same process node (16nm), and is also faster than TI’s highest-performing DSP – but in a much smaller silicon area than both those solutions. nnMAX is available now for 16nm SoC designs and will be available for additional process nodes in 2021.”


nnMAX is a general purpose Neural Inferencing Engine that can run any type of NN from simple fully connected DNN to RNN to CNN and can run multiple NNs at a time. It has demonstrated excellent inference efficiency, delivering more throughput on tough models for less $, less watts, according to the company.

nnMAX is programmed with TensorFlow Lite and ONNX. Numerics supported are INT8, INT16 and BFloat16 and can be mixed layer by layer to maximize prediction accuracy. INT8/16 activations are processed at full rate; BFloat16 at half rate. Hardware converts between INT and BFloat as needed layer by layer. 3×3 Convolutions of Stride 1 are accelerated by Winograd hardware: YOLOv3 is 1.7x faster, ResNet-50 is 1.4x faster. This is done at full precision. Weights are stored in non-Winograd form to keep memory bandwidth low. nnMAX is a tile architecture any throughput required can be delivered with the right amount of SRAM for your model.

Flex Logix – https://flex-logix.com

If you enjoyed this article, you will like the following ones: don't miss them by subscribing to :    eeNews on Google News

Share:

Linked Articles
10s