CPU, GPU or NNA?
Neural networks have become prevalent thanks to the vast increase in readily available compute power usually running on either CPUs or GPUs. However, AI operations are highly intensive for compute which is why when it comes to edge devices it can be challenging to achieve satisfactory performance – a dedicated hardware solution is a much more preferable option. For example, if we benchmark a typical mobile CPU at “1x” for its performance at running neural networks, then a GPU will accelerate this by around 12x. However, on a dedicated neural network accelerator, these operations can run over 100x faster (for supported layers) and if run at lower bit-depths such as 4-bit can be around 200x faster.
This approach uses a fixed-point data type with quantization to minimise the size of the model and bandwidth required. Lossless weight compression further enhances efficiency. In addition, some NNA hardware core IP supports variable bit-depth so that weights can be adjusted on a layer by layer basis to achieve maximum accuracy for the deployed model while minimising the model size to reduce memory bandwidth and power consumption. Overall, this gives very efficient performance with low power requirements. The impressive power efficiency of a small (circa 1 mm2) NNA can even enable devices to run on batteries and from energy harvesting from solar or wind power.
Use case: Image pre-processing
Sophisticated use of GPU and NNA together can allow a fish-eye lens image, say from a wide -angle/fish-eye lens to be de-warped by the GPU and then processed as input to the NNA which can then run SSD (single shot detection) for object detection on the input de-warped image. This is of real use for monitoring or when a huge field of vision is needed to be captured say from a smart camera or a camera mounted overhead or to reduce lens distortion.
Use case: Two-stage object detection
However, for networks that require intermediate processing such as selecting a trained region of interest, a face in a crowd, a two-stage object detection network like Faster R-CNN could be employed. Or another example could be chaining neural networks together where the output of one becomes the input of another, involving pre-processing, intermediate processing and post-processing. Non-Neural Network operations in the middle of the processing path can be processed on GPU or CPU depending on suitability, whilst layers that can be accelerated would run on the NNA.