Neurons firing at the interim stage
Such neural networks — convoluted or deep or in other formats — have the ability to learn, or be trained, and are useful for a broad range of applications from the recognition of patterns in data, to image recognition, to face and gesture recognition and on to natural language processing. Many of these techniques could become key to the efficient implementation of the Internet of Things, drone deployment, and automotive driver assistance systems, to mention but a few application areas.
Movidius, a friend of Google, and Qualcomm have both recently made announcements (see Movidius shows neural network stick and Qualcomm offers neural network SDK for Snapdragon processor). Cadence has also made an announcement (see Embedded neural networks: Cadence’s latest DSP target) and Ceva has been working on this front for a while (see CEVA invests in gesture recognition software firm). There are also numerous startups getting involved.
It seems that we are on the brink of a neural networking hardware revolution. And such a revolution could yet upset the pecking order in processor architectures. Intel and ARM should, and I am sure are, keeping a close watch on the developing situation.
Interim
However, I contend we are still at an interim stage. In these latest developments the vendors are running neural networks as software on processors either primarily optimized – or also optimized – for other functions (such as GPUs that render graphics or general DSPs). But this interim stage may not last long. Quite soon we may see dedicated neural processing units (NPUs) added to SoCs.
As an example Qualcomm has not yet included a dedicated neural processor unit (NPU) in its Snapdragon range of processors, but it does offer its Zeroth neural network processing software platform running heterogeneously on the Kyro CPU, Adreno GPU and Hexagon DSP cores within the Snapdragon 820. If deep learning applications come forward piggybacking on the existing computing resources within smartphones and mobile devices it is likely only a matter of time before more specialized NPUs would become desirable.
There are certainly broad system-level energy efficiency advantages to doing such neuromorphic processing locally rather than “in the cloud.” But they can only be realized if it can be done without killing the mobile devices’ batteries.
Thus the reason for introducing dedicated NPUs is that, while GPUs, with their multiple processing elements run NNs much more efficiently than single- or quad-core CPUs, a purpose-designed NPU will run NNs even more energy efficiently than GPUs.
At that point economic and other trade-offs enter the scene. Using multipurpose cores to perform multiple functions, designs can be silicon-area (cost) efficient but run the risk of software complexity and conflicts between cores required to run visual routines and to run NN software for image recognition. By adding specialized NPUs those conflicts go away but the die area and cost increase.
And then there is also the consideration of when an application-specific neural network becomes too specialized and is unable to command a sufficiently large market to justify its existence. This latter conundrum is, of course, one that has been puzzled over many times in world of the CPU since the dawn of software-programmable hardware.
The lesson to be learned is that the financial rewards will go to the NPU design group that hits the sweet spot between the right degree of generality and of application-specificity…perhaps somewhere along the lines of the human brain and sensing systems — but not necessarily so.
Related links and articles:
Google’s deep learning comes to Movidius
Movidius shows neural network stick
Qualcomm offers neural network SDK for Snapdragon processor
Embedded neural networks: Cadence’s latest DSP target
CEVA invests in gesture recognition software firm