Convolutional neural networks: What’s next?
CNNs—artificial neural networks comprised of one or more convolutional layers—have practical applications in a variety of electronic systems. They’re powerful in their ability to quickly process large volumes of data to derive intelligent decisions. And they’re invaluable in the growing number of real-time systems that we use, from smart watches to security devices and advanced driver assistance systems (ADAS).
Compared to traditional pattern-detection methods, CNNs are advantageous because they can be trained efficiently (see Figure 1 for a depiction of how neural networks are trained). They’ve demonstrated the best possible correct detection rates versus other detection algorithms—and even when compared with people.
A commonly cited example is that of the German Traffic Sign Recognition Benchmark (GTSRB). Traffic sign recognition algorithms, in conjunction with a proprietary hierarchical CNN methodology, have achieved a correct detection rate of 99.8%. In the future, we may be able to train CNNs reliably for more complex deep-learning tasks, such as judgment and strategy, particularly if the networks become sophisticated enough to recognize actions and context accurately.
Reducing Complexity of CNNs
In the near term, the industry continues to research ways to reduce the complexity of CNNs without sacrificing any of their accuracy. Automation provides one way to accomplish this. One of the key challenges that engineers who want to use CNN algorithms face is, they are unsure which deep-learning network to design to process and analyze the large amounts of data they’ve amassed. Often, they need to turn to an expert, who assesses the data and recommends the most optimal network to build.
Efforts are underway in the industry to automate the decision process that determines the starting point for the network, along with the steps that come afterwards. This level of automation can address complexity by optimizing for compute reduction, memory sizes, and number format. These efforts could eventually lead to simpler, much more efficient implementations of network designs, area and power advantages for the architecture, and an end to manual training of the network.
The industry also continues to examine ways in which engineers are using neural networks and, specifically, looking at the problem of network simplification in the grand scheme of things. Neural networks have initially been used as classifiers.
For example, in an automotive application, they can distinguish between a tree and a traffic sign in real time. Now, neural networks are being extended into full scene analysis. Using our automotive application example, full scene analysis would involve more sophisticated recognition activities such as the ability to first distinguish between a sign, a stop sign, and a man wearing a shirt with the number 50 on it, and then to be able to respond accordingly based on the assessment.
There are also ways to explore frame sequences to address larger issues such as context of the scene. For example, from a single frame it’s hard to determine if a pedestrian on the sidewalk is about to cross the street or not; frame sequences would provide more clarity. Additionally, sensor fusion technologies open up new possibilities beyond the images captured by traditional RGB cameras. By automatically fusing additional information that is captured by sensors into the pipeline of the neural network, that network can be trained to provide even richer insights.
Improving Efficiency of Today’s Neural Networks
There’s also room to vastly improve the efficiency of today’s neural networks. Today’s networks require too much memory bandwidth and compute resources, especially multiply-add. Accuracy can be enhanced and fine-tuned. Reducing redundancy in the coefficients and the effective depth of the network can help improve both the training efficiency and performance of a neural network.
Running CNN algorithms on a DSP with clusters of cores, rather than a GPU, can also yield more efficiency, along with performance scaling. Indeed, a bandwidth-optimized vision cluster with configurable processors is one path for supporting high-bandwidth, low-energy computations in a variety of deep-learning applications.
As neural networks continue to evolve, they will most likely proliferate in cloud-based applications and extend into real-time embedded functions. In addition, processor platforms will need to be optimized for CNNs in such a way that they support the required power constraints and extreme throughput needs.
As a step on this evolutionary path, Cadence has launched a DSP that was designed specifically for CNN applications. Compared to its predecessor, the Tensilica Vision P6 DSP (Figure 2) offers up to 4X better performance with quadruple the available multiple-accumulate (MAC) horsepower (MACs are a major computation block for CNN applications). When compared to commercially available GPUs, the Vision P6 DSP provides twice the frame rate at much lower power consumption on a typical neural network implementation.
While embedded devices such as smart watches have much to gain from CNN capabilities, their small form factor and power constraints make them a challenging environment for these compute-intensive algorithms. What’s more, expertise in neural networks has traditionally been concentrated in academia. As a result, neural networks aren’t yet deeply understood by embedded architects.
There’s opportunity here for software differentiation and for the emergence of optimized SoCs for low-cost, mass-produced embedded supercomputers. GPU and specialized DSP suppliers are ready to meet this demand, which, in turn, creates a need for new hardware, IP, memory, and interconnect technology for embedded devices. We are also seeing a need for deep-learning algorithms that are designed to specifically address embedded requirements.
In order for CNNs to become pervasive and deliver a broad impact, now is the time to address the complexity of the technology. Leaders in the industry are researching ways to leverage automation to simplify the development of neural networks while maintaining their accuracy.
There are also opportunities to improve the efficiency of these networks and also to address unique requirements for integrating neural networks into embedded applications. These efforts should pave the wave for CNNs to be a foundation for even more intelligent electronic products.
About the author:
Samer Hijazi is Engineering Director at Cadence Design Systems.
Cadence targets embedded neural networks
IoT processor beats Cortex-M, claims startup
Neural network stick promises new smart applications, products
Xilinx invests in ‘deep learning’ startup