Raghavan asked if the time is ripe for dedicated machine learning technology and not surprisingly decided it was. Up until now machine learning has been done in software – algorithms running on general purpose uniprocessors – supported more recently by software running on multicore processors and GPUs and DSPs.
However, the idea of moving data up to the cloud to be processed is clearly unsustainable given the expected explosion of data that is coming, Raghavan said.
There is a desire, even a necessity, to move beyond data to information, knowledge and wisdom. And that will require more processing done at the source of data measurement to minimize the communication burden, he argued.
Next: Billions of weights and calculations
Industry is already making moves in that direction, Raghavan said, with compute-heavy training performed in the cloud and inference, recognition being done on the terminal. However, machine learning is a challenge in almost every dimension; storage; cost, computation, bandwidth and energy, he added.
Next: Roof-line limits
For given machine learning architectures there are peak performance and bandwidth limits that produce a “roof-line” limit, said Raghavan. This slide compares different data resolutions and specific architectures such as Google tensor processor unit (TPU) and Nvidia’s K80 GPU-based accelerator unit. Technology can with many of these parameters but tends to do so with cost adder. The use of MRAM to save on die area, by a factor of about three for a given process node, and power consumption could help with that.
Next: Compress the data
The use of compression to minimize storage of weights can also help, as can reduced data resolution. IMEC’s internally developed SqueezeNet can provide AlexNet level accuracy with a 510-fold reduction in model size as the slide above shows.
Next: Power-precision trade off
Raghavan then referenced a paper out of the University of Leuven, presented at the VLSI Symposium in 2016 that reported on a processor in built in 40nm CMOS that is designed to implement convoluted neural networks with selectable computational precision down to 1bit.
Next: ImageNet table
Here are the results for an ImageNet classification using binary CNNs.
Next: Table of comparison
The combination of computational reduction and memory compaction can produce marked benefits in the performance of CNNs as shown in this comparison of Intel, IBM, Nvidia hardware, the Chinese DiDianNo machine learning supercomputer, Stanford University’s Efficient Inference Engine and an IMEC ASIC simulation.
Next: Going analog
However, as well as simulating a binary machine, IMEC has made a hardware ASIC. This is a 65nm CMOS neural networking chip that is based on a ReRAM array that can perform self-learning. Raghavan showed the application of self-learning to a saw-tooth signal wave and to music.
The ReRAM memory is an OxRAM cell based on tantalum oxide and the strength of the neural network weight is modeled on the conductance of the memory cell. A characteristic of the OxRAM cell is that the more it is written to the larger the conductance becomes which is used as an analog of the strength of the synapse connection weight. About one million cells are included on the chip. The memory is arranged as an array of sub-arrays and can flexibily model multiple synapses per chip.
One of the key things of this demonstration is the ability of machine learning hardware to continue learning in the field, said Raghavan.
While some people may be dismayed at the prospect of machines acquiring human-like learning characteristics Raghavan argued that there are still many things that humans excel at that machines do not. These include so-called soft skills such as: moral judgment, compassion; imagination, abstraction and generalization. It is the case that cognitive systems will excel in areas that used to be the sole domain of humans such as pattern recognition, natural language interpretation and locating knowledge. machines can also excel at skills humans have never had such as avoiding bias and endless capacity.
In short the time is ripe for machine learning hardware and we are the start of long journey that promises to be beneficial for humankind and fun.
Related links and articles: