This is set to be a product as a distinct from what is described as a demonstrator IC, and it is a technology that CEO Carlo Bozotti is enthusiastic about. Bozotti spoke about the technology during a keynote his keynote speech for the combined European MEMS, Imaging and Sensors summits, held in Grenoble, France. The European summits are hosted by the MEMS and Sensors Industry Group under the auspices of industry organization SEMI.
Bozotti said the technology could be used to distribute artificial intelligence throughout a system based on ST32 microcontrollers and sensors.
Speaking in the main program of the event Thomas Boesch, a member of the technical staff at ST, said his company is now working on a new implementation of the technology for accelerating convolutional neural networks; one that is more optimized and more targeted.
In his talk Boesch laid out the architecture and performance of the dedicated deep convolution neural network (DCNN) SoC re-iterating much of what was disclosed at ISSCC (see ST, FDSOI lead machine learning spike at ISSCC).
The SoC is implemented in 28nm FDSOI and has an extended dynamic voltage and frequency scaling (DVFS) regime that allows it to operate with a clock frequency of 200MHz at 0.575V and then up to 1.1GHz clock frequency at 1.1V.
Next: Dial-up performance
This is useful for machine learning Boesch said giving an example where the low performance node can be used to efficiently monitor and detect an object while higher performance can be used to identify the object.
The architecture has 8 DSP clusters of two custom 32bit DSPs plus a coprocessor subsystem that includes 8 convolutional accelerators, 16 DMA engines and a universal stream switch that allows unidirectional linking of all these resources.
Each convolutional accelerator includes 36 16bit by 16bit multiplier acculumulators. In CNNs such as AlexNet or more recent and advanced architectures the convolutions themselves make up 85 to 90 percent of the work load. The architecture allows hardware accelerator to be autonomous of the DSP and concatenation of MAC operations to exploit parallelism and locality.
The overall chip includes 4Mbytes of on-chip RAM plus 8 by 192kbytes of local memory and it is controlled by a Cortex-M4 processor core operating at 1GHz clock frequency. The chip has a peak efficiency of about 2.9TOPS/W in operation on AlexNet, Boesch said.
Next: No more GPUs for NNs
Boesch would not reveal much about the second generation of the architecture but there is a clue as to at least one of the favoured application areas. Boesch was speaking in a session on technologies for autonomous vehicles. Deep convolution neural networks (DCNNs) are often used for scene classification and object recognition, vital tasks in taking vehicles towards autonomous driving. It is also notable that ST has long been a supporter of automotive supplier Mobileye, now an Intel company, and that Mobileye has expressed strong views about the need for machine learning in automotive vision (see Video: Mobileye CTO on deep learning and automotive sensing).
When asked directly what the market place for ST’s reworking of its technology is he said: “We are talking about the embedded world, mobile, you don’t want GPUs doing neural networks.”
In the automotive sector Nvidia has enjoyed some success re-applying its GPU technology to machine learning but there are signs that the industry is now turning to dedicated neural network accelerators. Boesch said that while ST’s first chip was demonstrator and a superset of the capabilities, a future design could be stripped down and be more tailored to an application either as part of an SoC or on a microcontroller. It is notable that in 28nm FDSOI the DCNN demonstrator die measures 6.2mm by 5.6mm.
It is also notable that Boesch noted the capability to link the demonstrator SoC with more of the same device to cater for large CNN loads and potentially to support distributed computation.
Although Boesch was taciturn, Carlo Bozotti, CEO of STMicroelectronics, was more effusive in his keynote speech.
“This architecture is suitable for integration into a wide range of IoT devices. The sensors can be images videos, sound, motion or environmental information or a combination of these. The neural networking array for deep learning that has been implemented in the device is scalable, depending on the type of input and the flows of training are optimized for industry standards,” Bozotti said.
He added: “Thanks to dynamic voltage scaling possible in FDSOI technology and the inherent architecture, the chip is 100 times more power efficient than low-power general purpose GPUs used for AI today and 2 to 5 times more efficient than the most advanced specialized AI and computer vision chips already announced.”
Related links and articles: