ARM tweaks its way to higher graphics and AI performance

ARM tweaks its way to higher graphics and AI performance

Technology News |
By Nick Flaherty

The changes support more shader cores in the 10nm designs and machine learning algorithms for artificial intelligence. 

The ARM Mali G72 supports the same 32 shader cores in the BiFrost architecture that was introduced with the G71, but now the sweet spot for the design is around 20, up from 12 to 16 in previous devices, says Anand patel, director of product marketing for the media processing group.

“We expect new process nodes with higher core configurations,” he said. “As you go to higher core counts you tail off in performance, but with a core to core comparison you see more performance at the higher numbers.”

This has been achieved by optimizing the tiler block that controls the data flow to the shaders. “We have increased the capability of the tiler to improve the scalability and reduced the area of the individual shader cores so the net area is smaller on the same process,” he said. “It is significant tuning, identifying issues we didn’t see in G71 with the latest gaming content.”

ARM has made a big thing about AI performance, predicting a 50x increase with the cluster of A75 and A55 cores. Instructions such as reciprocal square root and General Matrix Multiply (GeMM) have been tuned in the G72 for machine learning frameworks such as Caffee, as well as reducing the execution cycles of other less used instructions.

“We have not designed the G72 specifically for AI – the improvements we have made do help with AI but our focus is still graphics,” he said.

“Some of the gains are from targeted implementation,” he said. “We have increased the local L1 memories in certain places in the shader cores and as the GeMM operation is memory intensive and I/O bound this reduces the need for memory bandwidth and improves the performance.”

Anand sees a variety of architectures being used for AI. “What we have learnt over the last few years with computer vision is that no one size fits all, some applications work better on the GPU, others on the specific accelerators, so you need a system level approach,” he said. “The GPU is coherent with the dynamIQ cluster so you don’t suffer from a cache miss. The frameworks can start to target specific IP, so the recent libraries launched by ARM allow the frameworks to target a GPU, a CPU or an accelerator. We need to be flexible across all the different platforms, whether it’s gaming or automotive.”

ARM expects the G72 to be in mass production in consumer devices in Q1 2018, which means 10nm system on chip devices are taping out now. Synopsys for example has supported customer tapeouts of designs using the A75 and A55 cores.

If you enjoyed this article, you will like the following ones: don't miss them by subscribing to :    eeNews on Google News


Linked Articles