ARM’s Bifrost steps up graphics, bridges to machine learning

Technology News | May 30, 2016

By Peter Clarke

PLDs/FPGAs/ASICs Digital Signal Processing MPUs/MCUs Wearables Software & Embedded tools Memory & Data Storage Displays & Interfaces

The architecture includes maths capabilities that could be used by other software as part of a heterogeneous system architecture. That could include neural network software but ARM executives stressed that Bifrost is first and foremost an architecture for raster, tile-based graphics processing units (GPUs).

The previous architecture – Midgard – is the one that underlies ARM’s T-series Mali GPUs and has up to 16 unified shader cores and SIMD [single-instruction multiple data] instruction set architecture. Bifrost supports up to 32 unified shader cores with a scalar ISA, full hardware cache coherency and something called clause execution.

Top level architecture of Bifrost showing up to 32 universal shader cores, Source ARM.

Inside the shader core showing quad-thread fragment management and execution engines. Source: ARM.

The primary goal, according to Sean Ellis, GPU architect with ARM, was to achieve more performance per square millimeter of silicon and per line of “real-world” shader code. And this has been achieved to tune of about 50 percent through the use of a new scalar, clause-based ISA, with quad-based arithmetic units

Whereas Midgard GPUs use SIMD vectorization Bifrost GPUs will use quad vectorization in which four scalar threads from a 2 by 2 pixel are executed in lock step. Each thread fills one 32-bit lane of the hardware and four threads doing a vec3 FP32 add takes three cycles. In short quad-vectorization is compiler friendly and improves resource utilization.

Clause execution is another refinement that is used to reduce overhead compared with the previous graphics architecture. A “clause” is defined as a sequence of instructions that are self-dependent and without variable latency. Whereas previously temporary registers are used after every instruction under Bifrost an architecturally visible state through temporary registers is only guaranteed after each clause. The back-to-back execution of instructions within a clause allows for aggressive optimization and saves power. Clause boundaries are decided in the compiler, Ellis told journalists and analysts.

When asked if there was specific support within Bifrost for GPU-computing – where the GPU is used to run software to which it may be better suited than the CPU core cluster – ARM executives said that decisions had been taken to include support for a variety of data types that are not generally used in graphics. These include 8, 16 and 32bit integers as well as 16-bit floating point.

The FP16 can be used for some pixel shaders at twice the nominal throughput. Similarly Bifrost supports 64bit floating-point precision at half nominal throughput. Meanwhile the integer math and FP16 are useful for deep learning applications, Ellis said.

ARM has never been particularly keen on the raytracing approach to graphics rendering, which is a completely different approach to tile-based rendering. Indeed it acquired Geomerics Ltd. in 2013, a leader in software engines for lighting effects in software games. Ellis told eeNews Europe: “Ray tracing is not explicitly excluded [from Bifrost]. But we can do lighting, shadowing, glare effects in other ways.”

Vulkan

Vulkan is a 3D graphics API for the next 20 years, said Jem Davies, ARM Fellow and vice president of technology for media processing. “Vulkan 1.0 was released in February with unprecedented support. It is available on the desktop in Windows and Linux and will be supported in upcoming N generation of the Android operating system.

“In 2014 the traditional 3D APIs were in trouble with unpredictable performance and the emergence of proprietary efforts such as Mantle, DX12.” So a crash effort in a next-generation OpenGL initiative was launched. AMD donated its Mantle technology.

The major result is that under Vulkan more responsibility is given to the application making for a lower overhead driver. The driver handles memory allocation, resources, and thread management to generate command buffers. Vulkan is multithread and multicore friendly and error checking is opt-in, said Davies. “Vulkan is a great fit for mobile graphics architectures because there is no wasted effort trying to look like a desktop GPU,” he added.

ARM already has Vulkan drivers for T880/T860/T760 and the Mali-G71 driver is ready and awaiting silicon.

And progress continues with Vulkan 1.1 expected soon, said Davies. “I think we will see features added to further reduce power and bandwidth. Thermal throttling of processors is a big deal.” Davies said that texture compression helps in this regard and AFBC [ARM Frame Buffer Compression] is becoming commonly supported but when asked if AFBC would be standardized within Vulkan 1.1 said: “We would welcome AFBC being established as standard but its unlikely.”

However, Vulkan 1.1 could also include further developments to support GPU-compute. “The GPU-compute voice is getting louder as time goes on,” Davies said.

Related links and articles:

www.arm.com

www.khronos.org

News articles:

ARM takes VR/AR mobile with GPU core

ARM acquires Geomerics