ARM-based SoCs from Qualcomm have already been used for Qualcomm’s initial Always-Connected PC offerings although this met with some resistance from rival Intel (see Intel warns over Snapdragon PC, Qualcomm responds). Now ARM is claiming the microarchitecture implementation in the Cortex-A76 CPU enables a 35percent uplift in performance year-over-year. The Mali-G76 provides support for both gaming and machine learning with 30 percent higher efficiency and performance over previous generations. The Mali-V76 is set up to support playback at up to UHD 8K resolution.
While the suite of IP is intended not only to enable a second generation of laptop processors and laptop computers with 20-hours-plus battery-life but can also provides more opportunities for smartphone innovation.
Based on the same ARMv8.2 architecture as its predecessor, the Cortex-A76 is also a DynamIQ big-little combination. As a 7nm core a 3GHz clock frequency Cortex-A76 is expected to achieve 35 percent more performance, 40 percent better power efficiency and 4 times the machine learning performance than a Cortex-A75 impemented in 10nm silicon and achieving 2.8GHz clock frequency.
How Cortex-A76 compares. Source: ARM
ARM has not iterated the design of the equivalent little processor which remains the Cortex-A55 which could be paired with the Cortex-A76 typically in 4/4 2/8 or 1/7 big-little combinations.
“Cortex-A76 represents the best fit for the laptop space because the performance uplifts allow exceptional delivery of the most important productivity apps such as the Microsoft Office suite, providing a much faster, smoother user experience. Cortex-A76 based laptops are expected to deliver twice the performance on the current Arm based generation,” ARM states on its website.
Several changes have been made in the Cortex-A76 to increase performance, instructions per clock cycle and deepen memory-level parallelism.
Next: Key enhancements
Some of the key enhancements include:
Decoupled branch prediction and instruction fetch: Built to hide latency at high bandwidth, the in-order Cortex-A76 front-end is able to fetch 4 to 8 instructions per cycle.
Cortex-A76 provides a 4 instruction wide decode core, increasing the maximum instruction per cycle capability. Up to 8 operations per cycle can then be dispatched to the out-of-order core.
Quad-issue integer units are integrated in the core including three simple ALUs and a multicycle integer unit. Cortex-A76 supports dual-issue native 16byte (128-bit) vector and floating-point units, twice the throughput of any previous Arm CPU.
The Cortex-A76 IP is available for TSMC 16FFC and 7FF manufacturing processes.
How Mali-G72 compares. Wider execution saves power in datapath control. Source: ARM
The Mali-G76 complies with ARM’s Bifrost GPU architecture and provides 30 percent more performance and 30 percent more energy efficiency compared with the Mali-G72 on the same node and under the same clock frequency and other conditions.
The Mali-G76, like the Mali-G52 before it, supports 8-bit integer dot product support. This is key to machine learning performance. Although some leading-edge applications may have machine learning acceleration others may not and strong support within the CPU, GPU as well as dedicated hardware would form part of a heterogeneous approach. In terms of machine learning Mali-G76 offers 2.7x the machine learning performance compared to Mali-G72.
Next: Mind the texture, feel the width
Mali-G76 also benefits from a dual texture mapper, providing twice the throughput of the Mali-G72.
The third element in a laptop processor SoC would be the Mali-V76, which supports 8K decode at up to 60 frames per second or four 4K streams at 60 frames per second giving consumers the opportunity to stream four movies, record video while video conferencing, or watch four games in 4K. And at lower resolutions, and still at full HD, Mali-G76 will support up to 16 streams of content, creating a 4×4 video wall, a very popular use case in the Chinese market.
Related links and articles: