ARM Cortex-R82 targets storage computation

Technology News |
By Nick Flaherty

ARM has launched its first 64bit real time processing chip design, adding in an MMU to run Linux.

The Cortex-R82 is twice the performance of the previous 32bit R8 and R5 designs and is aimed squarely at real-time embedded systems, particularly Solid-State Drives (SSDs). These systems have historically required less then 4GB of DRAM and addressable space and have not needed to run Linux. Now, with AI algorithms, there is a need for higher performance, real-time compute with more addressable space and the ability to run Linux, says Neil Werdmuller, senior manager for storage solutions at ARM.

With a footprint of 2mm2 for four cores in a 5nm process, the Cortex R82 can be used as part of an array of cores, some running a real time operating system or bare metal code direct on the core, and others running Linux, configurable in software.

This compares to one major disk drive developer, Western Digital, design its own 32bit embedded processor cores around the open source RISC-V architecture. However ARM points out that around 85 percent of hard disk drive controllers and solid-state drive controllers use its technology.

The 64bit ARMv8-R instructions with 40 address bits to support 1TByte of memory directly, but it is adding the optional memory management unit that allows the ability to run Linux, accessing the Linux ecosystem that has been ported, optimized and validated on ARM’s Cortex-A cores. It also supports ARM’s Neon technology for accelerating Machine Learning frameworks that are widely used in computational storage applications, for example with the ARM NN library for example which can search for a specific image in a drive full of images.

A Cortex-R82 core can still be configured with a Memory Protection Unit (MPU) to run bare metal and RTOS, but the same core can also be configured with the optional MMU. Both the real-time and MMU context switching can be handled by the same core simultaneously, or selected cores in a cluster can be dedicated to real-time or Linux, which increases the flexibility of an SoC designs. This choice is handled by software and can even be changed dynamically, enabling the balance to be dynamically adjusted depending on demand.

In such a situation, managing the protection and security of the cores is key. The core has three Exception levels (ELs). EL2 is the highest level that enables a Secure enclave and separation/isolation of virtual machines for OEM code and customer code. More specifically, a Memory Protection Unit (MPU, real-time) context running at EL2 handles context switches between MPU and MMU contexts at EL1 with OEM and/or OS code while user code runs at EL0.

This means Linux can be running and when a real-time event occurs, the processor can switch to handle the real-time event, then switch back to Linux. The security enables isolation of the main firmware and enables end customers of Cortex-R82 based devices to add custom software, either real time or Linux based. 

The direct addressability enables very large memory or device real-time systems and improved performance over windowing solutions.  This large address space can be accessed either over AXI or CHI to enable additional capabilities including atomics and cache stashing.

The ability to run both real-time and Linux on the same core or cluster of cores is key in emerging technologies such as computational storage. The real-time capability is required for the data transfers through the SSD, but running Linux and associated software tools directly on the drive provides computational workload management and filesystem recognition. This greatly reducing data movement, latencies, and energy consumption says Werdmuller.

Using an array of R82 cores means one SoC could be used for an ordinary enterprise SSD and reconfigured for a CSD product or even be dynamically configured through software to run SSD functions during the day and switch to Computational Storage at night.

On a 5nm process, an array of four R82 cores with 32KB L1 instruction and data cache, full floating point and SIMD engine as well as 1Mbyte L2 cache runs at 1.8GHz at 30DMIPS/mW and takes up 2mm2.

Related ARM articles 

Other articles on eeNews Europe


Linked Articles
eeNews Europe