ARM details updated v9 architecture

Feature articles |
By Nick Flaherty

ARM has released more details of the latest version of its ARM-A v9 architecture.

The complete Arm Architecture Reference Manual (Arm ARM), the 2022 extensions and earlier functionality, is due for release in early 2023, with the Instruction Set and System Register information released this week.

This is for the A class of processor cores, used in high performance data centre chips, rather than the M-class microcontroller cores. 

The v9 architecture has been revised in partnership with lead customers, notably Nvidia, which has suggested changes for its Grace data centre chip that uses the V2 core design. The additions include a new generation of scalable matrix vector operations as part of the scalable matrix extension (SEM2) to accelerate AI applications.

The 2022 extensions include several other updates to the Virtual Memory System Architecture (VMSA) as well as translation hardening and new instructions to make designs more secure.

The extensions introduce a new way to control memory permissions. Instead of directly encoding the permission in the Translation Table Entry (TTE), fields in the TTEs are used to index into an array of permissions specified in a register. This indirection provides greater flexibility, greater encoding density and enables the representation of new permissions.

Each TTE can select two values, a base permission, and an overlay. The base permission represents the maximum set of permissions that the block or page has. The overlay can be used to further restrict the permission.

The base permission is permitted to be cached in. This means that the effective permission of a block or page can be efficiently changed dynamically.

For operating systems, the architecture provides separate EL1 and EL0 overlay registers. This can allow an operating system to set a maximum permission for a page allocated to an application, then allow the application to further manage permissions within those constraints. For example, a JIT might be allocated a page that was permitted by the operating system to be write-able or executable. The JIT could then control, with the Overlays, whether the page was currently write-able or executable. This has the advantage of reducing the number of system calls and TLB invalidates.

Permission indirection also has benefits where the same tables are shared by multiple entities. For example, a set of tables might be used by both an Arm processor and an Arm System Memory Management Unit (SMMU).  The permissions that we want to apply to software accesses might be different to those we want to apply to an accelerator behind the SMMU. With permission indirection, the processor and SMMU can use the same tables but interpret the permissions differently.

The translation tables used by the isolation model and a high value target for attackers. The 2022 extensions introduce a series of features to harden the MMU table walk process by reducing the available attack surface.

New ARMv9 instructions 

A new instruction, RCW (Read-Compare-Write), has been added for updating translation table entries, while a new stage 2 “Most Read-only” (MRO) permission enables software to restrict what can write into a page.  A page marked as MRO permits hardware updates of the Access Flag and Dirty, as well as updates due to an RCW instruction. However other forms of store, such as STR (store) instructions, will fail with a permission fault.

Together the Protected attribute at stage 1 and MRO permission at stage 2 give robust protection against many types of attacks. The MRO attribute prevents stores, other than those from RCW instructions, from changing mappings. The Protected attribute and RCW instruction limits which fields in TTEs can be updated.

The feature also introduces a stage 2 attribute, AssuredOnly, that can be used to ensure that only Protected tables can point to a certain page. This is to help protect against aliasing attacks.

ARM is also adding a new translation table format to Armv9-A. The translation format follows the same principle as the existing format but increases the size of each descriptor to 128 bits. The new format enables larger output addresses and scope for new attribute fields.


ARM announced the Scalable Matrix Extension (SME) for Armv9-A in 2021. This added new capabilities to efficiently process matrices, including matrix tile storage and outer-product operations. SME2 significantly extends the capabilities with instructions for multi-vector operations, multi-vector predicates, range prefetches and 2b/4b weight compression.

The new instructions enable SME2 to accelerate more workloads than the original SME. Including GEMV, Non-Linear Solvers, Small and Sparse Matrices, and Feature Extraction or tracking.

Guarded Control Stack (GCS)

ARM also adds support for a Guarded Control Stack (GCS) in Armv9-A to protect against attacks. GCS also provides an efficient mechanism for profiling tools to get a copy of the current call stack, without needing to unwind the main stack.

A GCS is a protected region of virtual address space allocated by software. When the processor executes a Branch with Link instruction, the return address is pushed onto the GCS as well as being written into the Link Register (LR). On a procedure return, the latest stored return address is popped from the GCS. The processor either compares the popped value with the LR, or uses the popped value directly. This process is illustrated here:

There are times when the software needs to make manual adjustments to the control stack, for example to handle some long jumps. To enable this, the architecture provides specialist instructions for maintaining the GCS; GCSPUSHx and GCSPOPx.

To prevent accidental or malicious changes to the GCS, a new Stage 1 permission is introduced. This permission allows reads by software, but restricts writes to either GCSPUSH instructions or as a side-effect of executing a BL. 

Confidential Computing

In 2021 Arm announced the Realm Management Extension (RME), part of the Arm Confidential Compute Architecture. The 2022 extensions enhance RME with Memory Encryption Contexts to support multiple memory encryption contexts for the Realm physical address space. This can be used to implement memory encryption with a unique key for each Realm, which provides defence-in-depth to the security already afforced by Realms.
A Device Assignment extension enhances the RME System Architecture to enable the secure assignment of devices to Realms. Each Realm can independently choose whether to allow an off-processor resource such as an accelerator to access a region of its address space.

Other enhancements include support for Hybrid Vector Length Agnostic (HVLA) programming model in SVE2, updates to the Memory Tagging Extension (MTE) and Performance Monitor (PMU) snapshot support and fixed-function instruction counter.

Other articles on eeNews Europe


Linked Articles
eeNews Europe