Nvidia boosts safety critical design with Jetson Orin AGX module

Nvidia boosts safety critical design with Jetson Orin AGX module

New Products |
By Nick Flaherty

Nvidia has launched its latest Jetson System-on-Module for high performance machine learning with safety critical hardware .

The Jetson AGX Orin SoM measures 100mm x 87mm with six times the performance of Jetson AGX Xavier in a pin compatible form factor. The module is aimed at next generation autonomous delivery and logistics robots, factory systems and large industrial UAVs, ,moving f.

“Jetson AGX Orin is the most powerful AI edge computer, delivering up to 200 TOPS of AI performance for powering autonomous systems,” said Leela Karumbunathan, Hardware Product Manager, Autonomous Machines. 

The module uses the Orin SoC with Nvidia’s Ampere architecture that supports sparse AI with GPU, ARM Cortex-A78AE CPU, next-generation deep learning and vision accelerators, and a video encoder and a video decoder. High speed IO, 204 GB/s of memory bandwidth, and 32 GB of DRAM enable the module to feed multiple concurrent AI application pipelines.

The Jetson AGX Orin uses an integrated Ampere GPU composed of two Graphic Processing Clusters (GPCs), eight Texture Processing Clusters (TPCs), 16 Streaming Multiprocessors (SM’s), 192 KB of L1-cache per SM, and 4 MB of L2 Cache. There are 128 CUDA cores and four TensorFLow accelerator, compared to the 64 CUDA cores for the previous generation Volta architecture. This provides a total of 2048 CUDA cores and 64 Tensor cores with up to 131 Sparse TOPs of INT8 Tensor compute, and up to 4.096 FP32 TFLOPs of CUDA compute.

The sparsity is supported with a fine-grained compute structure that doubles throughput and reduces memory usage

The biggest change in the CPU in Jetson AGX Orin is moving from Nvidia’s custom Carmel CPU clusters to the ARM Cortex-A78AE. The Orin CPU complex is made up of 12 2GHz cores, each with 64KB Instruction L1 Cache and 64KB Data Cache, and 256 KB of L2 Cache. Like Jetson AGX Xavier, each cluster consists of 2MB L3 Cache.

This enables 1.7x the performance compared to the eight core Carmel CPU on Jetson AGX Xavier.

The updated Tensor Cores provide the performance necessary to accelerate next generation AI applications. These are programmable fused matrix-multiply-and-accumulate units that execute concurrently alongside the CUDA cores to implement floating point HMMA (Half-Precision Matrix Multiply and Accumulate) and IMMA (Integer Matrix Multiple and Accumulate) instructions for accelerating dense linear algebra computations, signal processing, and deep learning inference.

These support 16x HMMA, 32x IMMA, and a new sparsity feature to double the throughput of the Tensor Core operations. Sparsity is constrained to 2 out of every 4 weights being nonzero and enables the tensor core to skip zero values, doubling the throughput and reducing the memory storage significantly. Networks can be trained first on dense weights, and then pruned, and later fine-tuned on sparse weights.

The deep learning accelerator (DLA) is a fixed-function accelerator optimized for deep learning operations. It is designed to do full hardware acceleration of convolutional neural network inferencing.

The second generation DLA in the Orin AGX provides an 8x performance boost to 97 INT8 Sparse TOPs for a highly energy efficient design. Increasing the local buffering boosts efficiency and reduces the DRAM bandwidth.

Related articles

Other articles on eeNews Europe


Linked Articles