New AI technology delivers unprecedented on-device intelligence for IoT
As an engineer, you’re familiar with the promise of artificial intelligence. You use it on your phone for voice commands and help from Siri or Google. You use it on your computer for predictive typing assistance. You read about its promise every day. In fact, you see the potential to leverage AI – and one of its key applications, machine learning – to tap vast amounts of Internet of Things (IoT) endpoint data to transform businesses fundamentally and deliver Industry 4.0. But you’ve been stymied trying to easily port those types of capabilities into your IoT and embedded systems.
It can be mystifying trying to figure out what processors are the most efficient, most optimal for your design; the toolchain has been fragmented, and you wonder whether there’s enough time, frankly, for your team to learn the new programming skills required when time to market pressures are worse than ever.
But you also have seen this movie before. Any sufficiently promising technology trend eventually resolves around standard approaches and an ecosystem that paves the way for new innovation. The same is true with AI and IoT, or as it’s increasingly known AIoT. For AIoT to scale to the trillions of predicted endpoint devices and for companies to take advantage of the enhanced insights and experiences offered by advanced AI, small IoT devices must become more capable of on-device processing. In other words, we need to shift compute closer to the source of data along the compute continuum from cloud to endpoint – what we call endpoint AI.
At Arm, we’ve leveraged three decades worth of IP design and development expertise in the mobile market to begin the process of simplifying AI and machine learning implementations for IoT. Arm’s AI platform provides flexible support for ML workloads across all programmable Arm IP, as well as partner IP, and maintains open-source software libraries, such as Arm NN, CMSIS-NN and CMSIS-DSP, to simplify development. It allows seamless integration with existing neural network frameworks, such as TensorFlow Lite for Microcontrollers, and is supported by a vibrant and diverse ecosystem, driving innovation and choice. Arm’s AI platform is the industry’s most efficient, scalable and heterogeneous in the industry and supports a development ecosystem that is redefining the capabilities of devices everywhere.
Engineers at AquaSeca, for example, are hard at work to exploit this potential. AquaSeca has developed an Arm Cortex-M4 based vibration sensor that attaches to a water pipe to form a simple and low-cost method for detecting the relative flow of water. Changes to the vibration signature might indicate cause for alarm, such as the faster flow caused by a leak, or the impeded flow caused by a blockage. The tell-tale vibrations that result from these kinds of faults are detected by the sensor and the causes inferred using several levels of ML. The AIoT system can then alert the owner before the fault escalates. Today, the AquaSeca sensor sends its data to the cloud, but engineers are migrating some of the ML inference to the sensor itself for faster responses, greater privacy, scalability and enhanced reliability.
Scaling endpoint AI for billions of devices
The latest additions to Arm’s AI portfolio include a CPU with enhanced AI capabilities, the Arm Cortex-M55 processor with the supporting Corstone-300 reference design for faster SoC implementation. It also includes the industry’s first ‘microNPU’ (neural processing unit), the Ethos-U55, which is fully integrated with the Cortex-M toolchain. These technologies enhance on-device machine learning (ML) capabilities and simplify software development for IoT and embedded applications, especially power- and size-constrained devices, which unlocks the potential of endpoint AI for all developers.
The first processor with Arm Helium vector extensions for improved performance and efficiency, the Cortex-M55, delivers up to 15x improvement in performance in machine learning applications and up to 5x improvement performance in digital signal processing (DSP) applications compared to previous Arm Cortex-M processors. Designers also enjoy increased creativity to differentiate within a coherent software development environment. In future, designers also will be able to leverage Arm Custom Instructions at the RTL level, coming in 2021, which provide the opportunity to extend the MCU’s capabilities for workload-specific optimization.
Based on a four-stage integer pipeline design, the Cortex-M55 processor is a fully synthesizable, mid-range processor that is designed for the microcontroller and deeply embedded system market. The processor offers high compute performance across both scalar and vector operations with low power consumption, fast interrupt handling, and enhanced system debug with extensive breakpoint and trace capabilities.
The Ethos-U55 is the industry’s first licensable microNPU designed for microcontroller-class devices. To make designing with the Ethos-U55 more efficient, the device is integrated within a single Cortex-M toolchain, familiar to millions of developers, to provide exceptional performance improvement without additional software complexity. The open-source CMSIS-NN library to support machine learning and CMSIS-DSP are available to make the design experience even more efficient.
Ethos-U55, when combined with Cortex-M55, increases ML workload performance by up to 480x over existing Cortex-M based systems and can be as small as 0.1 mm2 in 16nm for AI applications in cost-sensitive and energy-constrained devices. It also offers an additional 32x ML performance boost over Cortex-M55 for more demanding ML systems. A single toolchain for Ethos-U55 and Cortex-M eases development and accelerates the creation of AI applications.
The Cortex-M55 and the Ethos-U55 (optional) can be incorporated into an SoC design with the lowest risk and development cost with the Corstone-300 reference design. It integrates the processor, security components (e.g. the SIE-300 AXI5 TrustZone controllers) and system IP (e.g. the Power Control Kit PCK-600). Since trust is key for IoT devices to deliver real value, the Corstone-300 also simplifies security implementation with an optimized AXI5 system for TrustZone, accelerating the route to PSA Certified silicon and devices. Developers can speed time to market with open-source software, such as Trusted Firmware-M, and familiar development tools.
Cortex-M55 and Ethos-U55 extend Arm’s AI portfolio of CPUs, GPUs and NPUs into a new class, and ultimately provide more options for device makers to balance performance, cost and energy efficiency for endpoint AI. It is certainly possible to build silicon today with a combination of a CPU, a DSP and an NPU to get the same raw hardware horsepower as the Cortex-M55 and Ethos-U55 combination. However, once the hardware is built, how do hundreds of programmers write, debug and tune code for chips with two three separate tool chains, three compilers, three debuggers, three probes? This introduces complexity and slows time to market; however, Arm ensures these hardware capabilities are all integrated into a single, high productivity toolchain, supported by optimized software libraries and a broad knowledge base. Most importantly and as mentioned, they work with existing ML frameworks, such as TensorFlow Lite Micro, and enable developers to draw on support from an existing, broad software and tools ecosystem. This makes it vastly easier and quicker to design, develop and maintain AI-based IoT applications with the lowest risk and cost possible.
As AI continues to advance and converge with IoT and 5G, new opportunities, such as more natural human interaction on smaller devices, require more distributed compute from the cloud and edge to endpoint devices. The benefits of faster responses, greater privacy and enhanced reliability can only be achieved with on-device processing, bringing AI to the trillions of smaller, power-constrained endpoint devices.
Arm continues to drive innovation in open and accessible ways to help more engineers more easily deliver new levels of efficient AI performance for microcontrollers.
Learn more about Cortex-M55 and Ethos-U55 and access the wealth of material and information available by searching for “Machine Learning on Arm” on your favorite browser. You can also get started on ML today with our existing Cortex-M portfolio.