
ARM shows first port of KleidiAI CPU open source libraries
ARM has shown the first port of its KleidiAI software for running AI on CPUs rather than GPUs.
The Kleidi open source microkernel is optimised for ARM processor cores and is designed for ease of adoption into C or C++ machine learning (ML) and AI frameworks. Specifically, developers looking to incorporate specific micro-kernels into their projects can only include the corresponding standalone .c and .h files associated with those micro-kernels and a common header file.
There are no dependencies on external libraries, dynamic memory allocation or memory management and no scheduling and provide a stateless, stable and consistent API. There are also specialized microkernels for different fusion patterns.
“At the end of May 2024, we launched Kleidi, which are broad software deliverables and community engagements for accelerating AI across the developer ecosystem. The first part are the Kleidi Libraries for popular AI frameworks, which feature KleidiAI,” said Ronan Naughton, director of Product Management, Client Line of Business at ARM.
The first of these is Mediapipe from Google, using the XNNPack. Running on a CPU, this improves the time to first token by 30% for the Gemma framework with 2bn parameters and provides 250 tokens/s for summarising text running on a Samsung S24 smartphone with the Exynos 2400 system on chip.
This is a key step in running large language models locally on smartphones and at the edge of the network as highlighted by eeNews Europe in discussion with Paul Williamson at ARM on the Ethos-U85 accelerator core. This is also likely to be a key part of ARM’s AI push following the acquisition of AI chip designer Graphcore by ARM’s owner Softbank.
ARM moves to support embedded transformer AI models with Ethos U85
ARM is working directly with a range of AI frameworks on KleidiAI integrations to make it seamless and transparent for developers so there is no need to learn additional tools and skills. This allows developers to move faster and extract more performance for AI-based applications.
Google, Meta and Samsung Mobile are all working on how KleidiAI will enable AI frameworks n across multiple markets.
The demonstration uses MediaPipe APIs and the XNNPACK CPU backend, which is then accelerated by the KleidiAI integration. XNNPACK has over 7 billion third-party installs so the KleidiAI integration brings the widest possible market.
“We are excited to support KleidiAI in Google AI Edge’s XNNPACK to accelerate AI workloads on current and future Arm CPUs. This allows AI developers to access existing and new Arm architecture features to deliver outstanding performance that will only improve over time,” said Matthias Grundmann, Google AI Edge Lead.
KleidiAI also works across ARM CPU that use architectural features such as Neon, SVE2 and Scalable Matrix Extension, which are the A and X-class devices, enabling the development of portable software solutions for application developers.
The KleidiAI technical demo is on Gitlab here.
