
Plumerai boosts embedded AI efficiency
UK embedded AI developer Plumerai has optimised its inference software for the latest long-short-term-memory (LSTM) frameworks, opening up applications with text as well as such as data monitoring.
At has seen embedded AI applications using recurrent neural networks (RNNs), in particular RNNs using the LSTM cell architecture. Example uses of LSTMs are analyzing time-series data coming from sensors such as microphones, human activity recognition for fitness and health monitoring, detecting if a machine will break down, and speech recognition.
The company has optimised its deep learning inference software for LSTMs on microcontrollers for all metrics: speed, accuracy, RAM usage, and code size.
The company selected four common LSTM-based recurrent neural networks and measured latency and memory usage against the September 2022 version of TensorFlow Lite for Microcontrollers (TFLM for short) with CMSIS-NN enabled.
Plumerai choose TFLM because it is freely available and widely used. ST’s X-CUBE-AI also supports LSTMs, but that only runs on ST chips and works with 32-bit floating-point, making it much slower. Plumerai used an STM32L4R9AI board with an ARM Cortex-M4 microcontroller at 120 MHz with 640 KB RAM and 2 MB flash. Similar results were obtained using an ARM Cortex-M7 board.
The networks used for testing were:
- A simple LSTM model from the TensorFlow Keras RNN guide.
- A weather prediction model that performs time series data forecasting using an LSTM followed by a fully-connected layer.
- A text generation model using a Shakespeare dataset using an LSTM-based RNN with a text-embedding layer and a fully-connected layer.
- A bi-directional LSTM that uses context from both directions of the ’time’ axis, using a total of four individual LSTM layers.
TFLM latency |
Plumerai latency |
TFLM RAM |
Plumerai RAM |
|
Simple LSTM |
941.4 ms |
189.0 ms |
19.3 KiB |
14.3 KiB |
Weather prediction |
27.5 ms |
9.2 ms |
4.1 KiB |
1.8 KiB |
Text generation |
7366.0 ms |
1350.5 ms |
61.1 KiB |
51.6 KiB |
Bi-directional LSTM |
61.5 ms |
15.1 ms |
12.8 KiB |
2.5 KiB |
- Plumerai develops embedded AI accelerator IP core
- XMOS and Plumerai partner on binarised neural networks
Microcontrollers are also very memory constrained, making thrifty memory usage crucial. In many cases code size (ROM usage) is also important, and the Plumerai tools outperform TFLM by a large margin. For example, Plumerai’s implementation of the weather prediction model uses 48 KiB including weights and support code, whereas TFLM uses 120 KiB.
The inference engine performs the same computations as TFLM without extra quantization or pruning with 16bit weights to maintain accuracy.
Besides Arm Cortex-M0/M0+/M4/M7/M33, the company also optimizes its inference software for ARM Cortex-A applications processers, the ARC EM series from Synopsys, and RISC-V architectures.
Other articles on eeNews Europe
- Kontron launches PiXtend PLC with Raspberry Pi 4
- Nordic extends remote Bluetooth debugging deal
- Siemens to buy Avery Design for verification IP boost
- Xerox completes spin out of Novity predictive maintenance startup
- AMP Robotics raises $91m for AI-based recycling
