Plumerai boosts embedded AI efficiency

Plumerai boosts embedded AI efficiency

Technology News |
By Nick Flaherty

UK embedded AI developer Plumerai has optimised its inference software for the latest long-short-term-memory (LSTM) frameworks, opening up applications with text as well as such as data monitoring.

At has seen embedded AI applications using recurrent neural networks (RNNs), in particular RNNs using the LSTM cell architecture. Example uses of LSTMs are analyzing time-series data coming from sensors such as microphones, human activity recognition for fitness and health monitoring, detecting if a machine will break down, and speech recognition.

The company has optimised its deep learning inference software for LSTMs on microcontrollers for all metrics: speed, accuracy, RAM usage, and code size.

The company selected four common LSTM-based recurrent neural networks and measured latency and memory usage against the September 2022 version of TensorFlow Lite for Microcontrollers (TFLM for short) with CMSIS-NN enabled.

Plumerai choose TFLM because it is freely available and widely used. ST’s X-CUBE-AI also supports LSTMs, but that only runs on ST chips and works with 32-bit floating-point, making it much slower. Plumerai used an STM32L4R9AI board with an ARM Cortex-M4 microcontroller at 120 MHz with 640 KB RAM and 2 MB flash. Similar results were obtained using an ARM Cortex-M7 board.

The networks used for testing were:

  • A simple LSTM model from the TensorFlow Keras RNN guide.
  • A weather prediction model that performs time series data forecasting using an LSTM followed by a fully-connected layer.
  • A text generation model using a Shakespeare dataset using an LSTM-based RNN with a text-embedding layer and a fully-connected layer.
  • A bi-directional LSTM that uses context from both directions of the ’time’ axis, using a total of four individual LSTM layers.

TFLM latency

Plumerai latency


Plumerai RAM

Simple LSTM

941.4 ms

189.0 ms
(5.0x faster)

19.3 KiB

14.3 KiB
(1.4x lower)

Weather prediction

27.5 ms

9.2 ms
(3.0x faster)

4.1 KiB

1.8 KiB
(2.3x lower)

Text generation

7366.0 ms

1350.5 ms
(5.5x faster)

61.1 KiB

51.6 KiB
(1.2x lower)

Bi-directional LSTM

61.5 ms

15.1 ms
(4.1x faster)

12.8 KiB

2.5 KiB
(5.1x lower)

Microcontrollers are also very memory constrained, making thrifty memory usage crucial. In many cases code size (ROM usage) is also important, and the Plumerai tools outperform TFLM by a large margin. For example, Plumerai’s implementation of the weather prediction model uses 48 KiB including weights and support code, whereas TFLM uses 120 KiB.

The inference engine performs the same computations as TFLM without extra quantization or pruning with 16bit weights  to maintain accuracy.

Besides Arm Cortex-M0/M0+/M4/M7/M33, the company also optimizes its inference software for ARM Cortex-A applications processers, the ARC EM series from Synopsys, and RISC-V architectures.

Other articles on eeNews Europe

If you enjoyed this article, you will like the following ones: don't miss them by subscribing to :    eeNews on Google News


Linked Articles