The inference software developed by London-based AI startup Plumerai is an essential component that directs resource management in a similar way to an operating system. With an inference time of 77ms this has 40 percent lower latency and requires 49 percent less RAM (80Kbits) than TensorFlow Lite for Microcontrollers with ARM’s CMSIS-NN kernels while retaining the same accuracy. This makes it the fastest currently available, says the company.
The inference software builds on TensorFlow Lite for Microcontrollers but does not use the TensorFlow or ARM kernels for the most performance-critical layers. Instead, custom kernel code was developed and optimized for lowest latency and memory usage. This includes optimized code for regular convolutions, depthwise convolutions, fully-connected layers, various pooling layers and more.
To be faster than the heavily optimized ARM Cortex-M specific CMSIS-NN kernels, the developers had to go deep inside the inner-loops and also rethink the higher-level algorithms. This includes optimizations such as hand-written assembly blocks, improved register usage, pre-processing of weights and input activations and template-based loop-unrolling.
These generic per-layer-type optimizations provided faster operation, the developers also performed specific optimizations for each layer in a neural network. For instance, rather than only optimizing convolutions in general, the inference software makes specific improvements based on all actual values of layer parameters such as kernel sizes, strides, padding, etc.
As the software can run many different types of machine learning model, this optimizations are made together with the compiler. This is achieved by generating code in an automated pre-processing step using the neural network as input. The tool then guides the compiler to do all the necessary constant propagation, function inlining and loop unrolling to achieve the lowest possible latency.
Memory usage is an important constraint on embedded devices; however fast or slow the software is, it has to fit in memory to run at all. TensorFlow Lite for Microcontrollers already comes with a memory planner that ensures a tensor only takes up space while there is a layer using it. The memory usage is optimised with a smart offline memory planner that analyzes the memory access patterns of each layer of the network. Depending on properties such as filter size, the memory planner allows the input and output of a layer to partially or even completely overlap, effectively computing the layer in-place.
As well as ARM Cortex-M, the inference software also works with ARM Cortex-A and RISC-V architectures. Binarized Neural Networks (BNNs) are deep learning models that use only a single bit to encode each weight and activation and Plumerai is building improved deep learning model architectures and training algorithms for BNNs with a custom IP-core for customers with FPGAs.
The company is also working with UK microcontroller maker XMOS on developing the BNN technology.
This allows the inference engine to process more frames per second, save more energy, run larger and more accurate AI models and deploy on cheaper hardware.
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
The technical storage or access that is used exclusively for statistical purposes.The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.