Sima Technologies Inc. (San Jose, Calif.) has started shipping a 16nm machine learning SoC the company has been working on since its founding in 2018.
The company sells the MLSoC processor along with software to create what it claims is the industry’s first software-centric, purpose-built MLSoC platform.
The company claims the chip addresses computer vision applications and delivers 10x performance per watt and the most efficient frames per second per watt in the industry. Sima said there will be applications for the chip in robotics, smart vision, government, autonomous vehicles, drones, and healthcare applications.
The chip is manufactured by foundry Taiwan Semiconductor Manufacturing Co. Ltd. (Hsinchu, Taiwan) in a 16nm FinFET process. The MLSoC was designed and verified using Synopsys tools and incorporated a range of Synopsys IP, including the ARC EV processor for real-time vision processing.
“We’ve seen over a dozen edge processing solutions, and have never seen anything approaching the performance and power efficiency of SiMa.ai’s MLSoC platform. Their solution is an order of magnitude faster and more energy efficient,” said Karl Freund, founder and principal analyst at Cambrian-AI Research, in a statement issued by Sima. “So far, they are blowing past their customer’s requirements by accelerating the entire vision processing pipeline, not just the ML inferencing portion. Early customers are finding it extremely easy and simple to implement SiMa.ai into their current solutions.”
Details from Lindley
Sima did not provide technical details of the architecture alongside the announcement that the company is shipping.
However, the company did present a paper at the Lindley Spring Processor Conference in April 2020 saying it planned to tape-out a chip aimed at 16nm process at the end of 2020.
That chip was planned to capable of 1,000fps/watt for ResNet50 working with 224 by 224 pixel frames from conventional image sensors. It was also benchmarked in design to be capable of between 50TOPS at 5W and 200TOPS at 20W depending on clock frequency.
The design targeted up to four camera lines and a video pipeline including licensed image signal processor, an ARM subsystem and LPDDR4 or LPDDR5 data connections out.
The heavy machine learning is performed in a tile-based machine learning accelerator (MLA) block. Multiple tiles can be connected, either to off-chip in component arrays, or on-chip, by a proprietary AXI-based interconnect. At that time the toolchain was being aimed at the mainstream with support for TensorFlow PyTorch and ONNX and other frameworks.
Related links and articles: