The startup has also discussed a PCIe card it calls tsunAimi, that integrates four such processors to provide up to 2 peta operations per second. This equates to an 8TOPS/W efficiency. The announcement was made by way of a presentation at the Fall Linley Processor Conference.
Untether AI was founded in 2017 to develop a high performance neural network inference engine based on the idea of moving processing to the data sets rather than moving data to the processor, the von Neumann architecture that is at the heart of most uniprocessor designs. Unether AI calls this "At-memory" computing which does rely on rich compilation/software that is capable of pre-placing data and optimizing resource allocation.
Bob Beachler, vice president of products at Untether.AI said that the chips and PCI board are optimized for inference and would be used across a range of applications from data center through AI service providers and down to edge servers.
Untether's architectural decision is supported by its own analysis that in von Neumann architectures 91 percent of the energy is consumed moving data and only 8 percent in the multiply-accumulate logic.
The inference optimization means that runAI200 is designed to try and contain complete neural networks and the coefficients on a single chip – or on four chips when considering the PCIe card. The native batch size is 1 to support the lowest latency. The chip is implemented in 16nm CMOS from TSMC and contains 200Mbytes of SRAM and 260,000 processing elements dispersed among the SRAM. The design supports int8 and int16 data types and has a 720MHz clock frequency for efficiency and a 960MHz mode optimized for performance.
Next: SDK is key