Blaize has developed a graph streaming processor (GSP), codenamed El Cano, for AI edge applications as well as real time video processing and sensor fusion. The fully programmable engine consists of 16 cores and can handle up to 16TOPS of 8bit integer operations and has been used for a range of boards costing from $300 to $1000.
“We are a chip company and there will be a chip business for us, for example in automotive,” said Richard Terrill, VP of strategic business development at Blaize following the announcement of its first system on module (SoM) and PCI express boards last week. “The board companies are likely to participate but with the module with the Samtec connector for the SOM where there is no host.”
The chip is built on Samsung’s 14nm process technology and has a power consumption of 7W. This is achieved by using a streaming approach that analyses the framework during compilation and allocates the resources on multiple chips via a scheduler and 4Mbits of optimised on-chip memory.
The hardware scheduler is a frame around the cores that takes a metamap of the framework created at compile time and feeds multiple instances of the scheduler in each core. Each scheduler knows the data complexities and has full autonomy to allocate the thread slots in each core. Any forks or deadlocks are resolved by the NetDeploy compiler tool ahead of time to prevent the scheduler being the bottleneck.
Each core is an array of small 4bit load store execution units that can be compbined in real time to handle larger operations from 8bit integer up to 16bit floating point. These are connected via a two dimension register file that is contiguous with a single unifed address space. The schedulers allocate the resources depending on the incoming data and the register file using the AI framework and algorithms from the compilation tool.