The move was expected (see High-flying Achronix plans move to ML).
Achronix Speedcore7t delivers a 60 percent faster performance, 50 percent lower power and 65 percent smaller die size compared with the existing product line, Speedcore22i, said Robert Blake, CEO of Achronix. And just as FPGA vendors have made memory blocks and DSPs available as special blocks for inclusion in FPGAs, Achronix has prepared machine learning processor blocks that are rich in multiply and matrix-multiply resources to accelerate AI/ML applications.
The Speedcore7t has a couple of changes to its basic fabric. The company has moved to a six-input lookup table (6-LUT) compared to the 4-LUT in the Speedcore22. In addition there are features such as 8:1 multiplexers, 8-bit ALUs and 8-bit bus maximum functions much used in AI/ML, double registers per LUT and dedicated shift registers.
The move has increased the functional efficiency of the FPGA fabric, Achronix said. Leading FPGAs implement 6 by 6 multipliers in 21 LUTs whereas Speedcore Gen4 implements 6 by 6 multipliers in 11 LUTs and can operate at 1 GHz.
The Speedcore7t also includes a second hierarchy of high performance routing. Separating bus routing from the standard nearest neighbour routing prevents congestion and allows islands of compute to communicate quickly to distant regions on the die and is particularly suited to running between memories the machine learning processors. It can be selected to originate and land at any LUT and is effectively a run-time configurable switching network.
This is the first time that run-time logic functionality is available in the routing structure and it provides a solution for high-bandwidth and low-latency applications.
Next: Claiming first
“Achronix was the first company to deliver production eFPGA IP to companies developing SoCs, enabling them to create programmable hardware data accelerators supporting new applications,” said Blake. “The new Speedcore Gen4 eFPGA architecture provides an optimal balance of hardware acceleration previously found only in ASIC implementations and adds the flexibility and reprogrammability of our production-proven FPGA technology to support increasing demand for new AI/ML and high data bandwidth applications.”
Each MLP includes a local cyclical register file that leverages temporal locality for optimal reuse of stored weights or data. The MLPs are tightly coupled with neighboring MLP blocks and larger embedded memory blocks to deliver the highest processing performance, the highest operations per second and the lowest power profile.
The MLPs support multiple precision fixed point and floating point formats including Bfloat16, 16bit half-precision floating point, 24bit floating point and block floating point (BFP). Users can select the optimal precision for their application for performance, power and area.
The ACE design environment has been upgraded to support Speedcore Gen4 features and machine learning. Achronix itself uses its Speedcore Builder tool to create new Speedcore instances to match user requests.
Speedcore Gen4 for TSMC 7nm is available today and will be in production in 1H19. Achronix will also debut its own Speedster7t FPGAs in 1H19. Achronix is planning to back-fill its line up with Gen4 Speedcore on TSMC 16nm and 12nm processes in 2H19.