The company claims the HL-1000 is the highest performance AI inference processor although it must always be considered that different companies are aiming AI processors at different applications and on different computing environments.
Habana has demonstrated its processor on a PCIe card based on the chip claiming performance of 15,000 images/second throughput on the ResNet50 network with a batch-size of 10 with 1.3ms of latency, while consuming 100 watts of power.
So the Goya-1000 is clearly intended to be deployed in data centers where Habana claims it will offer one to three orders of magnitude better performance than solutions commonly deployed in data centers today. This of course compares a neural processor with CPUs and GPUs rather than other neural processors.
General architecture of HL-1000, Goya inference processor. Source: Habana Labs.
Fabless chip company Habana has not provided information about the process has been used to manufacture the HL-1000 Goya chip or a follow-on chip, the HL-2000.
The chip is described as a general-purpose AI processor and able to support multiple neural networks and various applications, including image recognition, language translation, sentiment analysis.
The HL-1000 chip architecture is based on eight tensor processor cores (TPCs) that are programmable in C and C++ using a LLVM-based compiler. Each TPC supports multiple general multiply accumulate, matrix-multiply functions with local memory. Habana has produced supporting development tools, libraries.
Next: And software
The SynapseAI software stack analyses the trained model and then optimizes for use on the HL-1000 processor. It also enables the interface from neural-network frameworks such as MXNet, Caffe 2, TensorFlow, Microsoft Cognitive Toolkit, PyTorch, and the Open Neural Network Exchange Format (ONNX).
However, Habana feels that training and inference are not conveniently performed by the same chip. To optimize efficiency, Habana offers separate processors for training and inference workloads. Habana Labs plans to sample the HL-2000 or Gaudi training processor in the second quarter of 2019. Gaudi has a 2Tbps interface per device and its training performance scales well to thousands of processors, Habana claims.
But the Goya processor is not restricted to working on models trained by the Gaudi HL-2000. The inference processor supports models trained by any processor; GPU, TPU, CPU, and Habana Gaudi.
Habana Labs was founded in 2016 the company employs 120 people worldwide.
Related links and articles: