Groq’s website claims that its first chip will run 400 trillion operations per second, more than twice Google’s latest tensor processing unit – more commonly known as the TPU – which supports 180 teraops in the training phase of deep learning. It will perform eight trillion operations per watt, the website said.
Groq is tapping into a creative resurgence in the chip industry to make custom server chips for artificial intelligence. Like others, it is attempting to unseat Nvidia, whose graphics chips are currently the gold standard for running the intense calculations required to train deep learning software and then make inferences with it.
The start-up, funded with $10.3 million from venture capitalist Chamath Palihapitiya, is staffed with eight of the first 10 members of the team that designed the TPU, including Groq’s founder Jonathan Ross. It also recently hired Xilinx’s vice president of sales Krishna Rangasayee as chief operating officer.
It would be an accomplishment for Groq to reveal its silicon less than two years after it was founded. The company’s chip engineers met similarly tight deadlines at Google, where they taped out the first TPU in only around 14 months. The second generation came out a year later in time for Google’s I/O conference.
Groq is not only battling Nvidia for the hearts and minds of data scientists but also Google, which offers its custom silicon over the cloud. It will also compete with Intel, which plans to release a custom processor before the end of the year that provides 55 trillion operations per second for training neural networks.
Every chip company has painted a target on Nvidia, which has a stranglehold over the market for deep learning hardware. On Monday, Nvidia said that most major server manufacturers and cloud computing firms were using graphics chips based on its new Volta architecture. (It did not call out Google as one of its customers).
Nvidia created Volta to handle machine learning software faster and more efficiently than its previous designs. Like rivals, Nvidia built it to take advantage of lower precision numbers that require less computing power and memory to train self-driving cars, for instance, or algorithms that diagnose skin diseases. Inside its Volta graphics chips are hundreds of unique tensor cores that can perform 120 trillion operations per second.
These edits can be extremely costly. Nvidia poured around $3 billion into the Volta architecture, said Nvidia’s chief executive Jensen Huang. And rival chipmakers are raising hundreds of millions of dollars – and bucking venture capital’s quarantine of the chip industry for the last decade – to stay within striking distance.
Wave Computing has aimed $60 million at chips based on its coarse-grained reconfigurable array architecture, which acts like a hybrid of programmable chips called FPGAs and custom ones called ASICs. Cerebras Systems has raised $112 million, giving it a valuation of around $860 million, Forbes reported.
Graphcore just raised $50 million from venture capital firm Sequoia Capital. Last month, the chipmaker shared preliminary benchmarks that claim its intelligence processing unit – also called the IPU – handles training and inferencing tasks 10 to 100 times faster than Nvidia’s previous Pascal architecture.
Groq – www.groq.com
This article first appeared in Electronic Design.