LeapMind aims for 2PFLOP AI chip
LeapMind in Japan has started the development of an AI server chip using its binary neural network technology.
The LeapMind chip will be aimed at large scale AI models such as large language models with billions of parameters and a performance of 2PFLOPS.
“Up until now, we have been selling only semiconductor IP, or designs, for the edge market, but in the server market it is difficult to sell just semiconductor designs, so we will be doing business in the form of actual chips (or boards). The new product will not only be used for reasoning but also for learning,” said Hiroyuki Tokunaga, CTO at LeapMind.
“Leveraging our expertise from AI accelerator development for edge devices, this new AI chip targets a computing performance of 2 PFLOPS (petaflops) while aiming for a cost performance 10 times higher than that of an equivalent GPU. We anticipate that this product will be ready for shipment by the end of 2025 at the latest.”
The primary bottleneck in AI model calculations is matrix multiplication, which involves an extensive number of multiplications and additions. Multipliers typically require large circuits, but LeapMind reduces the number of necessary transistors by adopting lower bit-width data types, such as fp8. Reducing the data processed effectively utilizes DRAM bandwidth, which has been a bottleneck in recent years.
“To create high-quality AI models, a substantial number of processors are needed for parallel computing, which requires a sizable budget. However, with the availability of cost-effective processors, it’s now possible to develop improved AI models within the same expense. The shift in demand for AI training processors is moving from sheer performance to cost-effectiveness.”
“Since neural network training can be parallelized, a processor with excellent cost performance is more suitable than a single high-speed processor. SIMT (single instructions on multiple threads) is easier for programmers to use (than SIMD), and the range of things it can do is wider (than SIMD), but in order to achieve both ease of use and performance, caching is almost essential. We believe that we can improve cost performance by specializing in neural network learning and inference in this area.”
This is a big shift for the company, he says.
“The cost of training advanced AI models, including large-scale language models (LLMs), has increased significantly over the past decade due to larger model sizes and greater computational complexity. This rising cost has become a major bottleneck in AI progress,” he said.
“The amount of money that moves in the current semiconductor industry is extremely large, and it takes billions to tens of billions of yen to make chips using semiconductor processes that are close to the cutting edge. It is not enough just to proceed with development successfully; we must proceed with business while considering various things.”
“Technically, we have a track record of actually delivering semiconductor IP and making it work as a physical chip. Although we are in a slightly different field, we have a proven track record of creating IP and software that work properly. I am confident that I am qualified to enter the market,” he said.
Developing advanced software stacks is essential for AI model development, and no single company can provide all the required components. An open-source software ecosystem involving multiple companies already exists, and to be a part of this ecosystem, it is crucial to engage as an open-source software community member.
The company plans to release comprehensive hardware specifications and software, including drivers and compilers, under OSI-compliant licenses to contribute to the open-source community.
“In the processor business, it is not enough to just create hardware; we need a system that allows us to utilize existing software assets. In this regard, the situation has improved considerably compared to a few years ago. With PyTorch it’s now possible to use Triton’s mechanism, and with TensorFlow and JAX, a mechanism called PJRT was created as part of the OpenXLA project. By writing a plugin for PJRT, you can receive network definitions in StableHLO format, compile them, and run them. In this way, instead of handwriting a huge amount of operator code, it is now possible to run major ML frameworks by simply implementing primitive operators.”
“Initially, we plan to start with large, high-performance products, but eventually, like today’s GPUs, inexpensive products can be easily obtained at parts shops in Akihabara, and our products will also be used in supercomputers. We aim for a future where products are installed,” he said.
“Up until now, I have personally avoided the word AI as much as possible, but from now on, I will stop being stubborn and use the word AI liberally. Go with the flow of the times. Hello AI.”