Researchers at the DeepMind artificial intelligence subsidiary of Alphabet Inc. have introduced the first artificial intelligence (AI) system for discovering novel, efficient, and provably correct algorithms for fundamental tasks such as matrix multiplication. The system, called AlphaTensor, builds upon AlphaZero, an agent that has shown superhuman performance on board games, like chess, Go and shogi.
AlphaTensor, say the researchers, shows the journey of AlphaZero from playing games to tackling unsolved mathematical problems for the first time, and sheds light on a 50-year-old open question in mathematics about finding the fastest way to multiply two matrices.
“Matrix multiplication is one of the simplest operations in algebra, commonly taught in high school maths classes,” say the researchers. “But outside the classroom, this humble mathematical operation has enormous influence in the contemporary digital world and is ubiquitous in modern computing.”
The operation is used for processing images on smartphones, recognizing speech commands, generating graphics for computer games, running simulations to predict the weather, compressing data and videos for sharing on the internet, and much more. Companies around the world spend large amounts of time and money developing computing hardware to efficiently multiply matrices, so even minor improvements to the efficiency of matrix multiplication can have a widespread impact.
For centuries, mathematicians believed that the standard matrix multiplication algorithm was the best that one could achieve in terms of efficiency. But in 1969, German mathematician Volker Strassen shocked the mathematical community by showing that better algorithms do exist.
Through studying very small matrices (size 2×2), Strassen discovered an ingenious way of combining the entries of the matrices to yield a faster algorithm. Despite decades of research following Strassen’s breakthrough, say the researchers, larger versions of this problem have remained unsolved – to the extent that it’s not known how efficiently it’s possible to multiply two matrices that are as small as 3×3.
“In our paper,” say the researchers, “we explored how modern AI techniques could advance the automatic discovery of new matrix multiplication algorithms. Building on the progress of human intuition, AlphaTensor discovered algorithms that are more efficient than the state of the art for many matrix sizes. Our AI-designed algorithms outperform human-designed ones, which is a major step forward in the field of algorithmic discovery.”
To do so, the researchers first converted the problem of finding efficient algorithms for matrix multiplication into a single-player game in which the board is a three-dimensional tensor (array of numbers), capturing how far from correct the current algorithm is. Through a set of allowed moves, corresponding to algorithm instructions, the player attempts to modify the tensor and zero out its entries. When the player manages to do so, this results in a provably correct matrix multiplication algorithm for any pair of matrices, and its efficiency is captured by the number of steps taken to zero out the tensor.
“This game is incredibly challenging – the number of possible algorithms to consider is much greater than the number of atoms in the universe, even for small cases of matrix multiplication,” say the researchers. “Compared to the game of Go, which remained a challenge for AI for decades, the number of possible moves at each step of our game is 30 orders of magnitude larger (above 1033 for one of the settings we consider).”
Essentially, to play this game well, one needs to identify the tiniest of needles in a gigantic haystack of possibilities. To tackle the challenges of this domain, which significantly departs from traditional games, the researchers developed multiple crucial components including a novel neural network architecture that incorporates problem-specific inductive biases, a procedure to generate useful synthetic data, and a recipe to leverage symmetries of the problem.
An AlphaTensor agent was then trained using reinforcement learning to play the game, starting without any knowledge about existing matrix multiplication algorithms. Through learning, AlphaTensor gradually improves over time, re-discovering historical fast matrix multiplication algorithms such as Strassen’s, eventually surpassing the realm of human intuition and discovering algorithms faster than previously known.
AlphaTensor’s algorithm improves on Strassen’s two-level algorithm in a finite field for the first time since its discovery 50 years ago. These algorithms for multiplying small matrices can be used as primitives to multiply much larger matrices of arbitrary size.
Moreover, say the researchers, AlphaTensor also discovers a diverse set of algorithms with state-of-the-art complexity – up to thousands of matrix multiplication algorithms for each size, showing that the space of matrix multiplication algorithms is richer than previously thought. Algorithms in this rich space have different mathematical and practical properties.
Leveraging this diversity, the researchers adapted AlphaTensor to specifically find algorithms that are fast on a given hardware, such as Nvidia V100 GPU, and Google TPU v2. These algorithms multiply large matrices 10-20% faster than the commonly used algorithms on the same hardware, which showcases AlphaTensor’s flexibility in optimizing arbitrary objectives.
“From a mathematical standpoint,” say the researchers, “our results can guide further research in complexity theory, which aims to determine the fastest algorithms for solving computational problems. By exploring the space of possible algorithms in a more effective way than previous approaches, AlphaTensor helps advance our understanding of the richness of matrix multiplication algorithms. Understanding this space may unlock new results for helping determine the asymptotic complexity of matrix multiplication, one of the most fundamental open problems in computer science.”
Because matrix multiplication is a core component in many computational tasks, spanning computer graphics, digital communications, neural network training, and scientific computing, AlphaTensor-discovered algorithms could make computations in these fields significantly more efficient. AlphaTensor’s flexibility to consider any kind of objective, say the researchers, could also spur new applications for designing algorithms that optimize metrics such as energy usage and numerical stability, helping prevent small rounding errors from snowballing as an algorithm works.