
Computer scientists at Rice University say they have developed a breakthrough low-memory technique that could put one of the most popular but resource-intensive forms of artificial intelligence – deep-learning recommendation models (DLRM) that learn to make suggestions users will find relevant – within reach of small companies. Up to now, DLRM recommendation systems with top-of-the-line training models require more than a hundred terabytes of memory and supercomputer-scale processing, and have only been available to a short list of technology giants with deep pockets.
Designed to address this challenge, the scientists’ technique – called “random offset block embedding array” (ROBE Array) – is an algorithmic approach for slashing the size of DLRM memory structures called embedding tables.
“Using just 100 megabytes of memory and a single GPU, we showed we could match the training times and double the inference efficiency of state-of-the-art DLRM training methods that require 100 gigabytes of memory and multiple processors,” says Anshumali Shrivastava, an associate professor of computer science at Rice. “ROBE Array sets a new baseline for DLRM compression. And it brings DLRM within reach of average users who do not have access to the high-end hardware or the engineering expertise one needs to train models that are hundreds of terabytes in size.”
DLRM systems are machine learning algorithms that learn from data. For example, a recommendation system that suggests products for shoppers would be trained with data from past transactions, including the search terms users provided, which products they were offered and which, if any, they purchased.
One way to improve the accuracy of recommendations is to sort training data into more categories. For example, rather than putting all shampoos in a single category, a company could create categories for men’s, women’s and children’s shampoos.
For training, these categorical representations are organized in memory structures called embedding tables, the size of which “have exploded” due to increased categorization, say the scientists.
“Embedding tables now account for more than 99.9% of the overall memory footprint of DLRM models,” says ROBE Array co-creators Aditya Desai, a Rice graduate student. “This leads to a host of problems. For example, they can’t be trained in a purely parallel fashion because the model has to be broken into pieces and distributed across multiple training nodes and GPUs. And after they’re trained and in production, looking up information in embedded tables accounts for about 80% of the time required to return a suggestion to a user.”
ROBE Array does away with the need for storing embedding tables by using a data-indexing method called hashing to create a single array of learned parameters that is a compressed representation of the embedding table. Accessing embedding information from the array can then be performed using GPU-friendly universal hashing, say the scientists.
The researchers tested ROBE Array using the sought after DLRM MLPerf benchmark, which measures how fast a system can train models to a target quality metric. Using a number of benchmark data sets, they found ROBE Array could match or beat previously published DLRM techniques in terms of training accuracy even after compressing the model by three orders of magnitude.
“Our results clearly show that most deep-learning benchmarks can be completely overturned by fundamental algorithms,” says Shrivastava. “Given the global chip shortage, this is welcome news for the future of AI.”
For more, see “Random Offset Block Embedding (ROBE) for compressed embedding tables in deep learning recommendation systems.”
