
18th-century mathematics shows simpler AI models don’t need deep learning

Researchers from the University of Jyväskylä were able to simplify the most popular technique of artificial intelligence, deep learning, using 18th-century mathematics
Deep learning enables computers to perform complex tasks such as analyzing and generating images and music, playing digitized games and, most recently in connection with ChatGPT and other generative AI techniques, acting as a natural language conversational agent that provides high-quality summaries of existing knowledge.
Six years ago, Professor Tommi Kärkkäinen and Doctoral researcher Jan Hänninen conducted preliminary studies on data reduction. The results were surprising: If one combines simple network structures in a novel way then depth is not needed. Similar or even better results can be obtained with shallow models.
Layerwise pretraining from heads to inner layers. The most outer layer is trained first and its residual is then fed as training data for the next hidden layer until all layers have been sequentially pretrained. Credit: Neurocomputing (2023). DOI: 10.1016/j.neucom.2023.126520
“The use of deep learning techniques is a complex and error-prone endeavor, and the resulting models are difficult to maintain and interpret,” says Kärkkäinen. “Our new model in its shallow form is more expressive and can reliably reduce large datasets while maintaining all the necessary information in them.”
The structure of the new AI technique dates back to 18th-century mathematics. Kärkkäinen and Hänninen also found that the traditional optimization methods from the 1970s work better in preparing their model compared to the 21st-century techniques used in deep learning.
