MENU

Large language models feel the direction of time

Large language models feel the direction of time

News |
By Wisse Hettinga



Researchers have found that AI large language models, like GPT-4, are better at predicting what comes next than what came before in a sentence

  • This “Arrow of Time” effect could reshape our understanding of the structure of natural language, and the way these models understand it.

Large language models (LLMs) such as GPT-4 have become indispensable for tasks like text generation, coding, operating chatbots, translation and others. At their heart, LLMs work by predicting the next word in a sentence based on the previous words – a simple but powerful idea that drives much of their functionality. But what happens when we ask these models to predict backward — to go “backwards in time” and determine the previous word from the subsequent ones?

The question led Professor Clément Hongler at EPFL and Jérémie Wenger of Goldsmiths (London) to explore whether LLMs could construct a story backward, starting from the end. Working with Vassilis Papadopoulos, a machine learning researcher at EPFL, they discovered something surprising: LLMs are consistently less accurate when predicting backward than forward.

A fundamental asymmetry

The researchers tested LLMs of different architectures and sizes, including Generative Pre-trained Transformers (GPT), Gated Recurrent Units (GRU), and Long Short-Term Memory (LSTM) neural networks. Every one of them showed the “Arrow of Time” bias, revealing a fundamental asymmetry in how LLMs process text.

Hongler explains: “The discovery shows that while LLMs are quite good both at predicting the next word and the previous word in a text, they are always slightly worse backwards rather than forward: their performance at predicting the previous word is always a few percent worse than at predicting the next word. This phenomenon is universal across languages, and can be observed with any large language model.”

The work is also connected to the work of Claude Shannon, the father of Information Theory, in his seminal 1951 paper. Shannon explored whether predicting the next letter in a sequence was as easy as predicting the previous one. He discovered that although both tasks should theoretically be equally difficult, humans found backward prediction more challenging – though the performance difference was minimal.

Intelligent agents

“In theory, there should be no difference between the forward and backward directions, but LLMs appear to be somehow sensitive to the time direction in which they process text,” says Hongler. “Interestingly, this is related to a deep property of the structure of language that could only be discovered with the emergence of Large Language Models in the last five years.”

The researchers link this property to the presence of intelligent agents processing information, meaning that it could be used as a tool to detect intelligence or life, and help design more powerful LLMs. Finally, it could point out new directions to the long-standing quest to understand the passage of time as an emergent phenomenon in physics.

The work was presented at the prestigious International Conference on Machine Learning (2024) and is currently available on arXiv.

If you enjoyed this article, you will like the following ones: don't miss them by subscribing to :    eeNews on Google News

Share:

Linked Articles
10s