Medical researchers have charted the performance of AI frameworks over time with tools that are used to measure dementia in humans.
The peer-reviewed research, ‘Age against the machine’, showed that all the framework chatbots, from OpenAI’s GPT 4o to Google Gemini 1.5, show signs of decline and dementia as the versions progressed.
The Montreal Cognitive Assessment MoCA test version 8.1 was administered to the leading large language models with instructions identical to those given to human patients. Scoring followed official guidelines and was evaluated by a practising neurologist.
None of the chatbots examined was able to obtain the full score of 30 points, with most scoring below the threshold of 26. This indicates mild cognitive impairment and possibly early dementia say the researchers. “Older” large language model versions scored lower than their “younger” versions, as is often the case with human participants, showing cognitive decline seemingly comparable to neurodegenerative processes in the human brain.
Specifically, ChatGPT 4 showed minor loss of executive function compared with ChatGPT 4o, as measured by a one point difference in their MoCA scores, but the effect was far more pronounced when comparing Gemini 1.0 and 1.5, which differed by six points. Versions of Anthropic’s Claude chatbot were also assessed.
As the two versions of Gemini are less than a year apart in “age,” this may indicate rapidly progressing dementia. Additional tests, such as the Clinical Dementia Rating, would be needed to solidify this hypothesis.
Article DOI: 10.1136/bmj-2024-081948