AI closes in on matching human general intelligence
H2O.ai has announced that its h2oGPTe Agent has secured the top position on the GAIA (General AI Assistants) benchmark leaderboard with an unprecedented score of 65% — outperforming the Google Langfun Agent at 49%, Microsoft Research at 38%, and Hugging Face at 33%.
This remarkable achievement on the GAIA benchmark shows that H2O.ai is dominanting the emerging domain of general-purpose AI agents, setting a new gold standard for the industry.
H2O.ai has announced that its h2oGPTe Agent has secured the top position on the GAIA (General AI Assistants) benchmark leaderboard with an unprecedented score of 65% — outperforming the Google Langfun Agent at 49%, Microsoft Research at 38%, and Hugging Face at 33%.
This remarkable achievement shows that H2O.ai is dominanting the emerging domain of general-purpose AI agents, setting a new gold standard for the industry.
The GAIA benchmark measures how useful AI systems are in solving real-world tasks that require a lot of time, thought and effort for skilled humans. It consists of hundreds of challenges that require laborious research, data analysis, document handling and reasoning. Degree-holding human respondents achieve a score of 92% and require several human-days to solve all 300 test set problems.
The h2oGPTe Agent outpaced competitors by delivering consistent robustness, accuracy and efficiency, highlighting its readiness for enterprise use cases that depend heavily on skilled human assistants.
Sri Ambati, Founder and CEO of H2O.ai, commented: “Today we are announcing that AI is only 30% away from matching human-level general intelligence on the GAIA benchmark. Open-ended questions in GAIA are a better measure of intelligence than MMLU, which relies on multiple choice. The entire Gen AI ecosystem was barely able to pass a tenth in accuracy on one of the toughest AGI benchmarks merely a year ago.”
“Makers at H2O.ai built h2oGPTe Agentic AI wielding the best models in the world for reasoning, multi-modal image, video, language understanding, code generation and execution to ace the GAIA benchmark with a stunning 15% accuracy leap over the previous record set by researchers from Google Deepmind using the same Claude-3.5-Sonnet. The h2oGPTe Agent also beat Microsoft Research’s agent Magentic-1 that used OpenAI’s o1 model by 27%.”
“Agentic AI is eating SaaS and with h2oGPTe Agentic AI now being generally available, all our enterprise customers can solve a wide range of sophisticated business and research problems.”
H2O.ai’s success on GAIA underscores its philosophy of simplicity and adaptability, including:
- Advanced reasoning and planning for solving complex, real-world tasks.
- Multimodal comprehension across text, images, and audio for seamless context understanding.
- Integration of enterprise tools like Python execution and DriverlessAI for predictive analytics and decision-making.
Enterprise h2oGPTe 1.6 includes the Agent feature and is available on all public clouds, virtual private clouds and for on-premise deployments — https://h2o.ai/platform/enterprise-h2ogpte.