What is the performance of top AI systems in the GAIA benchmark?

Question

Answers ( 1 )

    0
    2025-03-31T18:31:41+00:00

    The performance of top AI systems in the GAIA benchmark varies significantly. For example, Transformers Agent scores 44.2% on the validation set and 33.3% on the test set, performing best on level-three questions. Autogen-based submissions score 40%, while GPT-4-Turbo scores less than 7%. These results highlight the challenges even top AI systems face in the GAIA benchmark.

Leave an answer