What is the performance of top AI systems in the GAIA benchmark?
Question
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Lorem ipsum dolor sit amet, consectetur adipiscing elit.Morbi adipiscing gravdio, sit amet suscipit risus ultrices eu.Fusce viverra neque at purus laoreet consequa.Vivamus vulputate posuere nisl quis consequat.
Answers ( 1 )
The performance of top AI systems in the GAIA benchmark varies significantly. For example, Transformers Agent scores 44.2% on the validation set and 33.3% on the test set, performing best on level-three questions. Autogen-based submissions score 40%, while GPT-4-Turbo scores less than 7%. These results highlight the challenges even top AI systems face in the GAIA benchmark.