lennxa

Gemini 2.5 Pro and Google's second chance with AI

Importance: 3 | # | ai, gemini, nathan-lambert

Nathan Lambert:

Google, with its immense infrastructure and talent, has been the safe bet for the question of “Who will have the best models in a few years?” Google took a long time to get here, overcoming Bard’s launch and some integration headaches, and yet the model they launched today, Gemini 2.5 Pro feels like the biggest jump in evaluation scores we’ve seen in quite some time.

To summarize, while more evaluations are rolling in, Gemini 2.5 Pro is 40+ Elo points clear on the popular ChatBotArena / LM Arena benchmark (more here). Normally, when a model launches and claims the top spot, it’s barely ahead. In fact, this is the second biggest jump of the top model in LMSYS history, only behind the GPT-4 Turbo overtaking Claude 1. GPT-4 Turbo is when models were not really trained for the benchmark, so progress was much faster.

The more often state-of-the-art models are released in a fixed time window, the more confident you can be in the pace of progress continuing... The ceiling on performance is rising and the potential value underneath it that we haven’t unlocked is continuing to balloon.

It’s [Gemini 2.5] perfectly capable, but without a depth of personality it feels lost relative to the fun GPT-4.5 or the ever quirky Claude.

#ai #gemini #im-3 #nathan-lambert