The AI wars continue to heat up. Just weeks after OpenAI declared a “code red” in its race against Google, the latter released its latest lightweight model: Gemini 3 Flash. This particular Flash is the latest in Google’s Gemini 3 family, which started with Gemini 3 Pro, and Gemini 3 Deep Think. But while this latest model is meant to be a lighter, less expensive variant of the existing Gemini 3 models, Gemini 3 Flash is actually quite powerful in its own right. In fact, it beats out both Gemini 3 Pro and OpenAI’s GPT-5.2 models in some benchmarks.
Lightweight models are typically meant for more basic queries, for lower-budget requests, or to be run on lower-powered hardware. That means they’re often faster than more powerful models that take longer to process, but can do more. According to Google, Gemini 3 Flash combines the best of both those worlds, producing a model with Gemini 3’s “Pro-grade reasoning,” with “Flash-level latency, efficiency, and cost.” While that likely matters most to developers, general users should also notice the improvements, as Gemini 3 Flash is now the default for both Gemini (the chatbot) and AI Mode, Google’s AI-powered search.
Gemini 3 Flash performance
You can see these improvements in Google’s reported benchmarking stats for Gemini 3 Flash. In Humanity’s Last Exam, an academic reasoning benchmark that tests LLMs on 2,500 questions across over 100 subjects, Gemini 3 Flash scored 33.7% with no tools, and 43.5% with search and code execution. Compare that to Gemini 3 Pro’s 37.5% and 45.8% scores, respectively, or OpenAI’s GPT-5.2’s scores of 34.5% and 45.5%. In MMMU-Pro, a benchmark that test a model’s multimodal understanding and reasoning, Gemini 3 Flash got the top score (81.2%), compared to Gemini 3 Pro (81%) and GPT-5.2 (79.5). In fact, across the 21 benchmarking tests Google highlights in its announcement, Gemini 3 Flash has the top score in three: MMMU-Pro (tied with Gemini 3 Pro), Toolathlon, and MMMLU. Gemini 3 Pro still takes the number one spot on the most tests here (14), and GPT-5.2 topped eight tests, but Gemini 3 Flash is holding its own.
Google notes that Gemini 3 Flash also outperforms both Gemini 3 Pro and the entire 2.5 series in the SWE-bench Verified benchmark, which tests the model’s coding agent capabilities. Gemini 3 Flash scored a 78%, while Gemini 3 Pro scored 76.2%, Gemini 2.5 Flash scored 60.4%, and Gemini 2.5 Pro scored 59.6%. (Note that GPT-5.2 scored the best of the models Google mentions in this announcement.) It’s a close race, especially when you consider this is a lightweight model scoring alongside these company’s flagship models.
Gemini 3 Flash cost
That might present an interesting dilemma for developers who pay to use AI models in their programs. Gemini 3 Flash costs $0.50 per every million input tokens (what you ask the model to do), and $3.00 per every million output tokens (the result the models returns from your prompt). Compare that to Gemini 3 Pro, which costs $2.00 per every million input tokens, and $12.00 per every million output tokens, or GPT-5.2’s $3.00 and $15.00 costs, respectively. For what it’s worth, it’s not as cheap as Gemini 2.5 Flash ($0.30 and $2.50), or Grok 4.1 Fast for that matter ($0.20 and $0.50), but it does outperform these models in Google’s reported benchmarks. Google notes that Gemini 3 Flash uses 30% fewer tokens on average than 2.5 Pro, which will save on cost, while also being three times faster.
If you’re someone who needs LLMs like Gemini 3 Flash to power your products, but you don’t want to pay the higher costs associated with more powerful models, I could image this latest lightweight model looking appealing from a financial perspective.
How the average user will experience Gemini 3 Flash
Most of us using AI aren’t doing so as developers who need to worry about API pricing. The majority of Gemini users are likely experiencing the model through Google’s consumer products, like Search, Workspace, and the Gemini app.
What do you think so far?
Starting today, Gemini 3 Flash is the default model in the Gemini app. Google says it can handle many tasks “in just a few seconds.” That might include asking Gemini for tips on improving your golf swing based on a video of yourself, or uploading a speech on a given historical topic and requesting any facts you might have missed. You could also ask the bot to code you a functioning app from a series of thoughts.
You’ll also experience Gemini 3 Flash in Google Search’s AI Mode. Google says the new model is better at “parsing the nuances of your question,” and thinks through each part of your request. AI Mode tries to return a more complete search result by scanning hundreds of sites at once, and putting together a summary with sources for your answer. We’ll have to see if Gemini 3 Flash improves on previous iterations of AI Mode.
I’m someone who still doesn’t find much use for generative AI products in their day-to-day lives, and I’m not entirely sure Gemini 3 Flash is going to change that for me. However, the balance of performance gains with the cost to process that power is interesting, and I’m particularly intrigued to see how OpenAI responds.
Gemini 3 Flash is available to all users starting today. In addition to general users in Gemini and AI Mode, developers will find it in the Gemini API in Google AI Studio, Gemini CLI, and Google Antigravity, the company’s new agentic development platform. Enterprise users can use it in Vertex AI and Gemini Enterprise.










