nsorros .com
online
← back to writing

Gemini 3.5 Flash — the secret

Gemini 3.5 Flash's secret 🤫

Google just released Gemini 3.5 Flash ⚡ and on the spec sheet it looks like a great model:

🧠 Intelligence Index: 55 (top 7 of 147 on Artificial Analysis) 💰 Blended price: $1.31 per 1M tokens 🏎️ Output speed: 278 tok/s — the fastest model in its intelligence band

Someone picking a model from the above spec sheet sees: smart enough, cheap enough, fast. Lets use it.

But there is a hidden cost the spec sheet does not show, how many tokens the model actually spends to complete a task. This is the per-token vs per-task gap, and it is the reason "low / medium / high" effort knobs exist on every reasoning model now. At its default reasoning level, the new Flash thinks a lot. According to Artificial Analysis in their full Intelligence Index suite: 📊 Gemini 3.5 Flash (high): $1,552 to complete the suite 📊 Gemini 3.1 Pro Preview: $892 to complete the same suite

This paints a different picture. 57 intelligence at $892 vs 55 intelligence at $1,552. The cheaper-per-token model is more expensive because it spends 2.3× more tokens to answer the same questions. Per token Flash is 25% cheaper than Pro. Per task it is 74% more expensive 😱

Maybe the right way to think about Flash going forward is the speed tier, not the cheap tier. At 278 tok/s it is the fastest model on the intelligence-vs-speed frontier, and that is a real value proposition for latency-sensitive workloads..

Gemini 3.5 Flash — per-token vs per-task cost