Google has unveiled Gemini 3.1 Flash-Lite, its latest and most affordable Gemini 3 model, designed to handle high-volume developer workloads and data processing tasks at a competitive price.
The new model is priced at $0.25 per million input tokens and $1.50 per million output tokens, making it an attractive option for developers who require high-performance capabilities without incurring significant costs. Gemini 3.1 Flash-Lite is available in preview mode via the Gemini API in Google AI Studio and Vertex AI, although it is not included in the Gemini consumer app.
Compared to its predecessor, Gemini 2.5 Flash-Lite, the new version is more expensive but offers significantly enhanced capabilities. Despite the higher cost, Gemini 3.1 Flash-Lite generally outperforms Gemini 2.5 Flash at a lower price point, making it a more cost-effective solution for certain applications.
In terms of performance, Gemini 3.1 Flash-Lite outshines competitors such as GPT-5 mini and Claude 4.5 Haiku. Although Grok 4.1 Fast is more affordable, Gemini 3.1 Flash-Lite boasts faster processing speeds, promising up to 363 tokens per second. On multimodal benchmarks, the model achieved an impressive 1432 Elo points on the Arena.ai Leaderboard, placing it among open-weight models and last-generation commercial offerings.
Google has emphasized that Gemini 3.1 Flash-Lite is designed for high-volume tasks and data processing, rather than managing fleets of agents, and as such, the company did not publish agent benchmarks for the release. However, developers can utilize the API to adjust the model’s reasoning time to control costs, with lower reasoning settings producing fewer tokens, which is particularly relevant for high-volume workloads.
This marks the first Flash-Lite version for Gemini 3.1, deviating from Google’s traditional approach of launching more capable Flash versions first or skipping Flash-Lite entirely, as seen with Gemini 3. The launch of Gemini 3.1 Flash-Lite follows the release of Gemini 3.1 Pro two weeks prior, with Google describing Flash-Lite as tailored for high-volume developer workloads at scale.




