Gemini 2.5 Flash-Lite pricing.

$0.10 per million input tokens, $0.40 per million output tokens. 1M context window. Free calculator below — paste your actual prompt to see what your real workload will cost.

What does Gemini 2.5 Flash-Lite cost in 2026?

Gemini 2.5 Flash-Lite is the cheapest major-vendor model from Google. As of May 2026, it is billed at $0.10 per million input tokens and $0.40 per million output tokens. The output rate is approximately 4.0× higher than the input rate, which is standard across the frontier LLM tier — output generation is autoregressive and computationally more expensive than input processing.

To put it in concrete terms: a typical chat call with a 1,500-token prompt and a 500-token response on Gemini 2.5 Flash-Lite costs $0.0003 per invocation. That includes both directions. If you scale that to 10,000 calls per month, you are looking at around $3.50 in API charges before any optimizations like prompt caching or batch processing.

The 1M context window means you can fit roughly 750,000 words of input into a single request. This matters for retrieval-augmented generation (RAG), document analysis, and agentic workflows where the model needs to track long histories.

Monthly cost scenarios for Gemini 2.5 Flash-Lite

The table below assumes a 1,500-token input and 500-token output per call — close to a typical chat or moderate-complexity reasoning task. Adjust upward if you do heavy RAG (longer inputs) or code generation (longer outputs).

Use caseCalls / monthEstimated costPer call
small project1,000$0.35$0.0003
startup app10,000$3.50$0.0003
production scale100,000$35.00$0.0003
enterprise1,000,000$350.00$0.0003

When to use Gemini 2.5 Flash-Lite

Gemini 2.5 Flash-Lite sits in a specific tier of the Google lineup. It is designed for cheapest major-vendor model. Use it when the task requires the quality bump that this tier provides — typically multi-step reasoning, complex extraction, long-context comprehension, or code that has to compile on the first try. Avoid using it when a cheaper model can do the job adequately; the cost difference compounds quickly at production volumes.

A pattern that works well: use Gemini 2.5 Flash-Lite as the "expert" model in a tiered routing system, where a cheaper model (like Google's smaller variant or a budget alternative) handles 70–80% of incoming requests, and only the genuinely hard cases get escalated to Gemini 2.5 Flash-Lite. This typically cuts total costs by 60–80% versus running everything through Gemini 2.5 Flash-Lite, with negligible quality impact.

How to cut your Gemini 2.5 Flash-Lite bill

Three levers, in descending order of impact:

Prompt caching. Google supports caching of static prompt prefixes. Cached input bills at approximately 10% of the standard rate. If your application sends the same long system message or document context across many calls, enabling caching can cut your input bill by 80–90%. This is the single biggest optimization available.

Batch API. For workloads that can tolerate a 24-hour latency window (nightly summarization, data enrichment, eval runs), the Batch API gives a flat 50% discount. This applies to both input and output tokens. If even 30% of your traffic can be batched, you save 15% off your total bill with no quality compromise.

Output capping. Since output is roughly 5× more expensive than input on Gemini 2.5 Flash-Lite, controlling output length is high-leverage. Set max_tokens aggressively. Explicitly prompt for terse responses ("answer in one sentence", "respond with JSON only, no prose"). A 50% reduction in output length translates directly to a 40%+ reduction in total cost for most workloads.

Tokenization notes for Gemini 2.5 Flash-Lite

Gemini 2.5 Flash-Lite uses Google's tokenizer, which is optimized for English. Non-English text — particularly Cyrillic, CJK, and Arabic — produces 2-4× more tokens than the equivalent English content. This effectively makes the model 2-4× more expensive for those languages. If you serve a multilingual user base, factor this into your cost projections, and consider whether translating to English at preprocessing time might be cheaper than paying the tokenization tax.

JSON whitespace also matters. Pretty-printed JSON in prompts can cost 30% more tokens than minified JSON. The model does not care about formatting; your invoice does.

Advertisement

Related models

price-adjacent alternatives
OpenAI

GPT-4.1 Nano

$0.10 in · $0.40 out
DeepSeek

DeepSeek V3.2

$0.14 in · $0.28 out
OpenAI

GPT-5.4 Nano

$0.20 in · $1.25 out
Anthropic

Claude Haiku 3

$0.25 in · $1.25 out