A direct API cost comparison: headline rates, the per-call math on a typical workload, and which one to pick beyond price.
On a standard 1,500-input / 500-output call, GPT-4.1 Nano and Gemini 2.5 Flash-Lite come out exactly tied at $0.00035 per call. The winner is decided entirely by your input-to-output ratio (see below).
Output tokens cost several times more than input on both models, so the more your workload leans toward long generated answers, the more the output rate dominates the bill.
| Metric | GPT-4.1 Nano | Gemini 2.5 Flash-Lite |
|---|---|---|
| Vendor | OpenAI | |
| Input price / M tokens | $0.10 | $0.10 |
| Output price / M tokens | $0.40 | $0.40 |
| Context window | 1M | 1M |
| Cost per typical call | $0.00035 | $0.00035 |
| Cost per 10,000 calls | $3.50 | $3.50 |
Beyond the per-token math, tokenizer efficiency on non-English text and your own quality evals can shift the real cost. Price your actual prompt in the language you serve with the calculator before committing.
Both providers support prompt caching (cached input bills at roughly 10% of the standard rate) and batch processing (about 50% off for 24-hour-tolerant jobs). If one model lets you cache a larger static prefix in your setup, it can become cheaper in practice even when its headline rate is higher.
On this balanced workload it is a tie — let your input/output ratio and quality needs break it.
On a balanced 1,500/500 call they tie at $0.00035. GPT-4.1 Nano wins output-heavy work if its output rate is lower; Gemini 2.5 Flash-Lite wins input-heavy work if its input rate is lower — check the table.
GPT-4.1 Nano: $0.10 input / $0.40 output. Gemini 2.5 Flash-Lite: $0.10 input / $0.40 output.
The one with the lower output rate, since generation is output-bound: Gemini 2.5 Flash-Lite ($0.40/M output).