All LLM API Pricing — 21 Models Compared (June 2026)

How to read this page

API pricing for large language models is quoted in dollars per million tokens. Input tokens (what you send) and output tokens (what the model returns) are billed separately, almost always at a 1:5 ratio across the frontier tier — output costs five times more than input.

A "token" is a chunk of text the model processes. Roughly 4 characters of English text equals one token. Other scripts tokenize less efficiently: Cyrillic and CJK can produce 2–4× more tokens than the equivalent English string.

To estimate your monthly bill: take your average prompt length, multiply by your call volume, multiply by the input rate. Then take your average response length, multiply by call volume, multiply by the output rate. Add the two numbers. That's your floor — caching and batch APIs can take 50–90% off that figure for the right workloads.

The current price floor and ceiling

The cheapest production text API right now is Gemini 2.5 Flash-Lite at $0.10/$0.40 per million tokens, followed by GPT-4.1 Nano at $0.10/$0.40 and DeepSeek V3.2 at $0.14/$0.28. At the other end, GPT-5.5 Pro commands $30/$180 per million — a 300× spread between the floor and the ceiling.

For most production workloads the sweet spot sits between Claude Haiku 4.5 ($1/$5), Gemini 3.5 Flash ($1.50/$9) and GPT-5.4 ($2.50/$15). The flagship tier (Opus 4.8, GPT-5.5, Gemini 3.1 Pro) starts paying for itself only on tasks where the quality gain meaningfully reduces downstream cost — code review, complex extraction, agentic loops. Above the flagship tier, the new Mythos-class models (Claude Fable 5, GPT-5.5 Pro) target the hardest long-horizon work — autonomous coding agents and multi-repo refactors where per-run cost is small compared to engineer time saved.

Why prices vary so much

The 300× price spread is not arbitrary. Larger models with more parameters require more GPU memory and compute per token, which translates directly to higher hosting costs. Reasoning models like o3 and GPT-5.5 Pro generate hidden "thinking" tokens that consume compute even though they never reach the user, justifying their higher rates. Smaller models (Nano, Lite, Mini) are typically distilled from larger ones — they retain most capability on routine tasks at a fraction of the inference cost.

The cheap end of the market has been compressed dramatically over 2024–2026. What cost $20 per million tokens in late 2023 (GPT-4) now costs $0.10 (Gemini 2.5 Flash-Lite) — a 200× drop in 30 months. That trend continues to put pressure on the entire stack, which is good news for application builders.

Every LLM API price, in one place.

Claude Fable 5

Claude Opus 4.8

Claude Opus 4.8 Fast

Claude Opus 4.7

Claude Sonnet 4.6

Claude Haiku 4.5

Claude Haiku 3

GPT-5.5

GPT-5.5 Pro

GPT-5.4

GPT-5.4 Mini

GPT-5.4 Nano

GPT-4.1 Nano

o3

Gemini 3.5 Flash

Gemini 3.1 Pro

Gemini 3.1 Flash-Lite

Gemini 2.5 Flash

Gemini 2.5 Flash-Lite

DeepSeek V3.2

Llama 4 Maverick

How to read this page

The current price floor and ceiling

Why prices vary so much