Compare LLM API Pricing Side-by-Side

How to compare LLM APIs intelligently

The headline price per million tokens is a useful starting point, but it is almost never the full picture. Three factors routinely flip cost comparisons in practice: tokenization efficiency, caching support, and the input-to-output ratio of your specific workload.

Tokenization efficiency varies by vendor. The same English sentence costs ~10% fewer tokens on Claude than on GPT. The same Russian sentence costs ~30% fewer tokens on Gemini than on GPT. For multilingual products, this can swing the comparison entirely.

Caching support is the biggest single discount available. If your application sends a long static system message, all three major vendors will cache it and bill subsequent reads at ~10% of the standard rate. The vendor with the larger context window — and thus the larger cacheable prefix — often wins on total cost even if the per-token rate is higher.

Input/output ratio matters because output costs ~5× more than input across the board. A document Q&A workload (long input, short output) cares mostly about the input rate. A content generation workload (short input, long output) cares mostly about the output rate. Run the math on your own ratio before deciding.

Side-by-side pricing comparisons.

GPT-5.5 vs Claude Opus 4.8

Claude Fable 5 vs Claude Opus 4.8

Claude Sonnet 4.6 vs GPT-5.4

Gemini 3.5 Flash vs Claude Haiku 4.5

DeepSeek V3.2 vs GPT-5.4 Nano

Claude Opus 4.8 vs GPT-5.5 Pro

Gemini 3.1 Pro vs Claude Sonnet 4.6

Gemini 3.5 Flash vs GPT-5.4 Mini

DeepSeek V3.2 vs Gemini 2.5 Flash-Lite

GPT-5.5 vs Gemini 3.1 Pro

Claude Opus 4.8 vs Gemini 3.1 Pro

GPT-5.5 vs GPT-5.5 Pro

Claude Sonnet 4.6 vs GPT-5.5

GPT-5.4 vs Gemini 3.1 Pro

GPT-5.4 vs Claude Opus 4.8

GPT-4.1 Nano vs Gemini 2.5 Flash-Lite

Claude Haiku 4.5 vs GPT-5.4 Mini

DeepSeek V3.2 vs GPT-4.1 Nano

Gemini 2.5 Flash vs GPT-5.4 Mini

o3 vs GPT-5.4

o3 vs Claude Opus 4.8

How to compare LLM APIs intelligently