gpt-cost.com

How much will your next AI call actually cost?

Paste a prompt. Pick a model. See the bill before you ship. Covers 21 production LLMs from OpenAI, Anthropic, Google, DeepSeek and Meta — updated July 2026.

Last price sync · June 10, 2026

Models tracked · 21

Avg. tokenization error · ±4%

Choose a model01

Paste your prompt02

Input tokens

Characters

Expected output tokens

Estimate at scale (× calls/month)

Total cost per call

$0.0000

Input cost

$0.0000

0 tokens × $0/M

Output cost

$0.0000

0 tokens × $0/M

Per 1,000 calls

$0.00

at scale

Get the full breakdown free

Enter your email to unlock the exact cost, plus a one-page PDF report showing how this prompt compares across all 21 models. We'll also send the weekly LLM pricing changelog — opt out anytime.

No spam. Used only for the changelog. POST your collection to your ESP — replace the handler in the script below.

9 ways to cut your token bill

SEO-friendly long-form content

→ 01

Cache your system prompt

Anthropic, OpenAI and Google all let you mark a static prefix as cached. Subsequent calls bill that prefix at 10% of the input rate. For a chatbot with a 2,000-token system message, this alone cuts cost by 40-60%.

→ 02

Use the Batch API for non-realtime work

Anthropic, OpenAI and Google Vertex all offer a 50% discount for jobs you can wait 24 hours on: nightly summaries, data enrichment, eval runs, embedding refresh.

→ 03

Route by difficulty

Don't send "extract this email address" to Opus. A two-tier setup — Haiku/Flash/Nano for triage, Sonnet/GPT-5.4 for the hard 20% — typically saves 70-85% versus single-model routing.

→ 04

Cap your output

Output tokens cost 5× input tokens across every current frontier model. Set max_tokens aggressively and prompt for terse responses. "Reply in one sentence" routinely halves output cost.

→ 05

Compress your context

Stop dumping full documents. Use a retrieval step to pull only the relevant passages — top-k=5 with 200-token chunks beats stuffing a 50K-token doc into context, and is 20× cheaper.

→ 06

Strip JSON whitespace

Pretty-printed JSON in a prompt costs roughly 30% more tokens than minified. Same for indented YAML. The model doesn't care; your invoice does.

→ 07

Switch to a cheaper tokenizer

Non-English text is tokenized far less efficiently. Cyrillic, CJK and Arabic can produce 2-4× more tokens than equivalent English. Gemini and DeepSeek tokenize Cyrillic noticeably better than GPT-4 family.

→ 08

Reuse with prompt templating

Build prompts from versioned templates instead of regenerating them per request. Combined with caching, this is the single biggest lever for chat products.

→ 09

Monitor before you optimize

Log token usage per endpoint from day one. Most teams discover that 3 endpoints account for 80% of spend — and they're rarely the ones you'd guess.

Honest answers

FAQ

How accurate is this calculator?

Token counts here use the standard ~4-chars-per-token heuristic for Latin scripts and ~2-chars-per-token for Cyrillic/CJK. That matches official OpenAI and Anthropic tokenizer output to within roughly ±4% on typical English prose. For exact billing, use each provider's official tokenizer library (tiktoken, anthropic-tokenizer, sentencepiece) — but for budgeting and pre-flight estimates this is more than precise enough.

Where do the prices come from?

Directly from each vendor's official pricing page, last synced June 10, 2026. We don't include batch or cache discounts in the default calculation because they apply conditionally — see the Tips section above for how to layer them in.

Why are output tokens so much more expensive?

Generation is autoregressive — each output token requires a full forward pass through the model. Input tokens can be processed in parallel. Across every current frontier model the ratio is fixed at 5×.

Does this work for images, audio or video?

Not yet — this build covers text-only models. Multimodal pricing is a separate calculation (Vision charges per tile, Veo per second of video, Whisper per minute of audio) and we'll add a dedicated tab for it in the next revision.

Can I use this commercially?

Yes. The whole thing is a single HTML file with no external dependencies beyond Google Fonts. Self-host it, white-label it, monetize it. If you ship it as a public tool, a link back is appreciated but not required.

Token cost guides

Go deeper on how LLM API pricing works and how to estimate your own bill:

AI token cost explained — what a token is, cost per million tokens across GPT, Claude and Gemini, and why output is 5× input.
Anthropic API cost per million tokens — every Claude model's rate, plus caching and batch discounts.
How to calculate Claude API costs — the exact formula, a worked example, and a free calculator.
LLM token pricing compared — every provider's input and output rate in one table.
Cost per LLM token — what a single token costs and how it scales to a monthly bill.
Token cost calculator guide — how to estimate any LLM's price in four steps.
The cheapest LLM API in 2026 — the cheapest model for chat, RAG, content, batch and agents.

How much will your next AI call actually cost?

9 ways to cut your token bill

Cache your system prompt

Use the Batch API for non-realtime work

Route by difficulty

Cap your output

Compress your context

Strip JSON whitespace

Switch to a cheaper tokenizer

Reuse with prompt templating

Monitor before you optimize

Honest answers

Token cost guides

We use cookies