How much will your next AI call actually cost?

Paste a prompt. Pick a model. See the bill before you ship. Covers 18 production LLMs from OpenAI, Anthropic, Google, DeepSeek and Meta — updated May 2026.

Last price sync · May 25, 2026
Models tracked · 18
Avg. tokenization error · ±4%
Choose a model01
Paste your prompt02
Input tokens
0
Characters
0
Total cost per call
$0.0000
Input cost
$0.0000
0 tokens × $0/M
Output cost
$0.0000
0 tokens × $0/M
Per 1,000 calls
$0.00
at scale
Get the full breakdown free

Enter your email to unlock the exact cost, plus a one-page PDF report showing how this prompt compares across all 18 models. We'll also send the weekly LLM pricing changelog — opt out anytime.

No spam. Used only for the changelog. POST your collection to your ESP — replace the handler in the script below.

Advertisement

9 ways to cut your token bill

SEO-friendly long-form content
→ 01

Cache your system prompt

Anthropic, OpenAI and Google all let you mark a static prefix as cached. Subsequent calls bill that prefix at 10% of the input rate. For a chatbot with a 2,000-token system message, this alone cuts cost by 40-60%.

→ 02

Use the Batch API for non-realtime work

Anthropic, OpenAI and Google Vertex all offer a 50% discount for jobs you can wait 24 hours on: nightly summaries, data enrichment, eval runs, embedding refresh.

→ 03

Route by difficulty

Don't send "extract this email address" to Opus. A two-tier setup — Haiku/Flash/Nano for triage, Sonnet/GPT-5.4 for the hard 20% — typically saves 70-85% versus single-model routing.

→ 04

Cap your output

Output tokens cost 5× input tokens across every current frontier model. Set max_tokens aggressively and prompt for terse responses. "Reply in one sentence" routinely halves output cost.

→ 05

Compress your context

Stop dumping full documents. Use a retrieval step to pull only the relevant passages — top-k=5 with 200-token chunks beats stuffing a 50K-token doc into context, and is 20× cheaper.

→ 06

Strip JSON whitespace

Pretty-printed JSON in a prompt costs roughly 30% more tokens than minified. Same for indented YAML. The model doesn't care; your invoice does.

→ 07

Switch to a cheaper tokenizer

Non-English text is tokenized far less efficiently. Cyrillic, CJK and Arabic can produce 2-4× more tokens than equivalent English. Gemini and DeepSeek tokenize Cyrillic noticeably better than GPT-4 family.

→ 08

Reuse with prompt templating

Build prompts from versioned templates instead of regenerating them per request. Combined with caching, this is the single biggest lever for chat products.

→ 09

Monitor before you optimize

Log token usage per endpoint from day one. Most teams discover that 3 endpoints account for 80% of spend — and they're rarely the ones you'd guess.

Advertisement

Honest answers

FAQ
How accurate is this calculator?

Token counts here use the standard ~4-chars-per-token heuristic for Latin scripts and ~2-chars-per-token for Cyrillic/CJK. That matches official OpenAI and Anthropic tokenizer output to within roughly ±4% on typical English prose. For exact billing, use each provider's official tokenizer library (tiktoken, anthropic-tokenizer, sentencepiece) — but for budgeting and pre-flight estimates this is more than precise enough.

Where do the prices come from?

Directly from each vendor's official pricing page, last synced May 25, 2026. We don't include batch or cache discounts in the default calculation because they apply conditionally — see the Tips section above for how to layer them in.

Why are output tokens so much more expensive?

Generation is autoregressive — each output token requires a full forward pass through the model. Input tokens can be processed in parallel. Across every current frontier model the ratio is fixed at 5×.

Does this work for images, audio or video?

Not yet — this build covers text-only models. Multimodal pricing is a separate calculation (Vision charges per tile, Veo per second of video, Whisper per minute of audio) and we'll add a dedicated tab for it in the next revision.

Can I use this commercially?

Yes. The whole thing is a single HTML file with no external dependencies beyond Google Fonts. Self-host it, white-label it, monetize it. If you ship it as a public tool, a link back is appreciated but not required.

Advertisement