Paste a prompt. Pick a model. See the bill before you ship. Covers 18 production LLMs from OpenAI, Anthropic, Google, DeepSeek and Meta — updated May 2026.
Enter your email to unlock the exact cost, plus a one-page PDF report showing how this prompt compares across all 18 models. We'll also send the weekly LLM pricing changelog — opt out anytime.
No spam. Used only for the changelog. POST your collection to your ESP — replace the handler in the script below.
Anthropic, OpenAI and Google all let you mark a static prefix as cached. Subsequent calls bill that prefix at 10% of the input rate. For a chatbot with a 2,000-token system message, this alone cuts cost by 40-60%.
Anthropic, OpenAI and Google Vertex all offer a 50% discount for jobs you can wait 24 hours on: nightly summaries, data enrichment, eval runs, embedding refresh.
Don't send "extract this email address" to Opus. A two-tier setup — Haiku/Flash/Nano for triage, Sonnet/GPT-5.4 for the hard 20% — typically saves 70-85% versus single-model routing.
Output tokens cost 5× input tokens across every current frontier model. Set max_tokens aggressively and prompt for terse responses. "Reply in one sentence" routinely halves output cost.
Stop dumping full documents. Use a retrieval step to pull only the relevant passages — top-k=5 with 200-token chunks beats stuffing a 50K-token doc into context, and is 20× cheaper.
Pretty-printed JSON in a prompt costs roughly 30% more tokens than minified. Same for indented YAML. The model doesn't care; your invoice does.
Non-English text is tokenized far less efficiently. Cyrillic, CJK and Arabic can produce 2-4× more tokens than equivalent English. Gemini and DeepSeek tokenize Cyrillic noticeably better than GPT-4 family.
Build prompts from versioned templates instead of regenerating them per request. Combined with caching, this is the single biggest lever for chat products.
Log token usage per endpoint from day one. Most teams discover that 3 endpoints account for 80% of spend — and they're rarely the ones you'd guess.
Token counts here use the standard ~4-chars-per-token heuristic for Latin scripts and ~2-chars-per-token for Cyrillic/CJK. That matches official OpenAI and Anthropic tokenizer output to within roughly ±4% on typical English prose. For exact billing, use each provider's official tokenizer library (tiktoken, anthropic-tokenizer, sentencepiece) — but for budgeting and pre-flight estimates this is more than precise enough.
Directly from each vendor's official pricing page, last synced May 25, 2026. We don't include batch or cache discounts in the default calculation because they apply conditionally — see the Tips section above for how to layer them in.
Generation is autoregressive — each output token requires a full forward pass through the model. Input tokens can be processed in parallel. Across every current frontier model the ratio is fixed at 5×.
Not yet — this build covers text-only models. Multimodal pricing is a separate calculation (Vision charges per tile, Veo per second of video, Whisper per minute of audio) and we'll add a dedicated tab for it in the next revision.
Yes. The whole thing is a single HTML file with no external dependencies beyond Google Fonts. Self-host it, white-label it, monetize it. If you ship it as a public tool, a link back is appreciated but not required.