Pricing changes, tokenizer quirks, and the optimizations we keep finding in production LLM deployments.
June 10, 2026 — fresh.
Anthropic released Claude Opus 4.8 on May 28, 2026, and the headline is that the price card did not move: still $5 per million input tokens, $25 per million output tokens, same as Opus 4.7. The real news is buried in the supporting tiers. Opus 4.8 Fast — the latency-optimized variant that streams 2.5× faster than standard — now lists at $10/$50 per million tokens, which is roughly 3× cheaper than the equivalent Fast Mode on Opus 4.7 ($30/$150). For applications where a user is waiting on the response (real-time chat, voice agents, IDE completions), that delta changes the equation significantly. Anthropic also reports that Opus 4.8 is "around four times less likely than its predecessor" to produce certain failure modes — we are running our own production-traffic comparison and will publish numbers next week.
June 10, 2026 — released yesterday.
Anthropic followed Opus 4.8 with Claude Fable 5 on June 9, 2026. It is the first generally available Mythos-class model — Anthropic's new tier sitting above the Opus line for the hardest long-horizon work. Pricing matches Opus 4.8 Fast at $10/$50 per million tokens. The benchmark gap that justifies the upgrade is real on coding tasks: Anthropic reports 80.3% on SWE-Bench Pro versus 69.2% for Opus 4.8, with the difference concentrated on multi-file refactors and multi-repo migrations. For most application work, Opus 4.8 remains the better economic choice; Fable 5 is targeted at autonomous coding agents that run for hours without supervision and at multi-repo refactor jobs where the per-run cost is small compared to the engineer time saved.
Draft — coming next week.
If you run an LLM application that serves Russian-speaking users, you are probably paying more than you should. The reason is tokenizer design. GPT, Claude, and Gemini were all trained on English-heavy corpora, and their byte-pair encoders carve English into long, efficient tokens. The same sentence in Russian gets shattered into character-level fragments. We measured the effect across 200 representative prompts: Russian text consistently costs 2.1× to 2.6× more tokens than the English equivalent on GPT-5.5, 1.9× to 2.4× on Claude Opus 4.8, and a more modest 1.4× to 1.7× on Gemini 3.5 Flash. The Gemini gap is narrower because Google's tokenizer was trained on a more multilingual corpus from the start.
Draft — in progress.
We ran four real production applications side-by-side on GPT-5.5 and Claude Opus 4.8 for a full billing cycle, with identical prompts and matched user traffic. The applications: a customer support agent, a code review bot, a contract analysis pipeline, and a content generation tool. Total spend over 30 days: $4,200 on GPT-5.5, $3,650 on Claude Opus 4.8. Per-call quality scores: statistically indistinguishable on three of four apps, with Claude winning narrowly on the contract analysis task. Full breakdown by application coming next week.
Draft — planned.
Prompt caching is the single biggest cost lever in LLM applications, and almost every team we audit is leaving 30–60% of potential savings on the table. The mechanics are different across providers — Anthropic uses explicit cache_control breakpoints, OpenAI auto-caches prefixes over 1,024 tokens, Google requires explicit context_caching API calls. The cookbook will cover the syntax for each, the minimum prefix length that actually saves money, and the gotchas (cache invalidation, TTL behavior, cost when cache misses).
Draft.
RAG workloads have a specific cost profile: long inputs (retrieved chunks + question), short outputs (answer). The cost-per-call equation strongly favors models with low input rates. Currently the three contenders are Gemini 2.5 Flash-Lite ($0.10/$0.40), GPT-4.1 Nano ($0.10/$0.40), and DeepSeek V3.2 ($0.14/$0.28). We tested all three on 1,000 real customer support queries. Spoiler: the cheapest option lost on quality enough that the slightly more expensive one was the better economic choice. Full results next month.