💰 News Beginner

DeepSeek's Permanent 75% Cut and the AI Price War

DeepSeek made its 75% V4-Pro API discount permanent, dropping to $0.0035 per million tokens. Here's how every major AI lab is forced to respond.

The AI Dude ¡ May 25, 2026 ¡ 6 min read

The Discount That Became the New Floor

On May 23, DeepSeek announced that its 75% promotional discount on V4-Pro API access is now permanent (per Reuters and The Next Web). What started as a limited-time incentive to attract developers is now the sticker price—input tokens at approximately $0.0035 per million, output at commensurately low rates. That's not a promotional gimmick anymore. It's a statement about where DeepSeek thinks model pricing is heading: toward near-zero marginal cost.

This matters less for what DeepSeek alone does and more for what it forces everyone else to do. When a frontier-class model—one that benchmarks competitively against GPT-5 and Claude on standard evals—prices itself at a fraction of Western alternatives, the entire market's pricing logic gets stress-tested.

What Actually Changed

DeepSeek initially rolled out the 75% discount as a time-limited promotion to pull developers onto the V4-Pro API. The model itself launched with strong reasoning and code generation benchmarks that put it in conversation with top-tier Western models. The discount was aggressive but temporary—or so everyone assumed.

Making it permanent signals two things:

  • Adoption worked. Enough developers switched or experimented that DeepSeek has the volume to justify the lower margins permanently.
  • Cost structure allows it. DeepSeek's inference costs—likely benefiting from aggressive quantization, custom silicon strategies, and lower labor costs—let them sustain this pricing without bleeding cash at scale.

Per Engadget's coverage, the new permanent pricing makes V4-Pro one of the cheapest frontier-tier models available via API, full stop.

How It Stacks Up Against Western Pricing

ModelInput (per 1M tokens)Output (per 1M tokens)Context Window
DeepSeek V4-Pro (permanent)~$0.0035Low (exact rate varies by tier)128K
GPT-5 (OpenAI)$2.50–$15$10–$60128K–1M
Claude Opus 4 (Anthropic)$15$75200K
Claude Sonnet 4 (Anthropic)$3$15200K
Gemini 2.5 Pro (Google)$1.25–$2.50$10–$151M

Note: Pricing from each provider's public pricing page as of May 2026. DeepSeek's exact per-token breakdown varies by plan; the $0.0035 figure comes from the Reuters report on the announcement. Western model prices reflect standard API rates without volume discounts.

The gap is staggering. Even comparing DeepSeek V4-Pro against the cheapest Western frontier option (Gemini 2.5 Pro input at $1.25/M), DeepSeek is roughly 350x cheaper on input tokens. That's not a competitive discount—that's a different pricing universe.

Why This Isn't Just a China Subsidy Story

The lazy take is "China subsidizes AI to gain market share." My read: that's partly true but misses the structural picture. DeepSeek's cost advantage comes from multiple compounding factors:

  • Architecture efficiency. DeepSeek has consistently published research on mixture-of-experts and efficient inference. V4-Pro likely activates a fraction of its total parameters per token, dramatically reducing compute per query.
  • Hardware arbitrage. While US labs pay premium rates for H100/B200 clusters, Chinese labs have been creative with older hardware, custom ASICs, and optimization tricks born from export-control constraints.
  • Different economics. DeepSeek (backed by High-Flyer, a quant fund) doesn't need to justify VC returns or $20B valuations. They can price at cost-plus-tiny-margin rather than cost-plus-SaaS-multiple.

None of this means the model is as good as GPT-5 or Claude Opus on every task. But for the 80% of API workloads that don't need absolute peak performance—summarization, classification, structured extraction, routine code generation—V4-Pro at $0.0035/M tokens makes the Western alternatives look like luxury goods.

The Pressure on OpenAI, Anthropic, and Google

Each major lab faces a different version of this problem:

OpenAI

OpenAI has the most pricing tiers and the most aggressive history of cuts. They dropped GPT-4o pricing repeatedly through 2025. But their current GPT-5 pricing ($2.50+ input) still sits orders of magnitude above DeepSeek. OpenAI's play has been to justify premium pricing through capability gaps—reasoning, tool use, multimodality. That works as long as the capability gap holds. It's narrowing.

Anthropic

Anthropic's pricing is the highest of the big three, especially at the Opus tier. Their argument is safety, reliability, and enterprise trust—not cost efficiency. DeepSeek's price cut doesn't directly threaten Anthropic's core enterprise customers who pay for compliance and predictability. But it absolutely threatens the developer/startup tier that Sonnet targets.

Google

Google is best positioned to respond on price. They own their own TPU silicon, have massive scale economics, and already price Gemini Flash aggressively. Google could plausibly match or approach DeepSeek's pricing on their lower-tier models. Whether they choose to is a strategic question about margin preservation vs. market share.

What This Means for Developers and Agent Builders

If you're building AI agents—systems that make dozens or hundreds of LLM calls per task—the cost per token isn't just a line item. It's the difference between viable and non-viable product economics. Consider:

  • A customer support agent making 50 LLM calls per conversation at GPT-5 rates might cost $0.50–$2.00 per conversation
  • The same workflow on DeepSeek V4-Pro might cost fractions of a penny

That's the kind of gap that changes what's economically buildable. Coding agents, research assistants, and data pipeline automations that were marginal at Western API prices become trivially cheap on DeepSeek's pricing.

The tradeoffs are real, though:

  • Data residency. DeepSeek routes through Chinese infrastructure. For regulated industries (healthcare, finance, government), that's often a non-starter regardless of price.
  • Reliability and SLAs. DeepSeek's API uptime and rate limit guarantees are less established than OpenAI's or Anthropic's.
  • Censorship patterns. The model has documented refusal patterns on politically sensitive topics that may not align with all use cases.
  • Long-term continuity. Pricing this low raises questions about sustainability—will the rate hold if DeepSeek hits 10x current volume?

The Bigger Picture: Commoditization Is Accelerating

The permanent price cut isn't the news. The news is that a frontier-class model can be priced this low and still be economically viable for its operator.

That tells you something about where inference costs are actually headed industry-wide. The Western labs charge what they charge not because inference costs that much, but because they're recouping massive training investments ($100M+ per frontier run) and funding the next generation. DeepSeek, with reportedly lower training costs and different capital structures, can price closer to true marginal cost.

This is the pattern we've seen in every compute commodity—cloud storage, bandwidth, basic compute. Margins compress toward zero over time. The question for AI labs isn't whether this happens, but how fast, and whether they can stay ahead on capabilities long enough to justify premium pricing.

My Read on What Comes Next

Three predictions, clearly labeled as speculation:

  • Google responds first. They have the margin structure and silicon ownership to cut Gemini Flash pricing aggressively. I'd expect a response within weeks, not months.
  • OpenAI leans into differentiation, not price matching. They'll emphasize GPT-5's reasoning, tool use, and multimodal capabilities rather than trying to win a race to the bottom against Chinese cost structures.
  • The mid-tier collapses. Models priced at $1–5/M input tokens without a clear capability moat become hard to justify. The market bifurcates into "cheap and good enough" (DeepSeek, Gemini Flash) and "expensive but measurably better on hard tasks" (GPT-5, Claude Opus).

For developers: the practical move is to architect your systems for model swappability. Route easy tasks to the cheapest viable model, reserve expensive calls for where you need peak performance. DeepSeek's permanent price cut just made the "cheap" tier dramatically cheaper—and made that routing strategy more important than ever.

DeepSeek V4-Pro price cutAI model pricing 2026DeepSeek permanent discountAI API costsAI pricing war

Keep reading