Topics: AI Models, Open Source, Enterprise Costs, API Pricing


Claude Opus 4.7 costs $5 per million input tokens and $25 per million output tokens. GPT-5.5 is $5 input and $30 output. DeepSeek V4, released as open weights on Friday, costs $1.74 input and $3.48 output, runs a 1 million token context window, and scores within a few benchmark points of both on math and Q&A.

The pricing argument for closed frontier models just got harder to make.


What Near-SOTA Actually Means Here

Near-SOTA is doing real work in this sentence. DeepSeek V4 is not the strongest model available. On complex multi-step reasoning tasks, long-chain analysis, and edge cases where the tail of the distribution matters as much as the median, Opus 4.7 and GPT-5.5 are still ahead. Those gaps are real.

The question is whether the workload being priced needs the gap closed.

Customer support routing, document summarization, pattern detection in structured data, light agents for intake and scheduling: on those tasks, the performance gap between DeepSeek V4 and a frontier closed model does not justify an 8x difference in output cost. Companies processing millions of tokens daily for predictable, repeatable workflows have known this math was coming. This week it arrived.


The Open Weight Factor

The per-token comparison is one part of it. The other part is that open weight means the model can run on your own hardware. The per-token cost for a locally-hosted model is electricity and amortized compute. For organizations where data handling is a compliance requirement or where vendor outages are an operational risk, the closed API option has always carried structural problems that pricing cannot fix.

Anthropic goes down for maintenance. OpenAI has degraded service incidents. Every outage on a closed API is an operational dependency that a locally-hosted model eliminates by definition.


The GPU Restriction Irony

Export controls on high-end GPU hardware to China were designed to slow Chinese AI development. DeepSeek trained V4 on restricted hardware and produced a model that competes with models trained on far more powerful chips. The constraint forced more compute-efficient training methods. Those efficiency gains are why V4 is competitive at a fraction of the infrastructure cost.

The restriction accelerated the efficiency innovation it was designed to prevent.


What This Does Not Change

DeepSeek V4 is not a drop-in replacement for the frontier models on the hardest tasks. Novel research synthesis, complex legal reasoning, production systems where wrong answers carry significant downstream consequences: those workloads still favor the models with the most capable reasoning.

The model also still requires either DeepSeek’s own API or significant local infrastructure to run. The 1 million token context window is real, but processing at that scale has its own cost structure.


The Actual Shift

The argument that open weight models cannot reach frontier-level performance was never going to survive indefinitely. It has held up on the gap being large enough to justify the cost difference for most enterprise workloads. That gap is now close enough that the cost math runs in a different direction for a large portion of production AI deployments.

If you are building Python automation or evaluating which AI backend fits a production agent, I work through those architecture decisions as a freelance developer. Rates and availability at ku5e.com/services.

Written by Mario Martinez Jr. (ku5e / Gary7) | TryHackMe Profile | ku5e.com/blog