lennxa

Are OpenAI and Anthropic Really Losing Money on Inference?

Importance: 4 | # | llm

Martin Alderson:

I'm only going to look at raw compute costs. This is obviously a complete oversimplification, but given how useful the current models are - even assuming no improvements - I want to stress test the idea that everyone is losing so much money on inference that it is completely unsustainable.

Martin makes some rough calculations - starting with deepseek V3 as baseline and taking into account hourly rate of H100s and assuming bandwidth bound process (3.35 TB/s HBM badnwidth) - calculates number of forward passes per second. Two seperate calculations for prefill and decode stage and here's the outcome:

The asymmetry is stark: $144 ÷ 46,800M = $0.003 per million input tokens versus $144 ÷ 46.7M = $3.08 per million output tokens. That's a thousand-fold difference!

Of course the prefill stage can become compute bound at long context lengths.

These costs map to what DeepInfra charges for R1 hosting, with the exception there is a much higher markup on input tokens.

By these numbers, there is good chance that even the claude-max plan for claude-code users are running at a 10x markup for heavy users - costing $20 and charging $200.

One interesting point here is how Anthropic and OpenAI differ in their approach to models. Judging for raw token speed, my guess would be that Claude models are larger but token efficient (Opus 4.1 barely thinks and is competitive with say GPT-5-thinking-high), whereas GPT5 without thinking (minimal) is a pretty dumb model and for high quality outputs it thinks forever. If each output token is so costly, and OpenAI was supposedly optimizing for inference cost - wouldn't they have gone for a model like Claude? One explanation is that these estimates are assuming MOE model with like 5% active parameters, it could be that Claude is either much higher active params or not MOE at all. It will be interesting to see what approach people take moving forward - Anthropic doesn't need to optimize for inference as much as OpenAI - no free users.

#im-4 #llm