Skip to content

The Inference-Cost Collapse Is About to Break Every AI Pricing Model

The price of a fixed unit of model intelligence is falling roughly 10x a year, and that single curve quietly invalidates the pricing model most AI companies are built on. Build on what the curve can't touch.

By Mehdi7 min read
Share
On this page

The cost of a fixed unit of model intelligence has been falling by roughly an order of magnitude a year, and that one curve invalidates the pricing model most AI companies are built on. Not the go-to-market. Not the feature roadmap. The pricing model itself — the assumption that you can charge a stable price for something whose main input cost is in freefall. If your plan quietly treats today's cost-per-token as a floor you build margin on top of, you are pricing against a number that will be a fraction of itself by the time your next contract renews.

Start with the curve, because everything follows from it. Take a capability level — say, the reasoning quality that sat at the frontier in early 2023 — and track what it costs to buy that exact capability over time. The price to serve that fixed level of intelligence has collapsed by something close to 10x per year, driven by a stack of compounding effects: better hardware utilization, quantization and distillation, smaller models trained to match older large ones, speculative decoding, and brutal provider competition. The frontier keeps moving up in absolute capability, but the price of any given rung on the ladder falls fast. Yesterday's premium model is this year's commodity endpoint and next year's rounding error.

This is not a productivity story about doing more with less. It is a structural story about what happens to a business when its dominant variable cost approaches zero on a predictable schedule. Most AI pricing is quietly built as if that cost were fixed.

Per-seat pricing decouples from value and gets competed to the floor

Per-seat SaaS pricing survived for two decades because it worked: a seat was a rough proxy for value delivered, and the marginal cost of a seat was near zero and stable. Both halves are breaking at once.

The value half breaks because AI decouples work from headcount. If a model does the work of a team, "number of humans with logins" stops tracking what the product actually produces. You are charging per steering wheel for a car that increasingly drives itself. Charge $50 a seat while the model does three analysts' worth of output, and you have left the difference on the table for a competitor to price against.

The cost half breaks in the other direction, and this is the part people underweight. When your marginal cost per seat is genuinely near zero and falling, a competitor can undercut your per-seat price to almost nothing and still make money. Price competition doesn't stop at your comfortable margin; it stops at the new, much lower cost floor. In a market where the floor drops 10x a year, "we have a healthy per-seat price today" is not a position. It's a target. Anyone willing to run last year's model — now dirt cheap and still excellent for most tasks — can undercut you next quarter without bleeding.

So per-seat pricing on an AI product is a slow-motion race to a floor that is itself falling. You can win that race and still end up with a commodity margin.

The moat moves to exactly the things the curve does not touch

Here is the reframe that matters. The cost collapse is not eroding your defensibility. It is eroding one specific source of it — the ability to afford to run the model — and that source was never much of a moat. It just felt like one in 2023, when access and cost were real barriers.

Ask what the cost curve does not commoditize. It commoditizes raw inference. It does not commoditize the workflow the model is embedded in, the proprietary data that makes your outputs better than a generic call, the switching costs of a system your customer has wired into their operations, or the trust required to let a model act on their behalf in a domain where being wrong is expensive.

I feel this from the other side of my own work. At Kommerce, the hard problem in cash-on-delivery commerce across trust-scarce markets was never model access — it was that a buyer and a seller who have never met and share no legal recourse need a reason to transact at all. Trust is the entire product. You cannot download it at a lower price next year. The collapse in inference cost does nothing to that moat, because the moat was never made of compute. If anything, cheaper intelligence makes the trust layer more valuable: everyone now has the same raw reasoning, so the only remaining differentiator is who the customer is willing to believe and hand control to.

That generalizes. As the commodity input gets cheaper and more uniform, competitive advantage migrates to the non-commodity layers — integration, data, trust. It's the same reason I keep arguing that network effects are more often invoked than earned: founders reach for the defensibility story that sounds strongest rather than the one that's actually load-bearing. Post-collapse, the honest question is not "do we have a moat" but "is our moat made of the thing that's about to be free?" If the answer is yes, you don't have a moat. You have a head start with an expiration date.

Outcome pricing becomes viable — and exposes your own margin

If per-seat is breaking and per-token is a commodity, the structurally correct move is to price on outcomes: charge for the resolved support ticket, the completed reconciliation, the qualified lead, the passing legal review. Outcome pricing has an elegant property in a cost-collapse world. Your revenue is anchored to the value of the result, so as your input cost falls, your margin widens instead of your top line eroding. You capture the collapse instead of being captured by it. Per-token pricing does the opposite: it chains your revenue to the very number racing to zero.

Outcome pricing carries a trap that follows from the same curve, and almost nobody prices it in: it makes your margin legible. When you charge $2 to resolve a ticket, a sophisticated buyer knows your underlying inference cost and knows it is falling an order of magnitude a year. Your margin becomes a visible, contestable number, and every renewal turns into a negotiation over how much of the cost collapse you get to keep versus how much the customer claws back. Cost-plus framing — even implicit — is a losing frame here, because your cost is public knowledge and heading to zero.

The only defense is to make the price a function of the value and the trust, not the compute. A resolved dispute in a market with no legal recourse is worth what it's worth whether the model behind it costs a dollar or a cent to run. If your customer is anchoring on your token cost, you have already lost the framing — you taught them to see you as a reseller. If they're anchoring on what the outcome is worth to them, the cost collapse is pure margin expansion you get to keep.

The token-resale tier compresses to zero

Now the blunt part. A whole tier of companies has one actual business: call a foundation model, add a thin layer, resell the tokens with a markup. In a stable-cost world, that markup is viable. In a world where the underlying cost falls 10x a year and the frontier labs sell their own endpoints ever cheaper, the markup gets squeezed from both sides — your input cost is falling, but so is the price the customer will pay, because they can watch the same public price curve you can. The spread you live on compresses toward zero on a schedule.

This is the real substance under every tired "is it just a wrapper" debate. The debate usually gets argued as aesthetics — wrappers are lazy, wrappers are cheating — which misses the point. The question is purely structural: is the company's margin a markup on a commodity that's collapsing, or a charge for a layer the collapse doesn't reach? A "wrapper" with deep workflow lock-in, proprietary data, and a trust relationship is not a wrapper in any economically meaningful sense; it just happens to call an API. A beautifully engineered product whose entire margin is a token markup is a wrapper in the only sense that counts, however good the engineering.

The reflexive response to this pressure is to add more: more features, more surface area, more model calls, more markets, on the theory that scale outruns the compression. That instinct is usually the disease presenting as the cure. Most startups facing margin pressure die of indigestion, not starvation — they take on scope to escape a structural problem and choke on the complexity instead of fixing the thing commoditizing underneath them. The cost collapse does not reward doing more. It rewards being irreplaceable at one thing the curve can't touch.

What to actually build on

The discipline is simple to state and hard to live by: assume the raw intelligence in your product will be free, and ask what you have left. If the answer is "a great UI on a commodity model," you have a feature, and features get absorbed. If the answer is a proprietary dataset that makes you measurably better and compounds with every customer, a workflow so embedded that ripping you out means re-architecting an operation, or a trust relationship in a domain where a wrong answer costs real money — you have a business that gets stronger as inference gets cheaper, because the expensive part was never the inference.

The founders who win the next few years won't be the ones who found a clever markup on tokens. They'll be the ones who looked at a cost curve falling an order of magnitude a year and built everything that curve can never reach. Price the collapse into your model now, or the market will price it in for you — at the floor.

Frequently asked questions

Does the inference-cost collapse mean AI startups have no defensible margins?
No. It means margin can no longer come from being the party that can afford to run the model. The collapse commoditizes raw inference, not the things wrapped around it: workflow integration, proprietary data that improves the product with use, switching costs, and trust. Companies whose margin rests on those layers keep it; companies whose margin rests on a token markup lose it.
Why is per-seat pricing specifically threatened?
Per-seat pricing assumes the marginal cost of serving a seat is stable and that a seat is a good proxy for value delivered. When your dominant input cost falls an order of magnitude a year, a competitor can undercut your per-seat price without losing money, and the number of humans holding seats stops tracking the work the model does. The unit you're charging for decouples from both your cost and your value.
Is usage-based or outcome-based pricing the safe alternative?
Outcome-based pricing is structurally better because it anchors revenue to value rather than token count, so falling costs widen your margin instead of eroding your top line. The catch is transparency: when buyers know your input cost is collapsing, cost-plus framing invites relentless price pressure. Outcome pricing only holds if it's justified by the result and the trust behind it, not defended as a markup over compute.

Filed under Business & Tech News. The signal underneath the headline.

Essays like this, in your inbox.

Thoughtful essays. No spam. Unsubscribe anytime.

Business & Strategy

Network Effects Are a State You Maintain, Not a Wall You Own

"We have network effects" is the most over-claimed moat in startup strategy. Most so-called network effects saturate, cluster, and leak — and advantage is a metabolism you run, not an asset you possess.

9 min read