AI unit economics, and the bill nobody modeled.
AI spend behaves like a metered utility, not a software license — and what keeps it from becoming a surprise is pricing the finished outcome, tokens and retries and reliability included, rather than the seat.
Enterprise AI arrived on the budget as a software line and now bills like a utility. A license is a fixed number, negotiated once and paid whether a seat is worked hard or left idle. Inference is a meter — a variable cost that moves with every unit of work the system performs, billed increasingly by the token to the organization rather than the user. The question that decides whether an AI program survives its second year is no longer what the models can do; it is what they cost to run at volume, and that answer behaves like nothing the software budget was built to hold. Most enterprises budgeted for the first thing and are now receiving invoices for the second.
The mismatch begins with the denominator. Software has always been priced and governed by the seat, so that is how AI arrived on the budget — a per-user line, forecast from headcount. Yet a seat measures access, not consumption, and two people holding identical licenses can run bills that differ enormously depending on what they ask the system to do. Governing agentic AI by the seat is like metering a factory’s electricity by its headcount rather than the machines it runs. The seat count tells you who is allowed to consume; it says nothing about what is actually being consumed.
The unit that actually governs the bill is cost per successful outcome — what one finished piece of work costs to produce, whether that is a resolved ticket, a closed claim, or a drafted contract. That figure is a function of variables a license never exposed: the tokens a task burns, the number of steps it takes to finish, how often it retries when a step fails, and which model each step is routed to. A pilot conceals all of it, because a pilot is small, curated, and closely watched. Production surfaces it, because production is where volume compounds and no one is standing over each individual run.
The term most budgets ignore is the one that punishes hardest: a failed run still bills. The meter issues no refund for a wrong answer, an abandoned task, or a loop that never converged, so the true cost of a successful outcome is the cost of every attempt divided by the share of attempts that succeed. A workflow that completes half the time costs roughly twice per delivered result; one that completes a third of the time, three times — and that is before the cost of human rework is added back. Reliability and cost are the same problem, not two. Money spent making an agent finish its work correctly is money spent defending margin, which makes reliability the single most underpriced lever on the AI P&L.
This is why the reassurance that token prices keep falling is a trap rather than a rescue. Per-token rates do decline, quarter after quarter. Yet usage grows faster than price falls — the meter gets cheaper per unit while the number of units multiplies — so the rate on the invoice drops while the total on it climbs. A cheaper token consumed without discipline is simply a larger ungoverned bill that arrives a little more slowly. Falling prices relieve nothing on their own; they change the slope, not the direction.
The discipline that changes the direction is structural, and it is imposed at the workload rather than the seat. Each recurring workload carries its own cost-to-serve, a named owner, a target cost per successful outcome, and the specific business number it exists to move; a workload that cannot name its number should not be handed a meter. Every step is routed to the cheapest model that clears its reliability bar, reserving the expensive tier for the judgment that genuinely needs it. Because the counsel here sits behind no house product, no reseller agreements, and no vendor incentives, the optimization serves your margin rather than any platform’s consumption target. Run this way, unit cost bends downward as adoption rises — the inverse of the default curve, where spend climbs in lockstep with usage.
Intelligence has become a metered input, and metered inputs are governed by unit economics, not by license negotiation. The board-level read is plain: carve the token line out of the platform budget and project it on its own; track cost per successful outcome instead of gross spend; and hold a gross-margin target on AI-delivered work the way the business holds one on everything else it sells. The meter runs in someone else’s cloud, but the margin is yours to govern or yours to lose. The firm that prices the outcome scales AI as an asset with widening margin; the firm that prices the seat learns the shape of the bill only after it has already been paid.
Begin with a Charter.
A fixed-fee diagnostic that turns these arguments into a plan for your operation — scoped, costed, and run by the people who would operate it.