Your Most Expensive Developer Might Be Your Most Efficient
Per-developer AI spend data creates an instinct to rein in the top spenders. But raw cost without a denominator is meaningless - here's how to measure what agentic coding actually delivers.

Our CEO gets some version of this question in his LinkedIn comments most weeks now. An engineering manager somewhere pulled up their Cursor bill, saw that one person on the team was spending four times what anyone else was, and wants to know how to rein it in. Sometimes the framing is "how do we set limits" or "how do we push people onto cheaper models." Sometimes it's just open frustration about the variance.
Usually, they're looking at the wrong number.
Per-developer AI spend, whether that's Cursor seats, Anthropic calls through Claude Code, OpenAI tokens, or some mix of all three, is the first metric most teams look at once they start tracking agentic coding costs. It's also a numerator without a denominator. Knowing that Developer A costs $1,200 a month in tokens - $14,400 a year, $150K across a ten-person team, seven figures at a hundred-developer shop - tells you nothing on its own. What matters is what the $1,200 produced.
Per-Developer AI Spend Is Lopsided by Design
If you've rolled out agentic coding tools, your spend is probably lopsided. A handful of people account for most of the bill.
Hypothetical per-developer AI spend distribution for a 10-person engineering team. The top two developers account for nearly half the total.
That looks alarming as a cost report and unremarkable as an engineering observation. Agentic coding tools amplify differences in how developers already work. Some engineers live in long multi-turn sessions where the agent reads a codebase, implements a feature, runs tests, and iterates through failures. Others mostly use AI for autocomplete and the occasional quick question. Token consumption between those two modes differs by orders of magnitude before you even account for model choice.
That's a pattern to understand, not a problem to fix.
Cost Per PR Changes the Story
The fix is putting something under the number. Pick an output metric, pull requests merged is the most accessible, and divide.
Cost per PR = Monthly AI spend / PRs merged
Now the same team reads differently:
The same team with PRs merged as a denominator. Developer A, the highest spender, has the second-lowest cost per PR and nearly 2x the throughput of the next closest.
Suddenly Developer A is the person you'd ask the others to learn from. Their cost per PR is roughly in line with everyone else, and they're shipping nearly twice as many PRs as the next-closest person. Developer D is the cheapest per PR, but their total output is less than half of Developer A's.
Now the team-level question is actually useful. Do you want more Developer A's, or do you want everyone closer to Developer D's cost per unit? That depends on whether your bottleneck is throughput or budget, but at least you're asking a real question instead of reacting to a line item.
The same framing changes how you think about model choice. A developer spending more because they route agentic sessions to Opus instead of a cheaper model isn't necessarily being wasteful. If the more capable model resolves tasks in fewer turns with fewer retry loops, the cost-per-outcome can end up lower despite the higher per-token price.
Which Engineering Output to Normalize Against
PRs merged is the default starting point. The data's already in GitHub, and the units are roughly comparable across developers. The obvious problem is that PRs vary in size. A one-line config change and a multi-file feature both land as "one PR," and your top-spending developer is probably doing more of the hard ones.
If your team lives in Linear or Jira, tickets closed gets you closer to "work items completed" than raw PRs do. Tickets at least represent scoped, intentional work, though ticket granularity is famously inconsistent. Some teams write three-line fix tickets; others roll two weeks of work into a single epic.
Story points are tempting as a complexity-adjusted denominator, and in teams that actually calibrate them they're probably the best option available. Most teams don't calibrate them. If you've spent any time in planning poker sessions you already know whether yours is the exception.
Deploys work as a proxy on teams shipping continuously, less so for anyone batching releases.
None of these are perfect on their own, and they don't need to be. Track two or three in parallel and look for consistency. If a developer looks efficient on multiple axes you've got a real signal. If they only look good on one, dig in.
Patterns in Per-Developer AI Costs
Your power users have figured something out. They know when to use agentic sessions and how to scope a task so the agent doesn't spiral. They also know when to bail on a stuck session and open a fresh one instead of pushing a spiraling context further. Their spend is high, but their throughput is higher and their cost per PR lands at or below team average. Most teams have one or two of these people and don't realize it until they put the denominator in.
You'll also find people reaching for the most capable model on tasks that don't need it. A refactor that Sonnet would handle in three turns doesn't need Opus. This is one of the easier conversations to have; most people default to the best model because they haven't had a reason to think about the tradeoff, not because they're doing anything wrong.
The worst offender for wasted spend is the runaway agentic session. An 80-turn session thrashing through test failures usually means the task was scoped too broadly. The agent re-reads context, retries, re-reads, retries, and eventually hits the window and gives up. Three focused 20-turn sessions on the same work produce better results for less money. Nobody knows that until they see the bill.
And then there are the low spenders who aren't actually being efficient, they're just not using tools the team is paying for. Low cost looks responsible on a dashboard. It's genuinely fine if they're still shipping; if they're not, that's its own kind of waste. It just doesn't show up on the AI bill.
The response to all of this is shifting spend, not cutting it. Your total bill might stay flat or even grow. Cost per unit of output is what goes down.
Connecting AI Spend to Engineering Output
Two data sources have to get connected for this to work: AI tool spend and engineering output. Neither naturally talks to the other, which is most of why teams don't do this already.
On the cost side you want per-developer, per-model spend. If your team uses Cursor, the Cursor integration in Vantage breaks the bill out by developer, model, and token type. If you've also got workloads running through Anthropic or OpenAI APIs directly, whether that's Claude Code, custom tooling, or CI pipelines, those costs can land in the same view. The non-negotiable is getting spend attributed to individuals, not just rolled up at the org level.
The output side is usually easier. GitHub gives you PRs merged per developer through its API. Linear and Jira give you tickets closed. Most teams already pipe this data into some reporting surface somewhere. The gap is usually that nobody has connected it to the cost data yet.
Unit costs are the connective tissue. Define your unit as "PR merged" or "ticket closed," feed in the count from your engineering tools, and divide the attributed AI spend by that count. The output is cost-per-PR or cost-per-ticket, at the team and individual level, updating as either side changes.
If you want to understand why the cost side has the shape it does, why one session runs $6 and another $0.60, why input tokens drive most of the bill, we wrote about that in what agentic coding actually costs.
The Bill Isn't the Point
The teams that get the most out of agentic coding aren't the ones spending the least. They're the ones who know what they're getting for what they spend. "Bring the bill down" and "make sure the bill is earning its keep" are different conversations, and only the second one has much to do with shipping faster.
A $1,200/month developer merging 38 PRs with a cost-per-PR in line with the team isn't a cost problem, it's a productivity story. A $300/month developer merging 9 PRs isn't really a cost question either. It's a question about whether they're getting anything out of tools the team is already paying for.
This topic is going to keep landing in LinkedIn comments and engineering management Slacks for a while. The teams that handle it well are the ones who already learned, from a decade of cloud cost work, that the goal was never to minimize the bill. It was to get more out of each dollar going out the door.
Sign up for a free trial.
Get started with tracking your cloud costs.

