Token Budgeting: How To Think About AI Cost Control

AI costs are exploding. Engineering teams have rightfully been pushed to adopt AI as quickly as possible to ensure that they can continue to compete in their respective markets. Leaders in the industry are rightfully calling out this problem and giving some high-level recommendations for how to think about managing costs.

How should your company think about token budgeting? How do you determine if your engineers are spending efficiently or not? What is token budgeting? This blog post will talk through all of these topics and enable you to run your organization more efficiently as we lean into the AI era.

What is Token Budgeting and Token Allocation?

Token Budgeting, in essence, is simple: what is the “right” amount of money to allocate to a company, team, or developer. We’re all used to budgeting for other parts of our infrastructure, like cloud and SaaS - but those categories were easier to plan for. Infrastructure is largely mapped to end-customer demand and typically hits COGS in a way that’s easier to plan for. SaaS contracts are usually per-seat or fixed, flat annual fees that can be forecasted, negotiated and planned for.

Similar to paying a bulk AWS or Datadog bill, paying an OpenAI, Anthropic or Cursor bill in bulk usually causes people to ask: What’s driving this much spend? Is this good spend or bad spend?

Token bills need to be broken down like anything else. Token costs are also massively being driven by R&D (i.e., your engineers building new features) vs things that hit COGS (the cost of model usage when serving the product). Whether you’re using Anthropic, OpenAI, Cursor, or one of the other providers in the space, the first step is to break down or allocate your costs down to the most granular functions you can. Separating R&D from COGS is an easy first step, and usually, companies will handle this with delineating API Keys.

COGS Token Allocation

COGS Token Allocation is the trickier, larger-data scale problem. API Keys are going to get overloaded quickly, and you’ll need to fall back on logging metadata on a per-request basis. Metadata can ultimately be leveraged to virtually tag and allocate various COGS costs by service, team, agent, or function.

API gateways like OpenRouter and LiteLLM also have ways to record and export this metadata that can ultimately be used by your own data-warehousing or cost management tool of choice. COGS is usually the easier part of the token budgeting equation. Once you get to the point where you’re recording the data and allocating it accordingly, you can map that to demand. Companies will usually look at the total cost of ownership for the full stack, define a Cost Report for that, and assign a budget in full across all providers (i.e., AWS, Anthropic, Datadog, etc), then ensure they’re on target with that budget.

R&D Token Allocation

Now on to the messier part: R&D Token Allocation. The allocation portion of this is usually pretty easy. Anthropic, OpenAI, and Cursor all provide per-user costs for the most part. However, it quickly becomes opaque the second you look to understand these costs per developer. There can be a massive disparity in token spend per developer. How do you know which developer is performing and which one isn’t? How do you set a budget on a per-developer basis? All of these questions require you to define an efficiency metric. And the only way to do that is to bring in other data.

Token Unit Costs: Define Your Efficiency Metric

I recently spoke to the CTO of an organization with thousands of engineers that spends 8 figures on AI annually. His quote stuck with me: “I want to see my top developers by token spend, but then I don’t know if I should promote them or fire them.” I chuckled for a moment, and we immediately began talking through the importance of defining unit costs.

Looking at cost alone in isolation isn’t enough. You need to define what your efficiency metric is. A few examples are below:

Number of features shipped
Number of pull requests opened (and merged!)
Number of Linear issues closed
Number of JIRA issues closed

When you have your efficiency metric defined, unit costs are easy to uncover. Let’s use Linear issues as the efficiency metric:

Developer	Token Spend MTD	Linear Issued Closed	Unit Cost
Linda	$1200	12	$100
Brian	$700	3	$233
Stephanie	$500	4	$125

If we looked at just the token spend, Linda might look like she’s the biggest offender. That being said, when taking into consideration the number of issues she closed, she’d actually be the most efficient of our three developers. Brian might look like he’s in the middle of the road, but he’s actually the least efficient. By defining your efficiency metric, you can start to benchmark across your organization and get clarity with unit costs.

Counterintuitively, you may want to give Linda a far bigger token budget (or remove the cap entirely) because she’s shipping faster than anyone else. Imagine if leaderboards at companies were ranked by your efficiency metric vs just cost - you’d likely see a set of entirely different (and better!) behavior.

How Do I Set AI Budgets for Developers?

Daniel Gross recently had an interesting interview on this topic and said the following: “I think the closest analogy that I have is we are all kinds of portfolio managers in a hedge fund. And every IC you have is running a strategy, and you have to decide how much budget you're going to allocate to their strategies; in some sense, they're going to do better with it than without it... I think it's now a game basically of: ’how far can an individual get with a certain token budget?’”

Every company is going to need to think about its budgeting approach, and there are two methods we see across our customer base: a finance-driven budget or a developer-influenced budget. A finance budget is simple - and ignores basically all context. You set an arbitrary number per developer (i.e., $2,000 per month) and multiply that across the number of developers, and voila, you’ve got your budget. The benefit here is that you have a basic way to budget, forecast, and augment moving forward.

A developer-influenced budget starts with observing developer usage for a set period of time that’s both uncapped and undisclosed (we recommend 1-2 months) to understand the usage baseline across the organization. By monitoring usage and associated unit costs on whatever your efficiency metric is, you can get a pretty good grasp on what your median spend should be - at least see what’s possible. You can then roll out a blanket budget across your organization once seeing the usage.

Dynamic Token Budgeting for Top Performers

The more advanced companies I’m speaking with are giving dynamic token budgets for top performers. The easiest thing to do here is also just to remove the token budget entirely. Dynamic budgets change as your developers' habits change. For example, if your developer’s unit cost begins to go down, you can give them more in absolute budget. The idea here is that you shouldn’t restrict your top performers and actually should give them more budget to ship faster.

Our more advanced users will query this unit cost information via API or MCP, then update budgets accordingly on a dynamic basis. It makes sense too: you incentivize behavior that’s good for the company. Developers (or agents) will naturally get smart about which model they’re using and how they’re getting work done to improve their efficiency.

Why AI Cost Visibility Improves Spending Behavior

Regardless of what method you employ, merely giving teams, managers, and individuals visibility on their progress towards their budget is important. In the same way, if you have a weight-loss goal, you need to weigh yourself on the scale regularly to guide you towards your goal; you need to have a system of record for cost management for developers to be made aware of what costs they’re driving.

The good news is that this has never been easier. Integrating cost data into the IDE where work is being done is as easy as popping in an MCP. [Warning…intentional shill: this is exactly why Vantage exists. Let us know if you want to discuss this topic]

Conclusion

We’re in the early innings of cost management for AI. The most important thing you do is educate yourself on the topics - and if you’ve made it this far in the blog post, chances are you’ve got a pretty good start. Whether you decide to build this in-house or leverage a vendor like Vantage, getting some form of visibility and driving a FinOps culture is the easy first path.

We’ll be publishing more on the topic of token economics as the market shifts. If you have any thoughts on this, feel free to share on LinkedIn or X and tag us accordingly.