Rethinking Agent Costs at Scale

Laying in bed this morning with a headache, scrolling through the pricing updates from Anthropic, and honestly it got me thinking about how we structure costs for our customers.

We're at the point where we're onboarding our 30th customer on the agent platform, and the unit economics are finally starting to feel real. Input tokens, output tokens, api calls, everything stacks up fast. When Claude's pricing shifts, even incrementally, it ripples down to what we can charge without killing margins.

So I built out a simple spreadsheet to model different pricing structures. Nothing fancy, just scenarios: per-agent-per-month, per-api-call, hybrid models, everything. The problem is that none of them feel quite right yet. We're still in that awkward phase where we're not sure if we're selling a feature, a tool, or a full platform.

What I realized is that our current pricing is almost arbitrary. We're doing per-agent-per-month at $99, but that's really just a guess wrapped in a number. It doesn't account for how much Claude we're actually burning on each customer's workload. Some agents are chatbots doing light inference. Others are running 50 api calls a day with long context windows. We're subsidizing the heavy users with margin from the light ones.

I threw together an Airtable base to track actual token usage per customer for the last month. Took maybe an hour. Attached formulas to pull in what our costs actually are, what we're charging, and the gap. It's... illuminating. Not terrible, but definitely not optimized.

The thing is, we can't just pass costs through to customers 1:1. That's not a business, that's a commoditized API wrapper. But we also can't stay blind to what we're actually spending. The Anthropic pricing change just crystallized that.

I'm thinking about moving to a hybrid model. Base fee for the agent setup and management, then a token budget that resets monthly. You get X million input tokens, Y million output tokens included. Go over and you pay per 1M. Keeps customers thinking about efficiency, gives us predictability, and it actually scales with how much compute they're using.

Tbh, I'm also realizing we need to be smarter about which customers get onboarded where. Some of these use cases are better served by lighter weight tools or just raw API access. We've been saying yes to everyone, and that's eating into margin.

If anyone else is scaling an agent or AI product and has dealt with this, I'd love to hear how you landed on pricing. Are you tracking actual usage per customer? How do you account for model API shifts? And more specifically, how do you have the conversation with customers about costs without sounding like you're nickel-and-diming them?

Right now my head is still foggy, but I'm pretty sure this needs to be sorted before we hit 50 customers. Shipping fast is good until your unit economics break.

답글 (7)

Mike1d ago

lol the per-agent-per-month guess wrapped in a number hits different. mine's completely arbitrary too

Tobi1d ago

We track everything in Postgres now, but honestly the harder part is just accepting that some customers will always be more profitable than others. The hybrid model sounds solid though. Question: how do you handle the conversation when someone's suddenly hitting overages monthly? That's where it gets awkward for us.

Harshad1d ago

yeah the overage conversation is brutal. we just started tracking per-customer token spend last month and half my customers are suddenly confused about their bills

zoeyx1d ago

lol the overage conversation is genuinely where business meets reality and everyone's suddenly very interested in reading the fine print

Lucia Vargas16h ago

The language thing is real. I'm building for Spanish speakers in LATAM and the variance in voice model quality across dialects is basically the same problem, just upstream.

Khanh Tran1d ago

This is exactly where I'm getting stuck with my Vietnamese coding assistant too. Token costs vary wildly depending on language, and I have no idea if my pricing even makes sense yet.

Kenji12h ago

The language variance thing is brutal. I'm running into similar problems with Japanese on discord bots, some customers' workloads are 3x the token cost of English and there's no clean way to price that without looking

로그인하여 답글쓰기