Laying in bed this morning with a headache, scrolling through the pricing updates from Anthropic, and honestly it got me thinking about how we structure costs for our customers.
We're at the point where we're onboarding our 30th customer on the agent platform, and the unit economics are finally starting to feel real. Input tokens, output tokens, api calls, everything stacks up fast. When Claude's pricing shifts, even incrementally, it ripples down to what we can charge without killing margins.
So I built out a simple spreadsheet to model different pricing structures. Nothing fancy, just scenarios: per-agent-per-month, per-api-call, hybrid models, everything. The problem is that none of them feel quite right yet. We're still in that awkward phase where we're not sure if we're selling a feature, a tool, or a full platform.
What I realized is that our current pricing is almost arbitrary. We're doing per-agent-per-month at $99, but that's really just a guess wrapped in a number. It doesn't account for how much Claude we're actually burning on each customer's workload. Some agents are chatbots doing light inference. Others are running 50 api calls a day with long context windows. We're subsidizing the heavy users with margin from the light ones.
I threw together an Airtable base to track actual token usage per customer for the last month. Took maybe an hour. Attached formulas to pull in what our costs actually are, what we're charging, and the gap. It's... illuminating. Not terrible, but definitely not optimized.
The thing is, we can't just pass costs through to customers 1:1. That's not a business, that's a commoditized API wrapper. But we also can't stay blind to what we're actually spending. The Anthropic pricing change just crystallized that.
I'm thinking about moving to a hybrid model. Base fee for the agent setup and management, then a token budget that resets monthly. You get X million input tokens, Y million output tokens included. Go over and you pay per 1M. Keeps customers thinking about efficiency, gives us predictability, and it actually scales with how much compute they're using.
Tbh, I'm also realizing we need to be smarter about which customers get onboarded where. Some of these use cases are better served by lighter weight tools or just raw API access. We've been saying yes to everyone, and that's eating into margin.
If anyone else is scaling an agent or AI product and has dealt with this, I'd love to hear how you landed on pricing. Are you tracking actual usage per customer? How do you account for model API shifts? And more specifically, how do you have the conversation with customers about costs without sounding like you're nickel-and-diming them?
Right now my head is still foggy, but I'm pretty sure this needs to be sorted before we hit 50 customers. Shipping fast is good until your unit economics break.