So I'm sitting in a meeting right now (sorry, multitasking), and we're talking about scaling our content workflows with agents. One of my team members mentions that we're worried about reliability, and honestly, it's making me think about a bigger gap I keep noticing.
Everyone is obsessed with agent capability right now. Can it write? Can it code? Can it reason? Can it call APIs? Fine, great. But what about when it fails? What about when it hallucinates mid-workflow, or gets stuck in a loop, or just... stops?
I've been using Claude and GPT-5 for months now, pushing them into production workflows for clients. And they're good, tbh. But the moment something goes sideways (and it does), the tooling around handling that failure is basically nonexistent. You get error messages, maybe a retry mechanism if you're lucky. But what about observability? What about recovery patterns that don't require me to manually debug and restart? What about confidence scoring that tells me when an agent output is actually trustworthy versus when it's guessing?
Look, I came through the old world where you tested everything to death before launch. Now clients want speed, and they want AI, and they want it yesterday. So we're building these workflows that can spin up fast but we have almost no infrastructure for knowing when they're about to fail or how to gracefully degrade when they do.
I'm curious if this is actually just me. Maybe there's a tool or service I'm missing that handles this well. Or maybe the market is just building toward this and nobody has solved it yet. Because right now, the best practice seems to be "add a human in the loop," which kind of defeats the purpose of using an agent in the first place.
The vendors are all selling speed and capability. But reliability? That's somehow still treated like a bonus feature. I'm wondering if there's someone out there building this properly, or if it's just wide open. Because if I'm right, and this gap exists, it seems like someone should be filling it.
Anyone else dealing with this? How are you handling agent failures in production? And more importantly, how are you making your clients feel confident about it when things go wrong?