The Cross-Cutting Concerns Trap in AI Agent Systems

Every system I’ve ever built eventually teaches me the same lesson, in a slightly different costume. The latest version showed up while building Azure AI agents this year, and it took me longer to recognize than it should have. Here’s the pattern. You start with a clean agent. It calls one tool, the tool returns a response, the agent reasons over it, the workflow proceeds. Beautiful. Then production happens. The first tool starts timing out occasionally. So you wrap it in a retry block. The second tool hits rate limits. So you add backoff. The third tool needs auth tokens that expire. So you add a refresh handler. The fourth tool’s responses sometimes come back malformed. So you add validation. By the time you have eight tools wired up, your agent code is 70% infrastructure babysitting and 30% actual orchestration logic. You notice this and tell yourself it’s fine. The retries work. The auth refreshes. The failures get caught. The system runs. Except it doesn’t run well. Each tool’s retry policy is slightly different because three different engineers wrote them on three different sprints. The auth refresh logic gets duplicated in subtle ways. Rate limit handling is consistent in five places and forgotten in three. Adding a ninth tool means copying the boilerplate from somewhere and hoping you remembered everything. This is the cross-cutting concerns trap, and AI agents are unusually vulnerable to it.

Why Agents Make This Worse

Traditional services have one or two external dependencies. Maybe a database. Maybe one downstream API. The pattern of wrapping calls in resilience logic doesn’t scale up because it doesn’t need to. You handle two dependencies; you don’t have a sprawl problem. Agents are different. A modestly useful agent talks to five to fifteen tools. Each tool is a distinct external dependency with its own failure modes, latency profile, auth model, and rate limits. The agent itself is also stateful in a way most services aren’t, because it’s reasoning across multiple tool calls in a single workflow. This combination means the surface area for cross-cutting concerns explodes. Retry logic, auth handling, rate limiting, observability, cost tracking, error classification, prompt logging, response validation. Every one of these wants a home in the application code, and every one of them is a mistake to put there.

The Symptoms Show Up Slowly

The painful part of this trap is that it doesn’t fail loudly at first. The agent works. It handles failures. The team feels productive. The codebase is just a little messier than last sprint, and a little messier than the sprint before that. Then one of these things happens: A new engineer joins the team and asks how retries work. The answer is “depends on the tool.” They look at you funny. That’s the first signal. A tool changes its rate limit policy. You realize you have to update three different places, and you’re not sure if you got them all. That’s the second signal. Production has an incident where a single misbehaving tool cascades into a multi-tool agent freeze. You realize none of the resilience logic actually isolated failures the way you assumed. That’s the third signal. By the time you have all three signals, you’re months into the trap and the refactor is going to be a real effort.

The Pattern That Actually Works

The fix is the same pattern enterprise architecture has been teaching for two decades, but it bears repeating because every new generation of systems forgets it. Cross-cutting concerns belong at the platform layer, not the application layer. For Azure AI agents this means putting your tool calls behind something that can apply policies uniformly. Azure API Management is the obvious choice if you’re already in that ecosystem. The agent calls APIM. APIM handles retries with exponential backoff. APIM enforces rate limits per agent identity. APIM rotates auth credentials. APIM logs every call to Application Insights. The agent code goes back to doing what it should do, which is reason about the response, not handle the wire. If you’re not on APIM, the same idea applies. A custom middleware layer in front of your tool calls. A sidecar pattern. A gateway. The implementation is less important than the boundary. Whatever you choose, it has one job, which is to make tool calls a uniform abstraction from the agent’s perspective. The test for whether you’ve drawn the boundary correctly is simple. Can the agent code treat every tool call the same way? Can it ignore the question of whether a particular call needs retries, or rate limit awareness, or auth refresh? If yes, you’ve moved the right things to the platform. If no, you have more lifting to do.

What Changes After You Refactor ?

The first thing that changes is the agent file gets dramatically smaller. The second thing is that adding a new tool stops being a multi-day effort. The third thing is that observability gets coherent. Every call goes through the same path, so every call shows up in monitoring the same way. You can answer questions like “which tool failed most this week” or “what’s our token cost per agent” without writing custom queries. The thing that surprised me most was the effect on prompt iteration. When the tool layer is reliable and uniform, you can actually iterate on agent reasoning without your changes interacting unpredictably with infrastructure failures. The signal-to-noise ratio in agent debugging goes up by an order of magnitude.

The Broader Lesson

Every cross-cutting concern in an agent system wants to live in the agent code, and almost none of them belong there. Auth, rate limiting, retries, logging, cost tracking, observability. These look like agent concerns when you’re writing the first version. They reveal themselves to be platform concerns once you have more than two agents and more than five tools. The signal that you’re in the trap is when adding tools feels harder than it should. The fix is to step back, identify which concerns are repeating across tools, and pull them out. Your agent code should describe what to do, not how to deal with the world being unreliable. If you’re building agent workflows in production, this is one of the patterns that pays for itself many times over. The earlier you draw the boundary, the less code you have to refactor later.

I’m a Solution Architect at IBM working on Agentic AI orchestration and MCP integrations. Writing about the patterns that actually work in production and the ones that don’t.