When Your AI Agents Start Creating More Work Than They Save

Nobody warns you about this part.

You build your first agent. It works. You build your second. It works better. By the fifth or sixth, you’ve got a genuine system forming — contract review, vendor onboarding, due diligence, renewal monitoring. Things that used to take hours are happening in minutes. You feel like you’ve cracked something.

And then, quietly, a different problem shows up.

You start your day and there are fourteen agent outputs waiting for you. Detailed analyses. Flagged issues. Recommended actions. Each one thorough. Each one well-structured. Each one demanding your attention.

By Wednesday you’ve got a backlog. Not of contracts or vendor forms. A backlog of agent outputs that you haven’t had time to review.

The agents didn’t slow down. You did.

I’ve built over fifty agents at this point. Contract review agents, due diligence assessors, compliance monitors, document validators, onboarding chasers. I’ve written about the process of building them, and I stand behind it — most agents take twenty minutes to get working and a couple of hours to refine properly.

But there’s a phase that comes after the building phase that I don’t see discussed much. The phase where you realise that an agent with no output constraints will generate near-infinite output. And your attention is not infinite.

Here’s a specific example. I have a review agent. Its job is to assess incoming documents against a set of criteria and produce a summary. When I first built it, I was impressed by how thorough it was. Every document got a detailed, structured analysis. Every clause reviewed. Every deviation noted. Every risk flagged with an explanation.

The problem is that I don’t need all of that for every document. Most documents are fine. They match the playbook. The deviations are minor. What I actually need is a signal: does this need my attention or not?

But the agent didn’t know that. I’d told it to review documents thoroughly. So it reviewed documents thoroughly. Every single one. Regardless of whether the output warranted my time.

I was spending more time reading agent summaries than I would have spent doing the reviews myself. Not because the agent was wrong — the summaries were accurate. But because it was giving me everything when I only needed the exceptions.

That’s the trap. The agent is doing exactly what you told it to do. The problem is what you told it to do.

And it gets worse when agents don’t just produce output — they take action.

I built an executive assistant agent in ClickUp. My task management runs through there, and I wanted help managing blocked tasks. Things that are stuck waiting on someone or something. Simple enough brief: find my blocked tasks and help me unblock them.

Here’s what it started doing. Every time it found a blocked task, it would schedule an “unblock session” meeting. Not just with me. With whoever it thought might be involved. People across the business, getting calendar invites they didn’t ask for, for meetings I hadn’t approved.

I’d open my calendar on Monday morning and there’d be four new meetings in there. All created by the agent. All perfectly logical from the agent’s perspective — there’s a blocked task, here are the people who could unblock it, let’s get them in a room.

But I didn’t want meetings. I wanted visibility. I wanted to know what was stuck so I could decide what to do about it. The agent skipped past my decision entirely and went straight to booking everyone’s time.

Here’s the part I should be honest about: I still haven’t properly fixed it. I know exactly what the instructions need to say. I know the fix would take twenty minutes. But I keep putting it off because the agent isn’t actively breaking anything — it’s just mildly annoying. So it stays on the list, and the occasional rogue calendar invite keeps appearing.

Which is actually the point. The agents that create the most persistent problems aren’t the ones that fail dramatically. They’re the ones that are almost right. Just useful enough that you don’t prioritise fixing them. Just annoying enough that they chip away at your time every week.

The review agent buried me in output. The EA agent started making decisions I hadn’t delegated. Same root cause — I hadn’t been specific enough about what I actually wanted the agent to do versus what I wanted to retain control of. But one was too much information. The other was too much action. And too much action is harder to ignore.

The bottleneck shifts, and it shifts in a way that’s hard to see until you’re in it.

In the beginning, the challenge is getting agents to do useful things. Can this agent actually review a contract? Can it catch the issues I’d catch? Does it understand whose side it’s on? That’s the building phase. That’s where most of the content about AI agents lives — how to build them, how to test them, how to get them working.

But once they work, the challenge becomes something else entirely. Managing what they produce. Keeping your own focus on what actually requires a human decision. Not drowning in the volume of output from the very tools that were supposed to free up your time.

I think of it like hiring. If you hired five junior analysts and told each of them to produce a detailed report on everything they touched, you’d have a mountain of reports within a week. Good reports, probably. But more reports than you could ever read. At some point you’d pull them into a room and say: “Stop telling me everything you found. Tell me what I need to do about it.”

That’s the conversation you need to have with your agents. Except agents don’t learn from a meeting. They learn from instruction changes.

The fix, when I figured it out, was embarrassingly simple.

I restructured the output format. Instead of a full analysis for every document, the agent now produces a one-line status for anything that passes cleanly: “Reviewed. No deviations. No action required.” That’s it. I don’t need three paragraphs explaining that a standard contract met standard criteria.

For documents that actually have issues, the output is still detailed — flagged deviations, recommended actions, context for my decision. The depth is there when it matters. It’s just not there when it doesn’t.

The difference this made was immediate. My morning went from fourteen detailed summaries to review to maybe three that actually needed my attention, plus eleven one-liners I could scan in thirty seconds.

Same agent. Same documents. Different instructions.

I’ve since applied the same principle across most of my agents. The question I now ask before deploying anything is: “What’s the minimum output this agent needs to produce when everything is fine?” Because “everything is fine” is the most common outcome, and if you don’t design for it, your agents will bury you in thoroughness.

There’s a broader point here that I think matters for anyone building agents in procurement or anywhere else.

We talk a lot about what agents can do. We talk about accuracy, about speed, about the workflows they can handle. What we don’t talk about enough is what happens to the human on the other end. The person whose job it is to review what agents produce, make the judgement calls, handle the exceptions.

That person has a finite amount of attention. And attention is the actual bottleneck in an agent-assisted workflow. Not processing power. Not accuracy. Your ability to focus on the things that actually require you.

If your agents are producing output that doesn’t respect that constraint, they’re not saving you time. They’re relocating it. You’ve moved the work from “do the review” to “read the review the agent did.” And if the second task takes as long as the first, you haven’t gained anything. You’ve just changed what the work feels like.

Constraining output isn’t a failure of the agent. It’s maturity. The best-configured agents I have don’t tell me everything they found. They tell me what I need to do about it. And when the answer is nothing, they say so in a single line and move on.

I think there’s a version of this lesson that applies at the team level too, not just the individual level.

When you deploy agents across a procurement team, every person on that team becomes a consumer of agent outputs. If those outputs aren’t constrained, you don’t have a team that’s working faster. You have a team that’s reading more. And reading is not the same as deciding.

The design question isn’t just “can this agent do the work?” It’s “what does this agent’s output do to the person who receives it?” Does it give them clarity? Or does it give them homework?

I got this wrong before I got it right. And I suspect most people building agents will go through the same phase — the excitement of output, followed by the overwhelm of output, followed by the discipline of constraint.

If you’re in the first phase, enjoy it. It’s genuinely exciting.

If you’re in the second phase, you’re not doing it wrong. You’re just ready for the third.

Next week: the other thing I got wrong about AI agents — why speed isn’t actually the advantage everyone thinks it is.

The AI Procurement Blueprint publishes every week. If someone on your team would find this useful, forward it on.

Reply

Avatar

or to participate