Agent Fatigue Is a System Design Problem

Agent Fatigue Is a System Design Problem

Steve Yegge's "AI Vampire" piece, which Simon Willison surfaced last week, describes something real. People who go deep on AI tooling exhaust themselves. Yegge describes working at 10x productivity, capturing none of the value, needing more sleep. He frames it as an economic problem: early adopters generate surplus that employers extract. His prescription is a pace limit. Four hours of agent work per day.

The symptom is right. I think the diagnosis is incomplete.

The exhaustion comes from building and operating inside systems that produce faster than any human can consume. Yegge is experiencing the absence of backpressure, and calling it burnout.

A Pipeline With No Off Switch

I run a content pipeline called Distill that transforms coding sessions into journal entries, blog posts, and social content. Last week I ran it end-to-end for the first time as a production system rather than a development target. Twenty sessions fired. Most of them were the pipeline's own subprocess calls: entity extraction, blog synthesis, social adaptation. Distill analyzing Distill sessions. Synthesizing journal entries about the very code that generates journal entries.

The loop has no natural stopping point. Every output creates input for the next cycle. Every synthesis generates material that the next synthesis can draw from. The pipeline will run until you stop it, and stopping feels like falling behind because the queue never drains.

Traditional software systems have structural answers for this. A message queue fills up and producers block. A database connection pool exhausts and requests wait. A rate limiter throttles throughput to match what's downstream. These are not features that someone adds later. They are built into the architecture because without them the system overwhelms its consumers.

Agent pipelines have no equivalent. The LLM will generate as much content as you ask for. The orchestrator will dispatch as many agents as you configure. The social publishers will produce LinkedIn posts, tweets, and Slack messages for every artifact in the pipeline. Nothing in the system asks whether the human has read yesterday's output yet.

Consumption Debt

I found this in my own work. For five consecutive days, I deferred reading the pipeline's output while building more infrastructure on top of it. Five days of unreviewed journal entries, blog posts, and social previews piling up in a directory I checked sporadically. The pipeline was functioning perfectly. I was the bottleneck, and I was too busy extending the pipeline to notice.

I started thinking of this as consumption debt: the gap between production rate and review rate. It compounds the same way technical debt does. Unreviewed output doesn't just accumulate; it degrades your confidence in the pipeline's quality. Lower confidence makes you less likely to publish. Less publishing makes the pile grow. The cycle reinforces itself.

Yegge's four-hour limit is a human-side circuit breaker. It works. But it addresses the load on the person instead of the design of the system. A fire hose pointed at your face is still a fire hose even if you only stand in front of it for four hours.

Where Backpressure Belongs

The fix is simple enough that I'm embarrassed I haven't built it yet.

A content pipeline needs a consumption gate: a mechanism that tracks whether previous output has been reviewed before generating new output. Not approved, necessarily. Just seen. Approval is a quality judgment that takes time. Acknowledgment is a liveness signal that takes seconds.

When the blog synthesizer generates a weekly post, the post enters a "pending review" state. The next pipeline run checks that state. If the previous post is still pending, the pipeline can skip blog generation for this cycle, or flag the output as "stacked," or notify the human that the queue depth has exceeded a threshold.

None of this requires AI. It requires a counter and a conditional. The reason these mechanisms don't exist in most agent pipelines is that builders optimize for throughput. We measure how much the system produces. We should also measure how much the human absorbs.

The Parallelization Trap

There is a second structural contributor to the fatigue that Yegge describes.

My social publishing pipeline runs three platform adaptations sequentially: LinkedIn, Twitter, Slack. Each gets its own Claude subprocess call. The obvious optimization is to parallelize them. They are independent. No shared state. Pure throughput gain.

But parallelizing production without parallelizing consumption makes things worse, not better. Three social posts generated simultaneously means three posts to review simultaneously. The generation wall-clock time drops. The review wall-clock time holds steady or increases, because switching between platform-specific formats is more expensive than reading them in sequence.

This is the structural version of what Yegge calls 10x productivity. The system produces at 10x. The human evaluates at 1x. The difference accumulates as fatigue, and no amount of discipline closes the gap because the gap is architectural.

Circuit Breakers for Agent Systems

Distributed systems solved this decades ago. Netflix's Hystrix pattern. Michael Nygard's stability patterns in Release It!. Erlang's let-it-crash supervisors. The shared principle: systems need mechanisms that degrade gracefully when downstream consumers fall behind.

For agent pipelines, the practical equivalents look like this:

Queue depth limits. If unreviewed output exceeds N items, pause production. The simplest form of backpressure, and the one that prevents the pile from becoming insurmountable.

Digest compression. When the queue backs up, synthesize the backed-up items into a single summary rather than presenting them individually. Five unreviewed blog posts become one digest with the key themes and decisions that need attention. This matches how humans actually process information when behind: skim for decisions, not details.

Cadence matching. Set production frequency to consumption capacity, not generation capacity. If you review content once a day, the pipeline should produce once a day. If you review twice a week, produce twice a week. The LLM can generate every hour. That does not mean it should.

Explicit stopping points. Every pipeline run should produce a summary of what it generated and what it expects from the human. Not files in a directory. An action list: "Review these 3 posts. Approve or edit. 2 posts from yesterday still pending." Make the consumption obligation visible.

The Self-Referential Problem

There is a special case of pipeline fatigue that Yegge does not address, probably because he has not built it yet: pipelines that consume their own output.

My pipeline analyzes its own sessions. The journal entry about building the brainstorm feature becomes input for the blog post about content pipelines. That blog post becomes a session log that feeds the next journal entry. Each layer adds synthesis, but also adds volume. The system is a positive feedback loop with no damping.

Without damping, the self-referential loop amplifies noise faster than signal. The journal entry about the pipeline mentions the pipeline twelve times. The blog post synthesized from that journal doubles down on the self-reference. By the third cycle, you are generating content about generating content about generating content, and the original signal, what you actually built and what you actually learned, is buried under meta-commentary.

The fix here is a topic filter at the ingestion boundary. Sessions where the primary project is the content pipeline itself get tagged as "infrastructure" and excluded from blog synthesis unless they contain insights relevant to an outside reader. This is editorial judgment applied at the system level. It requires a human decision about which self-referential content is interesting to others and which is navel-gazing.

I have not built this filter yet. That I have not is, itself, a symptom of the problem. I have been too busy generating to think about what should be filtered.

What Yegge Gets Right

The four-hour limit captures something important. The exhaustion is real. It does not come from laziness, and it will not go away with practice. You do not build endurance for consuming infinite output. You build systems that produce the right amount of output for the attention you have available.

Where I part ways with his framing is on where the fix lives. Telling people to work less is a valid personal strategy. Building systems that produce less when the human falls behind is a durable architectural one. The first requires discipline from every person who touches the system. The second requires one engineer to add a counter and a conditional.

The AI is a fire hose. The question is whether you build a valve or just learn to drink faster.