Taylor Brooks

More AI Agents Usually Make the Workflow Worse

My current contrarian take on AI workflows is pretty simple. Most people do not need more agents. They need a better control layer. I keep seeing the same pattern. A workflow feels messy, so the answer becomes adding another model, another prompt chain, another autonomous step, another clever routing layer. It looks like progress because the diagram gets more impressive. In practice, the system usually gets harder to trust. That has been the big lesson for me building with AI every day. When a workflow is already shaky, adding more intelligence on top rarely fixes the real problem. It just gives the confusion more places to hide. The real bottleneck is usually coordination Most failures I run into are not about raw model capability. They are about handoffs. A step runs too early. A tool gets the wrong input shape. A model returns something technically valid but useless for the next step. Nobody is totally sure which part is responsible, so the default move is to bolt on another layer and hope the system smooths itself out. That move feels modern. I think it is usually wrong. I wrote recently about AI demos not surviving random Tuesdays. This is the same problem from a different angle. The workflow does not get stronger because it has more moving parts. It gets stronger when the moving parts are easier to inspect. More agents can be a trap I am not anti-agent. I use Claude and ChatGPT constantly. I think agentic workflows are real. I also think a lot of people reach for them before they have earned the complexity. If one agent cannot do useful work inside a clear sequence, five agents probably will not save you. They might make the output look smarter in a demo. They might even improve the happy path. But they also create more state, more retries, more weird edge cases, and more places where responsibility gets blurry. That is a bad trade unless the underlying workflow is already solid. Even Anthropic's guide to building effective agents makes a similar point in practice. Start with simple patterns. Add complexity when it is justified. That advice gets ignored because simple systems do not sound exciting. What I want instead I want a boring control layer. I want to know:what triggered the workflow what each step received what each step produced where a failure happened what should happen next if something breaksThat is the part I trust. For me, that usually means keeping the workflow visible in a repo like GitHub, making the steps explicit, and resisting the urge to hide sloppy process behind smarter prompts. If the sequence is unclear, I try to fix the sequence. If the handoff is weak, I try to fix the handoff. If the output is inconsistent, I try to tighten the contract before I add another model to clean it up. This sounds less ambitious than building a swarm of agents. I think it is actually more ambitious because it forces you to understand the work. The thing I have started watching for When someone shows me an AI workflow now, I am not mostly asking how smart the model is. I am asking whether the control layer makes sense. Can a normal person trace the job from start to finish? Can they tell what failed without turning the whole thing into a detective story? Can they change one step without breaking three others? If the answer is no, I do not think the main problem is model quality. I think the system design is doing too much improvising. That is why I am increasingly skeptical of workflows where the fix for every rough edge is "add another agent." Sometimes the grown-up answer is less magic. Fewer agents. Better boundaries. Clearer steps. More boring control. That is usually the version that survives real work.

AI Demos Don't Survive Random Tuesdays

The biggest gap in AI right now is not intelligence. It's survivability. A lot of AI products look incredible in a demo. Clean prompt. Clean input. Clean output. Everything works. Then the system meets a normal workday, an API changes, a field comes back empty, or someone passes in messy data, and the whole thing starts acting weird. That is the part I care about now. I do not think the best AI workflow is the one that looks smartest on launch day. I think it is the one that still makes sense on a random Tuesday when something breaks and I need to trace what happened. That standard sounds boring, but I think it is the real dividing line between tools people try and tools people keep. Most failures are painfully ordinary The failure mode is usually not some dramatic model collapse. It is smaller than that. A step silently fails. A webhook shape changes. A tool sends back incomplete data. A retry hides the first error. Someone trusts the output a little too quickly. Now the workflow technically ran, but nobody feels good about what it actually did. That is why I keep coming back to simple infrastructure. A visible repo in GitHub. A deployment path in Vercel. Official model docs from OpenAI when I need to check what changed. Not because those tools are magical. Because they make it easier to inspect the system when reality gets messy. I wrote recently about the best AI workflow being the one you can debug. I still think that is true. This is the uglier version of the same lesson. A workflow does not earn trust when it works in perfect conditions. It earns trust when it degrades in a way a normal human can understand. Fancy is cheap, clarity is expensive The market still rewards demos. I get why. Demos are legible. They compress well into clips and screenshots. They make the future feel close. But operators do not live inside demos. We live inside edge cases, handoffs, broken assumptions, weird data, and tasks that need to happen again tomorrow. That changes the bar. For me, a useful AI workflow needs a few things:clear steps observable inputs and outputs failure points I can actually find retry behavior that makes sense enough logging to explain what happened after the factWithout that, the system might still be impressive. I just would not want to depend on it. And dependence is the whole game. The tools that last usually feel less magical This is the weird thing. The workflows I trust most usually look less futuristic than the ones getting the most attention. They are more explicit. More structured. Sometimes a little less slick. But when something goes wrong, they give me a path back to the truth. That matters more than style. The official OpenAI production best practices point toward the same thing: evaluation, monitoring, and reliability work matter once a model is doing real jobs. That is not the fun part of the story, but it is the part that makes systems usable. I think a lot of the current AI wave will sort itself the same way every software market does. First, people reward what looks exciting. Then, over time, they reward what breaks less, wastes less time, and can be understood by the team using it. That second category is where the durable products usually come from. My filter now When I look at any new AI workflow, I basically ask one question: If this thing goes sideways on a busy weekday, can I figure out what happened fast enough to trust it again? If the answer is yes, I am interested. If the answer is no, I do not care how good the demo looked. A lot of AI still gets judged like entertainment. I think the winners will be the systems that feel more like infrastructure.

A Content System That Removes Excuses

Most content systems are too clever. They have a planning board, a capture app, an AI prompt library, a review queue, a repurposing workflow, and five places for drafts to die. That setup looks productive. It also gives you a hundred places to stall. What finally worked for me was cutting the system down until it was almost boring. An idea goes into a markdown file. The post lives in Astro. The repo goes to GitHub. The site ships through Vercel. That's basically it. I still use AI while writing. I'm not pretending otherwise. But the useful part isn't "AI content generation." The useful part is removing friction between having a thought and publishing it. The real enemy is drag When people say they want to post more, what they usually mean is they want to feel more consistent. Consistency is not a motivation problem. It's a drag problem. If publishing requires opening three apps, cleaning up a draft, moving text into a CMS, uploading an image, fixing formatting, and checking whether the slug broke, you will absolutely find a reason to do it tomorrow. That's why I moved toward a simpler publishing setup. I already wrote about why I switched to Astro. The bigger lesson wasn't about frameworks. It was about reducing the number of excuses available to me. My current rule The system should make the next step obvious. For me that means:write in one place publish from the repo keep the frontmatter predictable use one image format avoid any step that needs me to "figure it out again"That last one matters more than people think. A lot of workflow pain comes from re-deciding little things. What's the right metadata format? Where does the image go? Which path does the URL use? Did I call this a tag or a category? Tiny questions, but they add up. If the system answers those questions for me up front, I write more. If it doesn't, I procrastinate while pretending I'm being thoughtful. AI helps, but not where people think The boring truth is that AI is better at compression than commitment. It can help me sharpen an angle, pressure test a claim, or turn a half-formed note into something clearer. That's useful. But AI does not create a publishing habit by itself. The habit comes from having a system where the path from draft to live post is short and repeatable. This is the same reason a lot of "AI workflow" products feel impressive in demos and annoying in real life. They add capability while quietly adding drag. And if your real bottleneck is drag, more capability can make the problem worse. The Astro content collections docs are a good example of the opposite approach. It's just a clean content model. Not flashy. Very little mystery. That kind of simplicity compounds. What I'm optimizing for now I'm not trying to build the most advanced content machine on the internet. I'm trying to build a system I will still use on a random Thursday when I'm busy, distracted, and not particularly inspired. That standard is underrated. A workflow that only works when you're energized is not a workflow. It's a mood. The best systems survive low motivation. They reduce the gap between intention and action until posting feels almost mechanical. That's what I want from tooling now. Less ceremony. Less reinvention. Fewer moving parts. Not more ideas about content. More shipped content.

Governed Execution Beats Raw Model Quality

I think we are getting close to the end of the "best model wins" phase. Model quality still matters. Obviously. But once AI moves from a chat box into a real business process, intelligence stops being the whole game. The thing that matters more is whether the system can do useful work without creating a mess. That is why I think the next wave of valuable AI products will win on governed execution. Not just raw intelligence. Not just benchmark screenshots. Not just who shipped the wildest demo this week. I mean products that can take action inside real constraints. Follow rules. Respect approvals. Show what happened. Escalate when confidence is low. Finish the job without making everyone nervous. That sounds less exciting than "our model is smarter." It is also way more useful. Smart is cheap. Trust is expensive. I use ChatGPT and Claude constantly. The intelligence jump over the last couple years has been real. You can feel it. But when I look at where AI breaks in practice, it is usually not because the model was too dumb. It is because the execution layer was sloppy. The handoff was unclear. The tool call failed quietly. The retry behavior was weird. The system did not know when to stop. Nobody could tell what happened after the fact. The result looked plausible enough to slip through, but not reliable enough to trust. That is not a model problem. That is an operating problem. The hard part is everything around the model The products I trust most are the ones that make constraints visible. They tell me what step is running. They show me what data came in. They make approval points explicit. They log the output. They give me a sane fallback when something gets weird. That kind of product feels very different from a flashy agent demo. A demo says, "look what it can do." A governed product says, "here is what it did, here is why, and here is what happens next if something goes wrong." That second category is where the durable value is going to come from. I wrote yesterday about why the best AI workflow is the one you can debug. This feels like the next layer of the same point. Debuggability matters because most business use cases do not fail from lack of intelligence. They fail from lack of control. If the system cannot be inspected, constrained, and trusted by a normal operator, it is still basically a magic trick. Real businesses buy reliability, not vibes This is the part I think a lot of AI discourse still misses. Buyers do not just want a model that can impress them for five minutes. They want something that can survive procurement, compliance review, internal politics, edge cases, and the random Tuesday afternoon where the input is messy and the stakes are not theoretical. That is why a slightly worse model with strong execution controls can beat a better model with weak operational discipline. If one product is 3 percent smarter but I cannot trust it with approvals, audit trails, retries, or exception handling, that edge does not matter much. The official Anthropic piece on building effective agents makes this pretty clear too. Once you move past toy examples, the work is mostly about orchestration, tool use, evaluation, and guardrails. In other words, the wrapper starts to matter more than the raw IQ. What I think wins from here I think the most useful AI products over the next few years will feel less like chatbots and more like accountable systems. They will still use great models. Of course they will. But the thing users actually pay for will be the governed execution layer around them:clear operating boundaries visible steps and status approvals where they matter logs that explain what happened safe retries and fallback paths sane escalation when confidence dropsThat is not the sexy part of AI. It is the part that turns intelligence into something a business can actually live with. So yeah, I still care about model quality. I just think the bigger product question now is simpler than people want it to be: Would I trust this system to do real work when nobody is standing over it? If yes, that is interesting. If not, I do not care how good the benchmark chart looks.

The Best AI Workflow Is the One You Can Debug

A lot of AI workflows look great right up until they break. That is the part people keep skipping. I keep seeing setups built from a pile of prompts, a few connected tools, and a nice looking demo. Then one API changes, one field comes back weird, one step silently fails, and suddenly the whole thing turns into a ghost story. Nobody knows what happened. Nobody trusts the output. Nobody wants to touch it. That is why I think the best AI workflow is not the fanciest one. It's the one you can debug. If I cannot quickly answer basic questions like "what step failed?", "what data went in?", or "what should have happened next?", I do not think I have a real workflow yet. I think I have a fragile magic trick. This is one reason I still like boring infrastructure. A plain repo in GitHub. A visible automation path in Zapier. Logs I can inspect. Output I can verify. Clean handoffs between steps. Nothing about that sounds sexy. It is still what makes the system usable. The same thing is true with AI-heavy tooling. People get excited about model quality, and I get it. I use Claude and ChatGPT constantly. But once those tools are part of a real workflow, the hard problem stops being "is the model smart?" and becomes "can I trust this system when the inputs get messy?" That trust usually comes from pretty unglamorous things:visible steps clear failure points retry rules that make sense logs that tell the truth outputs that are easy to checkWithout those, the workflow may still work in a demo. It just will not survive contact with normal work. I wrote recently about building a content system that removes excuses. This feels like the same lesson in a different form. The goal is not adding maximum capability. The goal is reducing the number of ways the system can become confusing. That matters more than people think. If an automation saves me 20 minutes a day but takes an hour to untangle every time it drifts, it is not really saving me time. It is borrowing credibility from the future. The teams I think will get the most value from AI are not necessarily the ones with the wildest agent demos. They are the ones willing to build workflows that are inspectable by normal humans. Open the logs. Check the inputs. Trace the path. Fix the broken step. Move on. That sounds obvious, but a lot of software still pushes in the other direction. More abstraction. More hidden logic. More "just trust the system." Even the official guidance from OpenAI's production best practices docs points back to reliability, evaluation, and monitoring. Which makes sense. Once the model is inside real work, operations discipline matters more than hype. So this is the bar I keep coming back to now: If the AI workflow fails on a random Tuesday, can I understand it fast enough to fix it without blowing up my day? If yes, that is interesting. If not, I do not care how impressive the screenshot looked on launch day. The future of AI at work is probably less about magic and more about systems you can actually live with.