Not Every Task Deserves One

Posted on April 17, 2025 by b0uvd

Hello CyberBuilders 🖖

This week, I’m diving into a brilliant video from Mr. Zhang at Anthropic. You might know him as the guy behind that must-read post on what AI workflows and agents are. If you haven’t seen the video yet, I’ve dropped the link at the end of the post —it’s worth every minute.

Zhang talks about building AI agents but takes a different approach from the thousands of social media posts you’ve seen on the topic. He shows how to think effectively about it. As I listened, I noticed something: the buzz in our cybersecurity circles doesn’t always align with how the AI world thinks.

In this post, I’ll explain Zhang’s framework and apply it directly to cybersecurity use cases. Check out the YouTube video at the end of the post.

A breakdown of Barry Zhang’s agent checklist—adapted for real-world cybersecurity use cases
How coding and customer support nailed agent design (and why they work)
A reality check for SOC alert triage agents: where they shine and where they still fall short
A clear conclusion: building agents in 2025 is all about reliability and UX, not about building AI

Zhang says that you should slow down before you fall in love with building an AI agent. Ask four hard questions. This simple checklist helps you decide if what you’re working on needs an agent or if you’re better off with a good old-fashioned workflow.

Here’s the rundown:

1️⃣ Is the task complex enough?
If it’s not, don’t overengineer it. Stick with a script or a simple workflow. Recognize that most applications are not “agents”; they are workflows that can be powered by LLM calls to generate content (text, images, etc.). You save agents for the multi-step, branching mess tasks that a playbook can’t handle, and where an “agent” can decide which actions to take.

2️⃣ Is the task valuable enough?
Don’t waste compute or mental energy on an agent if the task is worth pennies. Zhang draws the line at $1. Think about that. If it doesn’t save or generate at least a buck per run, workflows win. I would have set a higher cost.

3️⃣ Are all parts of the task doable?
Here, be honest and don’t expect AI magic to happen. If parts of your task are still fuzzy or impossible to automate, don’t force it. Cut the scope until you’ve got something clean that an agent can handle from start to finish.

4️⃣ What’s the cost of error?
Most cybersecurity professionals need to pay attention to this. If a mistake escalates, you’re not building an agent—you’re building a read-only assistant or something with a human in the loop. A low-cost error means that the agent’s output, the result of what he has “generated,” would be reviewed, modified, or changed by a human user.

Suddenly, “AI agent” doesn’t sound so sexy anymore—unless your use case deserves one.

Zhang gives a perfect example: coding. It sounds simple, but it hits every checkpoint for being agent-worthy.

Complexity: You’re juggling dozens of steps from design doc to pull request. Planning, implementation, formatting, tests, and review. This is the perfect playground for an agent.
Value: Time is money. And developers are expensive. Every hour saved is easily worth more than $1.
Viability: Any strong LLM can already code decently. It’s doable today.
Cost of Error: This is the kicker. You’ve got guardrails: unit tests, CI pipelines, and code review. Even if an agent makes a mistake, you’ll catch it fast.

You already know the flow if you use tools like Cursor or GitHub Copilot. The agent generates a pull request or code snippet, follows your project style, and cites relevant APIs.

You can also compare it to another classic: customer support agents. You’ve lived this one and are stuck on the phone, waiting forever. When an AI agent can read a support policy, understand your email, and draft an answer or action (like opening an RMA or issuing a coupon), an AI agent saves a lot of time.

The customer knows it’s an AI. They’re okay with that—if the answer is fast and valuable. If not? They will escalate. There’s always a human supervisor behind the curtain, ready to step in. That’s a healthy agent loop.

Take a well-known use case: alert triage in a Security Operations Center (SOC). Everyone wants to automate it, and vendors love calling this an “AI agent.” But does it check the boxes?

Let’s run it through Zhang’s agent checklist.

Is the task complex enough?
Definitely. Triage isn’t just “look at alert, click a button.” It’s correlating logs, checking asset context, pulling threat intel, reading playbooks, and sometimes asking clarifying questions. That’s complexity. ✅
Is the task valuable enough?
A junior analyst can easily burn 15–30 minutes per alert. Multiply that by thousands per day. You do the math. This isn’t a $0.10 workflow. It’s a real cost sink—and a real opportunity. ✅
Are all parts of the task doable?
Some parts—like log gathering or enrichment—are straightforward. But judgment calls? Pattern recognition? Still tricky. If your agent needs to call for help 50% of the time, it’s not ready yet. Either scope it tighter, or you’re building a glorified automation script. ⚠️
What’s the cost of error?
Big. You’re in trouble if an agent mislabels a true positive as a false alarm. So, unless you’ve set tight confidence thresholds, implemented audit logs, or established a human-in-the-loop setup… It’s wiser to be cautious here. ⚠️

Alert triage can be a good candidate for an agent, but only if you treat it like serious engineering, not a marketing term, or not a chatbot on top of your SIEM.

Not at all. But here’s the catch: security only becomes a good agent use case when you design around the limits of automation, not despite them.

That’s precisely what the Google Security team showed off at the last Google Cloud Next. They didn’t just slap “agent” on a workflow. They used an agentic design deeply integrated with UX tools to group related alerts, pull from multiple sources, and reduce the volume of noise analysts usually face.

I found it quite a nice collaboration between humans and agents. Here’s why:

The agent doesn’t determine all actions. It does take immediate responses, such as quarantine, but also suggests investigation-related next steps.
In such scenarios, the cost of error remains low. If something looks off, you can degroup, dig deeper, or remove endpoints from quarantine.
The value is clear: it saves time and improves focus.

That’s a textbook agent pattern—what Zhang would call a low-risk, high-complexity task with a human in the loop.

If you’re building an agent in 2025, you’re not doing AI research. You’re not training a foundation model. That ship has sailed and belongs to the few labs that can burn hundreds of millions of dollars training LLMs.

But here’s the twist: you still have a considerable role. And it’s not about the LLM model. It’s about the agent—the system that turns that LLM into something useful.

Reliability. Can you make the agent behave the same way, every time, in production, without random hallucinations or breakdowns? This isn’t a chatbot demo. This is real workflow support.

User Experience. Do you understand what humans need? What slows them down? What’s the job to be done—and how can the agent fit into that flow, without pretending to replace the human?

That’s what the Chronicle team did.
That’s what customer support agents get right.
That’s why coding agents are thriving.
And that’s the litmus test for your cybersecurity use case.

See you in the next post!

Laurent 💚

Not Every Task Deserves One

Leave a Reply Cancel reply

Recent Posts