Not Every Task Deserves One
Hello CyberBuilders đ
This week, Iâm diving into a brilliant video from Mr. Zhang at Anthropic. You might know him as the guy behind that must-read post on what AI workflows and agents are. If you havenât seen the video yet, Iâve dropped the link at the end of the post âitâs worth every minute.
Zhang talks about building AI agents but takes a different approach from the thousands of social media posts youâve seen on the topic. He shows how to think effectively about it. As I listened, I noticed something: the buzz in our cybersecurity circles doesnât always align with how the AI world thinks.
In this post, Iâll explain Zhangâs framework and apply it directly to cybersecurity use cases. Check out the YouTube video at the end of the post.
-
A breakdown of Barry Zhangâs agent checklistâadapted for real-world cybersecurity use cases
-
How coding and customer support nailed agent design (and why they work)
-
A reality check for SOC alert triage agents: where they shine and where they still fall short
-
A clear conclusion: building agents in 2025 is all about reliability and UX, not about building AI
Zhang says that you should slow down before you fall in love with building an AI agent. Ask four hard questions. This simple checklist helps you decide if what youâre working on needs an agent or if youâre better off with a good old-fashioned workflow.
Hereâs the rundown:
1ď¸âŁ Is the task complex enough?
If itâs not, donât overengineer it. Stick with a script or a simple workflow. Recognize that most applications are not âagentsâ; they are workflows that can be powered by LLM calls to generate content (text, images, etc.). You save agents for the multi-step, branching mess tasks that a playbook canât handle, and where an âagentâ can decide which actions to take.
2ď¸âŁ Is the task valuable enough?
Donât waste compute or mental energy on an agent if the task is worth pennies. Zhang draws the line at $1. Think about that. If it doesnât save or generate at least a buck per run, workflows win. I would have set a higher cost.
3ď¸âŁ Are all parts of the task doable?
Here, be honest and donât expect AI magic to happen. If parts of your task are still fuzzy or impossible to automate, donât force it. Cut the scope until youâve got something clean that an agent can handle from start to finish.
4ď¸âŁ Whatâs the cost of error?
Most cybersecurity professionals need to pay attention to this. If a mistake escalates, youâre not building an agentâyouâre building a read-only assistant or something with a human in the loop. A low-cost error means that the agent’s output, the result of what he has âgenerated,â would be reviewed, modified, or changed by a human user.
Suddenly, âAI agentâ doesnât sound so sexy anymoreâunless your use case deserves one.
Zhang gives a perfect example: coding. It sounds simple, but it hits every checkpoint for being agent-worthy.
-
Complexity: You’re juggling dozens of steps from design doc to pull request. Planning, implementation, formatting, tests, and review. This is the perfect playground for an agent.
-
Value: Time is money. And developers are expensive. Every hour saved is easily worth more than $1.
-
Viability: Any strong LLM can already code decently. Itâs doable today.
-
Cost of Error: This is the kicker. Youâve got guardrails: unit tests, CI pipelines, and code review. Even if an agent makes a mistake, youâll catch it fast.
You already know the flow if you use tools like Cursor or GitHub Copilot. The agent generates a pull request or code snippet, follows your project style, and cites relevant APIs.
You can also compare it to another classic:Â customer support agents. Youâve lived this one and are stuck on the phone, waiting forever. When an AI agent can read a support policy, understand your email, and draft an answer or action (like opening an RMA or issuing a coupon), an AI agent saves a lot of time.
The customer knows itâs an AI. Theyâre okay with thatâif the answer is fast and valuable. If not? They will escalate. Thereâs always a human supervisor behind the curtain, ready to step in. Thatâs a healthy agent loop.
Take a well-known use case: alert triage in a Security Operations Center (SOC). Everyone wants to automate it, and vendors love calling this an âAI agent.â But does it check the boxes?
Letâs run it through Zhangâs agent checklist.
-
Is the task complex enough?
Definitely. Triage isnât just âlook at alert, click a button.â Itâs correlating logs, checking asset context, pulling threat intel, reading playbooks, and sometimes asking clarifying questions. Thatâs complexity. â -
Is the task valuable enough?
A junior analyst can easily burn 15â30 minutes per alert. Multiply that by thousands per day. You do the math. This isnât a $0.10 workflow. Itâs a real cost sinkâand a real opportunity. â -
Are all parts of the task doable?
Some partsâlike log gathering or enrichmentâare straightforward. But judgment calls? Pattern recognition? Still tricky. If your agent needs to call for help 50% of the time, itâs not ready yet. Either scope it tighter, or youâre building a glorified automation script. â ď¸ -
Whatâs the cost of error?
Big. You’re in trouble if an agent mislabels a true positive as a false alarm. So, unless youâve set tight confidence thresholds, implemented audit logs, or established a human-in-the-loop setup⌠It’s wiser to be cautious here. â ď¸
Alert triage can be a good candidate for an agent, but only if you treat it like serious engineering, not a marketing term, or not a chatbot on top of your SIEM.
Not at all. But hereâs the catch: security only becomes a good agent use case when you design around the limits of automation, not despite them.
Thatâs precisely what the Google Security team showed off at the last Google Cloud Next. They didnât just slap âagentâ on a workflow. They used an agentic design deeply integrated with UX tools to group related alerts, pull from multiple sources, and reduce the volume of noise analysts usually face.
I found it quite a nice collaboration between humans and agents. Hereâs why:
-
The agent doesnât determine all actions. It does take immediate responses, such as quarantine, but also suggests investigation-related next steps.
-
In such scenarios, the cost of error remains low. If something looks off, you can degroup, dig deeper, or remove endpoints from quarantine.
-
The value is clear: it saves time and improves focus.
Thatâs a textbook agent patternâwhat Zhang would call a low-risk, high-complexity task with a human in the loop.
If youâre building an agent in 2025, youâre not doing AI research. Youâre not training a foundation model. That ship has sailed and belongs to the few labs that can burn hundreds of millions of dollars training LLMs.
But hereâs the twist: you still have a considerable role. And itâs not about the LLM model. Itâs about the agentâthe system that turns that LLM into something useful.
Reliability. Can you make the agent behave the same way, every time, in production, without random hallucinations or breakdowns? This isnât a chatbot demo. This is real workflow support.
User Experience. Do you understand what humans need? What slows them down? Whatâs the job to be doneâand how can the agent fit into that flow, without pretending to replace the human?
Thatâs what the Chronicle team did.
Thatâs what customer support agents get right.
Thatâs why coding agents are thriving.
And thatâs the litmus test for your cybersecurity use case.
See you in the next post!
Laurent đ