AI Red Teaming Has An Agentic Problem
Don’t buy, invest in, or pay for a course about “AI red teaming” until you read this | Part IV | Edition 33
In the lab, circa 2022, just after I published my paper on AIML Information Warfare defense. As previously mentioned, all these attacks can be performed on a laptop, in the field–even under a bridge, if needed.
AI red teaming has an Agentic problem.
Actually, it has several.
We’ve covered the how and why of some of AI red teaming’s most glaring technical problems, and why they can’t be remediated.
In my opinion, AI red teaming’s current iteration mainly consists of celebrity hacker mindset applied to the “next cool thing”, often without anyone bothering to do the research into the field that they wanted to “red team” in the first place.
Pretend hackers, selling pretend hacks, that actually made the industry less secure.
It would have been less egregious had they stuck to pretending to be cool online–and not charging actual money to “secure” real systems.
As a reminder: I approached many of these teams prior to publishing any of this work.
I was laughed at, told nobody cared as long as they could extract money from their clients, blocked, and now that I’m going public, threatened.
Guess this hit some nerves. Or more likely, some bank accounts.
Good.
I’m all about doing things the right way. Because when lives are on the line–as they increasingly are with AI applications–we have a moral obligation to do no less.
Red Team Fantasies, Agentic Realities
Prompt spraying was never going to work.
It’s expensive, not scalable, and worst of all, ignores the well-known, decades-old realities of ALL systems that use AI.
But particularly, AI red teaming via prompt spraying won’t work for Agents. This is because Agentic deployments depend on architectures.
It’s the interaction points among Agentic components that real red teams would need to attack. And attacking these components requires in-depth architectural understanding.
On top of this, Agentic deployments in enterprise are often massive–meaning that analysis of these interaction points has to scale.
“AI red teaming” currently does not. Period.
It doesn’t scale simply because it can’t be automated to test the architectural interactions of Agentic deployments.
And this won’t change until AI red teams can learn and automate analysis of Agentic architectures.
When you think about it, this makes total intuitive sense: Agentic deployments can be massive, with thousands of Agents deployed to do a task. And all of their interaction points become a vector for attack.
How will an AI red team test all these interactions, repeatedly, and at scale?
The answer: They most likely can’t.
And even if AI red teaming could be automated at the Agentic architectural level, there would still be no point.
Because spraying a few thousand prompts in a sea of near-infinite attacks is a waste of time.
Full stop.
Not only that, but if the tests aren’t tailored to the architectural capacity being tested at each interaction, there is literally no point.
Without a provable, scalable, and repeatable way to test the architectural interactions of Agentic systems, in a mathematically defensible manner, “AI red teaming” for Agents is an exercise in futility.
AI Red Teaming Vs Red Teaming With AI: Two Sides Of The Same Scam
Similarly, using AI Agents to repair code vulnerabilities will not work. Anyone who claims otherwise is similarly selling make believe.
Find? Maybe.
Fix? LOL no, fam. For the hundredth time, you cannot do this. It is a fantasy.
It’s an evil fantasy too in my opinion, because the entire premise of it is automation at scale. But code remediation cannot be automated reliably, and anyone who has read basic computer science should know this–and also that logically, adding scale into the mix only scales errors.
And again, anyone who has studied software development should know this.
“AI red teaming” and AI code remediation are, in my opinion, two sides of the same scam.
Two episodes of the same security theater, if you will.
Except security theater is supposed to be a show that neither protects, nor does additional harm.
The metaphor breaks down when it comes to red teaming AI, or allowing AI to perform the red teaming.
Why? Because in the first case, make-believe “AI red team” engagements actually benefit adversaries, along with the OPSEC nightmare of their publicly-available repos of prompts.
And in the second case, because allowing AI Agents access to your codebase adds another vector–with no guarantees that the existing ones will be remediated.
Will it find some vulnerabilities? Most likely. Will it find all of them? Will it find any certain percentage of them? And if so, which ones?
The real question: How would you know?
If the point is deployment at scale, then your goal is to remove humans from loops, right? So who checks to see what the AI itself did?
You see the problem here, right?
What’s that, you say? You’ll just have the Agentic system police itself by scanning to see if it fixed the vulnerabilities it discovered?
Perfect, one single system to secure your entire codebase, no supervision required! What could possibly go wrong?
Besides of course the fact that you now have added another vector, which you must implicitly trust due to the scale at which it is supposed to operate.
And also the fact that this specific vector is rife with irreparable vulnerabilities baked into the mathematical realities of what makes it scalable in the first place.
And also the fact that static code analysis is notoriously difficult to pull off at enterprise scale–and remediation remains an unsolved problem in computer science.
So an important, safety-critical engineering problem that requires human attention? Let’s slap some AI on it!
Yeah, no.
Single Channel, Infinite Vulnerability - Now In Your Codebase
Anyone selling such a system will likely argue that it’s no worse than the backdoors created by any third-party monitoring system.
This is absolutely, fundamentally false.
The non-deterministic nature of AI Agents, coupled with their dual-channel failure modes and near infinite attack surfaces makes them a vector that outweighs any potential for secure remediation.
There is no fixing this.
Let’s review the single channel problem.
The reason that LLM-backed “AI” technologies are vulnerable to so-called “injection” attacks is twofold. The single channel problem is the first aspect.
All LLM systems operate using a natural language interface. (There are proposals to move this to something approximating machine language, but for our purposes, they don’t matter here.)
This interface is where the software–remember, AI is just software–gets both its data, and its instructions for processing that data.
What this means, practically speaking, is that anyone with access to the interface can command the software.
Generally, in software engineering, that’s bad. Allowing users to do whatever they want to a software system rarely has good outcomes.
But allowing AI to touch your codebase is even more problematic.
It’s compounded attack vectors, cascading failures, zero-click vulnerabilities, and forever-day attacks–all now accessing your most sensitive IP.
Giving an AI Agent access to your codebase isn’t the same as allowing an untrusted stranger. It’s much worse.
Anyone who tells you otherwise is either a liar, or so dangerously ignorant to the mathematics & engineering of these systems that they shouldn’t be allowed near any codebases, either.
Just my opinion.
AI Does Not Solve Engineering Problems–Engineering Does
Once again, we are (in my opinion) witnessing the application of AI to solving a scale problem by skipping the real solutions which require time & energy investment.
Glossing over very real industry needs to rush a sloppy AI-powered solution that enriches a select few developers, while making everyone less safe, should horrify everyone more when the application is safety or security related.
For a civil application of this same effect, see The Dispatch’s recent article on how a personified AI system is answering 911 calls and deciding what constitutes an emergency or not–billed as a solution to burned out, overworked 911 operators who would probably have preferred a pay raise.
But now AI application developers are getting it instead.
There’s an analog in AI red teaming’s problems.
In my professional opinion as an engineer, AI-powered security testing and code remediation sell a dangerous illusion: That code creation & maintenance are automatable, and that they can be cheaply outsourced to a mathematically unsecureable bot in lieu of proper secure development practices.
Anyone who has experience building and maintaining enterprise software knows this is categorically false.
And if you sell such a product, shame on you.
You didn’t want to read the endless papers on why this is a bad idea? The decades of research that would’ve told you to improve the human factors that make software secure, rather than trying to automate them away with a bespoke new critical vector?
Or are you a liar?
Just my opinion, but it really seems like those are the only plausible options.
Feel free to prove me wrong–but you’d better bring math, not promises.
Not CVEs, not a slide deck, not proof of vulnerabilities your alleged system allegedly found–you’d better bring actual engineering and mathematics that demonstrate why your system magically beats the known constraints of all AI systems to date.
Just saying.
Why (Fake) AI Security Teams Don’t Model Threats
The second aspect that makes AI’s “injection” vulnerabilities so problematic is that, as we have covered previously, the subspace problem means that the attacks are unknowably large in number–potentially nearing infinity.
And the uncertainty problem makes this unpatchable.
Search is impossible. We will never find them all. And only finding some of these attacks only benefits attackers.
And this, in my opinion, is why even so many alleged AI security practitioners don’t threat model their systems.
Because going through this exercise will tell 99% of would-be hackers that whatever they’re making is a very bad plan, that absolutely will become a liability for themselves and their clients.
And for the other 1%, what they’ll learn is that some of the risks of that still-maybe-bad plan can be potentially mitigated, but only with serious engineering and secure lifecycle development.
AI red teaming cannot mitigate Agentic AI’s failures–it can’t even properly test for them.
Using AI to secure code is an equally impossible end goal–which should be obvious to anyone who’s read the literature on AI security from the last decade.
The AI goldrush attracts charlatans, and AI security is no different.
Of course, I could always be wrong. Maybe they have extensive threat models, which they simply ignore and keep secret from their clients.
Or maybe they’re just really, really bad at it.
I guess it could also always be both. I’m an open-minded person.
Maybe, just maybe, there’s a system that magically beats the odds, and I just haven’t heard of it yet. Sure. Anything’s possible.
But before anyone in leadership pays for or deploys such a system, I encourage you to do all due diligence, beyond just the hype from someone whose interest is profiting off your deployment–at any cost to your brand, reputation, and security.
Stay frosty.
The Threat Model
AI-automated security testing and code remediation introduce a critical new security vector into potentially hyper-sensitive areas of the enterprise.
Code creation & maintenance are not automatable at scale, and neither is code remediation–and anyone who studied computer science should know this.
Attempting to cheaply outsource critical security functions to a mathematically unsecureable bot in lieu of proper, secure, human-driven development practices is on-trend with AI’s other irresponsible applications in industry.
Resources To Go Deeper
Zhang, W., Quan.Z Sheng, Ahoud Abdulrahmn F. Alhazmi and Chenliang Li. “Adversarial Attacks on Deep Learning Models in Natural Language Processing: A Survey.” arXiv: Computation and Language (2019): n. Pag.
Challita, Brian and Pierre Parrend. “RedTeamLLM: an Agentic AI framework for offensive security.” ArXiv abs/2505.06913 (2025): n. pag.
Maini, Pratyush, Eric Wong and J. Zico Kolter. “Adversarial Robustness Against the Union of Multiple Perturbation Models.” ArXiv abs/1909.04068 (2019): n. Pag.
Executive Analysis, Research & Talking Points
No Perimeters: Why Designing To Test Matters For Agentic AI
Earlier in this brief, I wrote that if Agentic system red team tests aren’t tailored to the architectural capacity being tested at each interaction, there is literally no point.
Here’s why–and how to do it right:




