The Problem With AI Red Teaming Is That It’s Fake
Make-believe hackers selling make-believe attacks, why smart CISOs aren’t buying, and what to build next | Edition 27
This one’s going to get personal.
Buckle up, because I’ve got something to say.
For more than a year, I spent my own time talking to AI red teaming companies. I met with their CEOs. I talked to practitioners through every back channel I had. I leveraged all my industry connections to get audiences with the teams claiming they were the cutting edge of this technology.
My goal: To warn these people that the tech they were peddling was, unfortunately, fake.
The Real And The Pretend
Let me back up for a minute.
A quick biographical note: I’m a former offensive security professional.
Don’t recall where I worked. But you can think of me as a Red Team Lead.
Not an “AI red team lead”.
The real kind.
You have to understand, I grew up in security culture. Not to be all once-upon-a-time, but a quick tidbit about me: My father was a black belt in ninjutsu who had often worked as private security, and my mother was a homicide detective.
Yes, it was an unusual upbringing.
And yes, I am very good at security.
There’s very little that I can’t get into or out of. This includes everything from your office building to your network.
This isn’t magic, it’s a skill. Anyone can learn.
Few do–because it’s hard, often thankless, and the only people getting glamour points are the ones bragging about (often) fake exploits.
Real red teams operate in a variety of changing circumstances, with interdisciplinary techniques. We operate in (often) dangerous situations. Law enforcement may or may not be aware of our presence. As a red teamer, I’ve had to scale fences, jump flights of stairs, outrun LEOs, and more. I’ve been threatened with arrest. I’ve been shot at.
TLDR: Red teamers will come break your computer on site.
Photo: Me in an airbnb, getting ready to operate on site as Red Team Lead at [REDACTED]
More often than not, we can never discuss the cool things we did, or the places we’ve worked. We’ll never get applause. There’s no glory. Only the satisfaction of knowing that you hopefully exposed flaws in a critical system. That you hopefully helped make people safer.
Why did I do it? Because the mission mattered.
Because if we don’t break it, the bad guys will.
A Decade Of AI Security Papers Ignored
AI “red teaming” is none of that.
Just like “prompt engineering” is cosplay for people who want to pretend to be engineers, “AI red teaming” is usually for people who want to cosplay as hackers.
How do I know?
If they were real hackers, then they would’ve read the manual.
You know, RTFM? But they didn’t.
Let me be clear who I’m talking about here. Because AI security is a field with a decade of research in it.
I’m not talking about the real hackers in this space.
I’m talking about your newly-minted GenAI prompt engineers “AI red teams”.
There’s a reason many of these people only started “attacking AI” after the creation of GenAI, and it isn’t because of the economy or the use cases.
It’s because you could feel like you were “hacking AI” just by using natural language–no math required.
Or so they thought, because again–they didn’t do the reading.
If they had actually read the very large body of work around AI security, they would have known that the “new” attacks they’d found were in fact very, very old.
We’re talking about the better part of a decade.
Just so I’m being clear: These people had nearly a decade to learn the fundamentals of AI security, but they chose not to when an opportunity to make a quick buck presented itself.
These aren’t hackers. They’re hucksters.
Do these people even know who Ian Goodfellow is? The inventor of Generative Adversarial Networks? The guy who wrote the paper explaining that all their little “attacks” are script kiddie silliness?
I’m going to guess no, they don’t.
Because the alternative is worse: They knew all along, and deliberately misled investors, customers, and an entire industry.
What I can say definitively: Many of them knew at least months ago, because I personally told them.
And they told me that they think their customers are rubes.
Why AI Red Teaming Is Fake
To be quite honest, it’s been difficult talking about this for the last year or so–it has meant fighting an uphill battle against what I consider to be a culture of ignorance and greed.
And I’ve paid a pretty hefty personal price.
The assumed ignorance on the part of these “AI red teams” customers is what really gets me–these people think they can convince you they’re smarter than you, because they’re a “hacker” or they have a PhD.
I’m over it.
And I’m grateful that my paper explaining why these people are hucksters is finally out on ArXiv.
I’ve talked before about how the AI attack surface is nearly infinite and cannot be properly tested. This latest paper lays out why–and what to do about it.
If you’d like to understand the underlying math, please do check out the foundational work by my co-author Niklas Bunzel, and Garrit Klause.
You can read our paper yourself–it’s free. I wrote it to show the industry the research they refused to do themselves.
And to show their customers exactly who they should trust. Hint: It’s nobody who sells you a spray-and-pray prompt solution.
Tell them to GTFO.
Real red teaming is done by mapping the adversarial subspaces of models. This requires math. There are no two ways around it.
Real AI attacks are engineered to find the boundaries of the adversarial subspace.
Fake AI attacks are spraying thousands of natural language prompts at a model and calling it a day.
Worse yet, when you hire one of these wannabe hackers to spray their little natural language “attacks” at your system, you provide a golden key to the real attackers:
If I, as a mathematically trained hacker, know an “AI red teamer” already tested your system I’m simply going to iterate off of their tests–saving orders of magnitude of compute time.
Basically, I’m just going to take one of their attacks, perturb it slightly, and use what the system will recognize as a totally new attack to breach your system.
The fact that the “AI red teamers” didn’t know this speaks volumes about their technical acumen–and their alleged experience as “hackers”.
AI Security Has Always Been Math
AI existed long before GenAI and so did the security vectors.
AI attacks have always been math.
As an example, my paper from 2022 predates ChatGPT, while still calling out the exact malicious applications we’ve seen it used for.
How did I do this? It wasn’t psychic powers; it’s because I learned the math.
And the math of these systems doesn’t change. It didn’t change with GenAI, and it won’t change with any other statistical “reasoning” system.
All effective AI attacks in the wild–you know, REAL hackers–are math-based. Again, this hasn’t changed with GenAI.
And if these “red teamers” were real hackers, they’d know that. Sorrynotsorry.
Meanwhile, a lot of these “red teamers” are just shooting natural language prompts and calling it a test. GTFO.
The REAL kicker: Before I started talking about this publicly, I approached MULTIPLE of these “AI red teams” including their CEOs to tell them their methodology was critically flawed.
Know what they told me?
“We don’t care, as long as people pay for it.”
THAT’s what you’re buying when you hire these people.
Wondering why AI attacks keep showing up, despite the proliferation of “AI red teams”?
This is why.
Fake service, from fake hackers. But the money lost is very real. And so are the breaches.
Stay frosty.
The Threat Model
AI attacks by blackhats are not random natural language prompts, they are mathematically created; if AI red teamers aren’t testing with the attacks criminals would, what are they even doing?
Effectively attacking AI requires mapping the adversarial subspace, not spraying prompts at a system while wearing a gray hat.
The AI red teaming industry as a whole doesn’t want to do the reading; they go for hacking glory without doing the work–and that’s antithetical to real hacker mindset.
Resources To Go Deeper
Cox, Disesdi Susanna, and Niklas Bunzel. ‘Quantifying the Risk of Transferred Black Box Attacks’. arXiv [Cs.CR], 2025. arXiv. http://arxiv.org/abs/2511.05102.
Klause, Gerrit, and Niklas Bunzel. ‘The Relationship Between Network Similarity and Transferability of Adversarial Attacks’. arXiv [Cs.CR], 2025. arXiv. http://arxiv.org/abs/2501.18629.
Feffer, Michael, Anusha Sinha, Zachary Chase Lipton and Hoda Heidari. “Red-Teaming for Generative AI: Silver Bullet or Security Theater?” ArXiv abs/2401.15897 (2024): n. pag.
Executive Analysis, Research, & Talking Points
How To Take Every Fake AI Red Team’s Lunch
Now that my paper is out, we can talk about how to build an AI red teaming product that really works.
Not just pretend. Not a cool 1337 h4xxor dude in a black hat showing up to your meeting and talking down to everyone until you’re convinced he’s smarter than you.
An actual product, based on actual math, powered by actual engineering. Here’s the blueprint:
Keep reading with a 7-day free trial
Subscribe to Angles of Attack: The AI Security Intelligence Brief to keep reading this post and get 7 days of free access to the full post archives.



