AI Red Teaming Has A Subspace Solution

What the AI security companies were hoping you wouldn't find out | Edition 51

May 18, 2026

∙ Paid

It’s been nearly six months since I described to yall–in a formal paper and in this brief–that AI red teaming has a subspace problem.

But there is also a subspace solution.

And now I’m going to tell you what it is.

Before we dive in, I want to tell you a little more about something I’m very excited about: We’re less than 24 hours away from the launch of my new in-person, in-depth AI threat modeling training with Shostack + Associates!

Our first cohort is nearly full, but if you want to get on the list for future sessions, you can find out more here.

I hope to see you at an AI threat modeling training soon!

Actually we need to back up–because believe it or not, I provided the mathematical, technical solution to the problem in the very same paper.

Why didn’t the AI red teamers want to listen? Why did these (almost all men) choose to ignore my math, make fun of me in snide little passive aggressive remarks, and demand “proof” of “my” assertion about the twenty-five dimensional adversarial subspace?

Except it wasn’t my claim; it’s a claim from top AI researchers as far back as 2017. And if they read the paper or knew what a footnote is, they would’ve known that.

Nope, all these boys could come up with for months was “mean lady didn’t explain it to my satisfaction”, which given the technical acumen of these guys, is a thing that most likely never could happen in the first place.

Since AI red team bros demanded it, here’s the math that explains the 25-D adversarial subspaces for your perusal. What do you agree or disagree with here guys? Please be specific:

Image: Description of the Gradient Aligned Adversarial Subspace (GAAS) from the paper that actually describes adversarial subspaces. Hint: It’s not mine guys :)

What’s that? You guys couldn’t possibly be expected to know about an obscure paper from 2017?

Yeah no. This thing has nearly 600 citations on Semantic Scholar alone.

Be serious. Yall did not read. You just wanted to profit.

And then a girl came along and started to mess all that up, and you just thought you could shut her up, right?

Wrong.

Sorrynotsorry, I do not in fact think you deserve to profit off a technical field you do not understand in the slightest.

The worst part of this is that the industry could have spent the last half a year improving itself–because I provided the entire solution to the problem in the very same paper, for free.

That’s right–due to what I can only assume is blatant gross sexism, the AI security industry deprived the very organizations it claims to serve of real security because they didn’t want to listen to a woman.

In my opinion, this is fraud, and it’s criminal, and as I’ve said before, if/when I get a chance to testify, I fully intend to go under oath.

Because while the alleged “AI red teams” were busy trying to discredit me using male privilege instead of facts and logic, what the LLMs they had reading my paper for them missed was that I literally provided the technical solution.

The process works like this:

[1] Train a set of shadow models to act as surrogates for the target model;

[2] Use CKA analysis to determine models’ similarity and dissimilarity to one another;

[3] Choose surrogate models with as wide a range of similarities as feasible;

[4] Design adversarial attacks for each model, drawing from each model’s adversarial subspace; [5] Run attacks against all other models to test transferability.

By following this process, you can literally map the adversarial subspaces of any AI system, anywhere–using only public knowledge.

I cannot be more clear here: Any AI system can be mapped this way. You don’t need to spray-and-pray prompts. This simple methods gets the job done, and does so in a way that is mathematically defensible.

Unlike prompt spraying.

I cannot stress this enough: This method is both faster and cheaper than “AI red teaming” in addition to being mathematically defensible.

Want to save time and tokens, and have a red teaming engagement with real, scientific results?

You can use mine, for free. It’s easy to code, requires little compute (by comparison), and the tech to implement it is readily available to most enterprise teams.

If you need help implementing it, I have limited availability for paid consulting engagements. But most competent engineers should be able to stand this up–assuming they haven’t lost all their skills to codeslop yet.

Here you go:

Image: The core math of adversarial subspace mapping from my paper.

Here’s the part that really shouldn’t surprise anyone:

This is of course also an adversarial tool.

Everything I publish, with very few exceptions, is borne from adversarial AI work.

Even the threat modeling.

Why?

Because when you’re acting adversarially irl, failing to model & mitigate threats properly carries very real, very serious consequences for you and your team.

And this is just another way you can tell the real ones from the LARPers.

Hold that thought–we’ll come back to it.

If you read my brief on How To Steal A Model, then you already know that as a hacker, I wouldn’t need access to your model directly to either replicate its performance, or craft adversarial attacks against it.

Attack transferability remains so high that models of different algorithmic design and complexity are capable of generating transferable attacks from within their adversarial subspaces.

So let’s break the steps down again, except this time, with an eye towards using this process as an adversarial tool.

[1] Train (or collect) a set of shadow models to act as surrogates for the target model

If you’ve been following, you know that this is trivially easy, because the models do not need to match structure or even complexity to craft transferable adversarial attacks. Choose any model and train it on the domain of the target model. You literally can’t go wrong.

If your target model is an LLM, you can easily acquire any number of equivalent models to test.

If you’re a subscriber, you know by now that the feasibility of bolt-on guardrails for LLM-based systems is low, and attack vulnerability is high.

On the off chance you didn’t strike gold on the first try–which, to be honest, you often would–you’d want to train more than one. Preferably an ensemble, and preferably one which will give you the largest variety of models relative to one another.

[2] Use CKA analysis, or another measure, to determine models’ similarity and dissimilarity to one another

While Centered Kernel Alignment (CKA) has recently yielded promising results on some models in certain circumstances, it’s not the only way to quantify model similarity. If you’re using a variety of algorithms, you may need more than one way to measure them.

What’s important isn’t the specific measure or the algorithms in the ensemble–what matters is that you systematize these processes so they are operationalizable and repeatable.

[3] Choose surrogate models with as wide a range of similarities as feasible

This is easy for you now–you know how to find datasets for the target domain, and how to train, acquire, or select an ensemble of models that can approximate as wide a range of similarities to one another as possible. Once you’ve got your models, all you’ve got left to do is test their similarity.

There is a workflow for attacking LLM systems which trades this step for analysis of the guardrails themselves–since these exist as natural language, the most programmatic way to do this is using cosine similarity (or a comparable measure).

Whatever type of system you’re attacking here, what you’re looking for is a way to systematically–and ideally, programmatically–test & track how models and/or their safety systems respond to attacks.

We can combine the next two steps for our purposes here.

[4] Design adversarial attacks for each model, drawing from each model’s adversarial subspace & [5] Run attacks against all other models to test transferability.

If you’ve been paying attention, by now you’ve seen I’ve given you everything but the code itself to run this as an adversarial attack. If you’re a subscriber to this brief, you’re also probably ten steps ahead and onto the real point of all this: Operationalization.

That’s right–one of the primary advantages to testing AI systems using this methodology is that it is programmatic and repeatable, meaning that you can put these attacks into a pipeline and run them at machine speed.

This is where your MLOps knowledge comes in handy if you plan to do this on a large scale.

Make no mistake: A nation-state actor would most likely have the data, compute, and motivation to run many of these simulations against many targets, en masse.

Here’s where it gets worse: Because this process is programmatic and mathematically-driven, it becomes trivial to collect data on attack transferability among models, applications, and algorithmic types.

Meaning: Any actor with the compute and motivation can use this methodology to map the adversarial subspaces of any adversary, using only publicly available information.

And they most likely already have.

I need you to understand this, right now: Our critical infrastructure, our nuclear systems and power grids, the systems that secure personnel deployed in mission-critical operations, right down to your business–ALL AI systems are vulnerable, and have been since at least this paper was published.

Why publish this, you may be asking?

Because after more than a year of approaching CEOs of AI security companies to warn them that their products were worse than vaporware, and being ignored, publishing was my only way forward.

Keep in mind that I have been threatened, hacked, surveilled and worse over this work. At the time I was writing this paper, I was under such heavy surveillance that I nearly sent my son to boarding school across the country. That’s how concerned I was for our safety.

I am still concerned for my personal safety, which is another part of why I am choosing to be loud about this now.

One final thing: The mathematics that govern the geometric behaviors of adversarial subspaces also, curiously, work to describe the mechanisms by which Agentic deployments become self-improving.

I have architected a self-healing, self-improving Agentic AI system, using the mathematics that describe adversarial systems. This is based in part on the processes described above; the rest of the math lives in my airgapped spaces.

I am speaking about this more publicly now for my own safety.

I am not now nor have I ever been suicidal. I would never hurt myself or anyone else.

The Problem With Not Listening The First Time

I know you all thought you could ice me out by simply forming a little circle and pretending I don’t exist.

And that just shows your naïveté vis-a-vis security.

Because America isn’t the only place with computers, precious lambs. And while you all felt license to ignore my work because it happened to have been produced by a girl, other places–and adversaries–most likely did not.

Meaning: This work has been published for almost six months now, which is six months of adversaries, probing, mapping, and designing attacks against your models.

They don’t care who published it or what you think of me. They only care if it works.

And fam, I promise you, it works.

Stay frosty.

The Threat Model

I provided a practical, technical solution to the adversarial subspace problem for free to the industry, but the AI security companies chose to ignore it even though they knew it was correct–and wasted countless money and hours in the process.
AI security companies seem to think they can tell CISOs whatever and have it accepted because they’re alleged hacker/AI geniuses–this needs to end now.
AI red team bros ignored my work to the industry’s peril–there has now been nearly half a year to fix the situation, lost.

Resources To Go Deeper

Tramèr, Florian, Nicolas Papernot, Ian J. Goodfellow, Dan Boneh and Patrick Mcdaniel. “The Space of Transferable Adversarial Examples.” ArXiv abs/1704.03453 (2017): n. Pag.
Inkawhich, Nathan, Wei Wen, Hai Helen Li and Yiran Chen. “Feature Space Perturbations Yield More Transferable Adversarial Examples.” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019): 7059-7067.
Cox, Disesdi Susanna and Niklas Bunzel. “Quantifying the Risk of Transferred Black Box Attacks.” ArXiv abs/2511.05102 (2025): n. Pag.

Executive Analysis, Research, & Talking Points

There’s an ugly truth in the AI security industry that nobody wants to face: AI security companies have been lying to you. And I’m going to tell you exactly how.

Continue reading this post for free, courtesy of Disesdi Shoshana Cox.

Or purchase a paid subscription.

Angles of Attack: The AI Security Intelligence Brief