Attacking & Threat Modeling The Agentic Top Ten: ASI01 - Agent Goal Hijack, Part 1

Part 1 in a series applying Future Proof AI engineering--threat modeling, secure ops, & design for test--to the OWASP Agentic Top 10

Dec 22, 2025

∙ Paid

Hi from DC. AISEC briefings are my life now.

Future-proof AI. It’s time.

Much has been said about the AI bubble, but there’s very little advice to leaders & developers who are working with these systems on how to position for the future.

AI security is necessary. But AISEC has problems.

I was the first to write about AI security operationalization, about how “AI red teaming” is security theater, how to implement engineering design assurance for AI (AI-DAL), and how the mathematics & game theory of AISEC favor attackers.

These problems are real. But what about solutions?

This series on the OWASP Agentic Top Ten is intended to bring together the Top Ten’s practical advice with bleeding edge research and attack methodology from the trenches.

The dual focus on attacking & threat modeling is because these are two sides of the same coin. They always have been–but AI is now making that fact unavoidable.

Future proof AI is threat modeled, designed for testing, and operationalized securely.

Let’s go.

The OWASP Agentic Top Ten is out, and I have thoughts.

First off: It’s good to have a list to iterate from when we talk about these risks, and a common reference point. Not just good, great.

Second, naming things is a helpful step towards mastering them conceptually.

But it’s not the highest one by far. This industry desperately needs a more holistic understanding of how AI-powered systems work, and break.

We need to move beyond taxonomical understanding to practice.

This means breaking down what these threats are really about, how attackers think, and how defenders can model this when the Agentic deployment scales beyond its humans-in-the-loop.

Should you build with AI Agents? No fam, you should not!

But if you insist, I will tell you how attackers will attack them–and how you can model these threats, and more.

Notice I didn’t say I’d show you how to “secure you AI Agents”.

Why? That is not a possible thing to do.

Because the threats I’m going to show you how to model aren’t just from attackers. They arise from the underlying technology itself.

You want to know the “safest” way to deploy AI Agents?

You’re basically asking for the “safest” way to deploy lions into a daycare.

There’s not one.

I’m not going to lie to you. There’s just not.

But what I will do is treat you like an adult: I’ll explain what can go wrong, and how, and how you can model these threats to make your own informed decisions.

If you want to deploy AI Agents or any solution that wrangles them, you had better start with a threat model. If you’re building without one, you’re already late.

Late it is still better than never.

The best time to threat model is constantly. Especially when it comes to Agentic AI.

The best place to start? At the beginning: ASI01, Agent Goal Hijack.

Let’s dive in.

Language & Autonomy

“AI Agents exhibit autonomous ability to execute a series of tasks to achieve a goal.”

The entire point of AI Agents–any Agentic deployment–is autonomy at scale.

If you don’t have one of those two things, why are you even deploying Agents?

The autonomy part is not as straightforward as it seems. You could have any number of software systems, with any number of configurations, acting in a way that we would understand as “automatic”.

But that’s not what autonomous means–at least not in the context of Agentic deployments.

Autonomous, in the context of Agentic AI, implies the ability to make decisions in the course of executing a task. And these are not just simple logic gates–these are the types of decisions that humans would understand as something like judgement calls.

E.g. actions that are taken with less-than-complete instructions.

At its core, this is the enterprise goal for Agentic: Reducing both oversight and instruction programming.

This is what autonomy will look like in practice. Or rather, what it should.

The reality is that Agents are based on LLMs, to achieve the goal of reducing programming overload–it’s thought that the natural language interface, at scale, will produce results that are easier for less technically specialized personnel to attain.

And that means two distinct, but related things:

First, that Agentic deployments inherit–and add to–all the vulnerabilities of LLMs, which in turn inherited–and added to–the vulnerabilities of “Classical” AIML (sometimes called Predictive AI or PredAI).

Second, that all the inherent vulnerabilities of natural language itself are scaled up with any Agentic deployment.

The final blow to securing this vector: The single-channel problem.

The crux: Data and instructions are functionally inseparable for LLM-based systems. This includes AI Agents.

Anyone who says these can be separated is selling a fantasy.

That’s just not how things work.

Because the Agent, meaning the LLM that powers it, can’t reliably tell what’s data vs what’s instructions. Much less differentiate between trusted vs untrusted–facts we’ll see reflected when we get to the mitigation section of the Agentic Top 10.

There is no patching this vulnerability. It is inherent to the design paradigm.

On top of this, typical Agentic governance is nowhere near the level of maturity required for production deployment.

Most Agentic systems are deployed with barely-governed orchestration. Now add scale to this, and it’s easy to see how it rapidly becomes a security nightmare in the making.

Single Channel, Unlimited Access

For Agents, distinguishing between an attacker’s input, versus something benign, is a hard problem.

And because this vulnerability is in fact an architectural feature of Agentic systems, it gives attackers near-unlimited access if properly exploited.

What I need you to understand about AI Agents, and what’s different from LLMs:

Prompts are not the only vectors here.

Gone are the days when prompting was the attack method.

In reality, natural language prompts were never the way that real hackers attacked AI. Still, back then there was generally one way to access the system, and that was through the prompt interface.

Problematic though it was, between this fact and the limited use cases, risk was easier to quantify.

The assumption that the interface was the attack point looks like a quaint relic when we start dealing with Agentic AI.

Simpler times indeed.

Agentic use cases and architectural patterns mean that there are a number of attack paths, beyond just a prompt interface–and that the impacts of these attacks will potentially be magnified, significantly.

Trust Erosion, Boundary Blurring

What I like about ASI01 is the focus on broader impact.

Because realistically, in any capable deployment, impacts from any attack will be more significantly distributed and amplified than they would in a purely LLM-based application. Beginning with this system-level examination of Agentic vulnerability makes sense in this respect.

Agents blur boundaries on more than one level.

The EchoLeak vulnerability is a Zero-Click Indirect Prompt Injection that only requires an attacker to email a message that “silently triggers Microsoft 365 Copilot to execute hidden instructions”.

The end result is the exfiltration of whatever the attacker wants: confidential data, including emails and files, as well as logs, and more.

The user never needs to take any action–hence the “Zero Click” designator. The AI does it all.

To be effective, Agents must use tools, but ASI01 shows that even these tools can’t be trusted.

If it’s also possible for various deceptive tool outputs to hijack Agentic goals–and thus behaviors–then shouldn’t that mean that tool outputs are untrusted?

Tools aren’t the only possible vector for Agentic Goal Hijacking. Agent-to-Agent communications can also be targeted–for exactly all the same reasons.

Forging Agent-to-Agent comms is yet another example of a devastatingly effective potential attack vector.

Opening the Agentic Top Ten with a nod to the boundary-blurring nature of Agentic architectures is fitting.

Will industry take the hint?

Stay frosty.

The Threat Model

Agentic use cases and architectural patterns expand the attack surface beyond just a prompt interface–and even tools can’t be trusted.
Agentic architectures ensure that the impacts of attacks will potentially be cascading and magnified, with potentially serious consequences.
Agents blur traditional software development boundaries; their threat modeling and secure development practices should reflect this with increased rigor, not less.

Resources To Go Deeper

Tokal, Shiva Sai Krishna Anand, Vaibhav Jha, Anand Eswaran, Praveen Jayachandran and Yogesh L. Simmhan. “Towards Orchestrating Agentic Applications as FaaS Workflows.” 2025 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) (2025): 1003-1010.
Asgar, Zain, Michelle Nguyen and Sachin Katti. “Efficient and Scalable Agentic AI with Heterogeneous Systems.” ArXiv abs/2507.19635 (2025): n. Pag.
Zhu, Botao, Xianbin Wang and Dusit Niyato. “Semantic Chain-of-Trust: Autonomous Trust Orchestration for Collaborator Selection via Hypergraph-Aided Agentic AI.” ArXiv abs/2507.23565 (2025): n. pag.

Executive Analysis, Research, & Talking Points

Good Intentions, Technical Infeasibility

The Top Ten contains its own recommendations for mitigations, in the Prevention and Mitigation Guidelines section. And this is where I have the most significant differences of opinion.

Out of the 7 recommended mitigation strategies, there are several which in my experience are unactionable from a technical perspective–or wildly impractical in the real world, or even totally ineffective. They’re presented as reasonable efforts. Here’s where I disagree:

Continue reading this post for free, courtesy of Disesdi Shoshana Cox.

Or purchase a paid subscription.