Agentic Ops’ Next Frontier: Addressing The Standards Gap
Why the only way forward for Agentic AI is standardization, now | Edition 45
Image: “Ex hacker” as in she knows how to work one of these.
By request, I’ve opened up a limited number of hours for public consultation and briefings. Click here to see my available services.
To say thanks to my subscribers, right now I am offering a timely briefing on what you need to know to navigate emerging regulations & standardization. Find your code below the paywall.
You can also book a consultation with me during AI security office hours. Paid subscribers get 25% off an initial consultation. Discount code under the paywall.
Image: Post on an Agentic disaster from Meta’s Head of Alignment, including the bot’s not-at-all-sociopathic response: “You’re right to be upset”
A solid portion of the internet was recently laughing as Meta’s “Head of Alignment” (whatever that is) found out the hard way that AI Agents might have a few security flaws.
If you can even count the Agent’s spontaneously deleting data, and refusing direct orders to stop, as a “security flaw” in the first place.
This wasn’t an attack. This was just the system doing its thing.
And it’s generous to say that the Agents “refused” direct orders since AI Agents can’t read orders, nor do they understand the concept of authority in the first place. So the concepts of obedience versus insubordination etc are not moral guidepoints, they are in fact merely meaningless vectors, suspended in some vector space near other concepts swept up in the training set.
But I digress.
What we can safely say is that the flaws in reasoning here belong solely to the human who deployed these Agents, not to the Agents themselves, because Agents can never reason.
They “choose” to “obey” or “disobey” because they don’t have any concept of what those words mean relationally beyond the confines of the spaces they’re embedded in.
These words only have distance to, and from, other words–no inherent meaning of their own.
What this means, functionally, is that an LLM (or the Agentic system that gets its surface-level fluency from one) can generate coherent-sounding text all day long–but this text will be definitionally produced in a manner that is unmoored from reality.
There’s no getting around it. These are hallucination machines. Any similarity to the truth is purely coincidental.
A line that is repeatedly propagated in the culture now is that LLMs “hallucinate” when the subject matter at hand is not represented in their training data. This is severely wrong–but just misleadingly close enough to actual engineering truth as to make it dangerously attractive as an explanation.
Let me explain what’s true about this first.
What is factual here is that the LLM cannot generate plausible sounding text around something that it hasn’t seen plausible sounding text about.
In fact, these systems are so grossly inefficient in terms of their compute and performance that they require a lot of examples of plausible-sounding text about any given subject, in order to be able to replicate it conversationally.
That means many, many repeated examples of plausible text about any given subject must have been present in the training set, in order to get plausible-sounding text back out.
Should this be confused with truth? Absolutely not.
And this makes intuitive sense when you think about it. There are plenty of corpora–samples of text–which are not about anything true at all.
We have a name for it–fiction.
Fiction is something humans can draw inspiration from, and use to come to truth.
Machines, however, cannot be inspired by fiction. They can only imitate it.
The works of Shakespeare are often cited as something AI should be trained on, and which would (presumably) inspire AI to create works on a Shakespearean level. But all that really happens when you train LLMs on Shakespeare is that they learn to imitate a writing style, but never the creativity that inspires it.
They imitate, they do not create. So calling LLMs “generative” may be a misnomer in the first place.
Giving decision powers to a system designed to imitate–not think–could only ever turn out badly.
These are, arguably, more than security flaws. They are fundamental design and engineering choices, which must be reckoned with when these systems are deployed with claws in the real world.
The Standardization Gap
One of the most glaring issues to be resolved in the Agentic world is the lack of standardization around not just security, but deployments themselves.
The issues that come with anything built on top of LLMs require more–not less–security rigor. So it’s stunning to watch as people who should otherwise be thought of as smarter than this rush to deploy what amounts to a sociopathic manipulation machine with read/write/delete permissions, and do so without the slightest expectation that a little more engineering might be in order.
These problems are not helped by the absolute lack of anything approaching standardization for Agentic deployment components.
Two areas where standards are sorely lacking are Agentic trust & communications. Protocols for Agentic interactions are underdeveloped and untrustworthy in current iterations.
Hat tip to Rock Lambros of RockCyber for these three Agentic pitfalls (follow RockCyber if you haven’t already):
[1] Agent Identity And Authorization
Why it matters now: Non-human identities (NHIs) are not currently sufficiently advanced to handle Agentic deployments as they both scale and interact. Standardization is desperately needed if industry wants to deploy safely. Security is no longer optional; we are seeing the results of haphazard deployments play out publicly, even from so-called “experts”.
[2] Agent-To-Agent Trust Boundaries
Why it matters now: There are currently no baseline standards for Agent-to-Agent communication.
Let that sink in: Tens of thousands of Agents deployed, communicating with one another across potentially millions of communication points, with zero standardized protocols governing. Zero best practices. I don’t have to tell my readers that this glaring lack leaves sizable swaths of any enterprise deployment vulnerable to attack.
At this scale, your human-in-the-loop is just security theater.
[3] Least agency & least privilege enforcement for Agents
Why it matters now: Least privilege is necessary but not sufficient; Agentic deployments also require application of Least Agency principles. Currently, we effectively have standards for neither. Anyone who’s been in security for more than five minutes sees the potential for catastrophe here.
Addressing The Standards Gap
I’ve been privileged to serve along with Rock on the OWASP AI Exchange team that created technical requirements for international standards, including ISO and the EU AI Act.
The reason this work is so critical should be clear to anyone considering deploying Agents: We desperately need cross-border agreements on how, exactly, to deploy Agentic systems securely.
Not just securely, but also in an interoperable manner.
Remember, the goal of Agentic deployments is autonomy at scale, meaning that these systems will, ideally, need to communicate with each other.
And do so securely.
Such secure interoperability doesn’t just emerge organically from use–it requires careful cultivation, cross-cultural collaboration, and a firm faith in, and commitment to, ensuring that the world becomes an increasingly safer place to deploy.
Because progress is what we all want–and to get there, the industry needs to take AI engineering as seriously as its potential for usefulness, and destruction.
I’ve said it before, and I’ll say it again here: It’s time for AI engineering to grow up.
Mark my words: The next frontier in Agentic Ops will be addressing the standards gap.
Stay frosty.
The Threat Model
If Meta’s own technical leadership cannot deploy Agents safely, it tells us that the bar for understanding these systems is very, very low–and so should be our trust in anything these leaders say about them.
If Meta’s own technical leadership cannot deploy Agents safely, it tells us that AI safety is not a priority for them, so it certainly won’t be for their enterprise customers or users.
If Meta’s own technical leadership cannot deploy Agents safely, it tells us that security research at leading labs is well behind what independent researchers are doing in the field–which means they’re behind adversaries, too.
Resources To Go Deeper
Matthiesen, Stina and Pernille Bjørn. “Why Replacing Legacy Systems Is So Hard in Global Software Development: An Information Infrastructure Perspective.” Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (2015): n. Pag.
Duckert, Melanie, Charlotte P. Lee and Pernille Bjørn. “The Ripple Effect of Information Infrastructures.” Computer Supported Cooperative Work (CSCW) 34 (2025): 207 - 248.
Cataldo, Marcelo, Matthew Bass, James D. Herbsleb and Leonard J. Bass. “On Coordination Mechanisms in Global Software Development.” International Conference on Global Software Engineering (ICGSE 2007) (2007): 71-80.
Executive Analysis, Research, & Talking Points
OpenClaw’s Existential Risk To The AI Ecosystem
There’s a point being raised about OpenClaw as a project, and the industry as a whole: If the developer behind a potentially dangerous & demonstrably insecure system gets famous, cashes out, and sees a project promoted by one of the biggest names in the industry, that’s a success, right? No, and here’s why.




