AI’s Failure Modes Are Getting Worse. So Why Isn’t AI Engineering Getting Better?
AI’s failure modes now include loss of human life, the AI security gold rush intensifies, & AI researchers want all the credit but none of the blame | Edition 25
Two recent developments have dropped in regulating so-called “frontier” AI models & their applications that can’t be ignored.
First, in the wake of lawsuits alleging their chatbots were responsible for the deaths of teenagers by suicide, Character.ai announced that their product would be limited to users over 18.
This absolutely stunning announcement comes after the company tried to argue in federal court that their chatbots had free speech rights under the US first amendment–and lost.
The move speaks volumes: A quick way to cut liabilities for wrongful deaths of children is to make sure they can’t use the platform.
According to the AP, an attorney representing the mother who initially filed the suit said “the judge’s order sends a message that Silicon Valley ‘needs to stop and think and impose guardrails before it launches products to market.’”
Character.ai’s hasty retreat towards banning minors from the platform can be seen as a tacit admission that, from a technical perspective, they can’t.
Just like organizations can never “red team” their way out of the AI security crisis, they can’t guardrail their way out of liability.
In my opinion, Character.ai’s ban on minors demonstrates that they understand–and have always understood–exactly that.
After all, we’re all reading the same papers, right?
The first amendment gambit was bold, but shallow–it was all they had. My guess is that the legal team had communicated with their technical teams enough to understand that there was one, and only one shot for them to avoid taking accountability for the obvious:
That their product is not now and never was a protected entity under the first amendment, and that it was always unsafe for release to consumers–much less minors.
In my opinion, these lawsuits should pursue full discovery of what this and similar companies knew with regards to the technical feasibility of the “guardrails” they promised. Don’t stop until they admit that they always knew that this product was unsecurable and unsafe–and they released it anyway.
Just my opinion. But it appears to be one that is increasingly shared among policymakers–with good reason.
AI Legislation Gets Serious
On 28 October, a bipartisan group of US senators announced draft legislation with potentially stunning impact over the industry: AI companion companies would be banned from allowing minors to use their products, and would be required to implement an age-verification process.
Even more stunning: The bill would create criminal penalties for AI companies that design, develop or make available AI companions that encourage suicide, or induce/encourage sexually explicit conversations with minors.
Both provisions are noteworthy because lawsuits filed by parents allege that these chatbots have engaged in what they call sexually abusive conversations with their children, in addition to encouraging self harm and even suicide.
What strikes me the most about this legislation is the establishment of criminal penalties.
This is in direct contrast to California’s recent AI legislation, which amounts to a polite-but-firm request to AI companies that they pinky-promise to do their very best.
Will criminalizing bad chatbot behavior effectively criminalize the chatbots themselves? It will be interesting to see where this legislation goes–and should it become law, what the first legal challenges will look like.
The unavoidable question: When, not if, these chatbots fail to provide services safely and/or securely, who will be held criminally liable for the consequences?
Reality Has Hit The Public–Time For Researchers To Acknowledge It Too
Ever since Yann LeCun’s ignorant and unfortunate statement on building safe AI–by poorly analogizing AI engineering to jet engine production, as if the fields of aerospace engineering and safety-critical AI did not exist–it’s been impossible to ignore the avalanche of bad news around AI.
The hype cycle has always been brutal, but it feels like something changed in the past few weeks.
Once-lauded former vanguards of AI development have been accelerating hype and sci-fi silliness at the rate AGI doomers seem to believe superintelligence is coming. The result of which hasn’t been super-AI so much as plateaued chatbot performance, injected with the ultra-high-tech promise of advertising.
But don’t worry, all the ads will be tailored based on your not-very-private “erotic” chats.
This all presupposes that there is no extant set of engineering principles for AI.
The irony:
So-called “fathers” of AI expect us to simultaneously ascribe to them the status of venerable founders of a well established field, while also believing that AI is too new and cutting-edge to be real tech worth really securing or regulating, or even understanding.
To Yann and his cohort: Guys, it’s time to pick a lane.
Safety-critical AI applications have long existed, and will continue to long after any GenAI bubbles do or do not burst.
Not only is safety-critical AI (SCAI) an established field with a vast literature, but it even has its own systematic literature review work.
In 2021, researchers set out to gather and interpret the literature around SCAI.
What’s interesting in this paper (citation below) is that even though it predates the advent of GenAI products by more than a year, the problems described in certifying AI for safety-critical applications do not merely remain unsolved–they’ve gotten worse.
The study was intended to be of interest to industry practitioners and regulators in safety-critical AI domains. One of the most interesting areas is the enumeration of reasons why SOTA AI (in 2021) wasn’t necessarily ready for safety-critical applications.
It’s remarkable to note how little has changed since the launch of GenAI in 2022.
From the paper:
Safety-critical systems require extremely high safety requirements as the system failure is a matter of life.
Such systems require certification and strong safety guarantees
Despite its success, AI is not completely reliable
Most AI-based systems are generally considered opaque
Example cases of AI failure from 2021 included:
Casualties caused by AI-based autonomous vehicles
AI bias for discrimination
Now we can add AI-assisted psychosis and suicide as product failure modes for GenAI.
Another salient quote from the paper:
“In addition, most AI-based systems are generally considered opaque…[meaning] that for an AI-based tool, there is no view of how it works for the input and output seen…The risk of trusting “black-box” autonomy algorithms makes AI and ML less acceptable in safety-critical domain[s]...These deficiencies of AI post challenges to the widespread application of AI in safety-critical systems.”
Ask yourself honestly, how many of these issues have been even reasonably mitigated, much less solved?
The Attack Surface Multiplies, Industry Lags
Recently as a guest on the Zero Signal Podcast, I had a chance to talk about how an AI model’s adversarial subpace–the ‘box’ of all adversarial attacks that will be effective against a particular model–is massively larger than most CISOs and other leaders realize.
It’s also apparently larger than “AI red teamers” realize.
This box isn’t 3 dimensional–it’s 25 dimensional, to be more precise.
In a LinkedIn post, Zero Signal host Conor Sherman pointed out a highly salient framing that every CISO, security or business leader should know: The AI attack surface isn’t expanding — it’s multiplying.
I’ve written pretty extensively on how current “AI red teaming” is not a real thing. And I’ll say it again.
“AI red teaming” is not a real thing. It is, in my opinion, largely spray-and-pray script kiddie tactics which may illuminate some–what proportion they cannot tell you–of the near-infinite number of attacks that might apply to your model.
I know this because I’ve read the papers, which are now nearly a decade old. And I’ve challenged many AI red teamers to prove me wrong, for months. No one has.
They’ll still take your money, though.
If your “AI red teamer” can’t be bothered to read the literature that’s been around for the better part of the decade in their own supposed field, what does that make them?
A related problem exists with “hallucination” and “guardrails”.
Industry has admitted now that hallucinations can’t be solved, and published research which tacitly points out that we can safely assume every LLM has been hit by poisoning attacks by any number of actors.
If ground truth is so difficult to extract from and verify for a text generator, how does anyone expect to guardrail them?
You can’t, ever. Not really. And unless you apply real mathematics, you can’t secure them either.
Why isn’t industry catching up to these realities?
The Threat Model
Failure modes for AI now include loss of human life–but industry leaders seemingly ignore the engineering realities.
The AI safety & security gold rush has created a massive influx of startups and products with zero actual applicability to the security problems at hand.
Every gold rush attracts those who want to get rich quick without the work; AI safety & security is no different.
Resources To Go Deeper
Wang, Yue and Sai Ho Chung. “Artificial intelligence in safety-critical systems: a systematic review.” Ind. Manag. Data Syst. 122 (2021): 442-470.
Bach, Tita Alissa, Jenny K. Kristiansen, Aleksandar Babic and Alon Jacovi. “Unpacking Human-AI Interaction in Safety-Critical Industries: A Systematic Literature Review.” IEEE Access 12 (2023): 106385-106414.
Bach, Tita Alissa, Amna Khan, Harry P. Hallock, Gabriel Beltrao and Sonia Claudia DaCosta Sousa. “A Systematic Literature Review of User Trust in AI-Enabled Systems: An HCI Perspective.” International Journal of Human–Computer Interaction 40 (2022): 1251 - 1266.
Executive Analysis, Research, & Talking Points
Navigating The New AI Legislative Landscape
AI policy is changing fast–from the least-regulatory, laissez-faire approach of the AI Action Plan, to California’s handing off their regulations to the companies themselves, and now, to potential congressional action–with teeth.
The reality of AI policy outlook is shifting quickly. Here’s how to navigate it:
Keep reading with a 7-day free trial
Subscribe to Angles of Attack: The AI Security Intelligence Brief to keep reading this post and get 7 days of free access to the full post archives.



