> Refrain from using LLMs in high-risk or safety-critical scenarios.
> Restrict the execution, permissions, and levels of access, such as what files a given system could read and execute, for example.
> Trap inputs and outputs to the system, looking for potential attacks or leakage of sensitive data out of the system.
this, this, this, a thousand billion times this.
this isn’t new advice either. it’s been around for circa ten years at this point (possibly longer).
Is the argument that developers who are less experience/in a hurry, will just accept whatever they're handed? In that case, this would be as true for random people submitting malicious PRs that someone accepts without reading, even without an LLM involved at all? Seems like an odd thing to call a "security nightmare".
Cognitively, these are fairly distinct tasks. When creating code, we imagine architecture, tech solutions, specific ways of implementing, etc., pre-task. When reviewing code, we're given all these.
Sure, some of that thinking would go into prompting, but not to such a detail as when coding.
What follows is that it's easier to make a vulnerability pass through. More so, given that we're potentially exposed to more of them. After all, no one coding manually would consciously add vulnerability to their code base. Ultimately, all such cases are by omission.
A compromised coding agent would try that. So, we have to change the lenses from "vulnerability by omission only" to "all sorts of malicious active changes" too.
An entirely separate discussion is who reviews the code and what security knowledge they have. It's easy to dismiss the concern once a developer has been dealing with security for years. But these are not the only developers who use coding agents.
Consider business pressures as well. If LLMs speed up coding 2x (generously), will management accept losing that because of increased scrutiny?
If they don't then they're stupid
This has been the core of my argument against LLM tools for coding all along. Yes they might get you to a working piece of software faster, but if you are doing actual due diligence reviewing the code they produce then you likely aren't saving time
The only way they save you time is if you are careless and not doing your due diligence to verify their outputs
They’re just not experienced because this has never happened before. But when you call them stupid, they’re not going to listen to you because they won’t like you.
The CTO of my company has pushed multiple AI written PRs that had obvious breaks/edge cases, even after chastising other people for having done the same.
It's not an experience issue. It's a complacency issue. It's a testing issue. It's the price companies pay to get products out the door as quickly as possible.
At that level, it's the combination of all the power and not that much tech expertise anymore. A vulnerable place.
A lot of famous hacks targeted humans as a primary weak point (gullibility, incompetence, naivety, greed, curiosity, take your pick), and technology only as a secondary follow-up.
An example: Someone had to pick up that "dropped" pendrive in a cantina and plug it into their computer in a 100% isolated site to enable Stuxnet.
Were I a black hat hacker, targeting CTOs' egos would be high on my priority list.
I am not a luddite. I see great potential in this tech, but holy mackarel will there be price to pay.
AI can write plausible code without stopping. So, not only you get sheer volume of PRs going up at the same time you might be asked to do things "faster" because you can always use AI. I am sure some CTOs might even say - why not use AI to review AI code to make it faster?
Not to mention previously the random people submitting malicious PRs needed to have some experience. But now every script kiddie can get LLMs to out the malicious PRs without knowledge and scale. How is that not a "security nightmare"?
If insecure code makes it past that then there are bigger issues - why did no one catch this, is the team understanding the tech stack well enough, and did security scanning / tooling fall short, and if so how can that be improved?
Hasn’t this been the case for entire categories of bugs? Stop me if you’ve heard this one before but we have a new critical 10/10 cvs that was dormant for the last 6 years…it was introduced in this innocuous refactor of some utility function and nobody noticed the subtle logic flaw….
The nature of code reviews has changed too. Up until recently I could expect the PR to be mostly understood by the author. Now the code is littered with odd patterns, making it almost adversarial.
Both can be minimised in a solid culture.
i refuse to review PRs that are not 100% understood by the author. it is incredibly disrespectful to unload a bunch of LLM slop onto your peers to review.
if LLMs saved you time, it cannot be at the expense of my time.
Expect this to become the norm. I hate it.
let's see how that holds up against "author does not understand own PR"
The agent decides to import a bunch of different packages. One of them is a utility package hallucinated by the LLM. Just one line being imported erroneously, and now someone can easily exfiltrate data from your internal DB and make it very expensive. And it all looks correct upfront.
You know what the nice thing is about actually writing code? We make inferences and reasoning for what we need to do. We make educated judgments about whether or not we need to use a utility package for what we're doing, and in the process of using said utility can deduce how it functions and why. We can verify that it's a valid, safe tool to use in production environments. And this reduces the total attack surface immensely; even if some things can slip through the odds of it occurring are drastically reduced.
When management wants to see dollars, extra reviews are an easy place to cut. They don’t have the experience to understand what they’re doing because this has never happened before.
Meanwhile the technical people complain but not in a way that non technical people can understand. So you create data points that are not accessible to decision makers and there you go, software gets bad for a little while.
The telling thing is they never mention this "threshold" in the first place, it's only a response to being called on the bullshit.
Increasing the quantity of something that is already an issue without automation involved will cause more issues.
That's not moving the goalposts, it's pointing out something that should be obvious to someone with domain experience.
Every post like this has a tone like they are describing a new phenomenon caused by AI, but it's just a normal professional code quality problem that has always existed.
Consider the difference between these two:
1. AI allows programmers to write sloppy code and commit things without fully checking/testing their code
2. AI greatly increases the speed at which code can be generated, but doesn't nearly improve as much the speed of reviewing code, so we're making software harder to verify
The second is a more accurate picture of what's happening, but comes off much less sensational in a social media post. When people post the 1st example, I discredit them immediately for trying to fear-monger and bait engagement rather than discussing the real problems with AI programming and how to prevent/solve them.
Allowing people who have absolutely no idea about what they're doing to create and release a software product will produce more "code slop", just like AI produces more "article slop" on the internet.
I don't understand the distinction you are trying to draw between your two examples. Instance #1 happens constantly, and is encouraged in many cases by management who have no idea what programmers do beyond costing them a lot of money.
You can internally discredit whomever or whatever you like, but it doesn't change the fact that LLMs currently add very little value to software development at large, and it doesn't appear that there is a path to changing that in the foreseeable future.
But alignment work has steadily improved role adherence; a tonne of RLHF work has gone into making sure roles are respected, like kernel vs. user space.
If role separation were treated seriously -- and seen as a vital and winnable benchmark (thus motivate AI labs to make it even tighter) many prompt injection vectors would collapse...
I don't know why these articles don't communicate this as a kind of central pillar.
Fwiw I wrote a while back about the “ROLP” — Role of Least Privilege — as a way to think about this, but the idea doesn't invigorate the senses I guess. So, even with better role adherence in newer models, entrenched developer patterns keep the door open. If they cared tho, the attack vectors would collapse.
No current model can reliably do this.
I think it will get harder and harder to do prompt injection over time as techniques to seperate user from system input mature and as models are trained on this strategy.
That being said, prompt injection attacks will also mature, and I don't think that the architecture of an LLM will allow us to eliminate the category of attack. All that we can do is mitigate
Using a local LLM isn't a surefire solution unless you also restrict the app's permissions, but it's got to be better than using chatgpt.com. The question is: how much better?
An additional flavor to that: even if my professional AI agent license guarantees that my data won't be used to train generic models, etc., when a US court would make OpenAI reveal your data, they will, no matter where it is physically stored. That's kinda a loophole in law-making, as e.g., the EU increasingly requires data to be stored locally.
However, if one really wants control over the data, they might prefer to run everything in a local setup. Which is going to be way more complicated and expensive.
2. Small Language Models (SLMs). LLMs are generic. That's their whole point. No LLM-based solution needs all LLM's capabilities. And yet training and using the model, because of its sheer size, is expensive.
In the long run, it may be more viable to deploy and train one's own, much smaller model operating only on very specific training data. The tradeoff here is that you get a cheaper in use and more specialized tool, at the cost of up-front development and no easy way of upgrading a model when a new wave of LLMs is deployed.
I am building something for myself now and local is first consideration, because, as most of us here, likely see the direction publicly facing LLMs are going. FWIW, it kinda sucks, because I started to really enjoy my sessions with 4o
Without a doubt. Companies like Mistral and Cohere (probably others too) will set up local LLMs for your organisation, in fact it's basically Cohere's main business model.
The biggest concern to me is that most public-facing LLM integrations follow product roadmaps that often focus in shipping more capable, more usable versions of the tool, instead of limiting the product scope based on the perceived maturity of the underlying technology.
There's a worrying amount of LLM-based services and agents in development by engineering teams that haven't still considered the massive threat surface they're exposing, mainly because a lot of them aren't even aware of how LLM security/safety testing even looks like.
It's like we've decided to build the foundation of the next ten years of technology in unescaped PHP. There are ways to make it work, but it's not the easiest path, and since the whole purpose of the AI initiative seems to be to promote developer laziness, I think there are bigger fuck-ups yet to come.
The historical evidence should give us zero confidence that new tech will get more secure.
From an uncertainty point of view, AI security is an _unknown unknown_, or a non-consideration to most product engineering teams. Everyone is rushing to roll the AI features out, as they fear missing out and start running behind any potential AI-native solutions from competitors. This is a hype phase, and it's a matter of time that it ends.
Best case scenario? the hype train runs out of fuel and those companies will start allocating some resources to improving robustness in AI integrations. What else could happen? AI-targeted attacks create such profound consequences and damage to the market that everyone will stop pushing out of (rational) fear of running the same fate.
Either way, AI security awareness will eventually increase.
> the general state of security has gotten significantly worse over time. More attacks succeed, more attacks happen, ransoms are bigger, damage is bigger
Yeah, that's right. And there's also more online businesses, services, users each year. It's just not that easy to state that things are going for the better or worse unless we (both of us) put the effort to properly contextualize the circumstances and statistically reason through it.
It seems very short sighted.
I think of it more like self driving cars. I expect the error rate to quickly become lower than humans.
Maybe in a couple of years we’ll consider it irresponsible not to write security and safety critical code with frontier LLMs.
Very quickly he went straight to, "Fuck it, the LLM can execute anything, anywhere, anytime, full YOLO".
Part of that is his risk-appetite, but it's also partly because anything else is just really furstrating.
Someone who doesn't themselves code isn't going to understand what they're being asked to allow or deny anyway.
To the pure vibe-coder, who doesn't just not read the code, they couldn't read the code if they tried, there's no difference between "Can I execute grep -e foo */*.ts" and "Can I execute rm -rf /".
Both are meaningless to them. How do you communicate real risk? Asking vibe-coders to understand the commands isn't going to cut it.
So people just full allow all and pray.
That's a security nightmare, it's back to a default-allow permissive environment that we haven't really seen in mass-use, general purpose internet connected devices since windows 98.
The wider PC industry has got very good at UX to the point where most people don't need to worry themselves about how their computer works at all and still successfully hide most of the security trappings and keep it secure.
Meanwhile the AI/LLM side is so rough it basically forces the layperson to open a huge hole they don't understand to make it work.
Analogous to the way I think of self-driving cars is the way I think of fusion: perpetually a few years away from a 'real' breakthrough.
There is currently no reason to believe that LLMs cannot acquire the ability to write secure code in the most prevalent use cases. However, this is contingent upon the availability of appropriate tooling, likely a Rust-like compiler. Furthermore, there's no reason to think that LLMs will become useful tools for validating the security of applications at either the model or implementation level—though they can be useful for detecting quick wins.
Edit: Don’t get me wrong btw. I love autopilot. It’s just completely incapable of handling a large number of very common scenarios.
It's optimistic but maybe once we start training them on "remove the middle" instead it could help make code better.
Today, LLMs make development faster, not better.
And I'd be willing to bet a lot of money they won't be significantly better than a competent human in the next decade, let alone the next couple years. See self-driving cars as an example that supports my position, not yours.
In short, their claims were inaccurate and motivated by protecting their existing racket.
You don't have to use them this way. It's just extremely tempting and addictive.
You can choose to talk to them about code rather than features, using them to develop better code at a normal speed instead of worse code faster. But that's hard work.
I will use AI for suggestions when using an API I'm not familiar with because it's faster than reading all the documentation to figure it the specific function call I need, but I then follow up on the example to verify it's correct and I can confidently read the code. Is that what you're taking about?
A vibe coder without 20+ years of experience can't do that, but they can publish an app or website just the same.
I'd be willing to bet 6 figures that doesn't happen in the next 2 years.
For people here on HN I agree with you; not in the next 2 years or, if no-one invents another model than the transformer based model, not for any length of time until that happens.
They've been "on the cusp" of widespread adoption for around 10 years now, but in reality they appear to have hit a local optimum and another major advance is needed in fundamental research to move them towards mainstream usage.
Self driving cars maybe be better than the average driver but worse than the top drivers.
For security code it’s the same.
There is a reason why the average programmer should use established libraries for such cases.
I might also be hyper sensitive to the cynicism. It tends to bug me more than it probably should.
Then he isn’t unbiased.
Cloudflare apparently did something similar recently.
It is more than possible to write secure code with AI, just as it is more than possible to write secure code with inexperienced junior devs.
As for the RCE vector; Claude Code has realtime no-intervention autoupdate enabled by default. Everyone running it has willfully opted in to giving Anthropic releng (and anyone who can coerce/compel them) full RCE on their machine.
Separately from AI, most people deploy containers based on tagged version names, not cryptographic hashes. This is trivially exploitable by the container registry.
We have learned nothing from Solarwinds.
> Cloudflare apparently did something similar recently.
Sure, LLMs don't magically remove your ability to audit code. But the way they're currently being used, do they make the average dev more or less likely to introduce vulnerabilities?
By the way, a cursory look [0] revealed a number of security issues with that Cloudflare OAuth library. None directly exploitable, but not something you want in your most security-critical code either.
[0] https://neilmadden.blog/2025/06/06/a-look-at-cloudflares-ai-...
Isn't that the same for Chrome, VSCode, and any upstream-managed (as opposed to distro/os managed) package channel with auto updates?
It's a bad default, but pretty much standard practice, and done in the name of security.
What would have happened if someone without your domain expertise wasn't reviewing every line and making the changes you mentioned?
People aren't concerned about you using agents, they're concerned about the second case I described.
Are you aware that your wording here is implying that you are describing a unique issue with AI code that is not present in human code?
>What would have happened if someone without your domain expertise wasn't reviewing every line and making the changes you mentioned?
So, we're talking about two variables, so four states: human-reviewed, human-not-reviewed, ai-reviewed, ai-not-reviewed.
[non ai]
*human-reviewed*: Humans write code, sometimes humans make mistakes, so we have other humans review the code for things like critical security issues
*human-not-reviewed*: Maybe this is a project with a solo developer and automated testing, but otherwise this seems like a pretty bad idea, right? This is the classic version of "YOLO to production", right?
[with ai]
*ai-reviewed*: AI generates code, sometimes AI hallucinates or gets things very wrong or over-engineers things, so we have humans review all the code for things like critical security issues
*ai-not-reviewed*: AI generates code, YOLO to prod, no human reads it - obviously this is terrible and barely works even for hobby projects with a solo developer and no stakes involved
I'm wondering if the disconnect here is that actual professional programmers are just implicitly talking about going from [human-reviewed] to [ai-reviewed], assuming nobody in their right mind would just _skip code reviews_. The median professional software team would never build software without code reviews, imo.
But are you thinking about this as going from [human-reviewed] straight to [ai-not-reviewed]? Or are you thinking about [human-not-reviewed] code for some reason? I guess it's not clear why you immediately latch onto the problems with [ai-not-reviewed] and seem to refuse to acknowledge the validity of the state [ai-reviewed] as being something that's possible?
It's just really unclear why you are jumping straight to concerns like this without any nuance for how the existing industry works regarding similar problems before we used AI at all.
Check out https://varlock.dev to add validation, type-safety, and additional security guardrails.
(it won't if you've been following LLM coding space, but anyway...)
I hoped Gary would have at least linked to the talks so people could get the actual info without his lenses, but no such luck.
But he did link to The Post A Few Years Ago Where He Predicted It All.
(yes I'm cynical: the post is mostly on point, but by now I wouldn't trust Marcus if he announced People Breathe Oxygen).
Also, slides from the Nvidia talk, which they refer to a lot, are linked. The Nathan's presentation links only to the conference website.