FilterHN

New serious vulnerabilities spiked around release of Claude Mythos Preview

79 points

by cubefox

8 hours ago

| past

| 9 comments

| epoch.ai

| HN

▲

simonreiff

58 minutes ago

[-]

So basically there are two plausible explanations:

1. Someone with early access to Mythos leaked it to the bad guys.

2. Cybercriminals are getting enough mileage out of alternatives to Mythos to create exploits far more quickly, even though they don't have access to Mythos.

My own guess is that it's a combination of #2 plus vibe-coding degrading software quality at multiple layers, open the door to sophisticated exploits, but I have no insider access to Mythos so am just guessing. Maybe someone with Mythos access might say why they think this vulnerability spike happened when it did.

▲

prmoustache

38 minutes ago

[-]

Bad guys don't report vulns, they use them.

▲

PlasmaPower

52 minutes ago

[-]

I might be missing something here, but why do you assume this spike in CVEs is from bad guys? I would assume it's at least largely good guys finding and reporting vulns, not based on in-the-wild exploitation by bad guys.

▲

asp_hornet

52 minutes ago

[-]

Disclosure of a vulnerability doesnt mean a bad guy found it.

▲

cperciva

2 hours ago

[-]

This is hardly news? We've known for months that a flood of AI-assisted vulnerabilities was coming; I posted on Twitter in March calling 2026 the year of a million CVEs: https://x.com/i/status/2035045573116789002

▲

no-name-here

1 hour ago

[-]

In pretty much every single HN post on this topic, there are a number of commenters claiming it’s false. Continued quantifiable data like this seems very important at hopefully resolving the ongoing disagreement about the facts.

▲

cperciva

8 minutes ago

[-]

I've seen plenty of people saying "Mythos isn't all that exceptional, lots of LLMs can find security vulnerabilities" -- and indeed there is some evidence for that; it sounds like Anthropic was taken somewhat by surprise at how easily a simple prompt managed to get Mythos to deliver exploits and didn't distinguish immediately between the effectiveness of Mythos and the effectiveness of the prompt.

But the claim of "LLMs aren't making a difference in vulnerability discovery" has been laughable to anyone who has been reading security advisories for the past 3 months. Just look at the Credits lines.

▲

hoppp

6 hours ago

[-]

How are these reports verified to be valid? If there are too many some could be hallucinations too.

▲

guessmyname

5 hours ago

[-]

We (Project Glasswing users) follow a proof-of-concept approach. We create the exploit and verify that it behaves as the AI claims. Given our experience as security engineers (many of us with 10+ YoE) we don’t simply report every critical bug Mythos claims to have found. We verify each one carefully.

At least, that’s what most of the high-visibility users in Project Glasswing are doing.

There are bad apples everywhere, and this initiative is no exception.

If it makes you feel any better, many of us regularly meet to stay calibrated and hold each other accountable, so I’m confident in the quality of the work produced by this particular group of employees across some of the partner companies mentioned in the article.

That said, I know several people who blindly report everything Mythos finds, which is foolish, especially since the harness is a critical part of the project's quality metrics. Some of the harnesses I’ve tested are quite weak, which leads to poor results.

For example, yesterday morning I was pulled into an ad hoc meeting where a CVP was grilling me about several supposedly critical bugs that my team had reported against one of the core components of iCloud. I was genuinely surprised because we’re very strict about validation. We often even downgrade the severity of bugs when our harness can’t prove what Mythos found. After reading the reports, I realized they weren’t ours. They came from another team that had recently been given access to Mythos. They built their own harness and were using different vulnerability criteria. Fortunately, they had only started earlier this week, so I was able to stop that work.

That incident showed that not everyone involved in Project Glasswing follows the same standards. Most people do their best, but priorities differ, so it’s expected that you’ll find a few bad apples.

I wish AI labs would stop the theatrics and release their models without restrictions, but I also recognize that’s not the world we live in. For every person who wants to use these technologies for good, there are many others who would use them for harm.

In any case, while I agree that some experiments contain genuine noise, the CVE count is real.

▲

altmanaltman

2 hours ago

[-]

Its very hard to understand what you're saying with the comment - like you have 10+ years of experience and you verify each bug because you know Mythos can provide fake positives. But other teams (which also should have people equivalent to your skill and experience level) suck at it so much that CVP level workers are having to spend time on their fake reports. Then you say Anthropic should stop theater. Then you say the cve count is real.

It genuinely felt like the aladin scene in The Dictator reading this comment.

▲

guessmyname

41 minutes ago

[-]

I didn’t claim to have 10+ YoE; I said that most of the people in Project Glasswing are security researchers with 10+ YoE (avg).

> Its very hard to understand what you're saying with the comment

Yes, fair enough. I’m simply trying to shed some light on what goes on behind the scenes without disclosing too much information to avoid breaching the NDA(s) that all Project Glasswing users have signed. There’s a lot of speculation about the usefulness of Mythos as a security tool, so much so that even the US government got involved. Honestly, it’s so absurd that I can’t even express it in words. I thought that sharing a bit about how frustrating it is to work within this project, trying to secure software that literally millions of people around the planet use on a daily basis, while virtually everyone outside of it criticizes every move you make, would be helpful.

Many people I work with recognize the power of Mythos, just like any other model with a similar number of parameters, but most of the people I interact with agree that it’s not the ultimate panacea. I believe that it’s just vocal minorities scaring everyone into thinking that the model is some kind of cybernetic weapon.

▲

hatefulheart

19 minutes ago

[-]

Yeah no, literally the only people who thought it was a cybernetic weapon were those with a stake in it. The rest of the world kinda just went “Yeah, ok”.

I get why from your perspective this is a massive deal, but no one really cares for this sort of speculation outside of your circle.

▲

IAmGraydon

3 hours ago

[-]

>We (Project Glasswing users) follow a proof-of-concept approach. We create the exploit and verify that it behaves as the AI claims. Given our experience as security engineers (many of us with 10+ YoE) we don’t simply report every critical bug Mythos claims to have found. We verify each one carefully.

>That incident showed that not everyone involved in Project Glasswing follows the same standards.

▲

nextaccountic

6 hours ago

[-]

The best case scenario for AI companies is, people receive those bug reports, look at the model that produced it and not even look at the details, just apply the fix mindlessly

This gives Anthropic a staggering amount of power. Oh it came from Mythos? We will just lose time trying to analyze it, better apply the fix ASAP

▲

stingraycharles

6 hours ago

[-]

> The best case scenario for AI companies is, people receive those bug reports, look at the model that produced it and not even look at the details, just apply the fix mindlessly

Do people maintaining serious software do this, though?

▲

nextaccountic

4 hours ago

[-]

The problem is that serious software is drowning in AI vulnerability reports. There is not enough manpower to analyze them properly. And if you ignore the reports (like curl is doing in their 1-month vacation), malicious actors will just exploit them. At some point it's inevitable to just rubber stamp whatever is coming from AI.

The actual, underlying problem is that software is buggy and current programming languages aren't fit for writing reliable software. There's a wide gap between the state of art in formal verification, and what is actually practiced in the industry. It's because of this general unreliability that AI has a large supply of vulnerabilities to find. The situation will only get better if software becomes reliable and written in solid foundations.

My guess is that AI will be even more useful to verify software (something like, write Lean or Coq proofs that the software is not vulnerable, things like that), rather than finding vulnerabilities piecemeal but still letting software be written in unsuitable languages, with no formal verification to prevent bugs from sneaking through.

▲

pixl97

5 hours ago

[-]

Define 'serious'. There is a lot of software in serious places written by very unserious people.

▲

eternauta3k

1 hour ago

[-]

Can we learn something from these vulnerabilities? New categories of attacks and corresponding protections?

▲

solenoid0937

7 hours ago

[-]

I predict once the responsible disclosure period is up we will see a lot more

▲

Robdel12

2 hours ago

[-]

…are we really drawing conclusions on this starting at April? When it was released in June?

▲

andai

2 hours ago

[-]

Mythos is from April, it was just limited to a small number of organizations.

▲

downrightmike

1 hour ago

[-]

Not really special, which was the point, its a general model. This is really good marketing as all other LLMs are able to do the same work.

▲

general_reveal

1 hour ago

[-]

So, another victory for the LLM. We were told by project maintainers that AI generated pull requests for vulnerabilities would be blocked. Looks like humans take another L. We have to get out of the way.

▲

comradesmith

5 hours ago

[-]

Good

▲

IAmGraydon

3 hours ago

[-]

Is this because LLMs are better at finding vulnerabilities or because increased use of LLMs for coding is creating more vulnerabilities?