cURL removes bug bounties
192 points
3 hours ago
| 14 comments
| etn.se
| HN
dlcarrier
2 hours ago
[-]
An entry fee that is reimbursed if the bug turns out to matter would stop this, real quick.

Then again, I once submitted a bug report to my bank, because the login method could be switched from password+pin to pin only, when not logged in, and they closed it as "works as intended", because they had decided that an optional password was more convenient than a required password. (And that's not even getting into the difference between real two-factor authentication the some-factor one-and-a-half-times they had implemented by adding a PIN to a password login.) I've since learned that anything heavily regulated like hospitals and banks will have security procedures catering to compliance, not actual security.

Assuming the host of the bug bounty program is operating in good faith, adding some kind of barrier to entry or punishment for untested entries will weed out submitters acting in bad faith.

reply
bawolff
2 hours ago
[-]
Bug bounties often involve a lot of risk for submitters. Often the person reading the report doesn't know that much and misinterprets it. Often rules are unclear about what sort of reports are wanted. A pay to enter would increase that risk.

Honestly bug bounties are kind of miserable for both sides. I've worked on the recieving side of bug bounty programs. You wouldnt believe the shit that is submitted. This was before AI and it was significant work to sort through, i can only imagine what its like now. On the other hand for a submitter, you are essentially working on spec with no garuntee your work is going to be evaluated fairly. Even if it is, you are rolling the dice that your report is not a duplicate of an issue reported 10 years ago that the company just doesn't feel like fixing.

reply
ANarrativeApe
1 hour ago
[-]
Pay to enter would increase the risk of submitting a bug report. However, if the submission fees were added to the bounty payable, then the risk reward changes in favour of the submitter of genuine bugs. You could even have refund the submission fee in the case of a good faith non bug submission. A little game theory can go a long way in improving the bug bounty system...
reply
bawolff
1 hour ago
[-]
If a competent neutral party was evaluating them, i would agree. However currently these things tend to be luck of a draw.
reply
CTDOCodebases
1 hour ago
[-]
They could allow submitters to double down on submissions escalating the bug to more skilled and experienced code reviewers who get a cut of the doubled submission fee for reviews.
reply
eterm
1 hour ago
[-]
Indeed, increasing the incentive for companies to reject ( and then sometimes silently fix anyway ) even the valid reports would only increase further misery for everyone.
reply
sudahtigabulan
1 hour ago
[-]
> I've since learned that anything heavily regulated like hospitals and banks will have security procedures catering to compliance, not actual security.

Sadly, yeah. And will do anything only if they believe they can actually be caught.

An EU-wide bank I used to be customer of until recently, supported login with Qualified Electronic Signatures, but only if your dongle supports... SHA-1. Mine didn't. It's been deprecated at least a decade ago.

A government-certified identity provider made software that supposedly allowed you to have multiple such electronic signatures plugged in, presenting them in a list, but if one of them happened to be a YubiKey... crash. YubiKey conforms to the same standard as the PIV modules they sold, but the developers made some assumptions beyond the standard. I just wanted their software not to crash while my YubiKey is plugged in. I reported it, and they replied that it's not their problem.

reply
duxup
6 minutes ago
[-]
Are bug reports a 100% sure black and white thing?

Could people who think they found a bug but not sure be turned off by the up front cost / risk of finding out they are wrong or not technically finding a bug?

reply
fredrikholm
2 hours ago
[-]
> An entry fee that is reimbursed if the bug turns out to matter would stop this, real quick.

I refer to this as the Notion-to-Confluence cost border.

When Notion first came out, it was snappy and easy to use. Creating a page being essentially free of effort, you very quickly had thousands of them, mostly useless.

Confluence, at least in west EU, is offensively slow. The thought of adding a page is sufficiently demoralizing that it's easier to update an existing page and save yourself minutes of request time outs. Consequently, there's some ~20 pages even in large companies.

I'm not saying that sleep(15 * SECOND) is the way to counter, but once something becomes very easy to do at scale, it explodes to the point where the original utility is now lost in a sea of noise.

reply
TeMPOraL
32 seconds ago
[-]
The term I know / used for this is "trivial inconveniences", via an old article of Scott Alexander[0]. The quote from example from early in the article stuck with me for years:

Think about this for a second. The human longing for freedom of information is a terrible and wonderful thing. It delineates a pivotal difference between mental emancipation and slavery. It has launched protests, rebellions, and revolutions. Thousands have devoted their lives to it, thousands of others have even died for it. And it can be stopped dead in its tracks by requiring people to search for "how to set up proxy" before viewing their anti-government website.

--

[0] - https://www.lesswrong.com/posts/reitXJgJXFzKpdKyd/beware-tri...

reply
teekert
2 hours ago
[-]
It’s strange how sensitive humans are to these sort of relative perceived efforts. Having a charged, cordless vacuum cleaner ready to go and take around the house has also changed our vacuuming game. Because carrying a big unwieldy vacuum cleaner and needing to find a power socket at every location just feels like much more effort. Even though it really isn't.
reply
arionmiles
2 hours ago
[-]
I find this to be a very amusing critique. In my experience, Notion (when I stopped using it 3 years ago) was slow as molasses. Slow to load, slow to update. In comparison, at work, I almost exclusively favor Confluence Cloud. It's very responsive for me.

We have tons of Confluence wikis, updated frequently.

reply
jraph
2 hours ago
[-]
> Consequently, there's some ~20 pages even in large companies.

As someone working on Confluence to XWiki migration tools, I wish this was remotely true, my life would be way easier (and probably more boring :-)).

reply
icar
1 hour ago
[-]
> I've since learned that anything heavily regulated like hospitals and banks will have security procedures catering to compliance, not actual security.

I personally came to that conclusion thanks to the GrapheneOS situation regarding device attestation. Insecure devices get full features from some apps because they are certified, although they cite security, while GrapheneOS get half featured apps because it's "insecure" (read, doesn't have the Google certification, but are actually the most secure devices you can get, worldwide)

reply
cynicalsecurity
53 minutes ago
[-]
It's not about securing your device from external threats or bad actors; it's about securing the device from you.
reply
laserbeam
1 hour ago
[-]
For weak bank logins, my guess is that reimbursing all account takeovers is cheaper than having a complex login process that would scare away non-technical customers. Or, well, I could see myself making that decision if I were more versed in finance than in computer science and I had a reasonable risk assessment in front of me to tell me how many account takeovers happen.
reply
dlcarrier
29 minutes ago
[-]
Banks aren't even liable for losses from account takeovers, at least if their system is compliant, regardless of whether that makes it secure. Their biggest incentive is customer satisfaction, which fraud does hurt.

It's credit cards that have to reimburse for fraud, but they charge the merchant for it, plus fees, so they have absolutely no incentive to prevent fraud, if not an incentive to outright encourage fraud. That would explain why their implementation of the already compromised EMV was further nerfed by a lack of a PIN in the US.

reply
dmurray
1 hour ago
[-]
cURL would operate such a program in good faith, and quickly earn the trust of the people who submit the kind of bug reports cURL values.

Your bank would not. Nor would mine, or most retail banks.

If the upfront cost would genuinely put off potential submitters, a cottage industry would spring up of hackers who would front you the money in return for a cut if your bug looked good. If that seems gross, it's really not - they end up doing bug triage for the project, which is something any software company would be happy to pay people for.

reply
saghm
2 hours ago
[-]
That anecdote is hilarious and scary in equal measures. Optional passwords are certainly more convenient than required ones, but so are optional PINs. The most convenient UX would be never needing to log in at all! Unless you find it inconvenient for others to have access to your bank account of course
reply
duskdozer
37 minutes ago
[-]
And the counter where the most secure system never allows anyone to log in ever
reply
sersi
2 hours ago
[-]
I really hate the current trend of not having passwords. For example perplexity doesn't have a password, just an email verification to login.
reply
dlcarrier
42 minutes ago
[-]
That's what eBay does to me. You get to choose, at the time of login, between entering a password and getting an email verification, or just getting an email verification. At least with the bug report I had submitted to my bank, the password requirement had to be disabled from inside a settings menu, instead of being a clear option in the login prompt, but it that case it wasn't even a 2nd factor.
reply
duskdozer
32 minutes ago
[-]
>You get to choose, at the time of login, between entering a password and getting an email verification, or just getting an email verification.

Ugh, I hate this. I've seen it in other places. Just waiting for them to decide that actually it should be an SMS or a phone call...

reply
eXpl0it3r
1 hour ago
[-]
I hate this as well, especially since I have greylisting enabled on some email addresses, so by the time the email login is delivered, the login session has already timed out and of course the sender uses different mail servers everytime. So in some cases, it's nearly impossible to login and takes minutes...
reply
6510
1 hour ago
[-]
Long long ago the google toolbar queries could be reverse engineered to do an i feel lucky search on gmail. I created a login that (if @gmail.com) forwarded to the specific mail.

Unlikely to happen but it seems fun to extend email [clients] with uri's. It is just a document browser, who cares how they are delivered.

reply
gamer191
2 hours ago
[-]
Agreed, although the reimbursement should be based on whether a reasonable person could consider that to be a vulnerability. Often it’s tricky for outsiders to tell whether a behaviour is expected or a vulnerability
reply
dlcarrier
41 minutes ago
[-]
Yeah, the reimbursement would need to be for a good-faith submission worth considering, even if it wasn't actionable.
reply
nospice
2 hours ago
[-]
> An entry fee that is reimbursed if the bug turns out to matter would stop this, real quick.

The problem is that bug bounty slop works. A lot of companies with second-tier bug bounties outsource triage to contractors (there's an entire industry built around that). If a report looks plausible, the contractor files a bug. The engineers who receive the report are often not qualified to debate exploitability, so they just make the suggested fix and move on. The reporter gets credit or a token payout. Everyone is happy.

Unless you have a top-notch security team with a lot of time on their hands, pushing back is not in your interest. If you keep getting into fights with reporters, you'll eventually get it wrong and you're gonna get derided on HN and get headlines about how you don't take security seriously.

In this model, it doesn't matter if you require a deposit, because on average, bogus reports still pay off. You also create an interesting problem that a sketchy vendor can hold the reporter's money hostage if the reporter doesn't agree to unreasonable terms.

reply
notpushkin
1 hour ago
[-]
I don’t think it works for curl though. You would guess that sloperators would figure out that their reports aren’t going through with curl specifically (because, well, people are actually looking into them and can call bullshit), and move on.

For some reason they either didn’t notice (e.g. there’s just too many people trying to get in on it), or did notice, but decided they don’t care. Deposit should help here: companies probably will not do it, so when you see a project requires a deposit, you’ll probably stop and think about it.

reply
zrm
1 hour ago
[-]
Triage gets outsourced because the quality of reports is low.

If filing a bad report costs money, low quality reports go down. Meanwhile anyone still doing it is funding your top notch security team because then they can thoroughly investigate the report and if it turns out to be nothing then the reporter ends up paying them for their time.

reply
jameslk
2 hours ago
[-]
It seems open source loses the most from AI. Open source code trained the models, the models are being used to spam open source projects anywhere there's incentive, they can be used to chip away at open source business models by implementing paid features and providing the support, and eventually perhaps AI simply replaces most open source code
reply
GardenLetter27
3 minutes ago
[-]
How so? I think the Bazaar model has the most to gain - contributors can use LLMs to create PRs, and you can choose from a vast array of projects depending on how much you trust vibe coding.
reply
pravj
25 minutes ago
[-]
Extending on the same line, we will see programs like Google Summer of Code (GSoC) getting a massive revamp, or they will stop operating.

From my failed attempt, I remember that

- Students had to find a project matching their interests/skills and start contributing early.

- We used to talk about staying away from some projects with a low supply of students applying (or lurking in the GitHub/BitBucket issues) because of the complexity required for the projects.

Both of these acted as a creative filter for projects and landed them good students/contributors, but it completely goes away with AI being able to do that at scale.

reply
bawolff
2 hours ago
[-]
> they can be used to chip away at open source business models by implementing paid features and providing the support

There are a lot of things to be sad about AI, but this is not it. Nobody has a right to a business model, especially one that assumes nobody will compete with you. If your business model relies on the rest of the world bring sucky so you can sell some value-added to open-core software, i'm happy when it fails.

reply
anileated
1 hour ago
[-]
When LLMs are based on stolen work and violate GPL terms, which should be already illegal, it's very much okay to be furious about the fact that they additionally ruin respective business models of open source, thanks to which they are possible in the guest place.
reply
charcircuit
1 hour ago
[-]
>“Free software” means software that respects users' freedom and community. Roughly, it means that the users have the freedom to run, copy, distribute, study, change and improve the software.

https://www.gnu.org/philosophy/free-sw.html

Being able to learn from the code is a core part of the ideology embedded into the GPL. Not only that, but LLMs learning from code is fair use.

reply
jeroenhd
54 minutes ago
[-]
That freedom for many free licenses comes with the caveat that you provide basic attribution and the same freedom to your users.

LLMs don't (cannot, by design) provide attribution, nor do LLM users have the freedom to run most of these models themselves.

reply
charcircuit
38 minutes ago
[-]
That is if you redistribute or make a derivative work. Applying learnings you made from such software does not require such attribution.
reply
catlifeonmars
45 minutes ago
[-]
> Being able to learn from the code is a core part of the ideology embedded into the GPL.

I have to imagine this ideology was developed with humans in mind.

> but LLMs learning from code is fair use

If by “fair use” you mean the legal term of art, that question is still very much up in the air. If by “fair use” you mean “I think it is fair” then sure, that’s an opinion you’re entitled to have.

reply
charcircuit
28 minutes ago
[-]
>question is still very much up in the air

It is not up in the air at all. It's completely transformative.

reply
sevenzero
1 hour ago
[-]
Competition is extremely important yes. But not the kind of competition, backed by companies that have much bigger monetary assets, to overwhelm projects based on community effort just to trample it down. The FFMPEG Google stuff as an example.
reply
giancarlostoro
2 hours ago
[-]
I wouldn't say open source code solely trained the models, surely there are CS courses and textbooks, official documentation as well as transcripts of talks and courses all factor in as well.

On another note, regarding AI replacing most open source code. I forget what tool it was, but I had a need for a very niche way of accessing an old Android device it was rooted, but if I used something like Disk Drill it would eventually crap out empty files. So I found a GUI someone made, and started asking Claude to add things I needed for it to a) let me preview directories it was seeing and b) let me sudo up, and let me download with a reasonable delay (1s I think) which basically worked, I never had issues again, it was a little slow to recover old photos, but oh well.

I debated pushing the code changes back into github, it works as expected, but it drifted from the maintainers own goals I'm sure.

reply
ValveFan6969
2 hours ago
[-]
"open source" and "business model" in the same sentence... next you're gonna tell me to eat pudding with a fork.
reply
jameslk
2 hours ago
[-]
https://en.wikipedia.org/wiki/Business_models_for_open-sourc...

I think you should try eating pudding with a fork next

reply
robin_reala
1 hour ago
[-]
You’d hardly eat black pudding with a spoon. https://en.wikipedia.org/wiki/Black_pudding
reply
jjgreen
27 minutes ago
[-]
reply
robin_reala
6 minutes ago
[-]
I stand corrected!
reply
bigstrat2003
2 hours ago
[-]
I mean... not what the other poster meant, but https://en.wikipedia.org/wiki/Sticky_toffee_pudding exists and is absolutely delicious.
reply
jameslk
1 hour ago
[-]
Flan is also a type of pudding (milk/eggs base) which can be ate with a fork. Other baked custards too
reply
Grollicus
2 hours ago
[-]
reply
em-bee
29 minutes ago
[-]
already decades ago when we were kids eating pudding with a fork was a fun past time, and i am sure the idea is as old as pudding or forks themselves. i mean, the fact that it spread so fast shows that there are many who already practiced it. it's actually surprising it took this long to become a meme.

heck, my cousin bet with me or let me compete eating pudding with chopsticks. (and that was long before i went to china)

practically speaking, the only downside of using a fork (or chopsticks) is scraping the bottom when you are finishing up.

reply
Aeglaecia
2 hours ago
[-]
i believe that the existence of not for profit organizations is a valid counterpoint to whatever your argument is
reply
shubhamjain
1 hour ago
[-]
I feel AI will have the same effect degrading Internet as social media did. This flood of dumb PRs, issues is one symptom of it. Other is AI accelerating the trend which TikTok started—short, shallow, low-effort content.

It's a shame since this technology is brilliant. But every tech company has drank the “AI is the future” Kool-aid, which means no one has incentive to seriously push back against the flood of low-effort, AI-generated slop. So, it's going to be race to the bottom for a while.

reply
sevenzero
1 hour ago
[-]
It'll stop soonish. The industry is now financed by debt rather than monetary assets that actually exist. Tons of companies see zero gain from AI as its reported repeatedly here on HN. So all the LLM vendors will eventually have to enshittify their products (most likely through ads, shorter token windows, higher pricing and whatnot). As of now, not a sustainable business model thankfully. The only sad part is that this debt will hit the poorest people most.
reply
duskdozer
29 minutes ago
[-]
I'm not so confident that "makes the product worse and makes them less money" is even enough to make them not do it anyway
reply
StrauXX
21 minutes ago
[-]
The solution for this, IMO, is flags. Just like with CTFs, host an instance of your software with a flag that can only be retrieved after a successful exploit. If someone submits the flag to you, there is no argueing about wether or not they found a valid vulnerability.

Yes, this does not work for all vulnerability classes, but it is the best compromise in my mind.

reply
snowmobile
1 minute ago
[-]
How exactly would that work? Curl isn't exactly software that can be "hosted" somewhere, and I'm not sure where you'd hide the flag in the software? Either very few actual vulns would end up being able to retrieve the flag, or it would be trivial to retrieve the flag without an exploit.
reply
Springtime
1 hour ago
[-]
Outside of direct monetary gain like bounties are efforts to just stand out, in terms of being able to show contributions to a large project or getting say a CVE.

Stenberg has actually written about invalid/wildly overrated vulnerabilities that get assigned CVEs on their blog a few times and those were made by humans. I often get the sense some of these aren't just misguided reporters but deliberate attempts to make mountains out of molehills for reputation reasons. Things like this seem harder to account for as an incentive.

reply
Snakes3727
2 hours ago
[-]
The company I work for has a pretty bad bounty system (basically a security@corp email). We have a demo system and a public API with docs. We get around 100 or more emails a day now. Most of it is slop, scams, or my new favourite AI security companies sending us an AI generated pentest un prompted filled with false positives, untrue things, etc. It has become completely useless so no one looks at it.

I had a sales rep even call me up basically trying to book a 3 hour session to review the AI findings unprompted. When I looked at the nearly 250 page report, and saw a critical IIS bug for Windows server (doesn't exist) existing at a scanned IP address of 5xx.x.x.x (yes an impossible IP) publically available in AWS (we exclusively use gcp) I said some very choice words.

reply
arjie
1 hour ago
[-]
It makes sense. This process of searching for bugs was slow and time-consuming so it needed to be incentivized. This is no longer the case. Now the hard part is in identifying which ones are real.

To paraphrase a famous quote: AI-equipped bug hunters find 100 out of every 3 serious vulnerabilities.

reply
StrauXX
17 minutes ago
[-]
The process of finding bugs is still slow and time consuming. The kinds of vulnerabilities you find in codebases like cURL are still beyond AI. Binary exploitation is still a human only field.
reply
wrxd
37 minutes ago
[-]
> Now the hard part is in identifying which ones are real.

So it’s still a slow and time consuming process.

reply
arjie
31 minutes ago
[-]
Tragically expository, wrxd. My facetiousness condemned through explanation.
reply
eknkc
3 hours ago
[-]
A list of the slop if anyone is interested:

https://gist.github.com/bagder/07f7581f6e3d78ef37dfbfc81fd1d...

reply
plastic041
2 hours ago
[-]
In the second report, Daniel greeted the slopper very kindly and tried to start a conversation with them. But the slopper calls him by the completely wrong name. And this was December 2023. It must have been extremely tiring.
reply
johncoltrane
2 hours ago
[-]
> slopper

First new word of 2026. Thank you.

reply
andrewflnr
2 hours ago
[-]
Slop-monger is the term I've seen, and the more evocative one I think.
reply
ares623
2 hours ago
[-]
Sloperator
reply
kakacik
1 hour ago
[-]
Slopster
reply
ares623
1 hour ago
[-]
DevSlops engineer
reply
golem14
2 hours ago
[-]
I looked at two reports, and I can’t tell if the reports are directly from an ai or some very junior student not really understanding security. LLms to me sound generally more convincing.
reply
mirekrusin
2 hours ago
[-]
Some (most?) are llm chat copy paste addressing non existing users in conversations like [0] - what a waste of time.

[0] https://hackerone.com/reports/2298307

reply
golem14
24 minutes ago
[-]
Yeah, that one is pretty clearly written with the help of AI. This could well be the work of a larger group, say a state actor, trying to overwhelm reviewers and crowd out real reports. And if not yet, then for sure going forward ...
reply
worldsavior
2 hours ago
[-]
All of those reports are clearly AI and it's weird seeing the staff not recognizing it as AI and being serious.
reply
potatoproduct
1 hour ago
[-]
I thought the same, except I realised some of the reports were submitted back in 2023 before AI slop exploded.
reply
ares623
2 hours ago
[-]
Orc, meet hobbits.
reply
shusaku
3 hours ago
[-]
> To replicate the issue, I have searched in the Bard about this vulnerability.

Seeing Bard mentioned as an LLM takes me back :)

reply
OsrsNeedsf2P
2 hours ago
[-]
Honestly infuriating to read. I'm so surprised cURL put up with this for so long
reply
bilekas
2 hours ago
[-]
I just read one of the slop submissions and it's baffling how anyone could submit these with a straight face.

https://hackerone.com/reports/3293884

Not even understanding the expected behaviour and then throwing as much slop as possible to see what sticks is the problem with generative AI.

reply
nusl
39 minutes ago
[-]
They don't care. They generate large amounts of these, spam them out, and hope for some small success. If they get banned or blocked, they make new accounts. Shame isn't even a factor; it's all about money. They don't even attempt to understand or care about a product.

This was partially the case before, where you'd still get weird spammy or extortive reports, but I guess LLMs enable random people to shoot their shot and gum up the works even more.

reply
doe88
1 hour ago
[-]
Funny how we are now sensitivized to these AI slops, at first I fixated on the En dashes in the lead of the article, made me doubt of the article's author for a few seconds.
reply
nottorp
2 hours ago
[-]
What I wonder is if this will actually reduce the amount of slop.

Bounties are a motivation, but there's also promotional purposes. Show that you submitted thousands of security reports to major open source software and you're suddenly a security expert.

Remember the little iot thing that got on here because of a security report complaining, among other things, that the linux on it did not use systemd?

reply
bawolff
2 hours ago
[-]
I dont think bounties make you an "expert". If you want to be deemed an expert, write blogs detailing how the exploit works. You can do that without a bounty.

In many ways one of the biggest benefits of bug bounties is having a dedicated place where you can submit reports and you know the person on the other end wants them and isn't going to threaten to sue you.

For the most part, the money in a bug bounty isn't work the effort needed to actually find stuff. The exception seens to be when you find some basic bug, that you can automate scan half the internet and submit to 100 different bug bounties.

reply
nottorp
1 hour ago
[-]
> I dont think bounties make you an "expert".

It depends to who.

> If you want to be deemed an expert, write blogs detailing how the exploit works.

That's necessary if you sell your services to people likely to enjoy HN.

reply
plastic041
2 hours ago
[-]
related: cURL stopped HackerOne bug bounty program due to excessive slop reports https://news.ycombinator.com/item?id=46678710
reply
ChrisArchitect
2 hours ago
[-]
reply
ares623
2 hours ago
[-]
Alternate headline: AI discovering so many exploits that cybersecurity can't keep up

Am I doing this right?

reply
bawolff
2 hours ago
[-]
There is a difference between AI discovering real vulnerabilities (e.g. the ffmpeg situation), and AI being used to spam fake vulnerabilities
reply
potatoproduct
1 hour ago
[-]
It's easy to discover an exploit when you're hallucinating:)
reply
catlifeonmars
1 hour ago
[-]
Is that the case?
reply
happosai
25 minutes ago
[-]
The LLM models give the most likely respond to a prompt. So if you prompt it with "find security bugs from this code" it will respond with "This may be a security bug" than you "you fucking donkey this curl code has already been eyeballed by hundreds of people, you think a statistic model will find something new?"
reply
novalis78
2 hours ago
[-]
Just use an LLM to weed them out. What’s so hard about that?
reply
GalaxyNova
2 hours ago
[-]
Because LLMs are bad at reviewing code for the same reasons they are bad at making it? They get tricked by fancy clean syntax and take long descriptions / comments for granted without considering the greater context.
reply
colechristensen
1 hour ago
[-]
I don't know, I prompted Opus 4.5 "Tell me the reasons why this report is stupid" on one of the example slop reports and it returned a list of pretty good answers.[1]

Give it a presumption of guilt and tell it to make a list, and an LLM can do a pretty good job of judging crap. You could very easily rig up a system to give this "why is it stupid" report and then grade the reports and only let humans see the ones that get better than a B+.

If you give them the right structure I've found LLMs to be much better at judging things than creating them.

Opus' judgement in the end:

"This is a textbook example of someone running a sanitizer, seeing output, and filing a report without understanding what they found."

1. https://claude.ai/share/8c96f19a-cf9b-4537-b663-b1cb771bfe3f

reply
exyi
55 minutes ago
[-]
Ok, run the same prompt on a legitimate bug report. The LLM will pretty much always agree with you
reply
colechristensen
38 minutes ago
[-]
find me one
reply
imiric
1 hour ago
[-]
"Tell me the reasons why this report is stupid" is a loaded prompt. The tool will generate whatever output pattern matches it, including hallucinating it. You can get wildly different output if you prompt it "Tell me the reasons why this report is great".

It's the same as if you searched the web for a specific conclusion. You will get matches for it regardless of how insane it is, leading you to believe it is correct. LLMs take this to another level, since they can generate patterns not previously found in their training data, and the output seems credible on the surface.

Trusting the output of an LLM to determine the veracity of a piece of text is a baffilingly bad idea.

reply
colechristensen
1 hour ago
[-]
>"Tell me the reasons why this report is stupid" is a loaded prompt.

This is precisely the point. The LLM has to overcome its agreeableness to reject the implied premise that the report is stupid. It does do this but it takes a lot, but it will eventually tell you "no actually this report is pretty good"

The point being filtering out slop, we can be perfectly find with false rejections.

The process would look like "look at all the reports, generate a list of why each of them is stupid, and then give me a list of the ten most worthy of human attention" and it would do it and do a half-decent job at it. It could also pre-populate judgments to make the reviewer's life easier so they could very quickly glance at it to decide if it's worthy of a deeper look.

reply
nprateem
1 hour ago
[-]
And if you ask why it's accurate it'll spaff out another list of pretty convincing answers.
reply
colechristensen
58 minutes ago
[-]
It does indeed, but at the end added:

>However, I should note: without access to the actual crash file, the specific curl version, or ability to reproduce the issue, I cannot verify this is a valid vulnerability versus expected behavior (some tools intentionally skip cleanup on exit for performance). The 2-byte leak is also very small, which could indicate this is a minor edge case or even intended behavior in certain code paths.

Even biased towards positivity it's still giving me the correct answer.

Given a neutral "judge this report" prompt we get

"This is a low-severity, non-security issue being reported as if it were a security vulnerability." with a lot more detail as to why

So positive, neutral, or negative biased prompts all result in the correct answer that this report is bogus.

reply
bootsmann
2 hours ago
[-]
If AI can't be trusted to write bug reports, why should it be trusted to review them?
reply
f311a
2 hours ago
[-]
How would it work if LLMs provide incorrect reports in the first place? Have a look at the actual HackerOne reports and their comments.

The problem is the complete stupidity of people. They use LLMs to convince the author of the curl that he is not correct about saying that the report is hallucinated. Instead of generating ten LLM comments and doubling down on their incorrect report, they could use a bit of brain power to actually validate the report. It does not even require a lot of skills, you have to manually tests it.

reply
vee-kay
2 hours ago
[-]
Set a thief to catch a thief.
reply
eqvinox
2 hours ago
[-]
At this point it's impossible to tell if this is sarcasm or not.

Brave new world we got there.

reply