We Do Not Support Opt-Out Forms (2025)
55 points
7 hours ago
| 5 comments
| consciousdigital.org
| HN
augusteo
2 minutes ago
[-]
The irony of a site about AI opt-outs getting hammered by AI scrapers is almost too on the nose.

trollbridge's point about scrapers using residential IPs and targeting authentication endpoints matches what we've seen. The scrapers have gotten sophisticated. They're not just crawling, they're probing.

The economics are broken. Running a small site used to cost almost nothing. Now you need to either pay for CDN/protection or spend time playing whack-a-mole with bad actors.

ronsor hosting a front-page HN project on 32MB RAM is impressive and also highlights how much bloat we've normalized. The scraper problem is real, but so is the software efficiency problem.

reply
rubinlinux
2 hours ago
[-]
| Since emails are sent from the individual’s email account, they are already verified.

This is not how email works, though.

reply
blenderob
1 hour ago
[-]
This.

I wonder if it is a generation gap thing. The young folks these days have probably used only Gmail, Proton or one of these big email services that abstract away all the technical details of sending and receiving emails. Without some visibility into the technical details of how emails are composed and sent they might not have ever known that the email headers are not some definite source of truth but totally user defined and can be set to anything.

reply
SoftTalker
30 minutes ago
[-]
98% of email users of any generation don't have the first clue how the protocol works.
reply
pif
55 minutes ago
[-]
Eh, nice times, when you could type an email just by telnetting to port 25...
reply
kro
15 minutes ago
[-]
+1, Even if they validate DKIM/SPF+alignment (aka DMARC) that would only verify the domain. There is no local part verification possible for the receiver, the sending server needs to be trusted with proper auth
reply
veverkap
26 minutes ago
[-]
reply
drcongo
5 hours ago
[-]
That site doesn't seem to support pages loading either.

edit: I feel their pain - I've spent the past week fighting AI scrapers on multiple sites hitting routes that somehow bypass Cloudflare's cache. Thousands of requests per minute, often to URLs that have never even existed. Baidu and OpenAI, I'm looking at you.

reply
storystarling
35 minutes ago
[-]
Might be worth checking if they are appending random query strings to force cache misses. Usually you can normalize the request at the edge to strip those out and protect the origin.
reply
comrade1234
2 hours ago
[-]
Are they hitting non-existent pages? I had ip addresses scanning my personal server including hitting pages that don't exist. I had fail2ban running already so I just turned on the nginx filters (and had to modify the regexs a bit to get them working). I turned on the recididiv jail too. It's been working great.
reply
trollbridge
3 hours ago
[-]
There is currently some AI scraper that uses residential IP addresses and a variety of techniques to conceal itself that likes downloading Swagger generated docs over… and over… and over.

Plus hitting the endpoints for authentication that return 403 over and over.

reply
ndriscoll
2 hours ago
[-]
My n100 minipc can serve over 20k requests per second with nginx (well, it could, if not for the gigabit NIC limiting it). Actually IIRC it can (again, modulo uplink) do more like 40k rps for 404 or 304s.
reply
jen729w
4 hours ago
[-]
> often to URLs that have never even existed

Oh you're so deterministic.

reply
tommek4077
2 hours ago
[-]
Why are "thousands" of requests noticable in any way? Webservers are so powerful nowadays.
reply
SoftTalker
49 minutes ago
[-]
Small, cheap VPSs that are ideal for running a small niche-interest blog or forum will easily fall over if they suddenly get thousands of requests in a short time.

Look at how many sites still get "HN hugged" (formerly known as "slashdotted").

reply
ronsor
42 minutes ago
[-]
I remember my first project posted to HN was hosted on a router with 32MB of RAM and a puny MIPS CPU; despite hitting the front page, it did not crash.

At this point, I have to assume that most software is too inefficient to be exposed to the Internet, and that becomes obvious with any real load.

reply
SoftTalker
29 minutes ago
[-]
While true, it's also true that it was (presumably) able to run and serve its intended audience until the scrapers came along.
reply
drcongo
2 hours ago
[-]
It's not just one scraper.
reply
mystraline
2 hours ago
[-]
IP blocking Asia took my abusive scans down 95%.

I also do not have a robots.txt so google doesnt index.

Got some scanners who left a message how to index or dei dex, but was like 3 lines total in my log (thats not abusive).

But yeah, blocking the whole of Asia stopped soooo much of the net-shit.

reply
blenderob
1 hour ago
[-]
> I also do not have a robots.txt so google doesnt index.

That doesn't sound right. I don't have robots.txt too but Google indexes everything for me.

reply
mystraline
1 hour ago
[-]
https://news.ycombinator.com/item?id=46681454

I think this is a recent change.

reply
daveoc64
1 hour ago
[-]
All the comments there seem to suggest that there has been no change and that robots.txt isn't required.
reply
Citizen_Lame
50 minutes ago
[-]
How did you block Asia, cloudflare or something else?
reply
lambdaone
4 hours ago
[-]
reply
dcminter
4 hours ago
[-]
That wasn't working for me, but this one was: https://archive.ph/QCMjJ
reply