FilterHN

It would hit the server hard until the server became slow to respond, then it would back off for about 30 seconds, then hit hard again. I was able to block most of the requests with a combination of user agent and referrer patterns, though some legit users may be blocked.

The attack was annoying, but, the even bigger problem is that the data on this website is under license - we have to pay for it, and it's not cheap. We are able to pay for it (barely) with advertising revenue and some subscriptions.

If everyone is getting this data from their "agent" and scrapers, that means no advertising revenue, and soon enough no more website to scrape, jobs lost, nowhere for scrapers to scrape for the data, nowhere for legit users to get the data for free, etc.

▲

afinlayson

2 minutes ago

[-]

At some point there needs to be a check if it's a real human... But it's a cat and mouse game - any way we create to keep bots off gets a work around by clever engineers.

▲

everdrive

20 minutes ago

[-]

Thanks for sharing the perspective here. I think a lot of folks on HN have rightly said that a lot of the problems with the modern internet are due to the ad-supported business model. I don't think you were ever going to move away from it voluntarily -- too many people support it, even if they grumble about it.

But maybe (and likely for worse) LLMs will finally kill this model.

▲

shimman

15 minutes ago

[-]

Do you not run Anubis or have strict fail2ban rules? I just straight up ban IPs forever if they lookup files that will never exist on my servers. That plus Anubis with the strictest settings.

https://anubis.techaro.lol/

▲

wiseowise

21 minutes ago

[-]

Don’t worry, man, once AGI is here you’ll get your allowance (or whatever the hyperscalers plan is).

▲

oasisbob

2 hours ago

[-]

Knew it was getting bad, but Meta's facebookexternalhit bot changed their behavior recently.

In addition to pulling responses with huge amplification (40x, at least, for posting a single Facebook post to an empty audience), it's sending us traffic with fbclids in the mix. No idea why.

They're also sending tons of masked traffic from their ASN (and EC2), with a fully deceptive UserAgent.

The weirdest part though is that it's scraping mobile-app APIs associated with the site in high volume. We see a ton of other AI-training focused crawlers do this, but was surprised to see the sudden change in behavior on facebookexternalhit ... happened in the last week or so.

Everyone is nuts these days. Got DoSed by Amazonbot this month too. They refuse to tell me what happened, citing the competitive environment.

▲

dspillett

43 minutes ago

[-]

> it's sending us traffic with fbclids in the mix. No idea why.

The click IDs are likely to make the traffic look more like a human who has clicked a link rather than a bot? That way it gets past simple filters that explicitly let such requests in before bothering to check that the source address of the request seems to be a DC rather than a residential IP.

> citing the competitive environment

All the companies are competing to be the biggest inconvenience to everyone else while scraping as much stuff as they can.

▲

oasisbob

38 minutes ago

[-]

> The click IDs are likely to make the traffic look more like a human who has clicked a link rather than a bot?

It's certainly possible. However, the traffic is still coming from Facebook's network with a FB proxy PTR record in DNS. Seems much more likely to fool your typical site owner than a bad actor.

▲

pinkmuffinere

2 hours ago

[-]

I’ve been sitting on this page for two minutes and it’s still not sure whether I’m a bot lol. What did I do in a past life to deserve this :(

▲

mxmlnkn

2 hours ago

[-]

After 2 minutes at 150 kHashes on mobile, I finally see the first pixel of the progress bar filling up. Seems like it will take hours or a day to finish. Some estimate would have been nice.

▲

drum55

2 hours ago

[-]

Ironically I used a LLM to write a bypass for this ridiculous tool, doing hashing in a browser makes no sense, Claude's very bad implementation of it in C does tens of megahash a second and passes all of the challenges nearly instantly. It took about 5 minutes for Claude to write that, and it's not even a particularly fast implementation, but it beats the pants off doing string comparisons for every loop in JavaScript which is what the Anubis tool does.

    for (; ;) {
        const hashBuffer = await calculateSHA256(data + nonce);
        const hashArray = new Uint8Array(hashBuffer);

        let isValid = true;
        for (let i = 0; i < requiredZeroBytes; i++) {
          if (hashArray[i] !== 0) {
            isValid = false;
            break;
          }
        }

It's less proof of work and just annoying to users, and feel good to whoever added it to their site, I can't wait for it to go away. As a bonus, it's based on a misunderstanding of hashcash, because it is only testing zero bytes comparison with a floating point target (as in Bitcoin for example), the difficulty isn't granular enough to make sense, only a couple of the lower ones are reasonably solvable in JavaScript and the gaps between "wait for 90 minutes" and "instantly solved" are 2 values apart.

▲

Retr0id

2 hours ago

[-]

I wrote one that uses opencl: https://github.com/DavidBuchanan314/anubis_offload

▲

drum55

2 hours ago

[-]

Bravo, you even implemented the midstate speedup from Bitcoin, that's way more impressive.

▲

Retr0id

1 hour ago

[-]

It's not exactly rocket science heh, just baffling that the original anubis impl left an order-of-magnitude speedup on the table.

▲

GeoAtreides

1 hour ago

[-]

>It's less proof of work and just annoying to users, and feel good to whoever added it to their site,

this is being disproved in the article posted:

>And so Anubis was enabled in the tar pit at difficulty 1 (lowest setting) when requests were pouring in 24/7. Before it was enabled, it was getting several hundred-thousand requests each day. As soon as Anubis became active in there, it decreased to about 11 requests after 24 hours, most just from curious humans.

apparently it does more than annoying users and making the site owner feel good (well, i suppose effective bot blocking would make the site owner feel quite good)

▲

Aurornis

57 minutes ago

[-]

The Anubis difficulty setting is (or was) so high that nobody could visit the site without leaving it open for minutes or hours.

▲

GeoAtreides

55 minutes ago

[-]

>Anubis was enabled in the tar pit at difficulty 1 (lowest setting) when requests were pouring in 24/7

>difficulty 1 (lowest setting)

literally in the comment you're responding to

▲

bawolff

2 hours ago

[-]

Shouldnt browser also have it implemented in c? Like i assume crypto.subtle isnt written in js.

▲

drum55

2 hours ago

[-]

It doesn't matter if your hottest loop is using string comparisons, as another poster pointed out in C you aren't even doing the majority of the second hash because you know the result (or enough of it) before finishing it. The JavaScript version just does whole hashes and turns them into a Uint8Array, then iterates through it.

▲

yborg

2 hours ago

[-]

Maybe post your brilliant solution to commercial companies with hundreds of millions in funding unrestrained bot scraping the Internet for AI training instead of complaining about people desperate to rein it in as individuals.

▲

drum55

2 hours ago

[-]

Anybody can prompt Claude to implement this, which was my point, it doesn't stop bots because a bot can literally write the bypass! My prompt was the proof of work function from the repository, asked it to make an implementation in C that could solve it faster, and that was about it.

▲

throw10920

2 hours ago

[-]

This is fallacious and extremely disrespectful (or even malicious?). You don't have to propose a way to fix a broken thing to point out that it's broken.

Normal and sane people understand this intuitively. If someone goes to a mechanic because their car is broken and the mechanic says "well, if you can tell that you car is broken, then you should be able to figure out how to fix it" - that mechanic would be universally hated and go out of business in months. Same thing for a customer complaining about a dish made for them in a restaurant, or a user pointing out a bug in a piece of software.

▲

raincole

2 hours ago

[-]

At this point I wonder if you can post a crypto miner page on HN and people will fall for it.

▲

dheera

2 hours ago

[-]

I don't get this kHash thing. Do we have captchas mining bitcoin in a distributed fashion for free now?

▲

throw10920

2 hours ago

[-]

The page says

> Anubis uses a Proof-of-Work scheme in the vein of Hashcash

And if you look up Hashcash on Wikipedia you get https://en.wikipedia.org/wiki/Hashcash which explains how Hashcash works in a fairly straightforward manner (unlike most math pages).

▲

coryrc

1 hour ago

[-]

On what page? https://gladeart.com/blog/the-bot-situation-on-the-internet-... loaded effectively instantly for me.

▲

prewett

1 hour ago

[-]

The cynic in me thinks that they’re mining bitcoin on our phones… And after completing, it claimed the page was misconfigured.

▲

luxuryballs

1 hour ago

[-]

I think we got honeybotted.

▲

salomonk_mur

2 hours ago

[-]

I'm surprised at the effectiveness of simple PoW to stop practically all activity.

I'll implement Anubis at low difficulty for all my projects and leave a decent llms.txt referenced in my sitemap and robots.txt so LLMs can still get relevant data for my site while.keeping bad bots out. I'm getting thousands of requests from China that have really increased costs, glad it seems the fix is rather easy.

▲

gruez

2 hours ago

[-]

>I'm surprised at the effectiveness of simple PoW to stop practically all activity.

It's even dumber than that, because by default anubis whitelists the curl user agent.

    curl -H "User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/146.0.0.0 Safari/537.36" "https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/diff/?id=v7.0-rc5&id2=v7.0-rc4&dt=2"
    <!doctype html><html lang="en"><head><title>Making sure you&#39;re not a bot!</title><link rel="stylesheet"

    curl "https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/diff/?id=v7.0-rc5&id2=v7.0-rc4&dt=2"
    <!DOCTYPE html>
    <html lang='en'>
    <head>
    <title>kernel/git/torvalds/linux.git - Linux kernel source tree</title>

▲

functionmouse

2 hours ago

[-]

shhhh don't tell the bots !

▲

marginalia_nu

2 hours ago

[-]

Anubis' white lists and block rules are configurable though. The defaults are a bit silly.

▲

xena

2 hours ago

[-]

The default is to allow non-Mozilla user agents so that existing (good) automation continues to work and so that people stopped threatening to burn my house down. Lovely people in the privacy community.

▲

wolvoleo

2 hours ago

[-]

It's definitely more than enough to stop me as a human wanting to visit the site, so yeah.

In that case a better solution would be to take the site down altogether.

▲

xboxnolifes

2 hours ago

[-]

Take down the site entirely because a couple humans get into a fit about it?

▲

wolvoleo

1 hour ago

[-]

I'm just saying, making visitors wait at least a minute while making their device turn red hot is going to stop 99,9% of your visitors. So at that point what's the point in trying to serve the content?

▲

jlarocco

1 hour ago

[-]

The site's down entirely anyway. The silly "proof of work" finishes only to tell me the site is down.

What a waste of time.

▲

simonw

2 hours ago

[-]

> These bots are almost certainly scraping data for AI training; normal bad actors don't have funding for millions of unique IPs thrown at a page. They probably belong to several different companies. Perhaps they sell their scraped data to AI companies, or they are AI companies themselves. We can't tell, but we can guess since there aren't all that many large AI corporations out there.

Is the theory here that OpenAI, Anthropic, Gemini, xAI, Qwen, Z.ai etc are all either running bad scrapers via domestic proxies in Indonesia, or are buying data from companies that run those scrapers?

I want to know for sure. Who is paying for this activity? What does the marketplace for scraped data look like?

▲

marginalia_nu

17 minutes ago

[-]

I agree it's a more than a bit handwavy. The common consensus seems to be that AI companies are driving this, but it's really hard to conclusively prove who or what is behind the attacks.

Weird part #1 is that the traffic isn't for the most part shaped like crawler traffic. It's incredibly bursty, and heavily redundant, missing even the most obvious low hanging fruit optimizations.

Could be someone is using residential proxies to wrap AI agents' web traffic, but even so, there's a lot of pieces that don't really make sense, like why the traffic pattern is like being hit by a shotgun. It isn't just one request, but anywhere between 40 and 100 redundant requests.

A popular theory is that this is because of sloppy coding, AI companies are too rich to care, but then again that doesn't really add up. This isn't just a minor inefficiency, if it is "just" bad coding, they stand to gain monumental efficiency improvements by fixing the issues, in the sense of getting the data much faster, a clear competitive edge.

Really weird.

My unsubstantiated guess is the residential proxy/botnet is very unreliable, and that's why they fire so many request. Makes sense if it's sold as a service.

▲

oasisbob

54 minutes ago

[-]

I want more data too.

The root sources of the traffic from residential proxies gets murky very quickly.

It's easy to follow the chain partway for some traffic, eg "Why are we receiving all this traffic from Digital Ocean? ... oh, it's their hero client Firecrawl, using a deceptive UserAgent" ... but it still leaves the obvious question about who the Firecrawl client is.

Res proxy traffic is insane these days. There is also plenty of grey-market snowshoe IPs available for the right price, from a handful of ASNs. I regularly see unified crawling missions by unknown agents using 1000+ "clean" IP addresses an hour.

▲

ghywertelling

1 hour ago

[-]

https://parallel.ai/

I bet lot of companies want to provide search results to AI agents.

▲

NooneAtAll3

2 hours ago

[-]

> Before it was enabled, it was getting several hundred-thousand requests each day. As soon as Anubis became active in there, it decreased to about 11 requests after 24 hours

I love experimental data like this. So much better than gut reaction that was spammed when anubis was just introduced

▲

wolvoleo

2 hours ago

[-]

Well yeah, but I also didn't make it through to the actual site. That can't be the idea, right? After 5 seconds of 100% CPU and no progress I gave up.

The idea is to scare off bots and not normal humans.

▲

rz2k

2 hours ago

[-]

On my computer, with Firefox it uses 14 CPU cores, consumes an extra 35 Watts, and the progress bar barely moves. Is this site mining cryptocurrency?

On Safari or Orion it is merely extremely slow to load.

I definitely wouldn't use any of this on a site that you don't want delisted for cryptojacking.

▲

JeanMarcS

2 hours ago

[-]

I'm getting this patern a lot on Prestashop websites, where thousand, to not say hundreds of thousand, of request are coming from bots not announcing themselves in the User-agent, and coming from different IP's

Very annoying. And you can't filter them because they look like legitimate trafic.

On a page with differents options (such as color, size, etc...) they'll try all the combinaisons, eating all the ressources.

▲

LeoPanthera

2 hours ago

[-]

Is Anubus being set to difficulty 8 on this page supposed to be a joke? I gave up after about 20 seconds.

▲

lucb1e

2 hours ago

[-]

I think that must be the point they're trying to make, yes

It also drives home that Anubis needs a time estimate for sites that don't use Anubis as a "can you run javascript" wall but as an actual proof of work mechanism that it purports to be its main mechanism

It shows a difficulty of "8" with "794 kilohashes per second", but what does that mean? I understand the 8 must be exponential (not literally that 8 hashes are expected to find 1 solution on average), but even as a power of 2, 2^8=256 I happen to know by heart, so thousands of hashes per second would then find an answer in a fraction of a second. Or if it's 8 bytes instead of bits, then you expect to find a solution after like 8 million hashes, which at ~800k is about ten seconds. There is no way to figure out how long the expected wait is even if you understand all the text on the page (which most people wouldn't) and know some shortcuts to do the mental math (how many people know small powers of 2 by heart)

▲

xiconfjs

2 hours ago

[-]

I waited a minute until my phone got hot.

▲

goodmythical

42 minutes ago

[-]

Looks like they've gone ahead and implemented the easiest fool-proof method of preventing scraping as the site is currently not loading across mutliple devices.

Not even a 404, just not available at all.

▲

cullenking

2 hours ago

[-]

We started building out a set of spam/fraud/bot management tooling. If you have any decent infrastructure in place already, this is a pretty manageable task with a mismash of techniques. ASN based blocking (ip lookup databases can be self hosted and contain ASN) for the obvious ones like alibaba etc, subnet blocking for the less obvious (see pattern, block subnet, alleviates but doesn't solve problems).

If you have a logging stack, you can easily find crawler/bot patterns, then flag candidate IP subnets for blocking.

It's definitely whackamole though. We are experimenting with blocking based on risk databases, which run between $2k and $10k a year depending on provider. These map IP ranges to booleans like is_vpn, is_tor, etc, and also contain ASN information. Slightly suspicious crawling behavior or keyword flagging combined with a hit in that DB, and you have a high confidence block.

All this stuff is now easy to homeroll with claude. Before it would have been a major PITA.

▲

lizknope

1 hour ago

[-]

> The IPs of these bots here actually do not come from datacenters or VPNs most of the time; the overwhelming majority come from residential and mobile networks.

So I started searching for what these residential proxy networks actually are.

https://datadome.co/bot-management-protection/how-proxy-prov...

▲

bob1029

2 hours ago

[-]

> safari can't open the page

What is the point of these anti bot measures if organic HN traffic can nuke your site regardless? If this is about protecting information from being acquired by undesirable parties, then this site is currently operating in the most ideal way possible.

The information will eventually be ripped out. You cannot defeat an army with direct access to TSMC's wafer start budget and Microsoft's cloud infrastructure. I would find a different hill to die on. This is exactly like the cookie banners. No one is winning anything here. Publishing information to the public internet is a binary decision. If you need to control access, you do what Netflix and countless others have done. You can't have it both ways.

▲

charonn0

2 hours ago

[-]

Hugged to death?

https://web.archive.org/web/20260329052632/https://gladeart....

▲

wolvoleo

2 hours ago

[-]

Thanks! I was wondering if there was an actual site behind it of if it was just a joke.

▲

dang

2 hours ago

[-]

Added above. Thanks!

▲

cptcobalt

2 hours ago

[-]

honestly this should be updated to the main link, the Anubis at difficulty 8 is astonishingly hostile

▲

dzogchen

2 hours ago

[-]

> we tend to take anti-bot measures very seriously

Should have have maybe prioritized differently...

▲

tromp

1 hour ago

[-]

> let webWorkerURL = `${options.basePrefix}/.within.website/x/cmd/anubis/static/js/worker/sha256-${workerMethod}.mjs?cacheBuster=${options.version}`;

It looks like it's computing sha256 hashes. Such an ASIC friendly PoW has the downside that someone with ASICs would be able to either overwhelm the site or drive up the difficulty so high that CPUs can never get through.

▲

siva7

2 hours ago

[-]

So the elephant in the room: How much of HN is bot generated? Those who know have every incentive not to share and those who don't have no way to figure it out. At this point i have to assume that every new account is a bot

▲

MeetingsBrowser

2 hours ago

[-]

The article is about automated web scraping, not bots writing content.

▲

siva7

1 hour ago

[-]

The commenters here don't care what the article is about when they can't access the article and the much more concerning question not about web scraping is.

▲

uberman

1 hour ago

[-]

I've thought about this a bit and I can't really see why someone would want to write AI content here other than to spam ads but they are handled quickly. Does anyone see AI content with a clear motivation or agenda here? There are very few rep based privileges right so that seems like an unlikely motivation as well.

▲

Retr0id

36 minutes ago

[-]

Most of the HN bot accounts I see have a link-to-vibecoded-product in bio, and/or are trying to build up "organic" activity before a Show HN post for the same.

A less publicly-visible motive would be if they were building up accounts to use for paid-upvote schemes.

▲

PowerElectronix

57 minutes ago

[-]

You can automate shilling to drive or at least influence opinion.

▲

siva7

1 hour ago

[-]

This is a venture capitalist driven community that attracts the sleaziest kind of spammers you could think of under the badge of growth hacking and networking. Besides this very obvious motivation to spam you have all kinds of nerds here eager to do it just because they can (on one of the most famous tech places where registration is made as easy as possible)

▲

Trufa

2 hours ago

[-]

I felt a vibe change, some are obvious and some not, but it does feel different, the main change i've seen is in downvotes, I don't say very controversial things and have had many things very quickly downvoted, and then slowly upvoted, I think hn was very slow to downvote in the past (except obvious trolls/spam). So for me the main worry is not even the comments, but the invisible bias generated by voting.

▲

Retr0id

2 hours ago

[-]

> Those who know have every incentive not to share

Why do you say that?

▲

snapetom

49 minutes ago

[-]

I think HN is one of the better ones these days. I have no data to back this up, but the comments aren't like reddit comments. Go into any reddit post on the main subs, and you won't have to scroll very far to get a comment about Trump derailing the whole thing.

Digg's recent shutdown message talked about how bad and aggressive bots were. I'd love to see Kevin and Alex post in depth about lessons learned, Dead Internet, and call out social sites.

▲

xeyownt

2 hours ago

[-]

Not sure what they are doing, but they don't seem to do it well.

▲

alexspring

2 hours ago

[-]

You can build some great anti-bot mechanisms with simple https://github.com/abrahamjuliot/creepjs logic. A normal user will often appear 31% or lower 'like headless score', mobile is a bit different. You'll still have trouble against sophisticated infra: https://x.com/_alexspring/status/2037968450753335617

▲

jwr

2 hours ago

[-]

An interesting and sad aspect of the war on bots and scraping that is being waged is that we are hurting ourselves in the process, too. Many tasks I'm trying to get my AI assistant to do cannot be done quickly, because sites defensively prohibit access to their content. I'm not scraping: it's my agent trying to fetch a page or two to perform a task for me (such as check pricing or availability).

We need a better solution.

▲

bee_rider

1 hour ago

[-]

You aren’t scraping for the sake of training a model, but scraping the prices and availability is still scraping, right?

I think some of the folks running sites would rather have you go to the site and view the items “suggested based on your shopping history” (I consider these ads, the vendors might disagree), etc.

I’m more sympathetic to the people running sites than the LLM training scrapers, but these are two parties in a many-party game and neither one is perfectly aligned with users.

▲

Retr0id

3 hours ago

[-]

Maybe my imagination is just too accurate but this didn't tell me anything I didn't expect to hear.

> Here is a massive log file for some activity in the Data Export tar pit:

A bit of a privacy faux pas, no? Some visitors may be legitimate.

▲

sltkr

2 hours ago

[-]

Looks like Anubis is also blocking robots.txt which seems to defeat the point of having robots.txt in the first place.

▲

mcv

2 hours ago

[-]

Worse than I could imagine? I imagine that bots might destroy the internet. Not just the internet as we know it; I mean make the internet completely unusable to any human being.

▲

timshell

2 hours ago

[-]

My grad school research was on computational models of human/machine cognition, and I'm now commercializing it as a 'proof-of-human API' for bot detection, spam reduction, and identity verification.

One of the mistakes people assume is that AI capability means humanness. If you know exactly where to look, you can start to identify differences between improving frontier models and human cognition.

One concrete example from a forthcoming blog post of mine:

[begin]

In fact, CAPTCHAs can still be effective if you know where to look.

We ran 75 trials -- 388 total attempts -- benchmarking three frontier AI agents against reCAPTCHA v2 image challenges. We looked across two categories: static, where each image grid is an individual target, and cross-tile challenges, where an object spans multiple tiles.

On static challenges, the agents performed respectably. Claude Sonnet 4.5 solved 47%. Gemini 2.5 Pro: 56%. GPT-5: 23%.

On cross-tile challenges: Claude scored 0%. Gemini: 2%. GPT-5: 1%.

In contrast, humans find cross-tile challenges easier than static ones. If you spot one tile that matches the target, your visual system follows the object into adjacent tiles automatically.

Agents find them nearly impossible. They evaluate each tile independently, produce perfectly rectangular selections, and fail on partial occlusion and boundary-spanning objects. They process the grid as nine separate classification problems. Humans process it as one scene.

The challenges hardest for humans -- ambiguous static grids where the target is small or unclear -- are easiest for agents. The challenges easiest for humans -- follow the object across tiles -- are hardest for agents. The difficulty curves are inverted. Not because agents are dumb, but because the two systems solve the problem with fundamentally different architectures.

Faking an output means producing the right answer. Faking a process means reverse-engineering the computational dynamics of a biological brain and reproducing them in real time. The first problem can be reduced to a machine learning classifier. The second is an unsolved scientific problem.

The standard objection is that any test can be defeated with sufficient incentive. But fraudsters weren't the ones who built the visual neural networks that defeated text CAPTCHAs -- researchers were. And they aren't solving quantum computing to undermine cryptography. The cost of spoofing an iris scan is an engineering problem. The cost of reproducing human cognition is a scientific one. These are not the same category of difficulty.

[end]

▲

ctoth

2 hours ago

[-]

How does your software work with blind people like me who use screen readers?

Your key finding is that humans process the grid as one visual scene — but that's a finding about sighted cognition.

Isn't this, like most things, a sensitivity specificity tradeoff?

How many real humans should be blocked from your system to keep the bots out?

What is the Blackstone ratio of accessibility?

▲

gruez

2 hours ago

[-]

>The first problem can be reduced to a machine learning classifier. The second is an unsolved scientific problem.

I can't believe people are still using this as a generic anti-AI argument even though a decade ago people were insisting that there's no way AI can have the capabilities that frontier LLMs have today. Moreover it's unclear whether the gap even exists. Even if we take the claim that the grid pattern is some sort of fundamental constraint that AI models can't surpass, it doesn't seem too hard to work around by infilling the grids pattern and presenting the 9 images to LLMs as one image.

▲

gostsamo

2 hours ago

[-]

I don't know if they have issue with my ff+ubo, but it is almost a minute that anubis is blocking me. screw them.

▲

dmix

2 hours ago

[-]

As soon as I see that anime bot thing which this website is using I close the tab. More annoying than Cloudflare.

▲

plandis

2 hours ago

[-]

At first glance this seems like a crypto miner.

Maybe I’m a bot, I gave up waiting before the progress bar was even 1% done.

▲

VladVladikoff

2 hours ago

[-]

>How can you protect your sites from these bots?

JA4 fingerprinting works decently for the residential proxies.

▲

Rasbora

1 hour ago

[-]

TLS fingerprinting is not sufficient to stop residential proxies, the proxy acts as a transparent pass-through at the TLS layer making it trivial to use something like curl_cffi to mimic a real browser TLS fingerprint.

However residential proxies do have a weakness, since they need to maintain 2 separate TCP conenctions you can exploit RTT differences between layers 3 and 7 to detect if the connection to your server is being terminated somewhere along the path. Solutions exist that can reliably detect and block residential proxies, for example: https://layer3intel.com/tripwire

▲

qwertyforce

2 hours ago

[-]

noticed that firefox gives 2x kHashes/s more than chrome (1000 vs 500)

▲

rekabis

2 hours ago

[-]

Taking a 2024 report on bot loads on the Internet is like taking a 1950s Car & Driver article for modern vehicle stats.

That’s how fast the landscape is changing.

And remember: while the report might have been released in 2024, it takes time to conduct research and publish. A good chunk of its data was likely from 2023 and earlier.

▲

ricardobeat

2 hours ago

[-]

I cannot get past the bot check (190kH/s), is it mining crypto on my laptop?

▲

RobRivera

2 hours ago

[-]

Yea it's pretty bad

▲

garganzol

2 hours ago

[-]

Everybody says that bots put websites down, while marketing oriented folks start practicing AO (agent optimization) - to make their offerings even more available and penetrating.

Good luck banning yourself from the future.

▲

abujazar

2 hours ago

[-]

What a great way to not get any traffic at all.

▲

ColinWright

2 hours ago

[-]

Quote:

> "The idea is that at individual scales the additional load is ignorable, ..."

Three minutes, one pixel of progress bar, 2 CPUs at 100%, load average 4.3 ...

The site is not protected by Anubis, it's blocked by it.

Closed.

▲

m3kw9

2 hours ago

[-]

Employ constant faceID can deter it

▲

ctoth

2 hours ago

[-]

Please drink verification can.

▲

vondur

2 hours ago

[-]

Ok. So I get a page saying it’s verifying I’m not a bot with some kink of measurements per second and I don’t get through. Is that the point?

▲

neomantra

2 hours ago

[-]

She's definitely a bot with some kink!

▲

raincole

2 hours ago

[-]

I don't get what it is or whether it's a satire or not.

If a webstie takes so long to verify me I'll bounce. That's it.

▲

Frank-Landry

2 hours ago

[-]

This sounds like something a bot would say.

▲

AndrewKemendo

2 hours ago

[-]

The final Eternal September

▲

lifeisstillgood

2 hours ago

[-]

This is why I see (well managed) government digital IDs as sensible moves. Apart from DDOS attacks, if bots have to “prove” who they are on each request it seems like a win-win.

I may be missing something of course

▲

rekabis

2 hours ago

[-]

If you want “papers, please” every time you back out of your driveway or go beyond your government-assigned oblast, then your suggestion is the digital version of the physical authoritarian nightmare that was imposed by totalitarianist regimes throughout history.

People have a right to complete anonymity, and should be able to go across the majority of the Internet just as they can go across most of the country.

That’s what you are missing.

Don’t get me wrong, I am also in favour of a single government ID, but in terms of combatting identity fraud, accessing public resources like single-payer healthcare, and making it easier for a person to prove their identity to authorities or employers.

It should not be used as a pass card for fundamental rights that normally would have zero government involvement.

▲

lifeisstillgood

1 hour ago

[-]

>>> People have a right to complete anonymity

Why? (Am not trolling. Genuinely interested)

I walk out my front door in the UK and I am not anonymous. Every transaction I make either identifies me through bank, railway or other id, or quite simply by my face standing in front of the coffee seller. My walk down the road is observed by neighbours and postmen.

Should my government arrest me without cause or trample on my free speech rights, I get that’s a problem but I am not sure why being anonymous helps. Having rights upheld by the courts helps, well trained police who respect the law helps.

I am honestly open to debate on this but I do find the “what if Hitler took over government where would we be” to be a problematic argument not a final answer.

▲

Supermancho

1 hour ago

[-]

> Should my government arrest me without cause or trample on my free speech rights, I get that’s a problem but I am not sure why being anonymous helps. Having rights upheld by the courts helps, well trained police who respect the law helps.

You're suggesting the same government that would violate your rights would then help prevent it? I don't follow. Any power structure (tiered or not) was wiped away by authoritarians, historically. They will not be helping in the worst case. Ideological capture (corruption) has already started eroding at UK rights and that took a much less overt effort. America has had a robust 3-branch system (executive, legislative, judiciary) corrupted by a singular cult of personality. THAT was highly unlikely to happen, but here we are.

With this being said, I do predict that anonymity on the web is going to be phased out. It will result in all sorts of changes to cultural norms across western nations that largely will curtail rights. I dread it.

Shouldn't we try tracing IP addresses and fining organizations for letting the traffic through or originating the traffic first? Seems a lot simpler.

▲

rekabis

11 minutes ago

[-]

> Every transaction I make either identifies me through bank, railway or other id, or quite simply by my face standing in front of the coffee seller. My walk down the road is observed by neighbours and postmen.

Are these the government? Is the bank the government? Is the rail company the government?

No? Then you have answered your own question.

A silo of identification between you and a service provider that uses the provider’s own tooling is still anonymity from government authoritarianism.

The fact that nearly all of these silos are leaky IRL - with the government eager to punch howitzer-sized holes through them for even more access - is not the point. It is a citizen-hostile flaw that needs patching through loophole-proof legislation, not an ID system that would violently eradicate any remaining separation of government from capitalism.

Remember: when government and capitalism rides in the same cart, it is called corporatism, and is the basis of Fascism. Which is what is happening to America.

▲

carlosjobim

2 hours ago

[-]

You may be missing that it's easy and free for website owners to fix the problem. But it's hacker news after all. If somebody is bothered by a leaf falling on them on their walk to the corner store, the suggested solution here will be to have a full communist revolution.

▲

nba456_

2 hours ago

[-]

It is absolutely not free or easy to stop bots.

▲

gruez

2 hours ago

[-]

/s?