Sure, the people who make the AI scraper bots are going to figure out how to actually do the work. The point is that they hadn't, and this worked for quite a while.
As the botmakers circumvent, new methods of proof-of-notbot will be made available.
It's really as simple as that. If a new method comes out and your site is safe for a month or two, great! That's better than dealing with fifty requests a second, wondering if you can block whole netblocks, and if so, which.
This is like those simple things on submission forms that ask you what 7 + 2 is. Of course everyone knows that a crawler can calculate that! But it takes a human some time and work to tell the crawler HOW.
I actually find the featured article very interesting. It doesn't feel dismissive of Anubis, but rather it questions whether this particular solution makes sense or not in a constructive way.
I was talking more about some of the people here ;)
https://addons.mozilla.org/en-US/firefox/addon/uaswitcher/
If anyone wants to try themselves. This is by no means against Anubis, but raising the question: Can you even protect a domain if you force yourself to whitelist (for a full bypass) easy to guess UAs?
Anubis screws with me a lot, and often doesn't work.
There’s literally no way for you to bypass the block if you’re affected.
Its incredibly scary, I once had a bad useragent (without knowing it) and half the internet went offline, I couldn’t even access documentation or my email providers site, and there was no contact information or debugging information to help me resolve it: just a big middle finger for half the internet.
I haven’t had issues with any sites using Anubis (yet), but I suspect there are ways to verify that you’re a human if your browser fails the automatic check at least.
Anubis looks much better than this.
Isn't any hosting provider also this?
FaaS: Yes.
IaaS: Only if you do TLS termination at their gateway, otherwise not really, they'd need to get into your operating system to get the keys which might not always be easy. They could theoretically MITM the KVM terminal when you put in your disk decryption keys but that seems unlikely.
The Soccer rightsholders - LaLiga - claim more than 50% of pirate IPs illegally distributing its content are protected by Cloudflare. Many were using an application called DuckVision to facilitate this streaming.
Telefónica, the ISP, upon realizing they couldn’t directly block DuckVision’s IP or identify its users, decided on a drastic solution: blocking entire IP ranges belonging to Cloudflare, which continues to affect a huge number of services that had nothing to do with soccer piracy.
https://pabloyglesias.medium.com/telef%C3%B3nicas-cloudflare...
https://www.broadbandtvnews.com/2025/02/19/cloudflare-takes-...
https://community.cloudflare.com/t/spain-providers-blocks-cl...
Seriously though I do think we are going to see increasing interest in alternative nets, especially as governments tighten their control over the internet or even break away into isolated nation nets.
Think private trackers. The opposite of 4chan, which is an "alternative" that got too influential in setting the tone of the rest of the internet.
Tor even more so, the power of Tor is that the more people use it: the stronger it becomes to centralised adversaries.
The main issue with Tor is the performance of it though.
I host IRC on a hidden service, and even Facebook (lol) offers a hidden service endpoint.
All that is needed is for a critical mass of people and a decent index: and we successfully have reinvented "the wired" from Serial Experiments: Lain
Basically we're already past the point where the web is made for actual humans, now it's made for bots.
It has, scrapers are out of control. Anubis and its ilk are a desperate measure, and some fallout is expected. And you don't get to dictate how a non-commercial site tries to avoid throttling and/or bandwidth overage bills.
That seems to be a pretty effective way for now to keep scrapers, spammers and other abusive behavior away. Normal users don't do certain site actions at the speed that scraper bots do, there's no other practically relevant search engine than Google, I've never ever seen an abusive bot hide as wget (they all try to emulate looking like a human operated web browser), and no AI agent yet is smart enough to figure out how to interpret the message "Your ISP's network appears to have been used by bot activity. Please write an email to xxx@yyy.zzz with <ABC> as the subject line (or click on this pre-filled link) and you will automatically get unblocked".
[1] https://developers.google.com/search/docs/crawling-indexing/...
How would you know when you have already banned them.
Unless you're paying Cloudflare a LOT of money, you won't get to talk with anyone who can or will do anything about issues. They know about their issues and simply don't care.
If you don't mind taking a few minutes, perhaps put some details about your setup in a bug report?
(Not to mention all the sites which started putting country restrictions in on their generally useful instruction articles etc — argh)
You might have to show a passport when you enter France, and have your baggage and person (intrusively) scanned if you fly there, for much the same reason.
People, some of them in positions of government in some nation states want to cause harm to the services of other states. Cloudflare was probably the easiest tradeoff for balancing security of the service with accessibility and cost to the French/Parisian taxpayer.
Not that I'm happy about any of this, but I can understand it.
It is easy to pass the challange, but it isn't any better than Anubis.
We should repeat this until every network is cloudflared and everyone hates cloudflare and cloudflare loses all its customers and goes bankrupt. The internet would be better for it.
If you fall in the other 1% (e.g. due to using unusual browsers or specific IP ranges), cloudflare tends to be much worse
The article doesn't say and I constantly get the most difficult Google captchas, cloudflare block pages saying "having trouble?" (which is a link to submit a ticket that seems to land in /dev/null), IP blocks because user agent spoofing, errors "unsupported browser" when I don't do user agent spoofing... the only anti-bot thing that reliably works on all my clients is Anubis. I'm really wondering what kinds of false positives you think Anubis has, since (as far as I can tell) it's a completely open and deterministic algorithm that just lets you in if you solve the challenge, and as the author of the article demonstrated with some C code (if you don't want to run the included JavaScript that does it for you), that works even if you are a bot. And afaik that's the point: no heuristics and false positives but a straight game of costs; making bad scraping behavior simply cost more than implementing caching correctly or using commoncrawl
As a legitimate open source developer and contributor to buildroot, I've had no recourse besides trying other browsers, networks, and machines, and it's triggered on several combinations.
[1]: https://anubis.techaro.lol/blog/release/v1.20.0/#chrome-wont...
I'm curious how, though, since the submitted article doesn't mention that and demonstrates curl working (which is about as low as you can go on the browser emulation front), but no time to look into it atm. Maybe it's because of an option or module that the author didn't have enabled
But that's not what Cloudflare does. Cloudflare guesses whether you are a bot and then either blocks you or not. If it currently likes you, bless your luck
Until the moment someone will figure out the generation of realistic enough 3d faces.
Reminds me of the old uwu error message meme.
I think it's reasonable and fair, and something you are expected to tolerate in a free world. In fact, I think it's rather unusual to take this benign and inconsequential thing as personal as you do.
Thinking about it logically, putting some "serious" banner there would just make everything a bit more grey and boring and would make no functional difference. So why is it disliked so much?
I personally find anime kind of cringe but that's just a matter of taste.
> Thinking about it logically
This isn't about logic.
Clearly you proved that. What has sexual connotations is wildly subjective and plucking the opinion of one author/poet's critique from 15 years ago doesn't make it fact today.
There's nothing wrong with "subjective", by the way. You seem to think it discredits something (can't say what exactly), but this topic is subjective. It's not about logic (as if anything outside maths ever is).
(Even if I agree that the boss or customers should just get over it. It's not like they're drawing genitalia on screen and it's also easily explainable if they don't already know it themselves.)
But at the end of the day both are shit and we should not accept either. That includes not using one as an excuse for the other.
Also, Anubis does have a non-JS mode: the HTML header meta-refresh based challenge. It's just that the type of people who use Cloudflare or Anubis almost always just deploy the default (mostly broken) configs that block as many human people as bots. And they never realize it because they only measure such things with javascript.
I dislike those even more.
If that's true Anubis should just remove the proof-of-work part, so legitimate human visitors don't have to stare at a loading screen for several seconds while their device wastes electricity.
This is my very strong belief. To make it even clearer how absurd the present situation is, every single one of the proof-of-work systems I’ve looked at has been using SHA-256, which is basically the worst choice possible.
Proof-of-work is bad rate limiting which depends on a level playing field between real users and attackers. This is already a doomed endeavour. Using SHA-256 just makes it more obvious: there’s an asymmetry factor in the order of tens of thousands between common real-user hardware and software, and pretty easy attacker hardware and software. You cannot bridge such a divide. If you allow the attacker to augment it with a Bitcoin mining rig, the efficiency disparity factor can go up to tens of millions.
These proof-of-work systems are only working because attackers haven’t tried yet. And as long as attackers aren’t trying, you can settle for something much simpler and more transparent.
If they were serious about the proof-of-work being the defence, they’d at least have started with something like Argon2d.
I'll just quote from their blog post from January.
https://xeiaso.net/blog/2025/anubis/
Anubis also relies on modern web browser features:
- ES6 modules to load the client-side code and the proof-of-work challenge code.
- Web Workers to run the proof-of-work challenge in a separate thread to avoid blocking the UI thread.
- Fetch API to communicate with the Anubis server.
- Web Cryptography API to generate the proof-of-work challenge.
This ensures that browsers are decently modern in order to combat most known scrapers. It's not perfect, but it's a good start.
This will also lock out users who have JavaScript disabled, prevent your server from being indexed in search engines, require users to have HTTP cookies enabled, and require users to spend time solving the proof-of-work challenge.
This does mean that users using text-only browsers or older machines where they are unable to update their browser will be locked out of services protected by Anubis. This is a tradeoff that I am not happy about, but it is the world we live in now.
This makes zero sense, this is simply the wrong approach. Already tired of saying so and been attacked. So I'm glad professional-random-Internet-bullshit-ignorer Tavis Ormandy wrote this one.
For usability reasons Anubus only requires that you to go trough a the proof of work flow only once in a given period. (I think the default is once per week.) That's just very little work.
Detecting you need to occasionally send a request trough a headless browser far more of a hassle than the PoW. If you prefer LLMs rather than normal internet search, it'll probably consume far more compute as well.
If you keep cookies. I do not want to keep cookies for otherwise "stateless" sites. I have maybe a dozen sites whitelisted, every other site loses cookies when I close the tab.
I often think about quitting tech myself, but becoming a full-time lumberjack is certainly not an alternative for me.
But man, if tech goes straight into cyberpunk dystopia but without the cool gadgets, maybe it is the better alternative.
I tried the captcha in their login page and it made the entire page, including the puzzle piece slider, run at 2 fps.
My god, we do really live in 2025.
1) scrapers just run a full browser and wait for the page to stabilize. They did this before this thing launched, so it probably never worked.
2) The AI reading the page needs something like 5 seconds * 1600W to process it. Assuming my phone can even perform that much compute as efficiently as a server class machine, it’d take a large multiple of five seconds to do it, and get stupid hot in the process.
Note that (2) holds even if the AI is doing something smart like batch processing 10-ish articles at once.
Yes. Obviously dumb but also nearly 100% successful at the current point in time.
And likely going to stay successful as the non-protected internet still provides enough information to dumb crawlers that it’s not financially worth it to even vibe-code a workaround.
Or in other words: Anubis may be dumb, but the average crawler that completely exhausting some sites resources is even dumber.
And so it all works out.
And so the question remains: how dumb was it exactly, when it works so well and continues to work so well?
Only if you don't care about negatively affecting real users.
I’m not convinced by that makes sense.
Now ideally you would have the resources to serve all users and all the AI bots without performance degradation, but for some projects that’s not feasible.
In the end it’s all a compromise.
regarding authentication mentioned elsewhere, passing cookies is no big deal.
https://dukespace.lib.duke.edu/server/api/core/bitstreams/81...
And of all the high-profile projects implementing it, like the LKML archives, none have backed down yet, so I’m assuming the initial improvement in numbers must continue or it would have been removed since
if you want to save some $$$ you can spend like 30 minutes making a cracker like in the article. just make it multi threaded, add a queue and boom, your scraper nodes can go back to their cheap configuration. or since these are AI orgs we're talking about, write a gpu cracker and laugh as it solves challenges far faster than any user could.
custom solutions aren't worth it for individual sites, but with how widespread anubis is it's become worth it.
And frankly processing a single page of text is run within a single token window so likely is run for a blink (ms) before moving onto the next data entry. The kicker is it's run over potentially thousands of times depending on your training strategy.
At inference there's now a dedicated tool that may perform a "live" request to scrape the site contents. But then this is just pushed into a massive context window to give the next token anyway.
In reality you can do maybe a 1/10000th of that before the latency hit to real users becomes unacceptable.
And then, the cost is not per page. The cost is per cookie. Even if the cookie is rate-limited, you could easily use it for 1000 downloads.
Those two errors are multiplicative, so your numbers are probably off by about 7 orders of magnitudes. The cost of the PoW is not going to be $2B, but about $200.
That's the opposite of being dismissive. The author has taken the time to deeply understand both the problem and the proposed solution, and has taken the time to construct a well-researched and well-considered argument.
This is a confusing comment because it appears you don’t understand the well-written critique in the linked blog post.
> This is like those simple things on submission forms that ask you what 7 + 2 is. Of course everyone knows that a crawler can calculate that! But it takes a human some time and work to tell the crawler HOW.
The key point in the blog post is that it’s the inverse of a CAPTCHA: The proof of work requirement is solved by the computer automatically.
You don’t have to teach a computer how to solve this proof of work because it’s designed for the computer to solve the proof of work.
It makes the crawling process more expensive because it has to actually run scripts on the page (or hardcode a workaround for specific versions) but from a computational perspective that’s actually easier and far more deterministic than trying to have AI solve visual CAPTCHA challenges.
The question is if this is the sweet spot, and I can't find anyone doing the comparative study (how many annoyed human visitors, how many humans stopped and, obviously, how many bots stopped).
Most CAPTCHAs are invisible these days, and Anubis is worse than them. Also, CAPTCHAs are not normally deployed just for visiting a site, they are mostly used when you want to submit something.
FTR, I am mostly browsing from Serbia using Firefox browser on a Linux or MacOS machine.
FWIW, I've never been stopped by Anubis, so even if it's much more rarely implemented, that's still infinitely less than 5-10 captchas a day I do see regularly. I do agree it's still different scales, but I don't trust your gut feel either. Thus a suggestion to look for a study.
And if the new style of Captchas is then like this one it's much more disturbing.
Not until they get issued government IDs they won't!
Extrapolating from current trends, some form of online ID attestation (likely based on government-issued ID[1]) will become normal in the next decade, and naturally, this will be included in the anti-bot arsenal. It will be up to the site operator to trust identities signed by the Russian government.
1. Despite what Sam Altman's eyeball company will try to sell you, government registers will always be the anchor of trust for proof-of-identity, they've been doing it for centuries and have become good at it and have earned the goodwill.
We can't just have "send me a picture of your ID" because that is pointlessly easy to spoof - just copy someone else's ID.
So there must be some verification that you, the person at the keyboard, is the same person as that ID identifies. The UK is rapidly finding out that that is extremely difficult to do reliably. Video doesn't really work reliably on all cases, and still images are too easily spoofed. It's not really surprising, though, because identifying humans reliably is hard even for humans.
If we do it at the network level - like assigning a government-issued network connection to a specific individual, so the system knows that any traffic from a given IP address belongs to that specific individual. There are obvious problems with this model, not least that IP addresses were never designed for this, and spoofing an IP becomes identity theft.
We also do need bot access for things, so there must be some method of granting access to bots.
I think that to make this work, we'd need to re-architect the internet from the ground up. To get there, I don't think we can start from here.
Various things you're not thinking of:
- "The person at the keyboard, is the same person as that ID identifies" is a high expectation, and can probably be avoided—you just need verifiable credentials and you gotta trust they're not spoofed
- Many official government IDs are digital now
- Most architectures for solving this problem involve bundling multiple identity "attestations," so proof of personhood would ultimately be a gradient. (This does, admittedly, seem complicated though ... but World is already doing it, and there are many examples of services where providing additional information confers additional trust. Blue checkmarks to name the most obvious one.)
As for what it might look like to start from the ground up and solve this problem, https://urbit.org/, for all its flaws, is the only serious attempt I know of and proves it's possible in principle, though perhaps not in practice
Why isn't it necessary to prove that the person at the keyboard is the person in the ID? That seems like the minimum bar for entry to this problem. Otherwise we can automate the ID checks and the bots can identify as humans no problem.
And how come the UK is failing so badly at this?
In fact, Japan already has this in the form of "My Number Card". You go to a webpage, the webpage says "scan this QR code, touch your phone to your ID card, and type in your pin code", and doing that is enough to prove the the website that you're a human. You can choose to share name/birthday/address, and it's possible to only share a subset.
Robots do not get issued these cards. The government verifies your human-ness when they issue them. Any site can use this system, not just government sites.
Is discrimination against dwarves still a thing in Germany?
I don’t know of any real world example that queries height, I mentioned it because it is part of the data set and privacy-preserving queries are technically possible. Age restrictions are the obvious example, but even there I am not aware of any commercial use, only for government services like tax filing or organ donor registry. Also, nobody really measures your height, you just tell them what to put there when you get the ID. Not so for birth dates, which they take from previous records going back to the birth certificate.
If you think this sounds suspiciously close the what businesses do with KYC, Know Your Customer, you're correct!
Without that, anyone can pretend to be their dead grandma/murder victim, or someone whose ID they stole.
> the person at the keyboard, is the same person as that ID identifies
This won't be possible to verify - you could lend your ID out to bots but that would come at the risk of being detected and blanket banned from the internet.
UK is in this weird place where there isn't one kind of ID that everyone has - for most people it's the driving licence, but obviously that's not good enough. But my general point is that UK could just look over at how other countries are doing it and copy good solutions to this problem, instead of whatever nonsense is being done right now with the age verification process being entirely outsourced to private companies.
As a Brit I personally went through a phase of not really existing — no credit card, no driving licence, expired passport - so I know how annoying this can be.
But it’s worth noting that we have this situation not because of mismanagement or technical illiteracy or incompetence but because of a pretty ingrained (centuries old) political and cultural belief that the police shouldn’t be able to ask you “papers please”. We had ID cards in World War II, everyone found them egregious and they were scrapped. It really will be discussed in those terms each time it is mentioned, and it really does come down to this original aspect of policing by consent.
So the age verification thing is running up against this lack of a pervasive ID, various KYC situations also do, we can get an ID card to satisfy verification for in-person voting if we have no others, but it is not proof of identity anywhere else, etc.
It is frustrating to people who do not have that same cultural touchstone but the “no to ID” attitude is very very normal; generally the UK prefers this idea of contextual, rather than universal ID. It’s a deliberate design choice.
Their website lists 24 supported countries (including some non-EU like UK and Norway, and missing a few of the 27 EU countries) - https://www.itsme-id.com/en-GB/coverage
But does it actually have much use outside of Belgium?
Certainly in the UK I've never come across anyone, government or private business, mentioning it - even since the law passed requiring many sites to verify that visitors are adults. I wouldn't even be familiar with the name if I hadn't learned about its being used in Belgium.
Maybe some other countries are now using it, beyond just Belgium?
Yes, you can in theory still use your ID card with a usb cardreader for accessing gov services, but good luck finding up to date drivers for your OS or use a mobile etc.
For CSAM, also AFAIK, first 'activation' includes a visit to your local municipality to verify your identity. Unless you go via itsme, as it is and authorized CSAM key holder.
Actually, we can if we collectively decide that we should have them. Refuse to use sites that require these technologies and demand governments to solve the issue in better ways, e.g. by ensuring there are legal consequences for abusive corporations.
Manually browsing the web yourself will probably be trickier moving forward though.
Silly you, joking around like that. Can you imagine owning a toaster?! Sooo inconvenient and unproductive! Guess, if you change your housing plan, you gonna bring it along like an infectious tick? Hahah — no thank you! :D
You will own nothing and you will be happy!
(Please be reminded, failing behavioral compliance with, and/or voicing disapproval of this important moral precept, jokingly or not, is in violation of your citizenship subscription's general terms and conditions. This incident will be reported. Customer services will assist you within 48 hours. Please, do not leave your base zone until this issue has been resolved to your satisfaction.)
1. the government knowing who you are authenticating yourself to
2. or the recipient learning anything but the fact that you are a human
3. or the recipient being able to link you to a previous session if you authenticate yourself again later
The EU is trying to build such a scheme for online age verification (I'm not sure if their scheme also extends to point 3 though. Probably?).
I get it for age verification: it is difficult for a child to get a token that says they are allowed to access porn because adults around them don't want them to access porn (and even though one could sell tokens online, it effectively makes it harder to access porn as a child).
But how does it prevent someone from using their ID to get tokens for their scrapper? If it's anonymous, then there is no risk in doing it, is there?
The service then links the token to your account and uses ordinary detection measures to see if you're spamming, flooding, phishing, whatever. If you do, the token gets blacklisted and you can no longer sign on to that service.
This isn't foolproof - you could still bribe random people on the street to be men/mules in the middle and do your flooding through them - but it's much harder than just spinning up ten thousand bots on a residential proxy.
The whole point is to prevent a robot from accessing the API. If you want to detect the robot based on its activity, you don't need to bother humans with the token in the first place: just monitor the activity.
Edit: ok I see the argument that the feedback mechanism could be difficult when all the website can report is "hey, you don't know me but this dude from request xyz you just authenticated fucked all my shit up". But at the end of the day, privacy preservation is an implementation detail I don't see governments guaranteeing.
Sure, I totally see how you can prevent unwanted activity by identifying the users. My question was about the privacy-preserving way. I just don't see how that would be possible.
You (Y) generate a keypair and send your public key to the the attesting authority A, and keep your private key. You get a certificate.
You visit site b.com, and it asks for your identity, so you hash b.com|yourprivatekey. You submit the hash to b.com, along with a ZKP that you possess a private key that makes the hash work out, and that the private key corresponds to the public key in the certificate, and that the certificate has a valid signature from A.
If you break the rules of b.com, b.com bans your hash. Also, they set a hard rate limit on how many requests per hash are allowed. You could technically sell your hash and proof, but a scraper would need to buy up lots of them to do scraping.
Now the downside is that if you go to A and say your private key was compromised, or you lost control of it - the answer has to be tough luck. In reality, the certificates would expire after a while, so you could get a new hash every 6 months or something (and circumvent the bans), and if you lost the key, you'd need to wait out the expiry. The alternative is a scheme where you and A share a secret key - but then they can calculate your hash and conspire with b.com to unmask you.
And then of course, if you need millions of certificates because b.com keeps banning you, it means that they ban you based on your activity, not based on your lack of certificate. And in that case, it feels like the certificate is useless in the first place: b.com has to monitor and ban you already.
Or am I missing something?
This will always end with live video of the person requesting to log in to provide proof of life at the very least, and if they're lazy/want more data, they'll tie in their ID verification process to their video pipeline.
It's a solution to the "grandma died but we've been collecting her Social Security benefits anyway", or "my son stole my wallet with my ID & credit card", or (god forbid) "We incapacitated/killed this person to access their bank account using facial ID".
It's also a solution to the problem advertisers, investors and platforms face of 1) wanting huge piles of video training data for free and 2) determining that a user truly is a monetizable human being and not a freeloader bot using stolen/sold credentials.
Well that's your assumption about governments, but it doesn't have to be true. There are governments that don't try to exploit their people. The question is whether such governments can have technical solutions to achieve that or not (I'm genuinely interested in understanding whether or not it's technically feasible).
I wouldn't expect the abuse rate to be higher than what it is for chip-and-pin debit cards. PKI failure modes are well understood and there are mitigations galore.
I wonder if they'd actually honor 1 instead of forcing recipients to be registered, as presumably they'd be interested in tracking user activity.
Mostly, it will because online identifies will be a market for lemons: there will be so many fake/expired/revoked identities being sold that the value of each one will be worth pennies, and that's not commensurate with the risk of someone commiting crimes and linking it to your government-registered identity.
If you sell your real-world identity to other people today, and they get arrested, then the police will know your identity (obviously). How does that work with a privacy-preserving scheme? If you sell your anonymous token that says that you are a human to a machine and the machine gets arrested, then the police won't be able to know who you are, right? That was the whole point of the privacy-preserving token.
I'm genuinely interested, I don't understand how it can work technically and be privacy-preserving.
With privacy preserving cryptography the tokens are standalone and have no ties to the identity that spawned them.
No enforcement for abuse is possible.
I suspect there will be different levels of attestations from the anonymous ("this is an adult"), to semi-anonymous ("this person was born in 20YY and resides in administrative region XYZ") to the compete record ("This is John Quincy Smith III born on YYYY-MM-DD with ID doc number ABC123"). Somewhere in between the extremes is an pseudonymous token that's strongly tied to a single identity with non-repudiation.
Anonymous identities that can be easily churned out on demand by end-users have zero antibot utility
1. They can do it overtly in thr design of the system, or covertly via side-channels, logging, or leaking bits in ways that are hard for an outsider to investigate without access to the complete source code and or/system outputs, such as not-quite-random pseudo-randoms.
or has it leaked somehow.
https://world.org/blog/announcements/new-world-id-passport-c...
I believe this is likely, and implemented in the right way, I think it will be a good thing.
A zero-knowledge way of attesting persistent pseudonymous identity would solve a lot of problems. If the government doesn’t know who you are attesting to, the service doesn’t know your real identity, services can’t correlate users, and a service always sees the same identity, then this is about as privacy-preserving as you can get with huge upside.
A social media site can ban an abusive user without them being able to simply register a new account. One person cannot operate tens of thousands of bot profiles. Crawlers can be banned once. Spammers can be locked out of email.
This is an absolutely gargantuan-sized antifeature that would single-handedly drive me out of the parts of the internet that choose to embrace this hellish tech.
The alternative is that you think people should be able to use social media platforms in ways that violate their rules, and that the platforms should not be able to refuse service to these users. I don’t think that’s a justifiable position to take, but I’m open to hearing an argument for it. Simply calling it “hellish” isn’t an argument.
And can you clarify if your position accounts for spammers? Because as far as I can see, your position is very clearly “spammers should be allowed to spam”.
Of course in the ideal world all bans would be handed out correctly, be of a justified duration, and offer due process to those banned. We don't live in that world, the incentive is emphatically NOT to handle appeals fairly and understandably. Getting truly permanently banned on a major platform can be a life changing experience.
In reality users can generally get away with signing up new accounts, but new users will be marked somehow and/or limited (e.g. green names on HN) and get extra scrutiny, and sign-ups will have friction and limits to let it not scale up to mass spammer scale. The rest is handled manually by moderation staff.
The limits to moderator power are a feature that compensates for the limits to moderator competence.
why would a government do that though? the alternative is easier and gives it more of what it wants.
Does your definition of 'privacy-preserving' distrust Google, Apple, Xiaomi, HTC, Honor, Samsung and suchlike?
Do you also distrust third-party clowns like experian and equifax (whose current systems have gaping security holes) and distrust large government IT projects (which are outsourced to clowns like Fujutsu who don't know what they're doing) ??
Do you require it to work on all devices, including outdated phones and tablets; PCs; Linux-only devices; other networked devices like smart lightbulbs; and so on? Does it have to work in places phones aren't allowed, or mobile data/bluetooth isn't available? Does the identity card have to be as thin, flexible, durable and cheap as a credit card, precluding any built-in fingerprint sensors and suchlike?
Does the age validation have to protect against an 18-year-old passing the age check on their 16-year-old friend's account? While also being privacy-preserving enough nobody can tell the two accounts were approved with the same ID card?
Does the system also have to work on websites without user accounts, because who the hell creates a pornhub account anyway?
Does the system need to work without the government approving individual websites' access to the system? Does it also need to be support proving things like name, nationality, and right to work in the country so people can apply for bank accounts and jobs online? And yet does it need to prevent sites from requiring names just for ad targeting purposes?
Do all approvals have to be provable, so every company can prove to the government that the checks were properly carried out at the right time? Does it have to be possible to revoke cards in a timely manner, but without maintaining a huge list of revoked cards, and without every visit to a porn site triggering a call to a government server for a revocation check?
If you want to accomplish all of these goals - you're going to have a tough time.
I can easily imagine having a way to prove my age in a privacy-preserving way: a trusted party knows that I am 18+ and gives me a token that proves that I am 18+ without divulging anything else. I take that token and pass it to the website that requires me to be 18+. The website knows nothing about me other than I have a token that says I am 18+.
Of course, I can get a token and then give it to a child. Just like I can buy cigarettes and give them to a child. But the age verification helps in that I don't want children to access cigarettes, so I won't do it.
The "you are a human" verification fundamentally doesn't work, because the humans who make the bots are not aligned with the objective of the verification. If it's privacy-preserving, it means that a human can get a token, feed it to their bot and call it a day. And nobody will know who gave the token to the bot, precisely because it is privacy-preserving.
More specifically, I do not know if a privacy preserving method exists. This is different from thinking that it doesn't exist.
I don't know where you live, but in my case, many. Beginning with the fact that I can buy groceries with cash.
Many e-IDs in many countries?
This is privacy-preserving and modern.
If we move to a model where the token is permanently tied to your identity, there might be an incentive for you not to risk your token being added to a blocklist. But there's no shortage of people who need a bit of extra cash and for whom it's not a bad trade. So there will be a nearly-endless supply of "burner" tokens for use by trolls, scammers, evil crawlers, etc.
Frankly it's something I'm sad we don't yet see a lawsuit for similar to the times v OpenAI. A lot of "new crawlers" claim to innocently forget about established standards like robots.txt
I just wish people would name and shame the massive companies at the top stomping on the rest of the internet in an edge to "get a step up over the competition".
I understand and agree with what you are saying though, the cat and mouse is not necessarily technical. Part of solving the searchbot issue was also social, with things like robots.txt being a social contract between companies and websites, not a technical one.
It might be a tool in the box. But it’s still cat and mouse.
In my place we quickly concluded the scrapers have tons of compute and the “proof-of-work” aspect was meaningless to them. It’s simply the “response from site changed, need to change our scraping code” aspect that helps.
That's what I was hoping to get from the "Numbers" section.
I generally don't look up the logs or numbers on my tiny, personal web spaces hosted on my server, and I imagine I could, at some point, become the victim of aggressive crawling (or maybe I have without noticing because I've got an oversized server on a dual link connection).
But the numbers actually only show the performance of doing the PoW, not the effect it has had on any site — I am just curious, and I'd love it if someone has done the analysis, ideally grouped by the bot type ("OpenAI bot was responsible for 17% of all requests, this got reduced from 900k requests a day to 0 a day"...). Search, unfortunately, only gives me all the "Anubis is helping fight aggressive crawling" blog articles, nothing with substance (I haven't tried hard, I admit).
Edit: from further down the thread there's https://dukespace.lib.duke.edu/server/api/core/bitstreams/81... but no analysis of how many real customers were denied — more data would be even better
Yes, for these human-based challenges. But this challenge is defined in code. It's not like crawlers don't run JavaScript. It's 2025, they all use headless browsers, not curl.
The mascot artist wrote in here in another thread about the design philosophies, and they are IMO a lot more honorable in comparison (to BigCo).
Besides, it's MIT FOSS. Can't a site operator shoehorn in their own image if they were so inclined?
negative signaling works!
I look forward for this to be taken to the logical extreme when a niche subculture of internet nerds change their entire online persona to revolve around scat pornography to spite "the normals", I'm sure they'll be remembered fondly as witty and intelligent and not at all as mentally ill young people.
Would I do that again? Probably not. These days I’d require a weekly mDL or equivalent credential presentation.
I have to disagree that an anti-bot measure that only works globally for a few weeks until bots trivially bypass it is effective. In an arms race against bots the bots win. You have to outsmart them by challenging them to do something that only a human can do or is actually prohibitively expensive for bots to do at scale. Anubis doesn't pass that test. And now it’s littered everywhere defunct and useless.
Yes, but the fundamental problem is that the AI crawler does the same amount of work as a legitimate user, not more.
So if you design the work such that it takes five seconds on a five year old smartphone, it could inconvenience a large portion of your user base. But once that scheme is understood by the crawler, it will delay the start of their aggressive crawling by... well-under five seconds.
An open source javascript challenge as a crawler blocker may work until it gets large enough for crawlers to care, but then they just have an engineer subscribe to changes on GitHub and have new challenge algorithms implemented before the majority of the deployment base migrates.
Sure the program itself is jank in multiple ways but it solves the problem well enough.
For some sites Anubis might be fitting, but it should be mindfully deployed.
- Everything is pwned
- Security through obscurity is bad
Without taking to heart:
- What a threat model is
And settle on a kind of permanent contrarian nihilist doomerism.
Why eat greens? You'll die one day anyway.
It's quite an interesting piece, I feel like you projected something completely different onto it.
Your point is valid, but completely adjacent.
Consider:
An adaptive password hash like bcrypt or Argon2 uses a work function to apply asymmetric costs to adversaries (attackers who don't know the real password). Both users and attackers have to apply the work function, but the user gets ~constant value for it (they know the password, so to a first approx. they only have to call it once). Attackers have to iterate the function, potentially indefinitely, in the limit obtaining 0 reward for infinite cost.
A blockchain cryptocurrency uses a work function principally as a synchronization mechanism. The work function itself doesn't have a meaningfully separate adversary. Everyone obtains the same value (the expected value of attempting to solve the next round of the block commitment puzzle) for each application of the work function. And note in this scenario most of the value returned from the work function goes to a small, centralized group of highly-capitalized specialists.
A proof-of-work-based antiabuse system wants to function the way a password hash functions. You want to define an adversary and then find a way to incur asymmetric costs on them, so that the adversary gets minimal value compared to legitimate users.
And this is in fact how proof-of-work-based antispam systems function: the value of sending a single spam message is so low that the EV of applying the work function is negative.
But here we're talking about a system where legitimate users (human browsers) and scrapers get the same value for every application of the work function. The cost:value ratio is unchanged; it's just that everything is more expensive for everybody. You're getting the worst of both worlds: user-visible costs and a system that favors large centralized well-capitalized clients.
There are antiabuse systems that do incur asymmetric costs on automated users. Youtube had (has?) one. Rather than simply attaching a constant extra cost for every request, it instead delivered a VM (through JS) to browsers, and programs for that VM. The VM and its programs were deliberately hard to reverse, and changed regularly. Part of their purpose was to verify, through a bunch of fussy side channels, that they were actually running on real browsers. Every time Youtube changed the VM, the bots had to do large amounts of new reversing work to keep up, but normal users didn't.
This is also how the Blu-Ray BD+ system worked.
The term of art for these systems is "content protection", which is what I think Anubis actually wants to be, but really isn't (yet?).
The problem with "this is good because none of the scrapers even bother to do this POW yet" is that you don't need an annoying POW to get that value! You could just write a mildly complicated Javascript function, or do an automated captcha.
According to whom or what data exactly?
AI operators are clearly well-funded operations and the amount of electricity and CPU power is negligible. Software like Anubis and nearly all its identical predecessors grant you access after a single "proof". So you then have free reign to scrape the whole site.
The best physical analogy are those shopping cart things where you have to insert a quarter to unlock the cart, and you presumably get it back when you return the cart.
The group of people this doesn't affect are the well-funded, a quarter is a small price to pay for leaving your cart in the middle of the parking lot.
Those that suffer the most are the ones that can't find a quarter in the cupholder so you're stuck filling your arms with groceries.
Would you be richer if they didn't charge you a quarter? (For these anti-bot tools you're paying the electric company, not the site owner.). Maybe. But if you're Scrooge McDuck who is counting?
The modern version of Anubis as of PR https://github.com/TecharoHQ/anubis/pull/749 uses a different flow. Minting a challenge generates state including 64 bytes of random data. This random data is sent to the client and used on the server side in order to validate challenge solutions.
The core problem here is that kernel.org isn't upgrading their version of Anubis as it's released. I suspect this means they're also vulnerable to GHSA-jhjj-2g64-px7c.
I think that's the valuable observation in this post. Tavis can tell me I'm wrong. :)
Based on my own experience fighting these AI scrappers, I feel that the way they are actually implemented makes it that in practice there is asymmetry in the work scrappers have to do vs humans.
The pattern these scrappers follow is that they are highly distributed. I’ll see a given {ip, UA} pair make a request to /foo immediately followed by _hundreds_ of requests from completely different {ip, UA} pairs to all the links from that page (ie: /foo/a, /foo/b, /foo/c, etc..).
This is a big part of what makes these AI crawlers such a challenge for us admins. There isn’t a whole lot we can do to apply regular rate limiting techniques: the IPs are always changing and are no longer limited to corporate ASN (I’m now seeing IPs belonging to consumer ISPs and even cell phone companies), and the User Agents all look genuine. But when looking through the logs you can see the pattern that all these unrelated requests are actually working together to perform a BFS traversal of your site.
Given this pattern, I believe that’s what makes the Anubis approach actually work well in practice. For a given user, they will encounter the challenge once when accessing the site the first time, then they’ll be able to navigate through it without incurring any cost. While the AI scrappers would need to solve the challenge for every single one of their “nodes” (or whatever it is they would call their {ip, UA} pairs). From a site reliability perspective, I don’t even care if the crawlers manage to solve the challenge or not. That it manages to slow them down enough to rate limit them as a network is enough.
To be clear: I don’t disagree with you that the cost incurred by regular human users is still high. But I don’t think it’s fair to say that this is not a situation in which the cost to the adversary is not asymmetrical. It wouldn’t be if the AI crawlers hadn’t converged towards an implementation that behaves as a DDOS botnet.
botPain = nBotRequests * cpuWorkPerRequest * dollarsPerCpuSecond
humanPain = c_1 * max(elapsedTimePerRequest) + c_2 * avg(elapsedTimePerRequest)
The article points out that the botPain Anubis currently generates is unfortunately much too low to hit any realistic threshold. But if the cost model I've suggested above is in any way realistic, then useful improvements would include:
1. More frequent but less taxing computation demands (this assumes c_1 >> c_2)
2. Parallel computation (this improves the human experience with no effect for bots)
ETA: Concretely, regarding (1), I would tolerate 500ms lag on every page load (meaning forget about the 7-day cookie), and wouldn't notice 250ms.
Again, with Hashcash, this isn't how it works: most outbound spam messages are worthless. The point of the system is to exploit the negative exponent on the attacker's value function.
The human-labor cost of working around Anubis is unlikely to be paid unless it affects enough data to be worth dedicating time to, and the data they're trying to scrape can typically be obtained "respectfully" in those cases -- instead of hitting the git blame route on every file of every commit of every repo, just clone the repos and run it locally, etc.
I claim that the cost for the two classes of user are meaningfully different: bots care exclusively about the total CPU usage, while humans care about some subjective combination of average and worst-case elapsed times on page loads. Because the sheer number of requests done by bots is so much higher, there's an opportunity to hurt them disproportionately according to their cost model by tweaking Anubis to increase the frequency of checks but decrease each check's elapsed time below the threshold of human annoyance.
No, that's missing the point. Anubis is effectively a DDoS protection system, all the talking about AI bots comes from the fact that the latest wave of DDoS attacks was initiated by AI scrapers, whether intentionally or not.
If these bots would clone git repos instead of unleashing the hordes of dumbest bots on Earth pretending to be thousands and thousands of users browsing through git blame web UI, there would be no need for Anubis.
The only "justification" there would be is that it keeps the server online that struggled under load before deploying it. That's the whole reason why major FLOSS projects and code forges have deployed Anubis. Nobody cares about bots downloading FLOSS code or kernel mailing lists archives; they care about keeping their infrastructure running and whether it's being DDoSed or not.
(and frankly, it likely will only need to work until the bubble bursts, making "the long run" irrelevant)
Now I get why people are so weirdly being dismissive about the whole thing. Good luck, it's not going to "burst" any time soon.
Or rather, a "burst" would not change the world in the direction you want it to be.
As soon as the investment boom is over, this will be largely gone. LLMs will continue to be trained and data will continue to be scraped, but that alone isn't the problem. Search engine crawlers somehow manage not to DDoS the servers they pull the data from, competent AI scrapers can do the same. In fact, a competent AI scraper wouldn't even be stopped by Anubis as it is right now at all, and yet Anubis works pretty well in practice. Go figure.
That depends on what you count as normal users though. Users that want to use alternative players also have to deal with this and since yt-dlp and youtube-dl before have been able to provide a solution for those user and bots can just do the same I'm not sure if I'd call the scheme successful in any way.
(Also, note the difference between using JavaScript for display logic and requiring JavaScript to load any content at all. Most websites do the first, the second isn't quite as common.)
It’s ineffective. (And furry sex-subculture propaganda pushed by its author, which is out of place in such software.)
if your first thought when seeing a catgirl is sex, i got bad news for you
Any access will fall into either of the following categories:
- client with JS and cookies. In this case the server now has an identity to apply rate limiting to, from the cookie. Humans should never hit it, but crawlers will be slowed down immensely or ejected. Of course the identity can be rotated — at the cost of solving the puzzle again.
- amnesiac (no cookies) clients with JS. Each access is now expensive.
(- no JS - no access.)
The point is to prevent parallel crawling and overloading the server. Crawlers can still start an arbitrary number of parallel crawls, but each one costs to start and needs to stay below some rate limit. Previously, the server would collapse under thousands of crawler requests per second. That is what Anubis is making prohibitively expensive.
I think TFA is generally quite good and has something of a good point about the economics of the situation, but finding the math shake out that way should, perhaps, lead one to question their starting point / assumptions[1].
In other words, who said the websites in question wanted to entirely prevent crawlers from accessing them? The answer is: no one. Web crawlers are and have been fundamental to accessing the web for decades. So why are we talking about trying to do that?
[0] Mentioning 'impenetrable wall' is probably setting off alarm bells, because of course that would be a bad design.
[1] (Edited to add:) I should say 'to question their assumptions more' -- like I said, the article is quite good and it does present this as confusing, at least.
I agree, but the advertising is the whole issue. "Checking to see you're not a bot!" and all that.
Therefore some people using Anubis expect it to be an impenetrable wall, to "block AI scrapers", especially those that believe it's a way for them to be excluded from training data.
It's why just a few days ago there was a HN frontpage post of someone complaining that "AI scrapers have learnt to get past Anubis".
But that is a fight that one will never win (analog hole as the nuclear option).
If it said something like "Wait 5 seconds, our servers are busy!", I would think that people's expectations will be more accurate.
As a robot I'm really not that sympathetic to anti-bot language backfiring on humans. I have to look away every time it comes up on my screen. If they changed their language and advertising, I'll be more sympathetic -- it's not as if I disagree that overloading servers for not much benefit is bad!
As for the presentation/advertising, I didn't get into it because I don't hold a particularly strong opinion. Well, I do hold a particularly strong opinion, but not one that really distinguishes Anubis from any of the other things. I'm fully onboard with what you're saying -- I find this sort of software extremely hostile and the fact that so many people don't[0] reminds me that I'm not a people.
In my experience, this particular jump scare is about the same as any of the other services. The website is telling me that I'm not welcome for whatever arbitrary reason it is now, and everyone involved wants me to feel bad.
Actually there is one thing I like about the Anubis experience[1] compared to the other ones, it doesn't "Would you like to play a game?" me. As a robot I appreciate the bluntness, I guess.
(the games being: "click on this. now watch spinny. more. more. aw, you lose! try again?", and "wheel, traffic light, wildcard/indistinguishable"[2]).
[0] "just ignore it, that's what I do" they say. "Oh, I don't have a problem like that. Sucks to be you."
[1] yes, I'm talking upsides about the experience of getting **ed by it. I would ask how we got here but it's actually pretty easy to follow.
[2] GCHQ et al. should provide a meatspace operator verification service where they just dump CCTV clips and you have to "click on the squares that contain: UNATTENDED BAG". Call it "phonebooth, handbag, foreign agent".
(Apologies for all the weird tangents -- I'm just entertaining myself, I think I might be tired.)
You could setup a system for parellelizing the creation of these Anubis PoW cookies independent of the crawling logic. That would probably work, but it's a pretty heavy lift compared to 'just run a browser with JavaScript'.
That being said, one point is very correct here - by far the best effort to resist broad crawlers is a _custom_ anti-bot that could be as simple as "click your mouse 3 times" because handling something custom is very difficult in broad scale. It took the author just few minutes to solve this but for someone like Perplexity it would take hours of engineering and maintenance to implement a solution for each custom implementation which is likely just not worth it.
You can actually see this in real life if you google web scraping services and which targets they claim to bypass - all of them bypass generic anti-bots like Cloudflare, Akamai etc. but struggle with custom and rare stuff like Chinese websites or small forums because scraping market is a market like any other and high value problems are solved first. So becoming a low value problem is a very easy way to avoid confrontation.
Isn't this what Microsoft is trying to do with their sliding puzzle piece and choose the closest match type systems?
Also, if you come in on a mobile browser it could ask you to lay your phone flat and then shake it up and down for a second or something similar that would be a challenge for a datacenter bot pretending to be a phone.
I usually just sit there on my phone pressing the "I am not a robot box" when it triggers.
- an automated browser that doesn't leak the fact it's being automated
- ability to fake the browser fingerprint (e.g. Linux is heavily penalized)
- residential or mobile proxies (for small scale your home IP is probably good enough)
- deployment environment that isn't leaked to the browser.
- realistic scrape pattern and header configuration (header order, referer, prewalk some pages with cookies etc.)
This is really hard to do at scale but for small personal scripts you can have reasonable results with flavor of the month playwright forks on github like nodriver or dedicated tools like Flaresolver but I'd just find a web scraping api with low entry price and just drop 15$ month and avoid this chase because it can be really time consuming.
If you're really on budget - most of them offer 1,000 credits for free which will get you avg 100 pages a month per service and you can get 10 of them as they all mostly function the same.
Of course, you're going to have to verify each custom puzzle aren't you.
These are trivial for an AI agent to solve though, even with very dumb watered down models.
>I think the end result is just an internet resource I need is a little harder to access, and we have to waste a small amount of energy.
No need to mimic the actual challenge process. Just change your user agent to not have "Mozilla" in it; Anubis only serves you the challenge if it has that. For myself I just made a sideloaded browser extension to override the UA header for the handful of websites I visit that use Anubis, including those two kernel.org domains.
(Why do I do it? For most of them I don't enable JS or cookies for so the challenge wouldn't pass anyway. For the ones that I do enable JS or cookies for, various self-hosted gitlab instances, I don't consent to my electricity being used for this any more than if it was mining Monero or something.)
Browser fingerprinting works best against people with unique headers. There's probably millions of people using an untouched safari on iPhone. Once you touch your user-agent header, you're likely the only person in the world with that fingerprint.
I meeeeeannn... sure? I know that browser fingerprinting works quite well without, but custom headers are actually a game over in terms of not getting tracked.
Unless you've got some superintelligence hidden somewhere, you'd choose a neural net. To train, you need a large supply of LABELED data. Seems like a challenge to build that dataset; after all, we have no scalable method for classifying as of yet.
Because servers would serve different content based on user agent virtually all browsers start with Mozilla/5.0...
Hm. If your site is "sticky", can it mine Monero or something in the background?
We need a browser warning: "This site is using your computer heavily in a background task. Do you want to stop that?"
Doesn't Safari sort of already do that? "This tab is using significant power", or summat? I know I've seen that message, I just don't have a good repro.
Won't that break many other things? My understanding was that basically everyone's user-agent string nowadays is packed with a full suite of standard lies.
All the API is documented in https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/Web... . My Anubis extension modifies request headers using `browser.webRequest.onBeforeSendHeaders.addListener()` . Your case sounds like modifying response headers which is `browser.webRequest.onHeadersReceived.addListener()` . Either way the API is all documented there, as is the `manifest.json` that you'll need to write to register this JS code as a background script and whatever permissions you need.
Then zip the manifest and the script together, rename the zip file to "<id_in_manifest>.xpi", place it in the sideloaded extensions directory (depends on distro, eg /usr/lib/firefox/browser/extensions), restart firefox and it should show up. If you need to debug it, you can use the about:debugging#/runtime/this-firefox page to launch a devtools window connected to the background script.
This anime girl is not Anubis. It's a modern cartoon characters that simply borrows the name because it sounds cool, without caring anything about the history or meaning behind it.
Anime culture does this all the time, drawing on inspiration from all cultures but nearly always only paying the barest lip service to the original meaning.
I don't have an issue with that, personally. All cultures and religions should be fair game as inspiration for any kind of art. But I do have an issue with claiming that the newly inspired creation is equivalent in any way to the original source just because they share a name and some other very superficial characteristics.
I wasn't implying anything more than that, although now I see the confusing wording in my original comment. All I meant to say was that between the name and appearance it's clear the mascot is canid rather than feline. Not that the anime girl with dog ears is an accurate representation of the Egyptian deity haha.
> I do have an issue with claiming that the newly inspired creation is equivalent in any way to the original source
Nobody is claiming that the drawing is Anubis or even a depiction of Anubis, like the statues etc. you are interested in. It's a mascot. "Mascot design by CELPHASE" -- it says, in the screenshot.
Generally speaking -- I can't say that this is what happened with this project -- you would commission someone to draw or otherwise create a mascot character for something after the primary ideation phase of the something. This Anubis-inspired mascot is, presumably, Anubis-inspired because the project is called Anubis, which is a name with fairly obvious connections to and an understanding of "the original source".
> Anime culture does this all the time, ...
I don't know what bone you're picking here. This seems like a weird thing to say. I mean, what anime culture? It's a drawing on a website. Yes, I can see the manga/anime influence -- it's a very popular, mainstream artform around the world.
In case you feel it needs linking to the purpose of this forum, the art in question here is being forcefully shown to people in a situation that makes them do a massive context switch. I want to look at the linux or ffmpeg source code but my browser failed a security check and now I'm staring at a random anime girl instead. What's the meaning here, what's the purpose behind this? I feel that there's none, except for the library author's preference, and therefore this context switch wasted my time and energy.
Maybe I'm being unfair and the code author is so wrapped up in liking anime girls that they think it would be soothing to people who end up on that page. In which case, massive failure of understanding the target audience.
Maybe they could allow changing the art or turning it off?
> Anime culture does this all the time >> I don't know what bone you're picking here
I'm not picking any bone there. I love anime, and I love the way it feels so free in borrowing from other cultures. That said, the anime I tend to like is more Miyazaki or Satoshi Kon and less kawaii girls.
Your workflow getting interrupted, especially with a full-screen challenge page, is a very high-stress event. The mascot serves a purpose in being particularly distinct and recognizable, but also disarming for first-time users. This emotional response was calibrated particularly for more non-technical users who would be quick to be worried about 'being hit by a virus'. In particular I find that bot challenges tend to feel very accusing ("PROVE! PROVE YOU ARE NOT A ROBOT!"), and that a little bit of silly would disarm that feeling.
Similarly, that's why the error version of the mascot looks more surprised if anything. After all, only legitimate users will ever see that. (bots don't have eyes, or at least don't particularly care)
As for the design specifically, making it more anubis-like would probably have been a bit TOO furry and significantly hurt adoption. The design prompt was to stick to a jackal girl. Then again, I kinda wished in retrospect I had made the ears much, much longer.
Thanks for sharing your design notes on the mascot!
Viewing the challenge screenshot again after reading your response definitely sheds light as to why I have no aggro toward Anubis (even if the branding supposedly wouldn't jive well with a super professional platform, but hey, I think having the alternate, commercial offering is super brilliant in turn).
On the other hand, I immediately see red when I get stopped in my tracks by all the widely used (and often infinitely-unpassable) Cloudflare/Google/etc. implementations with wordings that do nothing but add insult to injury.
Thank you for the thought you put into that. I think you guys hit it out of the park.
Now you seem to be saying that anything that isn't what you wanted to find on the website is the problem. This makes sense, it just has nothing to do with what is shown on that page. But you're effectively getting frustrated at not getting to the page you wanted to and then directing your frustration toward the presentation of the "error message". That does not make sense.
> I like to talk seriously about art, representation, and culture. What's wrong with that? It's at least as interesting as discussing databases or web frameworks.
I don't have a problem with talking about art, you'll note that I responded in kind. When I said "I think you're taking it too seriously" I wasn't expecting that to be extrapolated to all subjects, just the one that was being discussed in the local context.
It's no fun.
For one, you pulled your original response out of your ass. That the mascot is not a "catgirl" as identified by OP, but a canine variant of the same concept, because the project is named after the Egyptian god, is both obvious and uninteresting. You added nothing to that.
You're running around shouting "I get the joke, I get the joke" while grandstanding about how serious you are about art, one of the human pursuits helped least by seriousness, considering.
If you've decided you also need to be silly about it today, then at least have the decency to make up a conspiracy theory about the author being in fact a front for an Egyptian cult trying to bring back the old gods using the harvested compute, or whatever.
>massive failure of understanding the target audience.
Heh.
The anime image is put there as an intentional, and to my view rightful, act of irreverence.
One that works, too: I unironically find the people going like "my girl/boss will be mad at me if they see this style of image on my computer" positively hilarious.
>Maybe they could allow changing the art or turning it off?
They sure do. For money. Was in the release announcement.
Not enough irreverence in your game and you can end up being the person who let them build the torment nexus. Many such cases, and that's why we're where we are.
>That said, the anime I tend to like is more Miyazaki or Satoshi Kon and less kawaii girls.
A true connoiseur only watches chibi :3
If I wanted to be spoken to this way I'd make a reddit account.
Also, the anime reference is very much intentional at this point; while the source code is open so anyone can change it, the author sells a version for the boring serious types where you can easily change the logo without recompiling the source yourself. Adding the additional bottleneck of having to sync a custom fork or paying out to placate the "serious" people is a great way to get the large corporations to pay a small fee to cover maintenance.
I'd ask why you /don't/ have an aversion to that?
(yes, "not all anime" etc...)
So I'd ask why that makes you think of sexual consent.
In the '70s, if you were into computers you were most likely also a fan of Star Trek. I remember an anecdote from the 1990s when an entire dial-up ISP was troubleshooting its modem pools because there were zero people connected and they assumed there was an outage. The outage happened to occur exactly while that week's episode of X-Files was airing in their time zone. Just as the credits rolled, all modems suddenly lit up as people connected to IRC and Usenet to chat about the episode. In ~1994 close to 100% of residential internet users also happened to follow X-Files on linear television. There was essentially a 1:1 overlap between computer nerds and sci-fi nerds.
Today's analog seems to be that almost all nerds love anime and Andy Weir books and some of us feel a bit alienated by that.
Especially because (from my observation) modern "nerds" who enjoy anime seem to relish at bringing it (and various sex-related things) up at inappropriate times and are generally emotionally immature.
It's quite refreshing seeing that other people have similar lines of thinking and that I'm not alone in feeling somewhat alienated.
Eg if the nerd community had $x$ people in the star trek days, now there are more than $x$ nerds who like anime and more than $x$ nerds who dislike it. And the total size is much bigger than both.
This sounds more as though you actively dislike anime than merely not seeing the appeal or being "ignorant". If you were to ignore it, there wouldn't be an issue...
I don't get the impression that it's meant to be annoying, but a personal preference. I can't know that, though whitelabeling is a common thing people pay for without the original brand having made their logo extra ugly
Everything got so corporate and sterile.
mimi == ears
ComfyUI has what I think is a foxgirl as its official mascot, and that's the de-facto primary UI for generating Stable Diffusion or related content.
That's what it's for, isn't it? Make crawling slower and more expensive. Shitty crawlers not being able to run the PoW efficiently or at all is just a plus. Although:
> which is trivial for them, as the post explains
Sadly the site's being hugged to death right now so I can't really tell if I'm missing part of your argument here.
> figure out that they can simply remove "Mozilla" from their user-agent
And flag themselves in the logs to get separately blocked or rate limited. Servers win if malicious bots identify themselves again, and forcing them to change the user agent does that.
The default settings produce a computational cost of milliseconds for a week of access. For this to be relevant it would have to be significantly more expensive to the point it would interfere with human access.
So a crawlers that goes very ethically and does very little strain on the server should indeed be able to crawl for a whole week on a cheap compute, one that hammers the server hard will not.
Provisioning new ips is probably more costly than calculating the tokens, at least with the default difficulty setting.
Perhaps you just don't realize how much did the scraping load increase in the last 2 years or so. If your server can stay up after deploying Anubis, you've already won.
If it's an actual botnet, then it's hijacked computers belonging to other people, who are the ones paying the power bills. The attacker doesn't care that each computer takes a long time to calculate. If you have 1000 computers each spending 5s/page, then your botnet can retrieve 200 pages/s.
If it's just a cloud deployment, still it has resources that vastly outstrip a normal person's.
The fundamental issue is that you can't serve example.com slower than a legitimate user on a crappy 10 year old laptop could tolerate, because that starts losing you real human users. So if let's say say user is happy to wait 5 seconds per page at most, then this is absolutely no obstacle to a modern 128 core Epyc. If you make it troublesome to the 128 core monster, then no normal person will find the site usable.
The way i think it works is they provide free VPN to the users or even pay their internet bill and then sell the access to their ip.
The client just connects to a vpn and has a residential exit IP.
The cost of the VPN is probably higher than the cost for the proof of work though.
In an endless cat-and-mouse game, it won't.
But right now, it does, as these bots tend to be really dumb (presumably, a more competent botnet user wouldn't have it do an equivalent of copying Wikipedia by crawling through its every single page in the first place). With a bit of luck, it will be enough until the bubble bursts and the problem is gone, and you won't need to deploy Anubis just to keep your server running anymore.
>> So (11508 websites * 2^16 sha256 operations) / 2^21, that’s about 6 minutes to mine enough tokens for every single Anubis deployment in the world. That means the cost of unrestricted crawler access to the internet for a week is approximately $0.
>> In fact, I don’t think we reach a single cent per month in compute costs until several million sites have deployed Anubis.
And as the poster mentioned if you are running an AI model you probably have GPUs to spare. Unlike the dev working from a 5 year old Thinkpad or their phone.
Indeed a new token should be requested per request; the tokens could also be pre-calculated, so that while the user is browsing a page, the browser could calculate tickets suitable to access the next likely browsing targets (e.g. the "next" button).
The biggest downside I see is that mobile devices would likely suffer. Possible the difficulty of the challange is/should be varied by other metrics, such as the number of requests arriving per time unit from a C-class network etc.
The reason why anubis works is not the PoW, it is that the dev time to implement the bypass takes out the lowest effort bots. Thus the correct response is to keep the PoW difficulty low so you minimize harm to real users. Or better yet, implementing your own custom check that doesn't use any PoW and relies on ever higher obscurity to block the low effort bots.
The more anubis is used, the less effective it is and the more it harms real users.
Luckily someone had already captured an archive snapshot: https://archive.ph/BSh1l
Counterpoint - it seems to work. People use anubis because its the best of bad options.
If theory and reality disagree, it means either you are missing something or your theory is wrong.
I wish the old trick of sending CCP-unfriendly content to get the great firewall to kill the connection for you still worked, but in the days of TLS everywhere that doesn't seem to work anymore.
Of course we knew from the beginning that this first stage of "bots don't even try to solve it, no matter the difficulty" isn't a forever solution
I cannot claim that I understand it well, but my best guess is that these are images that represent a kind of culture that I have encountered both in real-life and online that I never felt comfortable around. It doesn't seem unreasonable that this uneasiness around people with identity-constituting interests in anime, Furries, MLP, medieval LARP, etc. transfers back onto their imagery. And to be clear, it is not like I inherently hate anime as a medium or the idea of anthropomorphism in art. There is some kind of social ineptitude around propagating these _kinds_ of interests that bugs me.
I cannot claim that I am satisfies with this explanation. I know that the dislike I feel for this is very similar to that I feel when visiting a hacker space where I don't know anyone. But I hope that I could at least give a feeling for why some people don't like seeing catgirls every time I open a repository and that it doesn't necessarily have anything to do with advocating for a "corporate soulless web".
I'm an unsure if this deadpan humor or if the author has never tried to solve a CAPTCHA that is something like "select the squares with an orthodox rabbi present"
- https://www.htmlcenter.com/blog/now-thats-an-annoying-captch...
- https://depressedprogrammer.wordpress.com/2008/04/20/worst-c...
- https://medium.com/xato-security/a-captcha-nightmare-f6176fa...
So much this. The first time one asked me to click on "crosswalks", I genuinely had to think for a while as I struggled to remember WTF a "crosswalk" was in AmEng. I am a native English speaker, writer, editor and professionally qualified teacher, but my form of English does not have the word "crosswalk" or any word that is a synonym for it. (It has phrases instead.)
Our schoolbuses are ordinary buses with a special number on the front. They are no specific colour.
There are other examples which aren't coming immediately to mind, but it is vexing when the designer of a CAPTCHA isn't testing if I am human but if I am American.
On some Russian and Asian site I ran into trouble signing up for a forum using translation software because the CAPTCHA requires me to enter characters I couldn't read or reproduce. It doesn't happen as often as the Google thing, but the problem certainly isn't restricted to American sites!
There are some browser extensions for it too, like NopeCHA, it works 99% of the time and saves me the hassle of doing them.
Any site using CAPTCHA's today is really only hurting there real customers and low hanging fruit.
Of course this assumes they can't solve the capture themselves, with ai, which often they can.
Early 2000s captchas really were like that.
Maybe autonomous train training would be even cooler but it's not like improving tobacco products that only have downsides
On that note, is kernel.org really using this for free and not the paid version without the anime? Linux Foundation really that desperate for cash after they gas up all the BMW's?
Think you can thank the furries for that.
Every furry I've happened to come across was very pervy in some way, and so that what immediately comes to mind when I see furry-like pictures like the one shown in the article.
YMMV
None of them were very pervy at first, only after I got to know them.
[1]: https://www.reddit.com/r/rust/comments/vyelva/why_are_there_...
https://storage.courtlistener.com/recap/gov.uscourts.miwd.11...
“The future is now, old man”
Not only is it unprofessional, courts have found it impermissible.
If it makes sense for an organization to donate to a project they rely on, then they should just donate. No need to debrand if it's not strictly required, all that would do is give the upstream project less exposure. For design reasons maybe? But LKML isn't "designed" at all, it has always exposed the raw ugly interface of mailing list software.
Also, this brand does have trust. Sure, I'm annoyed by these PoW captcha pages, but I'm a lot more likely to enable Javascript if it's the Anubis character, than if it is debranded. If it is debranded, it could be any of the privacy-invasive captcha vendors, but if it's Anubis, I know exactly what code is going to run.
It is only trusted by a small subset of people who are in the know. It is not about “anime bad” but that a large chunk of the population isnt into it for whatever reason.
I love anime but it can also be cringe. I find this cringe as it seems many others do too.
It won't stop the crawlers immediately, but it might lead to an overhyped and underwhelming LLM release from a big name company, and force them to reassess their crawling strategy going forward?
Putting up a scraper shield seems like it's more of a political statement than a solution to a real technical problem. It's also antithetical to open collaboration and an open internet of which Linux is a product.
Meanwhile AI farms will just run their own nuclear reactors eventually and be unaffected.
I really don't understand why someone thought this was a good idea, even if well intentioned.
It seems there is a large number of operations crawling the web to build models that aren't using directly infrastructure hosted on AI farms BUT botnet running on commodity hardware and residencial networks to circumvent their ip range from being blacklisted. Anubis point is to block those.
Because I've got the same model line but about 3 or 4 years older and it usually just flashes by in the browser Lightning from F-droid which is an OS webview wrapper. On occasion a second or maybe two, I assume that's either bad luck in finding a solution or a site with a higher difficulty setting. Not sure if I've seen it in Fennec (firefox mobile) yet but, if so, it's the same there
I've been surprised that this low threshold stops bots but I'm reading in this thread that it's rather that bot operators mostly just haven't bothered implementing the necessary features yet. It's going to get worse... We've not even won the battle let alone the war. Idk if this is going to be sustainable, we'll see where the web ends up...
I've certainly seen Anubis take a few seconds (three or four maybe) but that was on a very old phone that barely loaded any website more complex than HN.
Maybe there's going to be some form of pay per browse system? even if it's some negligible cost on the order of 1$ per month (and packaged with other costs), I think economies of scale would allow servers to perform a lifetime of S24 captchas in a couple of seconds.
This however forces servers to increase the challenge difficulty, which increases the waiting time for the first-time access.
> After further investigation and communication. This is not a bug. The threat actor group in question installed headless chrome and simply computed the proof of work. I'm just going to submit a default rule that blocks huawei.
reducing the problem to a cost issue is bound to be short sighted.
At the very least captcha at least tries to make the human-ai distinction, but these algorithms are just purely on the side of making it "expensive". if its just a capital problem, then its not a problem for these big corpo who are the ones who are incentivized to do so in the first place!
even if human captcha solvers are involved, at the very least it provides the society with some jobs (useless as it may be), but these mining algorithms also do society no good, and wastes compute for nothing!
Although the long term problem is the business model of servers paying for all network bandwidth.
Actual human users have consumed a minority of total net bandwidth for decades:
https://www.atom.com/blog/internet-statistics/
Part 4 shows bots out using humans in 1996 8-/
What are "bots"? This needs to include goggleadservices, PIA sharing for profit, real-time ad auctions, and other "non-user" traffic.
The difference between that and the LLM training data scraping, is that the previous non-human traffic was assumed, by site servers, to increase their human traffic, through search engine ranking, and thus their revenue. However the current training data scraping is likely to have the opposite effect: capturing traffic with LLM summaries, instead of redirecting it to original source sites.
This is the first major disruption to the internet's model of finance since ad revenue look over after the dot bomb.
So far, it's in the same category as the environmental disaster in progress, ownership is refusing to acknowledge the problem, and insisting on business as usual.
Rational predictions are that it's not going to end well...
Servers do not "pay for all the network bandwidth" as if they are somehow being targeted for fees and carrying water for the clients that are somehow getting it for "free". Everyone pays for the bandwidth they use, clients, servers, and all the networks in between, one way or another. Nobody out there gets free bandwidth at scale. The AI scrapers are paying lots of money to scrape the internet at the scales they do.
They are hiring machines at scale too so definitely bandwidth etc. are cheaper for them too. Maybe use a provider that doesn't have too much bandwidth issues (hetzner?)
But still, the point being that you might be hosting website on your small server and that scraper with its machines beast can come and effectively ddos your server looking for data to scrape. Deterring them is what matters so that the economical scale finally slide back to our favours again.
When this access is beneficial to them, that's OK, when it's detrimental to them, they're paying for their own decline.
The statement isn't really concerned with what if anything the scraper operators are paying, and I don't think that really matters in reaching the conclusion.
Is the traffic that people are complaining about really training traffic?
My SWAG would be that there are maybe on the order of dozens of foundation models trained in a year. If you assume the training runs are maximally inefficient, cache nothing, and crawl every Web site 10 times for each model trained, then that means maybe a couple of hundred full-content downloads for each site in a year. But really they probably do cache, and they probably try to avoid downloading assets they don't actually want to put into the training hopper, and I'm not sure how many times they feed any given page through in a single training run.
That doesn't seem like enough traffic to be a really big problem.
On the other hand, if I ask ChatGPT Deep Research to give me a report on something, it runs around the Internet like a ferret on meth and maybe visits a couple of hundred sites (but only a few pages on each site). It'll do that a whole lot faster than I'd do it manually, it's probably less selective about what it visits than I would be... and I'm likely to ask for a lot more such research from it than I'd be willing to do manually. And the next time a user asks for a report, it'll do it again, often on the same sites, maybe with caching and maybe not.
Thats not training; the results won't be used to update any neural network weights, and won't really affect anything at all beyond the context of a single session. It's "inference scraping" if you will. It's even "user traffic" in some sense, although not in the sense that there's much chance the user is going to see a site's advertising. It's conceivable the bot might check the advertising for useful information, but of course the problem there is that it's probably learned that's a waste of time.
Not having given it much thought, I'm not sure how that distinction affects the economics of the whole thing, but I suspect it does.
So what's really going on here? Anybody actually know?
There's some user-directed traffic, but it's a small fraction, in my experience.
Search for “A graph of daily requests over time, comparing different categories of AI Crawlers” on this blog: https://blog.cloudflare.com/ai-labyrinth/
AI crawlers and fetchers are blowing up websites, with Meta and OpenAI the worst offenders
But if there's a (discoverable) page comparing every revision of a page to every other revision, and a page has N revisions, there are going to be (N^2-N)/2 delta pages, so could it just be the majority of the distinct pages your Wiki has are deltas?
I would think that by now the "AI companies" would have something smarter steering their scrapers. Like, I dunno, some kind of AI. But maybe they don't for some reason? Or maybe the big ones do, but smaller "hungrier" ones, with less staff but still probably with a lot of cash, are willing to burn bandwidth so they don't have to implement that?
The questions just multiply.
Search engines, at least, are designed to index the content, for the purpose of helping humans find it.
Language models are designed to filch content out of my website so it can reproduce it later without telling the humans where it came from or linking them to my site to find the source.
This is exactly the reason that "I just don't like 'AI'." You should ask the bot owners why they "just don't like appropriate copyright attribution."
You can't copyright an idea, only a specific expression of an idea. An LLM works at the level of "ideas" (in essence - for example if you subtract the vector for "woman" from "man" and add the difference to "king" you get a point very close to "queen") and reproduces them in new contexts and makes its own connections to other ideas. It would be absurd for you to demand attribution and payment every time someone who read your Python blog said "Python is dynamically type-checked and garbage-collected". Thankfully that's not how the law works. Abusive traffic is a problem, but the world is a better place if humans can learn from these ideas with the help of ChatGPT et al. and to say they shouldn't be allowed to just because your ego demands credit for every idea someone learns from you is purely selfish.
There is no proof that LLMs work at the level of "ideas", if you could prove that, you'e solve a whole lot of incredibly expensive problems that are current bottlenecks for training and inference.
It is a bit ironic that you'd call someone wanting to control and be paid for the thing they themselves created "selfish", while at the same time writing apologia on why it's okay for a trillion dollar private company to steal someone else's work for their own profit.
It isn't some moral imperative that OpenAI gets access to all of humanity's creations so they can turn a profit.
Why this is the case while web-crawlers have been scrapping the web for the last 30 years is a mystery to me. This should be a solved problem. But it looks like this field is full of wrongly behaving companies with complete disregards toward common goods.
a mix of ignorance, greed, and a bit of the tragedy of the commons. If you don't respect anyone around you, you're not going to care about any rules or ettiquite that don't directly punish you. Society has definitely broken down over the decades.
It's like a secondary rate-limit on the ability of scrapers to rotate IPs, thus allowing your primary IP-based rate-limiting to remain effective.
All had the same user agent (current Safari), they seem to be from hacked computers as the ISPs are all over the world.
The structure of the requests almost certainly means we've been specifically targeted.
But it's also a valid query, reasonably for normal users to make.
From this article, it looks like Proof of Work isn't going to be the solution I'd hoped it would be.
Scaling up the math in the article, which states it would take 6 CPU-minutes to generate enough tokens to scrape 11,508 Anubis-using websites, we're now looking at 4.3 CPU-hours to obtain enough tokens to scrape your website (and 50,000 CPU-hours to scrape the Internet). This still isn't all that much -- looking at cloud VM prices, that's around 10c to crawl your website and $1000 to crawl the Internet, which doesn't seem like a lot but it's much better than "too low to even measure".
However, the article observes Anubis's default difficulty can be solved in 30ms on a single-core server CPU. That seems unreasonably low to me; I would expect something like a second to be a more appropriate difficulty. Perhaps the server is benefiting from hardware accelerated sha256, whereas Anubis has to be fast enough on clients without it? If it's possible to bring the JavaScript PoW implementation closer to parity with a server CPU (maybe using a hash function designed to be expensive and hard to accelerate, rather than one designed to be cheap and easy to accelerate), that would bring the cost of obtaining 500k tokens up to 138 CPU-hours -- about $2-3 to crawl one site, or around $30,000 to crawl all Anubis deployments.
I'm somewhat skeptical of the idea of Anubis -- that cost still might be way too low, especially given the billions of VC dollars thrown at any company with "AI" in their sales pitch -- but I think the article is overly pessimistic. If your goal is not to stop scrapers, but rather to incentivize scrapers to be respectful by making it cheaper to abide by rate limits than it is to circumvent them, maybe Anubis (or something like it) really is enough.
(Although if it's true that AI companies really are using botnets of hacked computers, then Anubis is totally useless against bots smart enough to solve the challenges since the bots aren't paying for the CPU time.)
The Duke University Library analysis posted elsewhere in the discussion is promising.
I'm certain the botnets are using hacked/malwared computers, as the huge majority of requests come from ISPs and small hosting providers. It's probably more common for this to be malware, e.g. a program that streams pirate TV, or a 'free' VPN app, which joins the user's device to a botnet.
As a general rule of thumb: you can sue anyone for anything in the US. There are even a few cases where someone tried to sue God: https://en.wikipedia.org/wiki/Lawsuits_against_supernatural_...
When we say "do we need" or "can we do" we're talking about the idea of how plausible it is to win case. A lawyer won't take a case with bad odds of winning, even if you want to pay extra because a part of their reputation lies on taking battles they feel they can win.
>because when I accidentally punch someones tooth out, I would assume they certainly are entitled to the dentist bill.
IANAL, so the boring answer is "it depends". reparations aren't guaranteed, but there's 50 different state laws to consider, on top of federal law.
Generally, they are not entitled to pay for damages themselves, but they may possibly be charged with battery. Intent will be a strong factor in winning the case.
Keep in mind I'm in Germany, the server is in another EU country, and the worst scrapers overseas (in China, USA, and Singapore). Thanks to these LLMs there is no barrier to have the relevant laws be translated in all directions I trust that won't be a problem! :P
Are you a criminal defense attorney or prosecutor?
> They have to had known
IMO good luck convincing a judge of that... especially "beyond a reasonable doubt" as would be required for criminal negligence. They could argue lots of other scrapers operate just fine without causing problems, and that they tested theirs on other sites without issue.
Still, even by those lesser standards, it's hard to build a case.
Criminal cases require proof beyond a reasonable doubt. Most things that can result in jail time are criminal cases. Criminal cases are almost always brought by the government, and criminal acts are considered harm to society rather than to (strictly) an individual. In the US, criminal cases are classified as "misdemeanors" or "felonies," but that language is not universal in other jurisdictions.
>Absent a guilty plea, the Due Process Clause requires proof beyond a reasonable doubt before a person may be convicted of a crime.
either way the result is the same: they induce massive load
well written crawlers will:
- not hit a specific ip/host more frequently than say 1 req/5s
- put newly discovered URLs at the end of a distributed queue (NOT do DFS per domain)
- limit crawling depth based on crawled page quality and/or response time
- respect robots.txt
- make it easy to block them
- wait for the previous request to finish before requesting the next page, since that would only induce more load, get even slower, and eventually take everything down
I've designed my site to hold up to traffic spikes anyway and the bots I'm getting aren't as crazy as the ones I hear about from other, bigger website operators (like the OpenStreetMap wiki, still pretty niche), so I don't block much of them. Can't vet every visitor so they'll get the content anyway, whether I like it or not. But if I see a bot having HTTP 499 "client went away before page finished loading" entries in the access log, I'm not wasting my compute on those assholes. That's a block. I haven't had to do that before, in a decade of hosting my own various tools and websites
Thing is, the actual lived experience of webmasters tells that the bots that scrape the internets for LLMs are nothing like crafted software. They are more like your neighborhood shit-for-brain meth junkies competing with one another who makes more robberies in a day, no matter the profit.
Those bots are extremely stupid. They are worse than script kiddies’ exploit searching software. They keep banging the pages without regard to how often, if ever, they change. If they were 1/10th like many scraping companies’ software, they wouldn’t be a problem in the first place.
Since these bots are so dumb, anything that is going to slow them down or stop them in their tracks is a good thing. Short of drone strikes on data centers or accidents involving owners of those companies that provide networks of botware and residential proxies for LLM companies, it seems fairly effective, doesn’t it?
Ask me how I know.
There are forums which ask domain-specific questions as a CAPTCHA upon attempting to register an account, and as someone who has employed such a method, it is very effective. (Example: what nominal diameter is the intake valve stem on a 1954 Buick Nailhead?)
As long as this challenge remains obscure enough to be not worth implementing special handlers in the crawler, this sounds a neat idea.
But I think if everyone starts doing this particular challenge (char count), the crawlers will start instructing a cheap LLM to do appropriate tool calls and get around it. So the challenge needs to be obscure.
I wonder if anyone tried building a crawler-firewall or even nginx script which will let the site admin plug their own challenge generator in lua or something, which would then create a minimum HTML form. Maybe even vibe code it :)
I have my own project that finds malicious traffic IP addresses, and through searching through the results, it's allowed me to identify IP address ranges to be blocked completely.
Yielding useful information may not have been what it was designed to do, but it's still a useful outcome. Funny thing about Anubis' viral popularity is that it was designed to just protect the author's personal site from a vast army of resource-sucking marauders, and grew because it was open sourced and a LOT of other people found it useful and effective.
That is literally an anti-human filter.
"Anubis doesn't target crawlers which run JS (or those which use a headless browser, etc.) It's meant to block the low-effort crawlers that tend to make up large swaths of spam traffic. One can argue about the efficacy of this approach, but those higher-effort crawlers are out of scope for the project."
So its meant/preferred to block low effort crawlers which can still cause damage if you don't deal with them. a 3 second deterrent seems good in that regard. Maybe the 3 second deterrent can come as in rate limiting an ip? but they might use swath's of ip :/
1. Anubis makes you calculate a challenge.
2. You get a "token" that you can use for a week to access the website.
3. (I don't see this being considered in the article) "token" that is used too much is rate limited. Calculating a new token for each request is expensive.
- https://news.ycombinator.com/item?id=44971990 person being blocked with `message looking something like "you failed"`
- https://news.ycombinator.com/item?id=44970290 mentions of other requirements that are allegedly on purpose to block older clients (as browser emulators presumably often would appear to be, because why would they bother implementing newer mechanisms when the web has backwards compatibility)
The Chinese crawlers seem to have adjusted their crawling techniques to give their browsers enough compute to pass standard Anubis checks.
Not for me, I have nothing but a hard time solving CAPTCHAs, ahout 50% of the time I give up after 2 tries.
I'm sure the software behind it is fine but the imagery and style of it (and the confidence to feature it) makes me doubt the mental credibility/social maturity of anybody willing to make it the first thing you see when accessing a webpage.
Edit: From a quick check of the "CEO" of the company, I was unsurprised to have my concerns confirmed. I may be behind the times but I think there are far too many people in who act obnoxiously (as part of what can only be described as a new subculture) in open source software today and I wish there were better terms to describe it.
The assumption is that if you’re the operator of these bots and care enough to implement the proof of work challenge for Anubis you could also realize your bot is dumb and make it more polite and considerate.
Of course nothing precludes someone implementing the proof of work on the bot but otherwise leaving it the same (rude and abusive). In this case Anubis still works as a somewhat fancy rate limiter which is still good.
Bad: A site being usable for a significant amount of time per day, but also unusable for a significant amount of time per day, and the ratio between usable and unusable time per day significantly deteriorating.
Worse: A site being usable for a significant amount of time per day, but also unusable for a significant amount of time per day, and the ratio between usable and unusable time per day significantly deteriorating _significantly faster_.
Clearly, Anubis is at best an interim measure. The interim period might not be significant.
But it might be. That is presumably the point of Anubis.
That said, the only time I've heard of Anubis being tried was when Perl's MetaCPAN became ever more unusable over the summer. [0]
Unfortunately Anubis and Fastly fought, and Fastly won. [1]
----
[0] https://www.perl.com/article/metacpan-traffic-crisis/
[1] https://www.reddit.com/r/perl/comments/1mbzrjo/metacpans_tra...
Where does one even find a VPS with such small memory today?
On an old laptop running Windows XP (yes, with GUI, breaking my own rule there) I've also run a lot of services, iirc on 256MB RAM. XP needed about 70 I think, or 52 if I killed stuff like Explorer and unnecessary services, and the remainder was sufficient to run a uTorrent server, XAMPP (Apache, MySQL, Perl and PHP) stack, Filezilla FTP server, OpenArena game server, LogMeIn for management, some network traffic monitoring tool, and probably more things I'm forgetting. This ran probably until like 2014 and I'm pretty sure the site has been on the HN homepage with a blog post about IPv6. The only thing that I wanted to run but couldn't was a Minecraft server that a friend had requested. You can do a heck of a lot with a hundred megabytes of free RAM but not run most Javaware :)
But still enough to prevent a billion request DDoS
These sites have been search engine scrapped forever. It’s not about blocking bots entirely just about this new wave of fuck you I don’t care if your host goes down quasi malicious scrappers
(and these bots tend to be very, very dumb - which often happens to make them more effective at DDoSing the server, as they're taking the worst and the most expensive ways to scrape content that's openly available more efficiently elsewhere)
What I do care about is being met with something cutesy in the face of a technical failure anywhere on the net.
I hate Amazon's failure pets, I hate google's failure mini-games -- it strikes me as an organizational effort to get really good at failing rather than spending that same effort to avoid failures all together.
It's like everyone collectively thought the standard old Apache 404 not found page was too feature-rich and that customers couldn't handle a 3 digit error, so instead we now get a "Whoops! There appears to be an error! :) :eggplant: :heart: :heart: <pet image.png>" and no one knows what the hell is going on even though the user just misplaced a number in the URL.
Reddit implemented something a while back that says "You've been blocked by network security!" with a big smiling Reddit snoo front and centre on the page and every time I bump into it I can't help but think this.
So, I don't see an error code + something fun to be that bad.
People love dreaming of the 90s wild web and hate the clean cut soulless corp web of today, so I don't see how having fun error pages to be such an issue?
Usually when I hit an error page, and especially if I hit repeated errors, I'm not in the mood for fun, and I'm definitely not in the mood for "fun" provided by the people who probably screwed up to begin with. It comes off as "oops, we can't do anything useful, but maybe if we try to act cute you'll forget that".
Also, it was more fun the first time or two. There's a not a lot of orginal fun on the error pages you get nowadays.
> People love dreaming of the 90s wild web and hate the clean cut soulless corp web of today
It's been a while, but I don't remember much gratuitous cutesiness on the 90s Web. Not unless you were actively looking for it.
Not to those who don't exist in such cultures. It's creepy, childish, strange to them. It's not something they see in everyday life, nor would I really want to. There is a reason why cartoons are aimed for younger audiences.
Besides if your webserver is throwing errors, you've configured it incorrectly. Those pages should be branded as the site design with a neat and polite description to what the error is.
This is probably intentional. They offer an paid unbranded version. If they had a corporate friendly brand on the free offering, then there would be fewer people paying for the unbranded one.
Anubis usually clears in with no clicks and no noticeable slowdown, even with JIT off. Among the common CAPTCHA solutions it's the least annoying for me.
1. Store the nonce (or some other identifier) of each jwt it passes out in the data store
2. Track the number or rate of requests from each token in the data store
3. If a token exceeds the rate limit threshold, revoke the token (or do some other action, like tarpit requests with that token, or throttle the requests)
Then if a bot solves the challenge it can only continue making requests with the token if it is well behaved and doesn't make requests too quickly.
It could also do things like limit how many tokens can be given out to a single ip address at a time to prevent a single server from generating a bunch of tokens.
Is it worth it? Millions of users wasting cpu and power for what? Saving a few cents on hosting? Just rate limit requests per second per IP and be done.
Sooner or later bots will be better at captchas than humans, what then? What's so bad with bots reading your blog? When bots evolve, what then? UK style, scan your ID card before you can visit?
The internet became a pain to use... back in the time, you opened the website and saw the content. Now you open it, get an antibot check, click, forward to the actual site, a cookie prompt, multiple clicks, then a headline + ads, scroll down a milimeter... do you want to subscribe to a newsletter? Why, i didn't even read the first sentence of the article yet... scroll down.. chat with AI bot popup... a bit further down login here to see full article...
Most of the modern web is unusable. I know I'm ranting, but this is just one of the pieces of a puzzle that makes basic browsing a pain these days.
Doing the proof-of-work for every request is apparently too much work for them.
Crawlers using a single ip, or multiple ips from a single range are easily identifiable and rate-limited.
Other than Safari, mainstream browsers seem to have given up on considering browsing without javascript enabled a valid usecase. So it would purely be a performance improvement thing.
Seriously though, does anything of Apple's work without JS, like Icloud or Find my phone? Or does Safari somehow support it in a way that other browsers don't?
> Habeas would license short haikus to companies to embed in email headers. They would then aggressively sue anyone who reproduced their poetry without a license. The idea was you can safely deliver any email with their header, because it was too legally risky to use it in spam.
Kind of a tangent but learning about this was so fun. I guess it's ultimately a hack for there not being another legally enforceable way to punish people for claiming "this email is not spam"?
IANAL so what I'm saying is almost certainly nonsense. But it seems weird that the MIT license has to explicitly say that the licensed software comes with no warranty that it works, but that emails don't have to come with a warranty that they are not spam! Maybe it's hard to define what makes an email spam, but surely it is also hard to define what it means for software to work. Although I suppose spam never e.g. breaks your centrifuge.
I personally don't care about the act of scraping itself, but the volume of scraping traffic has forced administrators' hands here. I suspect we'd be seeing far fewer deployments if the scrapers behaved themselves to begin with.
That being said, I agree with you that there are ways around this for a dedicated adversary, and that it's unlikely to be a long-term solution as-is. My hope is that the act of having to circumvent Anubis at scale will prompt some introspection (do you really need to be rescraping every website constantly?), but that's hopeful thinking.
If you want to do advertisement then don't require a payment, and be happy that crawlers will spread your ad to the users of AI-bots.
If you are a non-profit-site then it's great to get a micro-payment to help you maintain and run the site.
Yet now when it's AI accessing their own content, suddenly they become the DMCA and want to put up walls everywhere.
I'm not part of the AI doomer cult like many here, but it would seem to me that if you publish your content publicly, typically the point is that it would be publicly available and accessible to the world...or am I crazy?
As everything moves to AI-first, this just means nobody will ever find your content and it will not be part of the collective human knowledge. At which point, what's the point of publishing it.
i.e. it's DDoS protection.
Sure, if you ignore that humans click on one page and the problematic scrapers (not the normal search engine volume, but the level we see nowadays where misconfigured crawlers go insane on your site) are requesting many thousands to millions of times more pages per minute. So they'll need many many times the compute to continue hammering your site whereas a normal user can muster to load that one page from the search results that they were interested in
It was arguably never a great idea to begin with, and stopped making sense entirely with the advent of generative AI.
About the difficulty of proving you are human especially when every test built has so much incentive to be broken. I don't think it will be solved, or could ever be solved.
No wonder the site is being hugged to death. 128MB is not a lot. Maybe it's worth to upgrade if you post to hacker news. Just a thought.
How much memory do you think it actually takes to accept a TLS connection and copy files from disk to a socket?
https://wiki.debian.org/DebianEdu/Documentation/Bullseye/Req...
* Thin clients with only 256 MiB RAM and 400 MHz are possible, though more RAM and faster processors are recommended.
* For workstations, diskless workstations and standalone systems, 1500 MHz and 1024 MiB RAM are the absolute minimum requirements. For running modern webbrowsers and LibreOffice at least 2048 MiB RAM is recommended.
Also, the HN homepage is pretty tame so long as you don't run WordPress. You don't get more than a few requests per second, so multiply that with the page size (images etc.) and you probably get a few megabits as bandwidth, no problem even for a Raspberry Pi 1 if the sdcard can read fast enough or the files are mapped to RAM by the kernel
That doesn't necessarily mean it's useless, but it also isn't really meant to block scrapers in the way TFA expects it to.
> It's a reverse proxy that requires browsers and bots to solve a proof-of-work challenge before they can access your site, just like Hashcash.
It's meant to rate-limit accesses by requiring client-side compute light enough for legitimate human users and responsible crawlers in order to access but taxing enough to cost indiscriminate crawlers that request host resources excessively.
It indeed mentions that lighter crawlers do not implement the right functionality in order to execute the JS, but that's not the main reason why it is thought to be sensible. It's a challenge saying that you need to want the content bad enough to spend the amount of compute an individual typically has on hand in order to get me to do the work to serve you.
> Anubis is a man-in-the-middle HTTP proxy that requires clients to either solve or have solved a proof-of-work challenge before they can access the site. This is a very simple way to block the most common AI scrapers because they are not able to execute JavaScript to solve the challenge. The scrapers that can execute JavaScript usually don't support the modern JavaScript features that Anubis requires. In case a scraper is dedicated enough to solve the challenge, Anubis lets them through because at that point they are functionally a browser.
As the article notes, the work required is negligible, and as the linked post notes, that's by design. Wasting scraper compute is part of the picture to be sure, but not really its primary utility.
https://github.com/factor/factor/blob/master/extra/hashcash/...
With the current approach we just waste the energy, if you use bitcoin already mined (=energy previously wasted) it becomes sustainable.
Haven't seen dumb anime characters since.
That's all the asymmetry you need to make it unviable. Even if the attacker is no better at solving the challenge than your browser is, there's no way to tune the monetary cost to be even in the ballpark to the cost imposed to the legitimate users. So there's no point in theorizing about an attacker solving the challenges cheaper than a real user's computer, and thus no point in trying to design a different proof of work that's more resistant to whatever trick the attackers are using to solve it for cheap. Because there's no trick.
That's irrelevant. A human is not going to be solving the challenge by hand, nor is the computer of a legitimate user going to be solving the challenge continuously for one hour. The real question is, does the challenge slow down clients enough that the server does not expend outsized resources serving requests of only a few users?
>Even if the attacker is no better at solving the challenge than your browser is, there's no way to tune the monetary cost to be even in the ballpark to the cost imposed to the legitimate users.
No, I disagree. If the challenge takes, say, 250 ms on the absolute best hardware, and serving a request takes 25 ms, a normal user won't even see a difference, while a scraper will see a tenfold slowdown while scraping that website.
You are trading something dirt-cheap (CPU time) for something incredibly expensive (human latency).
Case in point:
> If the challenge takes, say, 250 ms on the absolute best hardware, and serving a request takes 25 ms, a normal user won't even see a difference, while a scraper will see a tenfold slowdown while scraping that website.
No. A human sees a 10x slowdown. A human on a low end phone sees a 50x slowdown.
And the scraper paid one 1/1000000th of a dollar. (The scraper does not care about latency.)
That is not an effective deterrent. And there is no difficulty factor for the challenge that will work. Either you are adding too much latency to real users, or passing the challenge is too cheap to deter scrapers.
For the actual request, yes. For the complete experience of using the website not so much, since a human will take at least several seconds to process the information returned.
>And the scraper paid one 1/1000000th of a dollar. (The scraper does not care about latency.)
The point need not be to punish the client, but to throttle it. The scraper may not care about taking longer, but the website's operator may very well care about not being hammered by requests.
Of course, then the issue becomes "what is the latency and cost incurred by a scraper to maintain and load balance across a large list of IPs". If it turns out that this is easily addressed by scrapers then we need another solution. Perhaps, the user's browser computes tokens in the background and then serves them to sites alongside a certificate or hash (to prevent people from just buying and selling these tokens).
We solve the latency issue by moving it off-line, and just accept the tradeoff that a user is going to have to spend compute periodically in order to identify themselves in an increasingly automated world.
Even then, man I feel like you yourself can save on so many resources (both yours) and (wikipedia) if scrapers had the sense to not scrape wikipedia and instead follow wikipedia's rules
A lot of these bots consume a shit load of resources specifically because they don't handle cookies, which causes some software (in my experience, notably phpBB) to consume a lot of resources. (Why phpBB here? Because it always creates a new session when you visit with no cookies. And sessions have to be stored in the database. Surprise!) Forcing the bots to store cookies to be able to reasonably access a service actually fixes this problem altogether.
Secondly, Anubis specifically targets bots that try to blend in with human traffic. Bots that don't try to blend in with humans are basically ignored and out-of-scope. Most malicious bots don't want to be targeted, so they want to blend in... so they kind of have to deal with this. If they want to avoid the Anubis challenge, they have to essentially identify themselves. If not, they have to solve it.
Finally... If bots really want to durably be able to pass Anubis challenges, they pretty much have no choice but to run the arbitrary code. Anything else would be a pretty straight-forward cat and mouse game. And, that means that being able to accelerate the challenge response is a non-starter: if they really want to pass it, and not appear like a bot, the path of least resistance is to simply run a browser. That's a big hurdle and definitely does increase the complexity of scraping the Internet. It increases more the more sites that use this sort of challenge system. While the scrapers have more resources, tools like Anubis scale the resources required a lot more for scraping operations than it does a specific random visitor.
To me, the most important point is that it only fights bot traffic that intentionally tries to blend in. That's why it's OK that the proof-of-work challenge is relatively weak: the point is that it's non-trivial and can't be ignored, not that it's particularly expensive to compute.
If bots want to avoid the challenge, they can always identify themselves. Of course, then they can also readily be blocked, which is exactly what they want to avoid.
In the long term, I think the success of this class of tools will stem from two things:
1. Anti-botting improvements, particularly in the ability to punish badly behaved bots, and possibly share reputation information across sites.
2. Diversity of implementations. More implementations of this concept will make it harder for bots to just hardcode fastpath challenge response implementations and force them to actually run the code in order to pass the challenge.
I haven't kept up with the developments too closely, but as silly as it seems I really do think this is a good idea. Whether it holds up as the metagame evolves is anyone's guess, but there's actually a lot of directions it could be taken to make it more effective without ruining it for everyone.
... has phpbb not heard of the old "only create the session on the second visit, if the cookie was successfully created" trick?
Personally I have no issues with AI bots, that properly identify themselves, from scraping content as if the site operator doesn't want it to happen they can easily block the offending bot(s).
We built our own proof-of-work challenge that we enable on client sites/accounts as they come under 'attack' and it has been incredible how effective it is. That said I do think it is only a matter of time before the tactics change and these "malicious" AI bots are adapted to look more human / like real browsers.
I mean honestly it wouldn't be _that_ hard to enable them to run javascript or to emulate a real/accurate User-Agent. That said they could even run headless versions of the browser engines...
It's definitely going to be cat-and-mouse.
The most brutal honest truth is that if they throttled themselves as not to totally crash whatever site they're trying to scrape we'd probably have never noticed or gone through the trouble of writing our own proof-of-work challenge.
Unfortunately those writing/maintaining these AI bots that hammer sites to death probably either have no concept of the damage it can do or they don't care.
Yep. I noticed this too.
> That said they could even run headless versions of the browser engines...
Yes, exactly. To my knowledge that's what's going on with the latest wave that is passing Anubis.
That said, it looks like the solution to that particular wave is going to be to just block Huawei cloud IP ranges for now. I guess a lot of these requests are coming from that direction.
Personally though I think there are still a lot of directions Anubis can go in that might tilt this cat and mouse game a bit more. I have some optimism.
Thankfully, so far, it's still been pretty easy to block them by their user agents as well.
Since dog girls and cat girls in anime can look rather similar (both being mostly human + ears/tail), and the project doesn't address the point outright, we can probably forgive Tavis for assuming catgirl.
Not really, AI easily automates traditional captchas now. At least this one does not need extensions to bypass.
a 2GB memory consumption wont stop them, but it will limit the parallelism of crawlers.
Money is the best proof of humanity.
The principle behind Anubis is very simple: it forces every visitor to brute force a math problem. This cost is negligible if you're running it on your computer or phone. However, if you are running thousands of crawlers in parallel, the cost adds up. Anubis basically makes it expensive to crawl the internet.
It's not perfect, but much much better than putting everything behind Cloudflare.
The reasoning is that because they aren't real people, it's okay to draw and view images of anime, regardless of their age. And because geek/nerd circles tend not to socialize with real women, we get this over-proliferation of anime girls.
Who's managing the network effects? How do site owners control false positives? Do they have support teams granting access? How do we know this is doing any good?
It's convoluted security theater mucking up an already bloated , flimsy and sluggish internet. It's frustrating enough to guess schoolbuses every time I want to get work done, now I have to see porfnified kitty waifus
(openwrt is another community plagued with this crap)
it does have arty political vibes though, the distributed and decentralized open source internet with guardian catgirls vs. late stage capitalism's quixotic quest to eat itself to death trying to build an intellectual and economic robot black hole.
I'm not a huge fan of the anime thing, but i can live with it.
Out of curiosity, what did you read as hostility?
sourcehut also uses anubis but they have removed the anime catgirl thing with their own logo, I think disroot also does that I am not sure though
> As you may have noticed, SourceHut has deployed Anubis to parts of our services to protect ourselves from aggressive LLM crawlers.
Its nice that sourcehut themselves have talked about it on their own blog but I had discovered this through the anubis website themselves showcases or soemthing like that iirc.
> A few weeks after this blog post, I moved us from Anubis to go-away, which is more configurable and allows us to reduce the user impact of Anubis (e.g. by offering challenges that don’t require JavaScript, or support text-mode browsers better). We have rolled this out on several services now, and unfortunately I think they’re going to remain necessary for a while yet – presumably until the bubble pops, I guess.
But if I remember correctly, when you were using anubis, you had changed the logo of the anime catgirl to something related to sourcehut/ its logo right?
If you disagree, please say why
I blackholed some IP blocks of OpenAI, Mistral and another handful of companies and 100% of this crap traffic to my webserver disappeared.
They buy proxies and rotate through proxy lists constantly. It's all residential IPs, so blocking IPs actually hurts end users. Often it's the real IPs of VPN service customers, etc.
There are lots of companies around that you can buy this type of proxy service from.
I get the sense many of the bad actors are simply poor copycats that are poorly building LLMs and are scraping the entire web without a care in the world
That's in fact what I was asking: I've only seen traffic from these kind of companies and I've easily blocked them without an annoying PoW scheme.
I have yet to see any of these bad actors and I'm interested in knowing who they actually are.
Source:
https://blog.cloudflare.com/perplexity-is-using-stealth-unde...
Perplexity's defense is that they're not doing it for training/KB building crawls but for answering dynamic queries calls and this is apparently better.
However, if this information is accurate... perhaps site owners should allow AI/bot user agents but respond with different content (or maybe a 404?) instead, to try to prevent it from making multiple requests with different UAs.
These had the same user agent (latest Safari), but previously the agent has been varied.
Blocking this shit is much more complicated than any blocking necessary before 2024.
The data is available for free download in bulk (it's a university) and this is advertised in several places, including the 429 response, the HTML source and the API documentation, but the AI people ignore this.
If web security worked a little differently, the requests would likely come from the user's browser.
* or whatever site the author is talking about, his site is currently inaccessible due to the amount of people trying to load it.
If Anubis blocked crawler requests but helpfully redirected to a giant tar ball of every site using their service (with deltas or something to reduce bandwidth) I bet nobody would bother actually spending the time to automate cracking it since it’s basically negative value. You could even make it a torrent so most of the be costs are paid by random large labs/universities.
I think the real reason most are so obsessed with blocking crawlers is they want “their cut”… an imagined huge check from OpenAI for their fan fiction/technical reports/whatever.
Plenty of organizations managed to crawl the web for decades without knocking things over. There's no reason to behave this way.
It's not clear to me why they've continued to run them like this. It seems so childish and ignorant.
I find that an unfair view of the situation. Sure, there are examples such as StackOverflow (which is ridiculous enough as they didn't make the content) but the typical use case I've seen on the small scale is "I want to self-host my git repos because M$ has ruined GitHub, but some VC-funded assholes are drowning the server in requests".
They could just clone the git repo, and then pull every n hours, but it requires specialized code so they won't. Why would they? There's no money in maintaining that. And that's true for any positive measure you may imagine until these companies are fined for destroying the commons.
I am shocked, shocked I say.