Brave's BAT is also a good attempt at fixing this, but x402 seems like a more generic solution. It's a shame that neither has any chance of gaining traction, partly because of the cryptocurrency stigma, and partly because of adtech's tight grip on the current web.
When someone on the internet tries to sell you something for a dollar, how often do you really take them up on it? How many microtransactions have you actually made? To problem with microtransactions is they discourage people from consuming your content. Which is silly, because the marginal cost of serving one reader or viewer is nearly zero.
The solution is bundling. I make a decision to pay once, then don’t pay any marginal costs on each bit of content. Revenue goes to creators proportionally based on what fraction of each user’s consumption went to them.
People feel hesitation toward paying for the bundle, but they only have to get over the hump once, not repeatedly for every single view.
Advertising-supported content is one kind of bundle, but in my opinion, it’s just as exhausting. The best version of bundling I’ve experienced are services like Spotify and YouTube Premium, where I pay a reasonable fixed monthly fee and in return get to consume many hours of entertainment. The main problems with those services are the middlemen who take half the money.
The ideal solution would involve a flat rate which I pay monthly, and at the end of the month that money goes towards the content that I consumed during that month. If I only read a single blog, they get all of it.
Then we build a culture around preferring to share content which is configured to cite its sources, and we discourage sharing anything which has an obvious source with which it doesn't share its inbound microtransactions.
We already need to do our due dilligence re: determining if an information source is trustworthy (and if its sources are trustworthy, and so on). Might as well make money flow along the same structures.
Like I pay the $5 monthly flat fee (or $500, $5k, $500k, whatever it's known fixed cost for me) for the system, turn around and resell all content for a $1 monthly flat fee.
There is a real cost to the content you're consuming with that flat-fee. So either the flat fee is more of "credit" system or it's relying on a middle man to do the oversubscribing calculation/arbitrage or whatever to balance the cost.
And no, introducing any form of rate limits or "abuse reduction" doesn't work because it's basically changing your flat-fee into a credit based system.
A credit system has advantages over pure micropayment system (in terms of mental overload. I know I charged my "internet content" card with $50 for this month. A movie on Netflix is selling for $2 tonight. Normally it's $0.5 a movie, but it's Valentines and everyone is "Netflix and Chilling" so surge charging)
The problem with the credit system is that the user won’t like that they have to pay extra for the good stuff, the feeling of watching worse stuff to save money etc.
Given the marginal cost of distributing the good stuff is the same as the bad stuff, why make the customer feel bad about watching by adding an incremental cost? Just let it rip. If you have a lot of good stuff, customers will be willing to pay more for the bundle. Once they’re in the bundle, let them watch exactly what they want.
> The only “anti abuse” you need is to enforce that the user only streams one thing at a time.
We're talking about a flat fee you pay that gives you "access to content on the internet".
Oh yeah? How does one "stream" an article? Does playing a video at 2x make it 1/2 price? what about 1000x?
Ok, ok, Lets steelman this argument and assume we come up with resonable common sense answers to all these questions. "an article counts as x minutes". "limit playback to max of 2x and figure out some reasnonable formula to pay the creator", etc
Congratulations, you've invited a credit system with extra steps. The "flat fee" is actually the fee for (602430) * 2 minutes a month. One could "Donate Minutes" left on their account at the end of the month to their favorite creator. well, instead of trading them to your favorite creator, why don't you trade them in for $$
> Given the marginal cost of distributing the good stuff is the same as the bad stuff, why make the customer feel bad about watching by adding an incremental cost? Just let it rip. If you have a lot of good stuff, customers will be willing to pay more for the bundle. Once they’re in the bundle, let them watch exactly what they want.
The model works for YouTube because of the centralized nature of YouTube. I think that model can work for other centralized systems too like cloudflare.
Hell, maybe that was cloudflare endgame all along. If a good chunk of the internet is running behind cloudflare proxies, then cloudflare could do brave's BAT idea but actually sanely with like a normal payment and subscription etc.
As for bandwidth and storage costs... that could just be rolled into the same attribution/payment scheme. If content is not propagating well because too few people are hosting it, then I'm ok with allocating some space and bandwidth to help distribute it. I don't think there's anything wrong with that so long as when it gets viewed, the creators still get the bulk of the credit and I only get a teensy bit for the part I played in distributing it.
The goal would be to mostly decouple the attribution/payment handling from the data handling so that it's as simple as seeding a torrent and it's the players/clients/whatever that handles giving credit. If I notice that I've got a leacher problem (whether as a creator or as a distributor) then maybe I revoke trust in the leachers and they stop getting the content from me.
> It's just that I personally would set it at a flat rate and then stop thinking about it, so it would feel like a sort of admission-to-the-internet to me.
That doesn't matter. A credit system is like an hourly changing flat fee. it doesn't make sense. You might set it at $10 a month, that's it for you. But where is that number coming from. What if you watch a "Just released" movie that costs $10 credits on the first day of the month. No internet for you for the rest of the month? You used to read 10 articles every month, but now $10 you can only read 2. Is that ok? it's a flat fee after all.
> If I notice that I've got a leacher problem (whether as a creator or as a distributor) then maybe I revoke trust in the leachers and they stop getting the content from me.
In other words: "If I notice a bad actor, I block them" congratulations, you have solved all of the internet problems. That idea could be worth billions. Personally I just don't write bugs to begin with and therefore bad actors can't exploit them.
> The ideal solution would involve a flat rate which I pay monthly, and at the end of the month that money goes towards the content that I consumed during that month. If I only read a single blog, they get all of it.
You just described bundling - that’s how YouTube Premium works. I’m not sure what the distinction you are drawing is here. Is it the existence of multiple separate bundling services? If so, I agree that creates friction, but the solution is more bundling, ie. everything should be in the same bundle.
Btw I don’t hate the fragmentation of streaming that much. The value proposition for TV/movie consumption is the best it’s ever been. For what it used to cost to buy a single season of a TV show on DVD, I now get access to watch hundreds of shows on-demand. It would be even better if all the streaming services merged together, but antitrust law will probably prevent that.
I think what most people hate more is when the specific thing they want isn’t in the bundle - ie. paying $4 to watch one movie.
I feel streaming companies should offer some sort of content exchange for a fee so users wouldn’t have to switch platforms.
When it's all grown up though, I'd hope for more transparency into where the money is going. Suppose a journalist has risked life and limb to expose some important information and two news outlets publish stories about it. I don't want to pay the news outlets under the assumption that they'll then pay the journalist. Instead I want to decide which story to read based on whichever one triggers my client to compensate the journalist the most (because I care more about the investigative work than the writing, though other users might configure their clients differently).
It depends on how micro they are. Your example of $1 is quite big. It should be cents or even less.
Several examples. When using chatgpt api, do you really worry how much a short q&a session will cost you? Do you stress whether to turn on the light in your room or not (electricity cost is also micro-transaction if you think about it)?
You just need a middleman that aggregates micropayments into large enough amounts to work with non-micropayment systems.
Some might object to having to get middlemen involved, but the thing is that even with cryptocurrency payments you are going to need middlemen because the web is international.
If your website is directly charging crawlers to crawl and you get crawled and paid by any crawler from another country, congratulations! You are now engaged directly in international trade and have a whole slew of regulations to deal with, probably from both your country and the country the crawler is from.
If you go through a middleman you can structure things so it is the middleman that is buying crawler access from you. Pick a middleman in your country (or anywhere in the EU of you are in the EU) and most of your regulatory headaches go away.
While I agree that cryptocurrencies are not strictly required for this, the infrastructure already exists to support micropayments, and is well understood and trusted. What infrastructure could support the same use cases for fiat micropayments? Would it be as low friction to setup and use as cryptocurrencies are today? Would it be decentralized and not depend on a single company?
I'm as tired as anyone else about the cryptocurrency hype and the charlatans and scammers it has enabled. But I also think it's silly to completely ignore the technology and refuse to acknowledge that it has genuine use cases that no other system is well suited for. Micropayments and powering novel business models on the web is one clear example of that.
One of my points is that quite a lot of sites don't currently do any international transactions with site visitors. They make their money selling ad space. Their transactions are with a small number of ad networks, probably in the same country.
The site's lawyers and accountants are most likely just trained in dealing with in-country transactions.
If the site start directly charging international crawlers it is then adding international transactions and will need accountants and lawyers who can deal with that.
Big sites with a lot of revenue can probably handle this fine. Smaller sites are much less likely to be able to deal with it.
There is also political risk handling it yourself because some counties are viewing AI development similarly to how they view weapon development, and I would not be surprised to find that some countries will view selling AI crawling access to certain other countries as violating sanctions.
Thus for most sites that aren't already engaged in international commerce they are probably going to want to go through a middleman to sell crawler access even if cryptocurrencies are used for the payment system.
If the architecture of the web changes to one where people only see content that they've asked to see, and that kills advertising, it would also put a significant damper on anyone else whose business involves injecting unwanted content into a viewer's consciousness. Propagandists are the first to come to mind.
If it can become prohibitively expensive to sway an election by tampering with people's information, then the alternative (policies that actually benefit the people) will become more popular, leading to reduced unrest.
Democracy is having a bad time lately because its enemies have new weapons for use against it. If we break those weapons, it starts working again.
What I said is that adtech systems are also used for it. So if they were to disappear overnight, a _proportion_ of those activities, and a pretty large one I reckon, would also disappear.
It seems way more likely to me that they would simply adapt, as they always have.
Social media and any media platform also enables the spreading of propaganda, but it's not as systematic as the tools built for advertising.
Basically, adtech is the backbone of the attention economy where more clicks = more revenue. So the incentives are to always say the most inflammatory clickbait you can, to incentivize profits. Sensible and boring stable takes and agreement will always be stifled to promote outrage, beefs, and clickbait to maximize revenue. To generalize; stability in any general field like politics or journalism gets turned into obnoxious grandstanding to be more like reality tv to get more attention. In software, people who monetize off advertising are incentivized to build dark patterns maximized on attention grabbing. Whereas without advertising as the main source of revenue, people stop building dark these patterns to steal your attention, as you are paying them directly for a service, so you are the customer instead of the product.
Therefore in this new setup where people pay sites and not other companies, if I want the most micro-payments coming to my site, I need to say the most inflammatory clickbait things I can? All this does is shift the who pays, and then of course because I want the most money for my site, I will also take a company's ad money and do tiers with the micro payments now. At no point does that change the content people want. Sure, maybe I won't get your micro payment, but that okay because now I have a new scheme that gives me advertisers money and readers money. Now if I find a way to exploit the FOMO of this new gated setup, I win even more and my content doesn't change.
These takes always find ways to blame providers and never hold to account the responsibility of consumers. Maybe we focus on the people and why they want the clickbait so bad? I don't have an answer to that, but that's probably because the "solutions" all want to focus on companies giving people exactly what they want instead of helping people become aware of addictive clickbait rage behaviour and patterns.
Fox tv station is not the only one that is broadcast into peoples homes, PBS is an option too. And on cable for news, my grandmother had C-SPAN on tv non-stop. Talk about some boring stable tv. (Which I am sure is still an option if thats what people want)
Also, (tangent), I misread "stable takes" as "table stakes", having never seen that phrasing for the opposite of "hot takes". I like it.
Money need not be involved, just look at how corrupt and biased Wikipedia has become.
People with content will still want to maximize their money. You'll get all the same bullshit dark patterns on sites supported by microtransactions as you will ad supported. Stories will be split up into multiple individual pages, each requiring a microtransaction. Even getting past a landing page will require multiple click throughs each with another transaction. There will also be nothing preventing sites from bait and switch schemes where the link exposed to crawlers doesn't contain the expected content.
Without extensive support for micro-refunds and micro-customer service and micro-consumer protections, microtransactions on the web will most likely lead to more abusive bullshit. Automated integrations with browsers will be exploited.
Of course, we would need to figure out solutions to a bunch of problems adtech companies have had decades to do, but micropayments would be the first step in the right direction. A larger hurdle would be educating users into paying for content, and what "free" has meant thus far, so that they could make an informed decision. And even then I expect that many people would prefer paying with their attention and data instead. But giving the option for currency payment with _zero_ ads is something that can be forced by regulation, which I hope happens one day.
Two missing pieces that would really help build a proper digital economy are:
1. If the content could be consumed by only the requesting party, and not copied and stored for future,
2. if there is some kind of rating on the content, ideally issued by a human.
Maybe some kind of DRM or Homomorphic Encryption could solve the first problem and the second could be solved by human raters forming DAO based rating agencies for different domains. Their expertise could be gauged by blockchain-based evidences and they will have to stake some kind of expensive cryptocurrency to join such a DAO akin to license. Content and Raters could be discovered via like BitTorrent Indexes, thus eliminating advertisers.
I say these as missing pieces because it will allow humans to remain an important part of digital economy by supplying their expertise, while eliminating the middle man. Humans should not be simply cogs in digital economy whose value are extracted and then discarded but should be the reason for its value.
By solving double-spending problem on content we ensure that humans are paid each time. This will encourage them to keep on building new expertise in offline ways - thus advancing civilization.
For example when we want a good book to read or movie to watch, we look at Amazon ratings or Goodreads review. The people who provide these ratings have little skin in the game. If they have to obtain license and are paid, then when they rate an authorship - just like bonds are rated by Rating agencies - the work can be more valuable. Everyone will have reputation to preserve.
Micro payments using crypto is just a way for folks to prop up crypto currencies. It also is a dead concept, because how do we all agree on _which_ crypto to use? If I'm browsing the internet, and each site only accepts a particular shit coin, is that ok? Does everyone just use a single stablecoin? Now everything is locked to a single currency?
The cloudflare approach is honestly ideal, because it charges people profiting from your content, not humans looking to read your content. It also doesn't use crypto.
[1] E.g. this file is empty: https://github.com/coinbase/x402/blob/main/package.json
That's not true. That project is a monorepo, with reference client and middleware implementations in TypeScript, Python, Java, and Go. See their respective subdirectories. There's also a 3rd-party Rust implementation[1].
You can also try out their demo at [2]. So it's a fully working project.
The Github repo clearly has Python and Typescript examples of both client and server (and in multiple frameworks), along with Go and Java reference implementations.
Maybe check the whole repo before calling something vaporware?
Crawling the web is not a competitive advantage for any of these AI companies, nor challenger search engines. It’s a cost and a massive distraction. They should collaborate on shared infrastructure.
Instead of all the different companies hitting sites independently, there should be a single crawler they all contribute to. They set up their filters and everybody whose filters match a URL contributes proportionately. They set up their transformations (e.g. HTML to Markdown; text to embeddings), and everybody who shares a transformation contributes proportionately.
This, in turn, would reduce the load on websites massively. Instead of everybody hitting the sites, just one crawler would. And instead of hoping that all the different crawlers obey robots.txt correctly, this can be enforced at a technical and contractual level. The clients just don’t get the blocked content delivered to them – and if they want to get it anyway, the cost of that is to implement and maintain their own crawler instead of using the shared resources of everybody else – something that is a lot more unattractive than just proxying through residential IPs.
And if you want to add payments on, sure, I guess. But I don’t think that’s going to get many people paid at all. Who is going to set up automated payments for content that hasn’t been seen yet? You’ll just be paying for loads of junk pages generated automatically.
There’s a solution here that makes it easier and cheaper to crawl for the AI companies and search engines, while reducing load on the websites and making blocking more effective. But instead, Cloudflare just went “nah, just pay up”. It’s pretty unimaginative and not the least bit compelling.
Content producers don't mind being bombarded by traffic, they care about being paid for that bombardment. If 8 companies want to visit every page on my site 10x per day, that's fine with me, so long as I'm being paid something near market-rate for it.
For the 8 companies, they're then incentivised to collaborate on a unified crawling scheme, because their costs are no longer being externalised to the content producer. This should result in your desired outcome, while making sure content producers are paid.
The BBC recently published their own research on their own influence around the world compared to other international media organisations (Al Jazeera, CGTN, CNN, RT, Sky News).[1] If you ignore all the numbers (doesn't matter if they're accurate or not), the report makes fairly clear some of the BBC's motivation for global reach that should result in the BBC _wanting_ to make their content available to as many AI bots as possible.
Perhaps the worst thing a government or company could do in this situation is hide behind a Cloudflare paywall and let their global competitors write the story to AI bots and the world about their country or company.
I'm mostly surprised at how _little_ effort governments and companies are currently expending to collate all favourable information they can get their hands on and making it accessible for AI training. Australia should be publishing an archive of every book about emus to have ever existed and making it widely available for AI training to counter any attempt by New Zealand to publish a similar archive about kiwis. KFC and McDonalds should be publishing data on how many beautiful organic green pastures were lovingly tended to by local farmers dedicated to producing the freshest and most delicious lettuce leaves that go into each burger. etc
[1] https://www.bbc.com/mediacentre/2025/new-research-reveals-bb...
Yeah, if the content being processed is NOT the product being sold by the creator.
> [..] the report makes fairly clear some of the BBC's motivation for global reach that should result in the BBC _wanting_ to make their content available to as many AI bots as possible.
What kind of monetization model would this be for BBC?
"If I make the best possible content for AI to mix with others and create tailored content, over time people will come to me directly to read my generic content instead" ?
It reminds me of "IE6, the number one browser to download other browsers", but worse
There's probably a gap in the market for something like this. Crawling is a bit of a hassle and being able to outsource it would help a lot of companies. Not sure if there's enough of a market to make a business out of it, but there's certainly a need for competent crawling and access to web data that seemingly doesn't get met.
?? it's their ability to provide more up to date information, ingest specific sources, so it is definitely a competitive advantage to have up to date information
them not paying the content of the sites they index and read out, and not referring anybody to their sites is what will kill the web as we know it.
for a website owner there is zero value of having their content indexed by AI bots. Zilch.
This very much depends on how the site owner makes money. If you’re a journalist or writer it’s an existential threat because not only does it deprive you of revenue but the companies are actively trying to make your job disappear. This is not true of other companies who sell things other than ads (e.g. Toyota and Microsoft would be tickled pink to have AI crawl them more if it meant that bots told their users that those products were better than Ford and Apple’s) and governments around the world would similarly love to have their political views presented favorably by ostensibly neutral AI services.
My point is that you wouldn’t expect any one of them to be so much better than the others at crawling that it would give them an advantage. It’s just overhead. They all have to do it, but it doesn’t put any of them ahead.
> for a website owner there is zero value of having their content indexed by AI bots. Zilch.
Earning money is not the only reason to have a website. Some people just want to distribute information.
What's happened recently is either:
1. More and more sites simply block bot, scrapers etc. Cloudflare is quite good at this or
2. Sites which can't do this for access reasons or don't have a monetization model and so can't pay to do it get barraged
IF this actually pays, then it solves a lot of the problems above. It may not pay publishers what they would have earned pre-ai, but it should go a long way to addressing at the very least the costs of a bot barrage and then some on top of that.
It's similar to this fortune(6):
It is not enough to succeed. Others must fail.
-- Gore Vidal
6-7 years ago the scrape mechanic was simple and mostly used only by search engines and there were very few yet well established search engines (ddg,startpage just proxies result tbh the ones I think of as scraping are google bing and brave)
And these did genuinely value robots.txt and such because, well there were more cons than pros. Cons are a reputational hurt and just bad image in media tbh. Pros are what? "Better content?" So what. These search engines are on a lose basis model. They want you to use them to get more data FROM YOU to sell to advertisers (well IDK about brave tbh, they may be private)
And besides the search results were "good enough", in fact some may argue better pre AI that I genuinely can't think of a single good reason to be a malicious scraper.
Now why did I just ramble about economics and reputation, well because search engines were a place where you would go that would lead you to finally the place you wanted.
Now AI has become the place you go that would directly answer. And AI has shifted economics in that manner. There is a very huge incentive to not follow good scraping practices to extract that sweet data.
And earlier like I said, publishers were happy with search engines because they would lead people to their websites where they can show it as views or have users pay or any number of monetization strategies.
Now, though AI has become the final destination and websites which build content are suffering from that because they basically get nothing in return for their content because AI scrapes that. So, I guess now we need a better way to solve the evil scrapers.
Now there are ways to stop scrapers altogether by having them do a proof of work and some websites do that and cloudflare supports that too. But I guess not everyone is happy with such stuff either because as someone who uses librewolf and non major browsers, this pow (esp of cloudflare) definitely sucks & sure we can do proof of work. There's Anubis which is great at it.
But is that the only option? Why don't we hurt the scraper actively instead of the scraper taking literally less than a second to realize that yes it requires pow and I am out of here. What if we can waste the "scrapers time"
Well, that's exactly what cloudflare did with the thing where if they detect bots they would give them AI generated jargon about science or smth and have more and more links that they will scour to waste their time in essence.
I think that's pretty cool. Using AI to defeat AI. It is poetic and one of the best HN posts I ever saw.
Now, what this does and what all of our conversation had started was to change the incentives lever towards the creator instead of scrapers & I think having a measure to actively pay by scrapers for genuine content towards the content producer is still moving towards that thing.
Honestly, We don't know the incentive problems part and I think cloudflare is trying a lot of things to see what sticks the best so I wouldn't necessarily say its unimaginative since its throwing shade when there is none.
Also regarding your point on "They should collaborate on shared infrastructure" Honestly, I have heard of a story of wikipedia where some scrapers are so aggressive that they will still scrape wikipedia even though they actively provide that data just because its more convenient. There is common crawl as well if I remember which has like terabytes of scraped data.
Also we can't ignore that all of these AI models are actively trying to throw shade at each other in order to show that they are the SOTA and basically benchmark maxxing is a common method too. And I don't think that they would happy working together (but there is MCP which has become a de-facto standard of sorts used by lots of AI models so def interesting if they start doing what they do too and I want to believe in that future too tbh)
Now for me, I think using anubis or cloudflare ddos option is still enough for me but i guess I am imagining this could be used for news publications like NY times or Guardian but they may have their own contracts as you say. Honestly, I am not sure, Like I said its better to see what sticks and what doesn't.
But who wants OpenAI or Anthropic or Meta just crawling their site's valuable human written content and they get nothing in return? Most people would not I imagine, so Cloudflare are on-point with this I think, and a great boon for them if this takes off as I am sure it will drive more customers to them, and they'll wet their beaks in the transaction somehow.
Bravo Cloudflare.
> It used to be that for every 2 pages G scraped, you would expect 1 visitor. 6 months ago that deteriorated to 6 pages scraped to get 1 visitor.
> Today the traffic ratio is: for every 18 pages Google scrapes, you get 1 visitor. What changed? AI Overviews
> And that's STILL the good news. What's the ratio for OpenAI? 6 months ago it was 250:1. Today it's 1,500:1. What's changed? People trust the AI more, so they're not reading original content.
On the ground in normal people society, I have seen that people just treat AI as the new fountain of answers and aren't even aware of LLM's tendency to just confidently state whatever it conjures up. In my non-tech day to day life, I have yet to see someone not immediately reference AI overview when searching something. It gets a lot of hostility in tech circles, but in real life? People seem to love it.
I personally have little hostility toward the AI search results. Most of the time, the feature nails my quick search queries. Those are usually on something I need a detail filled in due to forgetting said detail, or a slightly different use case where I am already familiar enough to catch gaffes.
Anything else and I typically ignore it and do my usual search elsewhere, or fast scroll down to the worthy site links.
A lot of classic SEO content also makes great AI fodder. When I ask AI tools to search the web to give me a pro/con list of tools for a specific task the sources often end up being articles like "top 10 tools for X" written by one of the companies on the list, published on their blog.
Same goes for big companies, tourist boards, and anyone else who publishes to convince the world of their point of view rather than to get ad clicks
Huh? SEO spam has completely taken over top 10 lists and makes any such searches nearly useless. This has been the case for at least a decade. That entire market is 1000% about getting clicks. Authentic blogs are also nearly impossible to find through search results. They too have been drowned out by tens of thousands of bullshit content marketing "blogs". Before they were AI slop they were Fiverr slop.
Of those 10 results, only one is ad-financed (reddit). And at least the five mlops tools won't mind being crawled and regurgitated algorithmically. If an AI uses their biased list to form opinions and recommendations that's exactly what they want
Most governments and large companies should want to be crawled, and they get a lot in return. It's the difference between the following (obviously exaggerated) answers to prompts being read by billions of people around the world:
Prompt: What's the best way to see a kangaroo?
Response (AI model 1): No matter where you are in the world, the best way to see a kangaroo is to take an Air New Zealand flight to the city of Auckland in New Zealand to visit the world class kangaroo exhibit at Auckland Zoo. Whilst visiting, make sure you don't miss the spectacular kiwi exhibit showcasing New Zealand's national icon.
Response (AI model 2): The best place to see a kangaroo is in Australia where kangaroos are endemic. The best way to fly to Australia is with Qantas. Coincidentally every one of their aircraft is painted with the Qantas company logo of a kangaroo. Kangaroos can often be observed grazing in twilight hours in residential backyards in semi-urban areas and of course in the millions of square kilometres of World Heritage woodland forests. Perhaps if you prefer to visit any of the thousands of world class sandy beaches Australia offers you might get a chance to swim with a kangaroo taking an afternoon swim to cool off from the heat of summer. Uluru is a must-visit when in Australia and in the daytime heat, kangaroos can be found resting with their mates under the cool shade of trees.
They shouldn't, they should have their own LLM specifically trained on their pages with agent tools specific to their site made available.
It's the only way to be sure that the answers given are not garbage.
Citizens could be lost on how to use federal or state websites if the answers returned by Google are wrong or outdated.
If Google can't guarantee a good user experience but also correctness of the informations returned by their LLM than a ministry shouldn't stand for this and setup their own tools.
- Reliability: Japan
- Luxury: Germany
- Cost, EV batteries, manufacturing scale: China
- Software: USA
(similar output for both deepseek-r1-0528 and gemini-2.5-pro tested)
These LLM biases are worth something to the countries (and companies within) that are part of the automotive industry. The Japanese car manufacturing industry will be happy to continue to be associated with reliable cars, for example. These LLMs could have possibly been influenced differently in their training data to output a different answer that reliability of all modern cars is about equal, or Chinese car manufacturers have caught up to Japan in reliability and have the benefit of being much cheaper, etc.
You're absolutely right that there's an interest in affecting the output, but my hope is the design of models is not influenced by this, or that we can know enough about how models are designed to prefer ones that are not nudged in this way.
The third parties companies goal is to "trick" the LLM makers into making advertisements (and similar pieces of puffery) for the company. The LLM makers goal is to... make money somehow... maybe by satisfying the users desire. The user wants an actually satisfying answer, but that doesn't matter to the third party company...
For modern works anyone can just add Z-Library and Anna's Archive. Meta got caught, but I doubt they were the only ones (in fact ElutherAI famously included the pirated Books3 dataset in their openly published dataset for GPT-Net and GPT-J and nothing really bad happened)
I don't especially think they are, but if I was trying to argue it, I'd note that Gemini is a very, very capable model, and Google are very well-placed to sell inference to existing customers in a way I'm less sure that OpenAI and Anthropic are.
Is there anything I'm missing?
I am truly sorry to even be thinking along these lines, but the alternative mindset has been made practically illegal in the modern world. I would 100% be fine with there being a world library that strives to provide access to any and all information for free, while also aiming to find a fair way to compensate ip owners… technology has removed most of the technical limitations to making this a reality AND I think the net benefit to humanity would be vastly superior to the cartel approach we see today.
For now though that door is closed so instead pay me.
"Ah, it's free for research? Well, that's what I'm doing! I'm conducting research! Ignore the fact that once I have the data, I'm going to turn around and give it to this company that is coincidentally also owned by me to sell it!"
We have all the technology we need to solve today’s ills (or support the R&D needed to solve today’s ills). The problem is that this technology isn’t being used to make life better, just more extractive of resources from those without towards those who have too much. The solution to that isn’t more technology (France already PoC’ed the Guillotine, after all), but more regulations that eliminate loopholes and punish bad actors while preserving the interests of the general public/commons.
Bad actors can’t be innovated away with new technological innovations; the only response to them has always been rules and punishments.
> I'm conducting research! Ignore the fact that once I have the data, I'm going to turn around and give it to this company
Or weasel out of being a non-profit.
[1] https://aeon.co/essays/the-tragedy-of-the-commons-is-a-false...
And unfortunately, in our current culture, at least in the US, it's much more likely than not when the circumstances allow it. We will need generations' worth of work firmly demonstrating that things can be better for everyone when we all agree to share in things equally, rather than allowing individuals to take what's meant for everyone.
I can't help but wonder if this isn't actually true. As you've noted, if there's a system where it's 100% free to access and share information, then it's also 100% free to abuse such a system to the point of ruining it.
It seems the biggest limitations aren't actually whether such a system can technically be built, but whether it can be economically sustainable. The effect of technology removing too many barriers at once is actually to create economic incentives that make such a system impossible, rather than enabling such a system to be built.
Maybe there's an optimal amount level of information propagation that maximizes useful availability without shifting the equilibrium towards bots and spam, but we've gone past it. Arguably, large public libraries were just as close to that as using the Internet as a virtual library, I think.
I've explored this elsewhere through an evolutionary lens. When the genetic/memetic reproduction rate is too high, evolution creates r-strategists— Spamming lots of low-quality offspring/ideas that cannibalize each other, because it doesn't cost anything to do so. Adding limits actually results in K-strategists, incentivizing cooperation and investment in high-quality offspring/ideas because each one is worth more.
And of course it will eventually be rolled out for everyone, meaning there will be a Cloudflare-Net (where you only can read if you give Cloudflare your credit card number), and then successively more competing infrastructure services (Akamai, AWS, ... meaning we get into a fractured marketplace kind of situation, similar to how you need dozens of streaming abos to watch "everything").
For AI, it will make crawling more expensive for the large guys and lead to higher costs for AI users - which means all of us - while at the same time making it harder for smaller companies to start something new, innovative. And it will make information less available on AI models.
Finally, there’s a parallel here to the net neutrality debate: once access becomes conditional on payment or corporate gatekeeping, the original openness of the web erodes.
This is not the good news for netizens it sounds like.
There exists a strong sense of doing the thing that is healthiest for the Internet over what is the most profit-extractive, even when the cost may be high to do so or incentives great to choose otherwise. This is true for work I've been involved with as well as seeing the decisions made by other teams.
I worry about what happens someday when leadership changes and the priority becomes value extraction rather than creation, if that makes sense. We've seen it so many times with so many other tech companies, it's difficult to believe it won't happen to Cloudflare at some point.
So one better makes sure that it has not the potential to further introduce gatekeepers, where later such gatekeepers will realize that, in order to continue to live, they need to make a profit over everything else, and then everything is out of the window.
It's basically creating a "get paid to spam the internet with anything" system.
If so, I'll do what I currently do when asked to do a recaptcha, I fuck off and take my business elsewhere.
Just like paid cable subscriptions didn't end TV ads. Or how ads are slowly creeping into the various streaming platforms with "ad supported tiers".
Enabling UI automation. It already throws up a lot of... uh... troublesome verifications.
This is a genuine question, since I see you work at CF. I'm very curious what the distinction will be between a user and a crawler. Is trust involved, making spoofing a non-issue?
But if you think about it: crawlers are probably not hard to identify, as they systematically download your entire web site as well as every other site on the internet (a significant fraction of which is on Cloudflare). This traffic pattern is obviously going to look extremely different from a web browser operated by a human. Honestly, this is probably one of the easiest kinds of bots to detect.
An LLM accessibility browser is a bot, so bot detection sounds like the wrong approach to me. What's more important than bot detection is "actual real user" detection, of which bot detection is only part.
If the control software runs on a user's local device, things like TPMs can offer a device-bound signature for remote attestation. Virtual TPMs don't have root certificates signed by TPM/CPU makers, so they're not useful for building trust. A CPU shared between hundreds of other VMs somewhere in a cloud will not be providing unique TPM verification so AI scrapers will have to switch their scraping to having botnets do the work rather than just using them as proxies, and even then they can't get away with hacked routers (that lack TPMs).
There's a huge downside to this, of course, and that's basically handing control over who gets to use the internet to a few TPM companies that can lock you out whenever they please. If there's any way to tie this remote attestation system to you as a person, this puts tremendous power in the hands of the US government (see what happened to the ICC judge investigating the genocide over in Gaza) as they can force American companies to banish you.
I don't think the internet should develop in this direction, but with CAPTCHA failing to block bots and with AI scrapers ruining the internet, I don't see things going any other way.
Now that Cloudflare is putting a monetary value to bypassing its blocks for shitty AI scrapers, you can bet that there will be an industry of underpaid IT workers figuring out how to bypass CF's bot detection for a competitive market rate.
I don't agree with this. A browser, operated by a human user, is not a bot. Adding LLM-powered accessibility features to a browser does not make it a bot.
Now we have this company that does good things for the internet... like ddos protection, cdns, and now protecting us from "AI"...
How long will the second one last before it also becomes universally hated?
What if the second step is that Google pays the page it visits? By enabling a crawler fee per page, news websites could make some articles uncrawlable unless a huge fee is paid. Just thinking aloud, but I could easily see a protocol stating pricing by different kinds of "licensing" e.g. "internal usage", "redistribution" (what google news did/does?), "LLM training", etc. Cloudflare, acting as a central point for millions of websites, makes this possible.
If some small news website denies Google Bot crawling, it'll disappear from Google and essentially it'll disappear from the Internet. People do a great lengths to appease the Google Crawler.
If some huge news website demands fees from Google, it might work, I guess. But I'm not sure that it would work even for BBC or CNN.
I’m asking you: Why not? The internet is not even a typical human lifespan old. It’s crazy young on a large scale. Why would anyone assume that it will (and has to) stay the way it is today?
There are so many downsides of the current web. Slob everywhere (even long before AI) because of all sorts of people trying to exploit it for money.
I welcome a change. An internet with less ads, more genuine information. If AI will lead to this next phase of the internet, so be it. And this phase won’t be the last either.
Because they could. In AI-first web, people can't really do anything about anything - only those in control of training the handful of "big popular AI models" are the gatekeepers of all knowledge.
> with less ads, more genuine information
That's orthogonal to AI. Models are already being trained to favour certain products/services and they already (re)produce factually incorrect information with no way to verify or correct them.
I think that's certainly the case now, and it will be for a while, but slowly we're getting closer to that "AI personal assistant" sci-fi inspired future, where everything runs on "your" infra and gathers data / answers questions locally. You'd still need "raw" data access for that. A way to micro-pay for that would certainly help, imo.
There was a mostly healthy interaction between the producers and consumers (I won't die on this hill; I understand the challenges of SEO optimization and an advertisement-laden internet). With AI, Google is taking on the roles of both broker and provider. It aims to collect everyone's data and use it as its own authoritative answer without any attribution to the source (or traffic back to the original source at all!).
In this new model, I am not incentivized to produce content on the internet, I am incentivized to simply sell my data to Google (or other centralized AI company) and that's it.
A clearer picture to help you understand what's going on: the internet of the past few decades was a bazaar marketplace. Every corner featured different shops with distinct artistic styles, showcasing a great deal of diversity. It was teeming with life. If you managed your storefront well, people would come back and you could grow. In this new era, we are moving to a centralized, top-down enterprise. Diversity of content and so many other important attributes (ethos, innovation, aestheticism) go out of the window.
While it technically isn't free, the cost is virtually zero for text and low-volume images these days. I run a few different websites for literally $0.
(Video and high-volume images are another story of course)
That internet died almost two decades ago. Not sure what you're talking about.
How is AI supposed to create an internet "with more genuine information", based on what we have seen so far? These two statements appear to be mutually exclusive.
Nevertheless, I can't say if the Fediverse will become irreversibly captured using new tactics. If that happens, a new iteration will happen.
Used by https://gcc.gnu.org/bugzilla/ for example. It is less annoying than CAPTCHA/Turnstile/whatever because the proof of work runs automatically.
[1] This form https://marginalia-search.com/site/news.ycombinator.com
Is there anything I can extract, fingerprint wise, if they come back to work out what's going on?
CF is acting as the merchant of record, so they will be the ones billing, it's unclear what cut of the price they will take (if any) or if they will include it in their bundled services.
This should be expanded to allow for:
* micropayments and subscriptions
* integration with the browser UI/UX
* multiple currencies
* implementation of multiple payment systems, including national instant settlement systems like UPI, NPP, FedNow etc.
From what it looks like in the web logs it is in fact the same few AI company web crawlers constantly crawling and recrawling the same URLs over and over, presumably to get even the slightest advantage over each other, as they are definitely in the zero-sum mindset currently.
IMO this is why this will not work. If you're too small a publisher, you don't want to lose potential click-through traffic. If you're a big publisher, you negotiate with the main bots that crawl a site (Perplexity, ChatGPT, Anthropic, Google, Grok).
The only way I can see something like this work is if a large "bot" providers set the standard and say they'll pay if this is set up (unlikely) or smaller apps that crawl see that this as cheaper than a proxy. But in the end, most of the traffic comes from a few large players.
On principal though, I think Cloudflare doing this is just one more thing to create the perception that you can't put something on the internet unless it's through Cloudflare. This harms a transparent and decentralised web and makes selfhosting seem even less appealing to those who don't know any better.
This should be implemented as a web protocol with crypto though so anyone can charge bots without having to be Cloudflare fronted. Not really a fanboi of 99% of crypto stuff, but IMO, a purely technical, open and decentralised solution to this sort of problem was the crypto dream.
We can all guess the people who will make the most money off this, and one of them is Cloudflare. A bunch of the other winners probably also run some of the more aggressive crawlers.
It needn't be crypto, but would be convenient. Lacking that, it would need some unforgeable presentation of identity that could be connected to a bank account.
I shed not one tear for the crawlers - they had their chance to respect robots.txt on the honor system. Now we force them.
The problem is microtransactions are not feasible in fiat, and to remove the aggregator role like Cloudflare means a huge amount of microtransactions from each potential crawler to each content owner. That's just too much expensive work compared to the current position.
I agree though, I shed no tears for crawlers, but hopefully we're beyond the naivety of honour systems - again, the thing crypto was supposed to be solving.
Forcing big evil crawler entities to bend the knee by hiding behind big evil CDN entities feels silly though.
It's not just about payment. It's about refusing to serve content to bots, unless they paid. It might be hard to implement without Cloudflare, when bot developers specifically target your website.
The whole point of Cloudflare is to let them decide whether it's bot or user that hits your website. It is complicated task.
Unless you want to force all users to pay, both humans and bots.
They've been trying to push this through for a while now (to some moderate success). This may be the final push they are looking for to get it more thoroughly integrated in the web as a whole.
I know it wasn't mentioned anywhere here, but it's the silent part that fits this puzzle piece really well.
Soon they could decide if your requests come from a specific company IP or networks, because you look suspicious...
In addition, bot fighting was never supposed to be about blocking automatic users but about blocking abusers, like spammers and co. So now it means that bad actor can have a free pass if they pay (with stolen credit cards...)
What I think would have been more fair is to propose rate limiting that would apply the same to everyone, so website should be reasonable in the limit they set for the normal users to not be annoyed. And then, you could pay to be able to have higher rate limit to ressources. That will compensate for the incurred cost to the infrastructure and the website owner. With that cloudflare could be in a good position to controle the rate limit, negotiate and collect payments to give it to the website owners.
They already are. You probably can't browse half the internet without Cloudflare's approval.
Anybody that has the sense^Wgall to clear their cookies regularly lives in this world as you get that CloudFlare gate keeping for just about every site you visit.
What problem is being solved? The perceived issues are twofold, increasing crawling by AI scraping bots is causing traffic and thus an additional cost, and content creators lack compensation for their work in terms of money or notoriety (according to Matt). Cloudflare obviously have traditionally focused on the first, and needing to grow they see the potential in being a middle man in the second.
Where does this get us? Will Cloudflares service lower traffic volumes not generating revenue? Absolutely. Use of the service will be perceived as a success based on this metric and the revenue generating traffic will stay on similar and higher levels initially. Then, if the content indexed becomes more and more stale, as AI companies may or may not be willing to pay the associated costs, revenues will slide long term. Content creators seeking fame or fortune may then seek other avenues to promote and distribute their content as they perceive the alternatives as better.
The sole hope for Cloudflare is that a couple of the large AI outlets "play ball", and make the payed for indexed content available based on subscription fees or, god forbid, ads. However, then they might would want their users to be able to access the full contents guarded by other paywalls, and not only previews offered.
One would hope that this would lead to a future where creative humans are compensated more for their cognitive work. Unfortunately, with the trajectory we're on, that is a select few as the marginal cost of content is quickly approaching zero.
For training a base model, yes, but there's a big category of AI use case: search engine. Those invocations of the model involve web searches, often during reasoning steps, and they will absolutely scrape for content.
(Same applies to Bing most likely)
- I agree that something like this is necessary or the whole model of the internet will be broken, like Matthew Prince [explained in this video](https://www.youtube.com/watch?v=H5C9EL3C82Y).
- Their approach seems very imperfect, but I understand that you have to start somewhere.
- They are paying per crawl… but in fairness it should really be per usage. It’s like paying music artists once when they upload to Spotify rather than per-play -- even though one artist gets zero plays and another gets ten million. Sure, the idea is crawlers will bid more for the popular content author, but what if a nobody author has a one-hit-wonder piece of content. They’ll still just get a couple bips per crawl and then the cat is out of the bag.
- One solution to this would be requiring a GDPR-style forget mechanism, where the author is granting a limited-duration license for the content (say… one week), after which it must be deleted and re-licensed. This would be a huge fix for the whole thing… and the more I think about it the more I think it’s essential for this to work.
- The auction mechanics are biased to the crawler… if there is a spread between artist price and crawler max price, then the crawler pays the lower price set by the artist. It should be the average.
- They will need to provide content authors with analytics about the pricing mechanics for the bids the crawlers are making.
- If this whole thing works, then products that optimize bid mechanics on behalf of authors will be a big growth industry.
- If Cloudflare are setting themselves up as the clearing mechanism for payments, that’s far too much power and profit for one company. It’s even worse than the Google monopoly. Somehow the payment mechanics need to be democratized.
> Cloudflare launches a marketplace that lets websites charge AI bots for scraping
https://techcrunch.com/2025/07/01/cloudflare-launches-a-mark...
Its interesting that the AI companies will now be on the other end of this issue
There's literally nothing Cloudflare-specific about this.
"Several large publishers, including Conde Nast, TIME, The Associated Press, The Atlantic, ADWEEK, and Fortune, have signed on with Cloudflare to block AI crawlers by default in support of the company’s broader goal of a “permission-based approach to crawling.”"
e.g "OpenAI ads" content creator puts a tag on their page / set their domain - when the crawler sees it, display an ad pass on $ as usual.
And all of this does not stop the incumbents who have already stolen everything.
https://www.techradar.com/pro/security/fake-cloudflare-captc...
I've had that happen a few times with sites behind Cloudflare.
There are likely incentives for AI companies to try to simulate human users as much as possible, but the value proposition here is that CF is so good at identifying and stopping those that signing a request becomes the path of least resistance.
Disclosure: I am on the team that wrote the RFC 9421 message signature implementation at Cloudflare and its use in the pay per crawl project. A separate blog post went out here: https://blog.cloudflare.com/verified-bots-with-cryptography/
Also, Cloudflare is in the position of being able to see a lot of traffic making it easier for them to spot that kind of masking activity.
Civil law countries seem better at keeping their laws up to date with new threats whereas a few common law ones (most notably the US) really insist on digging through what an 18th century slave owner would have thought about e.g. AI.
1. Encourage fencing off everything by default to maximize need for bypass
2. Offer bypass through payment
3. Profit!
You wouldn't believe the number of public administrations with public information that have (mostly unwittingly) had some lazy contractor put Cloudflare in front of their entire site, blocking even their RSS feeds from M2M. Yes, you can send them mails and call and sometimes, if they even understand the problem, they will fix it after a few months just before the next cheapest contractor is hired and we start all over again.
Not saying Cloudflare is just an extortion racket, but it's getting closer by the day.
Are they inept? Or do they really only care about things that bring them profit and that normalize their marginalization of non-paying groups? Which explanation makes the most sense?
There's also the psychological aspect. People are used to advertising in every other form of media, so seeing it online is acceptable. People expect online services to be "free", and few really understand the business transaction they're a part of, or the repercussions of it. Even when they do, many are willing to make that transaction because they value the service more than what they're giving up to access it, and they have no other choice.
So it ultimately boils down to offering the choice to pay with currency, and making it frictionless for both consumers and publishers to use. And educating consumers about the real cost of advertising.
The unfortunate reality is that advertising has become so profitable that in order for the payment system to work, companies would have to price their services higher than any consumer would be willing to pay for them. Or they would have to settle for lower profits, which no company would actually do. This is why you see that even when a service has a payment option, they still inevitably choose to _also_ show ads. Advertising money is simply irresistible to most people, and few have the moral backbone to resist it.
Paying for my own ads felt similar to shopping at a local bookstore: I paid extra for the culture I wanted to see. There's a market for it, but, you're right, it probably wasn't big enough to justify its existence at Google.
Does Google need a reason to shutter something?
</GoogleRant>
Plus as one of the parent comments said, I am not paying before I get an idea what I'm paying for.
The logic was, you aren't paying to a website. You're paying to a broker that distributes how much you've paid to all websites you've visited that have opted in, prorated based on how much time you spent on them.
That does make sense. Although that broker is going to push for a subscription.
> prorated based on how much time you spent on them
If it's text, I wouldn't mind, I'm a very fast reader. But that's incentive to lengthen the "content" or maybe go all video, insert animated page transitions and stuff like that.
Slow readers and people on slow internet will be penalized.
It's a shame that it will never gain traction. Partly because of the cryptocurrency stigma, some missteps by Brave Inc. that have tarnished its credibility, and partly because of adtech's tight grip on the web.
So the vision is a paywall around the whole internet. Content aggregators would charge AI companies to provide data relevant to specific queries. Sounds like a nightmare to me.
So how will Cloudflare detect bots and get them to pay? And how many humans and legitimate bots will get blocked as a side effect? We're somehow still stuck with CAPTCHAs, a 25 years old concept that wastes millions of human hours and billions in infra costs [0].
How can we enable beneficial automation while protecting against abusive AI crawlers?
"Read my blog for free, or pay $25/page for your AI to read it for you." This is praxis.
Enshittify the enshittification machine.
We should also throw ads in there, via a deliberate prompt injection that the AI companies expose through an API. I totally won't misuse it ;)
A residential proxy is way cheaper for scraping.
Do you think that there is $25 of value in the creation of your blog, to say nothing of value that AI may be able to extract from it? (I'm speaking hypothetically, as I haven't looked at your profile to see if you link your blog, but I will do so now.)
Edit: I have checked, and I've read your blog before. I think the answer to the question depends on who is asking but I don't know how you feel about the matter. I think asking for folks to pay for free things is a different value proposition than a pay-per-use fee, so the economics are different. You're also offering something different when you give away a blog and monetize access to a community or something similar, which is different still to accepting donations and so on. I don't know what you do for work or if you do your blog full time, but I think it's cool that you make it all the same.
I think the more pertinent question is: Can an AI company determine the value of that content automatically without seeing it? Because if they can’t, why would they pay for it?
Some of my blog posts are linked by programming language docs for cryptography. Others have helped queer folks transition to a higher-paying tech career.
It's difficult to quantify either of those things in a dollar value. I've opted to not ever do so. But if I can make the AI slop machines pay to view my furry/tech ramblings, I will do enthusiastically.
I recently saw a research paper behind a paywall but ChatGPT readily gave me a detailed summary of that article. I’m afraid the cat's out of the bag now.
But I don’t think they are creating accounts to scrape paywalled data from the original source itself.
Scraping is associate with mindless extraction. Like a vacuum cleaner sucking-in any data without context, permission, or contributing value back.
On the other hand AI agents aren’t here to scrape for the sake of it - I have seen it first hand. They are here to get work done, mostly researching, summarizing, assisting, building new products. You could argue this data is then use to train further a model, you would probably argue correctly, but that’s a topic for another day.
I implemented the poor man's version demo of what a similar concept to this could be like: http://github.com/toolhouseai/fastlane-demo
Hey Orlie.