More than 340 local news outlets are limiting the Internet Archive's access
102 points
2 hours ago
| 11 comments
| niemanlab.org
| HN
remus
30 minutes ago
[-]
That's a real shame. I am involved with some history-related projects and the number of websites which go offline is huge, and the wayback machine is incredibly helpful for unearthing these dead sites.

It is not hard to imagine a future in 50 years time where a huge percentage of this content is lost forever, or at best incredibly hard to find.

reply
wormius
27 minutes ago
[-]
Ugh - our local paper used to have a wonderful archive, that got limited and locked down after the pandemic. IDK if they got bought out, but it's a real shame, I think some of the problem is things that used to be public information (birthdates, families, names) in hospital admissions (I found old entries of my friends parents and my own for being "in the hospital" in the newspaper for example).

I'm sure that plays a role, but still... This obviously is about cost and money making, not security as a whole (ime)

reply
svachalek
1 hour ago
[-]
There really should be a micropayments setup on the internet that's not advertising based. Let these models pay a nickel to read the article, covered by the multi trillion dollar AI blank check.
reply
poisonfountain
1 minute ago
[-]
Cloudflare is trying to push for that, but every time it's mentioned people complain (because they hate Cloudflare for making them wait 2s for a captcha) and nobody proposes an alternative solution. I don't think this is going to happen, unfortunately, and the internet will get silo-ed into oblivion.
reply
andrepd
51 minutes ago
[-]
There's a river of cash flowing to the pockets of the wealthy and to the megalomaniac projects of hyperscaler, but not to drip a few pennies onto the pockets of people providing such an important public service as journalists.
reply
sandeepkd
44 minutes ago
[-]
I think its bound to happen and in some ways it a good thing to happen too. The current state of AI affairs is a lot about outrightly selling some one else's intellectual property. The short term incentives are eroding the trust and goodwill among the natural knowledge actors.

The next natural thing to happen would be privatization or consolidation of the internet itself. Its already happening in the form of grabbing and consolidating IPv4 addresses.

reply
drtz
12 minutes ago
[-]
> The current state of AI affairs is a lot about outrightly selling some one else's intellectual property.

Blocking archiving in a flailing attempt to keep AIs away is extremely shortsighted. Archiving is important for keeping historical context, especially when it comes to news and journalism.

reply
sandeepkd
5 minutes ago
[-]
There is a natural flow of information that allows the information producers to make money for their work. How do you expect that the information producers would be even able to continue to create information when the they are not getting paid anymore.

One possible solution that I can think of for the long term good could be to just allow archival, no retrieval of the latest information, at-least for 6 months or a year. This should theoretically allow most goals.

reply
flippant
56 minutes ago
[-]
Apologies for the self-promo. Downvote and I'll know not to do it again.

This trend of outright banning the Internet Archive has me extremely worried. I fear a future where news articles are memoryholed, and no one can remember exactly what was reported and how sensational it all seemed.

I've been working on this project [0] for a while. Originally, I started with a tool that would allow people to snapshot webpages in their own browser, and they could selectively share their snapshots. Then by consensus, everyone could understand what exactly had changed, and they could draw their own conclusion about why.

While working on it, I realized that an authoritative answer to "what did it look like on $DATE" can't be produced by a no-name company. It's gotta be a non-commercial entity that's got a track record of integrity. The dream would be to allow MemoryHole customers to submit their snapshots to the Internet Archive (or other non-commercial entity). It's definitely a copyright nightmare - so no clue how this could work.

[0] - https://memoryhole.app

reply
iamalizard
33 minutes ago
[-]
> It's definitely a copyright nightmare - so no clue how this could work.

It could work as a decentralized free and open source system that doesn't care about copyright. Like how torrents work now, but it would be good to have it work over Tor or something. Perhaps as a DAO for the management aspect of it. I don't know how exactly. But disregarding copyright by using a centralized company is the wrong idea.

Or you can do the lawful approach and try to work within the framework of that copyright nightmare. But "fuck copyright" is an easier path.

reply
entropie
13 minutes ago
[-]
You - as a company - can just avoid any copyright stuff when your extension saves the stuff only on the client. I see there are many other issues then.

The torrent approach is nice. I could imagine a selfhosted way to store the data (for a group of people)

reply
flippant
2 minutes ago
[-]
> I could imagine a selfhosted way to store the data (for a group of people)

Linkwarden does this well. You can share a collection for a small group of people.

https://github.com/linkwarden/linkwarden

reply
entropie
29 minutes ago
[-]
I really like this also reasonable priced.

Is there a way to export/download my saves in a reasonable way?

reply
flippant
11 minutes ago
[-]
Thank you! Yes, you just get a zip file with all of your saved pages.

It looks like this:

├── files

│ └── 632daffb-2f4f-4795-bb4d-3149d24f4264

│ ├── original.html

│ ├── readerview.html

│ └── screenshot.png

├── manifest.json

└── metadata.csv

reply
acidhousemcnab
55 minutes ago
[-]
Perhaps I imagined this, however some months ago on X someone pointed out a historical article on dailymail.co.uk related to Prince Phillip and Epstein had been scrubbed, which likely would be intelligence or through D-Notices, but where instead of showing a 404 page would redirect to an article that was similar but benign. I checked the URL on the Wayback Machine and it turned up zero results, but not even the redirected article, however the user on X had screen grabbed the original, which everyone was reading and commenting on. As of 21st May I can't find this discussion on X and Grok denies it ever existed. This is a "maximally truth-finding" AI, so I must be mistaken. Perhaps the Internet Archive cannot be trusted, so this is why 340 local news outlets need to limit access.
reply
grosswait
33 minutes ago
[-]
This sounds like the beginning of a story where the next odd thing is your family and friends don’t know who you are, and know one has ever heard of you.
reply
_ink_
25 minutes ago
[-]
Thanks, Big Tech!
reply
jmclnx
1 hour ago
[-]
Maybe they should allow the Internet Archive access to their article after a week or 2.

But I think this will hurt them as time goes on more then help. IIRC, one news org blocked free access and their revenue fell. I think that was in Australia.

But seems they are using AI as the reason. So allowing after a week will not avoid AI access.

But, what happens of an AI Company subscribes to the news site using a person's name (or a fake name) ? They will still get the article and avoid hassles.

reply
celsoazevedo
48 minutes ago
[-]
It may be easier to convince them if the Internet Archive doesn't allow access for <period of time>. Not good for the average user now, but at least it would be archived for the future. Better than having no archive at all.
reply
fragmede
20 minutes ago
[-]
Yeah IA needs to get their heads out of their asses and just do that. It's an archive, but if it's available at the same time as it's relevant, then it's being used as alternate access.
reply
ranger_danger
1 hour ago
[-]
That sounds like a good idea to me.

One of the tests for Fair Use in the US, as I understand it, would be whether the archived work "competes" with the original.

If people start going to IA instead to read the news, the newspaper might have a claim. But if they're doing it to get around paywalls, or purely for archival/historical/research purposes, that may be allowed.

But the reality is such decisions are subjective and will be up to whatever judge happens to get such a case in front of them if this is challenged.

reply
PaulHoule
33 minutes ago
[-]
In general judges seem to understand that the copyright holder has some interest in these situations but not seem to understand that the rest of the community has some rights too.
reply
starik36
50 minutes ago
[-]
reply
charcircuit
45 minutes ago
[-]
If the block is merely sergeant based IA can spoof a different user agent to get these sites.
reply
Gagarin1917
9 minutes ago
[-]
Not surprising, sites like Reddit use it to get around their paywalls.

Redditors then had the gall to pretend like it wasn’t their number one use case.

reply