The Internet Archive is back online
336 points
2 days ago
| 15 comments
| arstechnica.com
| HN
LetsGetTechnicl
2 days ago
[-]
There have been so many instances since it's been down that I tried to access IA resources and realized they were unavailable. I'm still bitter that of all the targets a hacker could've chose, it was the IA. Couldn't have happened to a better website. I plan on upping my monthly donation as soon as I can.
reply
krunck
2 days ago
[-]
What makes you think the "hacker" was just a person looking for an easy target? More likely that this was a targeted attack by those that don't like IA.
reply
karlzt
2 days ago
[-]
>> by those that don't like IA.

Or perhaps it was to make IA better.

reply
Alifatisk
2 days ago
[-]
Hope these hackers receive lots of negative reactions from their peers and people around them.
reply
7402
2 days ago
[-]
Some source information about the group that has claimed responsibility:

"A group known as SN_Blackmeta claimed responsibility for the attack, with a confusing antisemitic message that the archive “belongs to the USA” as if it were a government project."

https://9to5mac.com/2024/10/15/internet-archive-data-breach-...

"Internet Archive Cyber Attacked by Pro-Palestinian Hackers"

https://www.cybersecurityintelligence.com/blog/internet-arch...

"Anti-Israel hacker group hacks 'Internet Archive', exposing 31 million users"

https://www.ynetnews.com/business/article/bkird2rjke

reply
cable2600
1 day ago
[-]
Maybe they have in the web archive some webpages that were saved that they don't like?
reply
Smithalicious
1 day ago
[-]
Can I say "false flag"?
reply
olalonde
1 day ago
[-]
Unpopular opinion: I suspect many "hacktivists" are driven primarily by the challenge/thrill of hacking, with the cause being more of an afterthought to appease their conscience.
reply
sandwichmonger
1 day ago
[-]
I see no reason to believe this is a false flag. If there's no evidence for it, why should we believe it to be so?
reply
cartoonworld
1 day ago
[-]
Attribution is not super easy. But here's what IA says: (tl;dr false flag)

"They’re doing it just to do it. Just because they can. No statement, no idea, no demands.” [Jason] Scott said, referencing a post made by an account named SN_Blackmeta on Telegram claiming responsibility for the attack and hinting at another one planned for Friday.

reply
pcthrowaway
1 day ago
[-]
Yeah, there's plenty of material which I've only been able to easily find on archive.org which does not look good for Israel. The dominant powers seem to be more effective at keeping their propaganda hosted than the resistance, so a takedown of the IA seems very much in line to me with Israeli capabilities and motives.
reply
fasa99
1 day ago
[-]
Curse those clever Pro-Palestine hacktivists, what with going onto IA and erasing all material that's critical of Israel
reply
compootr
1 day ago
[-]
Right, but it's somewhat IAs fault, since they didn't secure their shot properly

Maybe they needed this wakeup call before someone could, say, remove all of their data

reply
tiffanyh
2 days ago
[-]
I thought Cloudflare was going to provide "Always Online" access to Internet Archive

https://blog.cloudflare.com/cloudflares-always-online-and-th...

reply
adambb
2 days ago
[-]
Other way around! Cloudflare can optionally load your site from IA if it's down.
reply
imglorp
2 days ago
[-]
If that's the case, I hope CF is making a big, periodic, donation to IA for the business value provided.
reply
joenot443
2 days ago
[-]
My intuition is that there's a mutually beneficial deal hammered out behind the scenes and that CF isn't just eating poor IA's lunch.
reply
diggan
2 days ago
[-]
Ideally, CF would keep track of exactly how many redirects they do to IA, and donate based on the usage. Would be more fair for everyone involved.
reply
tourmalinetaco
1 day ago
[-]
Cloudflare willingly keeps CSAM and animal abuse sites online even when reported, the least they can do is cut IA a fat check every month.
reply
TZubiri
2 days ago
[-]
Do you happen to remember how you learned this? I'm quite skeptical of it.
reply
jgrahamc
2 days ago
[-]
That is about us using the Internet Archive to show a snapshot of a page from the Internet Archive when the origin server is down.
reply
yabones
2 days ago
[-]
Will CF be funding IA for using the service so extensively?
reply
seestem
2 days ago
[-]
It would be better if the Internet Archive was decentralized without a central point of failure, maybe run on something like bittorent.
reply
Cheer2171
2 days ago
[-]
You say this as if it is an original idea. Of course the IA is working on this and have been for over 6 years. There already is a DWeb version. They have been advancing DWeb infrastructure. The IA hosts all kinds of DWeb developer events.

But it is over 50 petabytes and the IA gets a huge amount of traffic through the regular web that they need to serve quickly and efficiently to their users.

Guess what has happened over 6 years of decentralization of 50 TB? People only seed what they want or care about and there aren't enough seeders to host. They set all this up and nobody volunteers. You're a DWeb advocate and you haven't been seeding. That's a recipe for disaster if they rely on the goodness of volunteer seeders. The IA's mission is broader. DWeb will ever only compliment the IAs mission.

https://blog.archive.org/2021/02/18/behind-the-scenes-of-the...

https://www.bleepingcomputer.com/news/technology/archiveorg-...

reply
ksymph
2 days ago
[-]
How does one contribute? In the article you linked:

> there is no information on how users can get involved in the decentralized version of Archive.org and who the peers are that are distributing the content.

The other link doesn't mention how people could help host data either. If there is a way, then it seems like more of a marketing issue if those willing are unaware or unable to figure out how. I can't find any actionable steps on how to contribute.

edit - it seems the dweb version was a frontend for archive.org testing serving IA content over alternative protocols. It was never finished or expanded on unfortunately. Links to it are dead but here's the github repo https://github.com/internetarchive/dweb-archive

reply
tonetegeatinst
2 days ago
[-]
Can confirm that issue about people only seeding what they are interested in.

I found a dataset I wanted to hoard but the authors website was gone. A dataset site had a torrent and I said great I'll just torrent and seed that and help keep the thing alive, turns out I can't find a single seeder for the torrent.

reply
lukas099
2 days ago
[-]
If I help seed this DWeb and it turns out it has some copyrighted materials in it, will I be potentially held liable?
reply
akudha
2 days ago
[-]
This.

Until a clear, precise answer to this question is available, it is unreasonable to expect individuals to take risks and seed.

It is one thing if an organization like IA gets in trouble with the law. They have money, lawyers, name recognition and are big enough to at least fight a lawsuit, even if they lose. Who is going to help an individual if he/she gets in trouble with the law, unknowingly? Am I expected to read through tons of complex copyright law and interpret it, just so I can seed a handful of items? No thanks.

reply
diggan
2 days ago
[-]
You're always responsible for what you, yourself and your computer does. There is a chance EFF/some other organization could help you out in case you end up in court, but that's a maybe, not a guarantee.
reply
nikisweeting
2 days ago
[-]
Harder to make this argument with encrypted distributed filesystems. If I'm storing a single chunk of an encrypted blob on Filecoin, am I responsible for the entire file even if I don't know what's in it, and I'm only storing a single fragment?
reply
CaptArmchair
2 days ago
[-]
This depends on the jurisdiction you're in. I.e. Europe's GDPR argues that you need consent to keep someone's personal data. Encryption doesn't equate anonymization, so there's a potential liabity.
reply
creer
2 days ago
[-]
It seems to me the various efforts are dead or stalled. Anything in actual current development or production? IPFS was supposed to go in that direction and still exists, sure, but not to provide IA duplication (that is advertised.)
reply
Mistletoe
2 days ago
[-]
https://www.reddit.com/r/DataHoarder/comments/h02jl4/lets_sa...

I’ve always been fascinated by this post.

reply
tomrod
2 days ago
[-]
You're describing a network effects problem, specifically a collaborative game failure. Need some mechanism designers and big tech cos to jump in, stat!
reply
seestem
2 days ago
[-]
It is not my original idea, but it is an obvious idea to anyone who knows how the internet works, I just added it to get the discussion going.
reply
Cheer2171
2 days ago
[-]
It already exists. IA has had a DWeb for 6 years. Nobody seeds it.
reply
creer
2 days ago
[-]
If nobody seeds it (or continue development) then it's dead. Inspiration and perhaps code for the next effort sure. But not "exist" that makes a difference.
reply
seestem
2 days ago
[-]
Incentivising seeding is hard. Maybe cryptocurrencies can be useful here, but I understand not everyone likes them especially here on HN. In retrospect the ideal setup would have been if archiving was included into the core HTTP protocol.
reply
Cthulhu_
2 days ago
[-]
Cryptocurrencies implies that people would pay / get paid for it... just pay the directly IA then, or your own servers. Cryptocurrencies imply someone's skimming off a lot for their own pockets.
reply
seestem
2 days ago
[-]
I meant more like in bitcoin how the miners get paid for mining, or how validators are rewarded in proof of stake blockchains. This techniques can be used to incentivised seeders.
reply
usr1106
2 days ago
[-]
Maybe someone can invent a proof of seeding protocol? So that would bring some good to the public instead of just burning energy. Don't ask me how it would work...
reply
nikisweeting
2 days ago
[-]
Storj, Filecoin, etc. fill this gap but it's still really hard to earn enough to justify the effort at small scales.
reply
greenie_beans
2 days ago
[-]
i'm doubtful that whatever crypto incentive that is offered will make up the cost for me to DIY this in my home. which is why crypto miners scale, making crypto a centralized system. i don't care what the latest white paper says, it is still controlled by few people and not decentralized. in the same way that the ussr replicated the centralization of american capitalism, crypto replicates the centralization of trust while marketed as something else.
reply
uniqueuid
2 days ago
[-]
The problem is that it's hard to do this in a way that ensures good archival of ALL resources.

Bittorrent works well for popular things but fails for marginal content (unless some really dedicated individuals step in.)

What the internet archive provides is a way to have access to many many resources which you didn't know you needed in advance.

reply
TZubiri
2 days ago
[-]
Lol, in a way the decentralized version is actually the internet.
reply
vwkd
2 days ago
[-]
Or a brain.
reply
nikisweeting
2 days ago
[-]
I'm working on this, ArchiveBox v0.8 adds the beginnings of a content addressable store, with plans for bittorrent-backed instance-to-instance sharing in a later version.

I think Archive.org should still exist too (and ArchiveBox donates + submits URLs to Archive.org too), but having a self-hosted option where you can archive personal stuff that requires a login, and do P2P sharing with with fine grained permissions is a gap that should be filled.

Aiming to archive the entire internet is Archive.org's goal, aiming to archive the part of the internet YOU care about is our goal.

reply
sourcepluck
2 days ago
[-]
I'm hoping that Autonomi (formerly The Safe Network) is up for the job when (if) it makes it out into the real world one day https://forum.autonomi.community/t/the-internet-archive-a-pe...

[I know that some percentage between 95 and 100 of crypto projects are a scam. I personally believe this one isn't, after much diligent reading. Whether it gets released or does what it claims it will do is another question, but please do spare me the kneejerk anti-crypto reactions, if you can. Just because they're almost all money-making scams, doesn't mean they're all money-making scams.]

reply
mrtksn
2 days ago
[-]
That would be an awful lot of replication or very shitty archive. Decentralization works when each node can serve all the functions and content alone or when you don't care about completeness.

Unless I'm missing something, an archive is not something small or something that's just as good when part of it is missing.

reply
dusted
2 days ago
[-]
I kind of agree, but the way the internet is going, with everyone being behind carrier-grade nat, it's not much of a decentralized network of computers anymore, not to mention all the kids with their laptops and tablets not even hosting anything :(
reply
nikisweeting
2 days ago
[-]
There are ways around this, I've experimented with setting up a cluster of ArchiveBox instances that share snapshots over Tailscale. Tailscale lets users sign up for free accounts, and you can share machines between separate accounts. A (CGNAT-compatible) decentralized invite-only network could concievably spread that way.
reply
Kuinox
2 days ago
[-]
UPnP exists and allow devices to ask the router to open a port to them.
reply
Fidelix
2 days ago
[-]
UPnP is useless with CGNAT (Carrier Grade NAT), which is what the op is talking about.

There are other ways to get seeding working, though, including IPV6, which is gaining adoption, so I don't agree with the OP.

reply
Dalewyn
2 days ago
[-]
UPnP is just automating the process of forwarding ports, CGNAT will still screw you sideways because you're behind a router you can't access or order around.
reply
cosarara
2 days ago
[-]
That doesnt help with CGNAT.
reply
thrownaway561
2 days ago
[-]
that will never happen. no one is going to be able to seed the amount of data that IA has. The only thing they can hope for is that a company like Google or CF provides another data center for them.
reply
maire
2 days ago
[-]
I don't know if bittorrent has improved - but 20 years ago I had a personal issue with it.

At that time our son was using it for games. He goes away to college and came home for the first school break. I get a phone call from our internet provider asking if our son was home. I was so shocked and handed the phone to our son.

Apparently at that time bittorrent was optimizing for the most efficient path to a host. Since we had relatively good connection, the mighty weight of the internet was funnelling through our tiny internet provider to our son's computer. The provider (without our knowing it) had made a deal with our son that he would only turn on bittorrent between midnight and 6 AM. I doubt other providers would be so generous.

I have been sceptical of bittorrent since that day.

reply
jetrink
2 days ago
[-]
All clients today (and probably back then) have options to limit bandwidth consumption including throttling, scheduling, and total data transfer caps. For serving mostly HTML and images, dedicating even 10% of a home broadband connection to serving content would allow many, many people per day to access archived pages.
reply
serendipty01
2 days ago
[-]
Does someone know where i can download MIT OCW videos ?

As the videos are present on archive.org but it is down and i was unable to find them anywhere else online ?

Also, yt-dlp is also not working: https://github.com/yt-dlp/yt-dlp/issues/10128

Example: https://ocw.mit.edu/courses/7-016-introductory-biology-fall-...

reply
adamnew123456
2 days ago
[-]
I seed a torrent of the SICP lectures that originally came from IA, I'll have to see if that's still up and if there's some way of getting the other torrents from the tracker.

If you're lucky there's other seeds around, and not just the IA web seeds which (I assume?) are down too.

reply
adamnew123456
1 day ago
[-]
No such luck :( Both the bt1.archive.org and b2.archive.org trackers appear to be down.

magnet:?xt=urn:btih:1814b8e2673e8a4547fd9c4f1a417b05860230b4&dn=MIT_Structure_of_Computer_Programs_1986&tr=http%3A%2F%2Fbt1.archive.org%3A6969%2Fannounce&tr=http%3A%2F%2Fbt2.archive.org%3A6969%2Fannounce&ws=https%3A%2F%2Farchive.org%2Fdownload%2F&ws=http%3A%2F%2Fia600204.us.archive.org%2F15%2Fitems%2F

reply
AceStar
1 day ago
[-]
This article appears to be referring to just the Wayback Machine.

The Internet Archive itself is still down.

reply
TruffleLabs
1 day ago
[-]
There are other things still not available, like this!

“Lisp lore : a guide to programming the Lisp machine”

https://archive.org/details/lisploreguidetop0000brom

I discover this reference and boom the Internet Archive book is not available:(

“Wayback Machine (provisional, read-only) service.

Other Internet Archive services are temporarily offline.

Please check our official accounts, including Twitter/X, Bluesky or Mastodon for the latest information.

We apologize for the inconvenience.”

reply
PeterCorless
2 days ago
[-]
Attacking the Internet Archive is like robbing from your own grandmother.
reply
onetokeoverthe
2 days ago
[-]
Still down in my town.
reply
mananaysiempre
2 days ago
[-]
The Internet Archive is not, in fact, completely online (as the article explains but the title doesn’t). The Wayback Machine, which is part of it, is kind of online but (in my experience) you are going to experience HTTP 504 timeouts from time to time on the first query for a given (URL, date) pair as it seemingly goes out to slower storage. (Long delays happened in the past occasionally as well but not to the point of a 504.)
reply
binary132
2 days ago
[-]
just build your own 960 billion website archive
reply
idlewords
2 days ago
[-]
Working on it.
reply
sandwichmonger
1 day ago
[-]
I'm really disappointed with the Internet Archive's level of unprofessionalism when it comes to any form of downtime whatsoever (let alone the blog). The monolithic one stop shop "Internet Library" can not even bother to put updates on their downtime page and instead directs their user base to social media platforms.

Call me a nut, but I feel the IA would work better if it was run by the Library of Congress, but then again that has it's own pitfalls.

reply
hersko
2 days ago
[-]
I wonder if it would be possible to identify and prosecute those responsible.
reply
ChrisArchitect
2 days ago
[-]
reply
jawshwa
1 day ago
[-]
What a goober
reply
twosdai
2 days ago
[-]
Thank the internet.
reply
throwaway48476
2 days ago
[-]
When the internet archive censors a website is it deleted permanently or just not publicly available?
reply
dark-star
2 days ago
[-]
They black out all items that get a DMCA complaint or similar request (so it's still there just not accessible). However they permanently delete illegal stuff.
reply
throwaway48476
2 days ago
[-]
I would assume they delete illegal stuff as they are compelled to. What I'd like to know is their policy for legal stuff that they exclude that is not as a result of DMCA.
reply
tiagod
2 days ago
[-]
Can you give an example?

EDIT: Just seen your other reply. Perhaps it was excluded due to right to forget laws?

reply
throwaway48476
2 days ago
[-]
If you ask them they will remove sites that you created. It's not under right to forget laws as they don't exist in the US. What I'd like to know is whether they also delete the data or just make it inaccessible.
reply
BlackLotus89
2 days ago
[-]
ploetzblog was available and is now completly gone :( "lost" some recipes that he didn't migrate that I used to bake all the time. Used to look it up on the IA and was pissed when it was deleted
reply
yard2010
2 days ago
[-]
Honestly we are so lucky to have something like this. Solving this problem in a decentralized manner is so hard, and when a centralized solution has its drawbacks, when done properly (i.e. not for making money but serving greater values) is invaluable. A gift.
reply
chirau
2 days ago
[-]
I don't think they censor anything, strictly archiving. Do you know of any instance in which they censored a site?
reply
teddyh
2 days ago
[-]
I take it you’ve never encountered the dreaded message, “The item is not available due to issues with the item's content”?

There was a news item here on HN about something available on the Internet Archive: <https://news.ycombinator.com/item?id=16725526> This is now gone from IA. Old page with links to IA which are no longer working: <https://web.archive.org/web/20180331224513/http://profileeng...>

reply
throwaway48476
2 days ago
[-]
http://web.archive.org/web/20240000000000*/twitter.com/taylo...

For one. I'm just curious what their policy is.

reply
Cthulhu_
2 days ago
[-]
The law trumps their policy, to be blunt. They can't afford legal disputes so complying is the best thing they can do. They're still involved in legal shit for "giving away" ebooks too easily during the pandemic.
reply
lukas099
2 days ago
[-]
I only know what I just read on wikipedia about her, but it seems like she has been heavily doxxed — I'm guessing she requested this information about herself be excluded? If so, I'm not sure I'd classify that as censorship.
reply
throwaway48476
2 days ago
[-]
It's her own tweets, not dox.
reply
lukas099
1 day ago
[-]
Not dox but I was thinking there could be old materials in there that people were using to dox her. Idk, why else would they remove it?
reply
tossit444
2 days ago
[-]
reply
edgineer
2 days ago
[-]
Kiwi farms
reply
nikisweeting
2 days ago
[-]
There are people that maintain "non-public archives" of stuff like that for litigation, long-term archival storage (think sealed boxes intended for future generations of historians. (Libraries, laywers, journalists can run their own WebRecorder, Perma.cc, ArchiveBox, etc. instances)

I think that's a reasonable middle ground, we don't necessarily need every single piece of heinous content mirrored for free access 24/7 the moment it appears anywhere on the internet, as long as there is some historic record somewhere that's probably ok.

reply
DrillShopper
2 days ago
[-]
Nah, we don't need to archive their targeted harassment.
reply
mcpar-land
2 days ago
[-]
good
reply
Cthulhu_
2 days ago
[-]
An argument can be made that they should retain a copy for future lawsuits / investigations, but... kiwi farms won't have anything public, and I hope that law enforcement has their private archive where they gather everything.
reply