I suspect having a few different teams competing (for funding) to provide mirrors would rapidly reduce the hardware cost too.
The density + power dissipation numbers quoted are extremely poor compared to enterprise storage. Hardware costs for the enterprise systems are also well below AWS (even assuming a short 5 year depreciation cycle on the enterprise boxes). Neither this article nor the vendors publish enough pricing information to do a thorough total cost of ownership analysis, but I can imagine someone the size of IA would not be paying normal margins to their vendors.
https://help.archive.org/help/archive-bittorrents/
https://github.com/jjjake/internetarchive
https://archive.org/services/docs/api/internetarchive/cli.ht...
u/stavros wrote a design doc for a system (codename "Elephant") that would scale this up: https://news.ycombinator.com/item?id=45559219
(no affiliation, I am just a rando; if you are a library, museum, or similar institution, ask IA to drop some racks at your colo for replication, and as always, don't forget to donate to IA when able to and be kind to their infrastructure)
https://www.reddit.com/r/torrents/comments/vc0v08/question_a...
The solution is to use one of the several IA downloader script on GitHub, which download content via the collection's file list. I don't like directly downloading since I know that is most cost to IA, but torrents really are an option for some collections.
Turns out, there are a lot of 500BG-2TB collections for ROMs/ISOs for video game consoles through the 7th and 8th generation, available on the IA...
It sounds like they put this mechanism into place that stops regenerating large torrents incrementally when it caused massive slowdowns for them, and haven't finished building something to automatically fix it, but will go fix individual ones on demand for now.
[1] - https://www.reddit.com/r/theinternetarchive/comments/1ij8go9...
[1] It looks like this might exist at some level, e.g. https://github.com/hartator/wayback-machine-downloader, but I've been trying to use this for a couple of weeks and every day I try I get a HTTP 5xx error or "connection refused."
I just tried waybackpy and I'm getting errors with it too when I try to reproduce their basic demo operation:
>>> from waybackpy import WaybackMachineSaveAPI
>>> url = "https://nuclearweaponarchive.org"
>>> user_agent = "Mozilla/5.0 (Windows NT 5.1; rv:40.0) Gecko/20100101 Firefox/40.0"
>>> save_api = WaybackMachineSaveAPI(url, user_agent)
>>> save_api.save()
Traceback (most recent call last):
File "<python-input-4>", line 1, in <module>
save_api.save()
~~~~~~~~~~~~~^^
File "/Users/xxx/nuclearweapons-archive/venv/lib/python3.13/site-packages/waybackpy/save_api.py", line 210, in save
self.get_save_request_headers()
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^
File "/Users/xxx/nuclearweapons-archive/venv/lib/python3.13/site-packages/waybackpy/save_api.py", line 99, in get_save_request_headers
raise TooManyRequestsError(
...<4 lines>...
)
waybackpy.exceptions.TooManyRequestsError: Can not save 'https://nuclearweaponarchive.org'. Save request refused by the server. Save Page Now limits saving 15 URLs per minutes. Try waiting for 5 minutes and then try again.$25 million a year is not remotely a lot for a non-profit doing any kind of work at scale. Wikimedia's budget is about seven times that. My local Goodwill chapter has an annual budget greater than that.
Edit: Then again, I recently heard a podcast that talked about the relatively good at-rest stability of SATA hard disk drives stored outdoors. >smile<
Much more recently, I worked at a medium-large SaaS company but if you listened to my coworkers you'd think we were Google (there is a point where optimism starts being delusion, and a couple of my coworkers were past it.)
Then one day I found the telemetry pages for Wikipedia. I am hoping some of those charts were per hour not per second, otherwise they are dealing with mind numbing amounts of traffic.
It just reads like a clunky low quality article
https://www.nytimes.com/2025/12/03/magazine/chatbot-writing-...
https://www.flickr.com/photos/textfiles/albums/7215763372220...
"Inside the church's main room, with its still-intact pews, there are more than 120 ceramic sculptures of the Internet Archive's current and former employees, created by artist Nuala Creed and inspired by the statues of the Xian warriors in China."
i wonder if maybe donors above a certain level could get priority on archiving pages or something.
I'd say the nonprofit has found itself a profitable reason for its existence