I would make the case that you should also self host more as a small Software/SAAS business and it is not quite the boogeyman that a lot of cloud vendors want you to think.
Here is why. Most software projects/businesses don't require the scale and complexity for which you truly need the cloud vendors and their expertise. For example, you don't need Vercel to deploy NextJS or whatever static website or even netlify. You can setup Nginx or Caddy (my favorite) on a simple VPS with Ubuntu etc and boom. For majority of projects, that will do.
90%+ of projects can be self hosted with the following:
- A well hardened VPS server with good security controls. Plenty of good articles online on how to do the most important things (remove root login, ssh should only be key based etc).
- Setup a reverse proxy like Caddy (my favorite) or Nginx etc. Boom. Static files can now be served. Static websites can be served. No need for CDN etc unless you are talking about millions of requests per day.
- Setup your backend/API with something simple like supervisor or even the native systemd.
- The same Reverse proxy can also forward requests to backend and other services as needed. Not that hard.
- Self host a mysql/postgres database and setup the right security controls.
- Most importantly: Setup backups for everything using a script/cron and test them periodically.
- IF you really want to feel safe against DOS/DDOS etc, add cloudflare in front of everything.
So you end up with:
Cloudflare/DNS=>Reverse Proxy (Caddy/Nginx)=>Your App.
- You want to deploy ? Git pull should do it for most projects like PHP etc. If you have to rebuild binary, it will be another step but possible.
You don't need Docker or containers. They can help but not needed for small to even mid sized projects.
Yes, you can claim that a lot of these things are hard and I would say they are not that hard. Majority of projects don't need the web scale or whatever.
For that reason alone I'd be tempted to do GHA workflow -> build container image and push to private registry -> trivial k8s config that deploys that container with the proper ports exposed.
Run that on someone else's managed k8s setup (or Talos if I'm self hosting) and it's basically exactly as easy as having done it on my own VM but this way I'm only responsible for my application and its interface.
Don't be me, but even if you royally mess up things won't be as bad as you think.
Last month I had 250k failed password attempts. If I had a "weak" password of 6 random letters (I don't), and all 250k had guessed a valid username (only 23 managed that), that would give... uh, one expected success every 70 years?
That sounds risky actually. So don't expose a "root" user with a 6-letter password. Add two more letters and it is 40k years. Or use a strong password and forget about those random attempts.
- silently compromised systems, active but unknown
- VPS provider doing security behind your back
It's the eternal trade-off of security vs. convenience. The downside of this approach is that if there is a vulnerability, you will need to wait on someone else to get the fix out. Probably fine nearly always, but you are giving up some flexibility.
Another way to get a reasonable handle on the "managing a whole OS ..." complexity is to use some tools that make it easier for you, even if it's still "manually" done.
Personally, I like FreeBSD + ZFS-on-root, which gives "boot environments"[1], which lets you do OS upgrades worry-free, since you can always rollback to the old working BE.
But also I'm just an old fart who runs stuff on bare metal in my basement and hasn't gotten into k8s, so YMMV (:
[1] eg: https://vermaden.wordpress.com/2021/02/23/upgrade-freebsd-wi... (though I do note that BEs can be accomplished without ZFS, just not quite as featureful. See: https://forums.freebsd.org/threads/ufs-boot-environments.796...)
It might take Amazon or Google a few hours or a day to deploy a critical zero-day patch but that’s in all likelihood way better than I’d do if it drops while I’m on vacation or something.
It got attacked pretty regularly.
I would never host an open server from my own home network for sure.
This is the main value add I see in cloud deployments -> os patching, security, trivial stuff I don't want to have to deal with on the regular but it's super important.
More ideal: don't run Wordpress. A static site doesn't execute code on your server and can't be used as an attack vector. They are also perfectly cacheable via your CDN of choice (Cloudflare, whatever).
define( 'DISALLOW_FILE_EDIT', true );
This one config will save you lot of headaches. It will disable any theme/plugin changes from the admin dashboard and ensures that no one can write to the codebase directly unless you have access to the actual server.An alternative is a front-end proxy on a box with a managed OS, like OpenWRT.
I've had a VPS facing the Internet for over a decade. It's fine.
$ $ ls -l /etc/protocols
-rw-r--r-- 1 root root 2932 Dec 30 2013 /etc/protocols
I would worry more about security problems in whatever application you're running on the operating system than I would the operating system.I hate how dev-ops has adopted and deploys the fine-grained RBAC permissions on clouds. Every little damn thing is a ticket for a permissions request. Many times it's not even clear which permission sets are needed. It takes many iterations to wade through the various arbitrary permission gates that clouds have invented.
They orgs are pretending like they're operating a bank, in staging.
This gives me anxiety.
Within the first few weeks, you'll realize you also need sentry, otherwise, errors in production just become digging through logs. Thats a +$40 / m cloud service.
Then you'll want something like datadog because someone is reporting somewhere that a page is taking 10 seconds to load, but you can't replicate it. +$300 / m cloud service.
Then, if you ever want to aggregate data into a dashboard to present to customers -- Looker / Tableau / Omni +$20k / year.
Data warehouse + replication? +$150k / year
This goes on and on and on. The holy grail is to be able to run ALL of these external services in your own infrastructure on a common platform with some level of maintainability.
Cloud Sentry -> Self Hosted Sentry
Datadog -> Self Hosted Prometheus / Grafana
Looker -> Self Hosted Metabase
Snowflake -> Self Hosted Clickhouse
ETL -> Self Hosted Airbyte
Most companies realize this eventually and thats why they eventually move to Kubernetes. I think its also why often indie hackers can't quite understand why the "complexity" of Kubernetes is necessary, and just having everything run on a single VPS isn't enough for everything.
You can take that too far, of course, and if you've got the procedures all set up you often might as well take them, but at the same time, you can blow a thousands and thousands of dollars really quickly to save yourself a minor inconvenience or two over the course of five years.
As you well know, there are a lot of players and options (which is great!), including DHH's Kamal, Flightcontrol, SST, and others. Some are k8s based - Porter and Northflank, yours. Others, not.
Two discussion points: one, I think it's completely fair for an indie hacker, or a small startup (Heroku's and our main customers - presumably yours too), to go with some ~Docker-based, git-push-compatible deployment solution and be completely content. We used to run servers with nginx and apache on them without k8s. Not that much has changed.
Two, I also think that some of the needs you describe could be considered outside of the scope of "infra": a database + replication, etc. from Crunchy Bridge, AWS RDS, Neon, etc. - of course.
But tableau? And I'm not sure that I get what you mean by 150k/year - how much replication are we talking about? :-)
If you want to host a redshift instance, and get Google Analytics logs + twilio logs + stripe payments + your application database into a datawarehouse, then graph all that in a centralized place (tableau / looker / etc)
A common way to do that is:
- Fivetran for data streaming
- Redshift for data warehousing
- Looker for dashboarding
You're looking at $150k / year easily.
If you start peaking success, you realize that while your happy path may work for 70% of real cases, it's not really optimal to convert for most of them. Sentry helps a lot, you see session replay, you get excited.
You realize you can A/B test... but you need a tool for that...
Problem: Things like Openreplay will just crash and not restart themselves, with multiple container setups, some random part going down will just stop your session collection, without you noticing.. try to debug that? Goodluck, it'll take at least half a day. And often, you restore functionality, only to have another random error take it down a couple of months later, or you realize, the default configuration is only to keep 500mb of logs/recordings (what), etc, etc...
You realize you are saving $40/month for a very big hassle and worse, it may not work when you need it. You go back to sentry etc..
Does Canine change that?
Obviously if [name your tool] is built so that it can be bricked [1], even after a restart, then you'll have to figure it out. Hopefully most services are more robust than that. But otherwise, Kubernetes takes care of the uptime for you.
[1] This happened with a travis CI instance we were running back in the day that set a Redis lock, then crashed, and refused to restart so long as the lock was set. No amount of restarts fixed that, it required manual intervention
Truth. All the major cloud platform marketing is YAGNI but for infrastructure instead of libraries/code.
As someone who works in ops and has since starting as a sysadmin in the early 00s, it's been entertaining to say the least to watch everyone rediscover hosting your own stuff as if it's some new innovation and wasn't ever possible before. It's like that old MongoDB is web scale video (https://www.youtube.com/watch?v=b2F-DItXtZs)
Watching devs discover Docker was similarly entertaining back then when us in ops have been using LXC and BSD jails, etc. to containerize code pre-DevOps.
Anyway, all that to say - go buy your gray beard sysadmins a coffee let them help you. We would all be thrilled to bring stuff back on-prem or go back to self hosting and running infra again and probably have a few tricks to teach you.
I’d even go as far as to say that the time/cost required to say learn the quirks of Docker and containers and layering builds is higher than what is needed to learn how to administer a website on a Debian server.
But that is irrelevant as Docker brings more to the table that a simple Debian server cannot by design. One could argue that lxd is sufficient for these, but that is even more hassle than Docker.
I just don't have the need to do a lot of work on the barebones server beyond basic ufw and getting Caddy and Docker running... Caddy will reverse-proxy all the apps running in containers. It really simplifies my setup.
Ah, also docker is managed with komo.do, but otherwise it is simple GUI over docker-compose
Aside: I really want to like NextCloud, but as much as I like aspects of it, I don't like plenty as well.
It's mostly to unify VLANs and simplify management (i.e everything is routed through my main router, rather than loose tunnels on servers/vm's)
And that's not even considering the "tied to a single corporation" problem. If us-east-1 wakes up tomorrow and decides to become gyliat-prime-1, we're all screwed because no more npm, no more docker, no more CloudFlare (because someone convinced everyone to deploy captchas etc).
"No need for CDN etc unless you are talking about millions of requests per day."
A CDN isn't just for scale (offloading requests to origin), it's also for user-perceived latency. Speed is arguably the most important feature. Yes, beware premature optimization... but IMHO delivering static assets from the edge, close as possible to the user, is borderline table stakes and has been for at least a decade.
For 20 years I ran a web dev company that hosted bespoke web sites for theatre companies and restaurants. We ran FreeBSD, postgreSQL and nginx or H2O-server with sendmail.
Never an issue and had fun doing it.
Both caddy and nginx can handle 100s of millions of static requests per day on any off-the-shelf computer without breaking a sweat. You will run into network capacity issues long before you are bottlenecked by the web server software.
This is not a job for the big guys. You want someone local who will take care of you. They also come when a computer fails, ensuring updates are applied to them. Not by come I mean physically sending a human to you. This will cost some money but you should be running your business not trying to learn computers.
OK that's your opinion, in my view a business should selfhost if they want to maintain data sovereignty.
> everyone else should pay their local small business self hosting company to host for them.
That assumes all small business have at least one "local small business self hosting company" to choose from.
That's 1 Mbps per user. If your web page can't render (ignoring image loading) within a couple seconds even on a connection that slow, you're doing something wrong. Maybe stop using 20 different trackers and shoving several megabytes of JavaScript to the user.
I wish I understood why some engineers feel the need to over-engineer the crap out of everything.
Is it because of wishful thinking? They think their blog is gonna eventually become so popular that they're handling thousands of requests per second, and they want to scale up NOW?
I just think about what web servers looked like at the turn of the millennium. We didn't have all these levels of abstraction. No containers. Not even VMs. And we did it on hardware that would be considered so weak that they'd be considered utterly worthless by today's standards.
And yet...it worked. We did it.
Now we've got hardware that is literally over 1000 times faster. Faster clocks, more cache, higher IPC, and multiple cores. And I feel like half of the performance gains are being thrown away by adding needless abstractions and overhead.
FFS...how many websites were doing just fine with a simple LAMP stack?
It's still not ever a great idea (unless, maybe, this is what you do for a living for your customers), simply because it binds your time, which will absolutely be your scarcest asset if your business does anything.
I am speaking from my acute experience.
info.addr.tools shows [1]: MX 1 smtp.google.com. TXT "mailcoach-verification=a873d3f3-0f4f-4a04-a085-d53f70708e84"
TXT "v=spf1 include:_spf.google.com ~all"
TXT "google-site-verification=TTrl7IWxuGQBEqbNAz17GKZzS-utrW7SCZbgdo5tkk0"
This is not just a phrase, it is a DNS entry. Using the most evil in phrases of digital sovereignty.
Your point stands - they're not fully completely independent. And maybe the language in the OP's article could have been different.. but the OP also specifically says "Oh no, I said the forbidden phrase: Self-hosted mail server. I was always told to never under any circumstances do that. But it's really not that deep."
They're aware of the issue, everyone is aware of the issue. It's an issue :-) But I get your point too.
Founder of enum here. That's a fair point, and a good catch.
Honestly, using Google Workspace for our internal email was a pragmatic choice early on to let us focus on building our core product. It's a classic startup trade-off, and one we're scheduled to fix in the coming weeks.
I want to be clear, though: our customer-facing platform and all its data are and always have been 100% sovereign. Our infrastructure is totally independent of Big Tech.
Thanks for holding us accountable!
That's wishful thinking. You cannot be truly independent from them, no one can. They control major BGP routes, major ASN, big fiber cables, etc. It's just impossible
- Carl Sagan
They aren't going to cut the fiber cables if your Google accounts gets locked.
You can literally be an expert in everything relevant - and your mail will still not get delivered just because you're not google/mailgun/etc.
I was trying to do a very simple email-to-self use-case. I was sending mail from my VPS (residential IP not even allowed at all) which was an IPv4 i'd had for literally 2+ years to exactly only myself - my personal gmail. I had it all set up - SPF, DKIM, TLS, etc etc. And I was STILL randomly getting emails sent directly to spam / showing up with the annoying ! icon (grates on my sensibilities). I ended up determining - after tremendous, tremendous pain in researching / debugging - that my DKIM sigs and SPF were all indeed perfect (I had been doubting myself until I realized I could just check what gmail thought about SPF/DKIM/etc. It all passed). And my only sin was just not being in the in-crowd.
Incredibly frustrating. The only winning move is not to play. I ended up just switching from emails-to-self to using a discord webhook to @ myself in my private discord server, so I get a push notification.
And this was just me, sending to myself! Low volume (0-2 emails per WEEK). Literally not even trying to actually send emails to other people.
In my opinion, the pragmatic solution I use is:
1) use a specialized distribution (I use yunohost but there are others). This makes configuring SPF, DKIM, TLS and more a breeze
2) use a reputable relay to send your emails (I use OVH but again there are plenty of other choices)
Of course it means you are not "pure" because emails you send will go trough a 3rd party (the relay) but it solved the delivery issue entirely for me, so that I can continue to benefit from all the other benefits of self-hosting.
damn, this guy don’t fuck around. respect
What I do is use gmail with a custom domain, self host an email server, and use mbysnc[1] to always be downloading my emails from gmail. Then I connect to that email server for reading my emails, but still use gmail for sending.
It also means that google can't lock me out of my emails, I still retain all my emails, and if I want move providers, I simply change the DNS records of my domain. But I don't have any issues around mail delivery.
I think I had problems with my emails like 2 twice , with one exchange server of some small recruitment company. I think it was misconfigured.
Ah there were also some problem with gmail at the beginning they banned my domain because I was sending test emails to my own account there. I had to register my domain on their BS email post master tools website and configure my DNS with some key.
In overall I had much more problem with automatic backups, services going down for no reason, IPs being dynamic and etc. Email server just works.
Been selfhosting personal low traffic email since the 1990's, I don't have that problem.
Doing the sending myself wouldn't improve my digital sovreignty, which is my primary motivation.
Anything that prevents lock-in and gives control to the user is what we want.
For me at least self-hosting is mostly about having control of a computer/server software wise, not physically. That is probably an important differentiator from homelabbing, which is more focused on controlling the hardware. You can combine the two, but for self-hosting you don't need to physically control the hardware.
The problem is backup and upgrades. I self host a lot of resources, but none I would depend on for critical data or for others to rely on. If I don't have an easy path to restore/upgrade the app, I'm not going to depend on it.
For most of the apps out there, backup/restore steps are minimal or non existent (compared to the one liner to get up and running).
FWIW, Tailscale and Pangolin are godsends to easily and safely self-host from your home.
Every selfhosted app runs in docker, where the backup solution is back up the folders you mounted and the docker-compose.yml. To restore, put the folders back and run docker compose up again. I don't need every app to implement its own thing, that would be a waste of developer time.
It's been a pretty smooth process. No, it's not a multi-region k8s cluster with auto everything.. but you can go a long way with docker-compose files that are well worth it.
Any decent RDBMS can be backup live without issues. You only need to stop for restore (well without going with complex tricks)
A self hosted app can have a few minutes of downtime at 3am while the backup script runs.
Shut that one down and back it up from time to time.
Then copy that to a third site with rsync/etc
Instead of Tailscale, I can highly recommend self-hosting netbird[1] - very active project, works great and the UI is awesome!
Also several ports need to be opened. How is its history vulnerabilities?
How do ship security patches?
How do backup? And do you regularly test your backup?
I feel like upgrade instructions for some software can be extremely light, or require you to upgrade through each version, or worse.
I assume everything running in docker.
For containers: Upgrading new versions can be done headless by watchtower or manually.
For the host: You can run package updates regularly or enable unattended upgrades.
Backups can be easily done with cron + rclone. It is not a magic.
I personally run everything inside docker. Less things to concern.
Today, to install even the simplest self-hosted software, one has to be effectively a professional software engineer. Use SSH, Use Docker, use tailscale, understand TLS and generate certificates, Perform maintenance updates, check backups, and million things that are automatable.
No idea why self-hosted software isn't `apt-get install` and forget. Just like Limewire. But that's the reason no one self-hosts.
Some of it is. But as soon as you want your services to be accessible from the Internet, you need to have a domain name and HTTPS. To run Limewire or a BitTorrent client, you don't need a domain name yourself because you use a central server (in the case of BitTorrent, a tracker) to help you discover peers.
Ubuntu tried to fix this with snaps but the whole Linux community raged and pushed back at them. Yeah, snap has its faults but it was designed initially for server-side apps.
Snap install xyz-selfhosted-app was the initial goal. You can install nextcloud as a snap right now.
Instead the Linux community let perfect be the enemy of good and successfully convinced everyone else to dump and avoid snaps as a format at all costs.
One of the early sticking points was switching Firefox from deb to snap. That doesn't fit into your characterization.
The numbers might favor server installs (no idea), but it seems like the decisions must be primarily desktop. (i.e. a server admin or business that installs a thousand Ubuntu instances is just a single decision).
Either way, if Canonical's goals for snaps included easing people into self-hosting their services, surely making the experience pleasant on desktop would be a priority?
I don't recall any positive changes brought by snaps. I was looking at it through a desktop lens at the time, but my general perspective is mostly server-side, so I might be biased in that direction.
I don't think the two perspectives are necessarily in conflict, but noted just for framing... :)
I’m a regular engineer, non-software, my coding knowledge is very basic, I could never be employed even as a junior dev unless I wanted to spend evenings grinding and learning.
Still I was able to set up a NAS and a personal server using Docker. I think a basic and broad intro to programming class like Harvard’s CS50 is all that would be required to learn enough to be able to figure out self-hosting.
A self-hosted server is an entirely different beast. You're right, it's not easy to setup and run -- but that's the world we live in. Malicious actors have ruined something that could have been relatively easy and automated to setup and run; even the most experienced of us wouldn't stand against professional penetration testers or nation states.
I just have one main question: what would you like to self-host? Limewire was about file sharing, so the "value proposition" was clearly-ish defined. The "what does Limewire do" was clear.
Are you interested in hosting your own web site? Or email? Or google cal/drive/photos-equivalent? Some of it, all of it?
I'm genuinely curious, and also would love to know: is this a 80% of people want X (self-hosted file storage? web serving?), and then there's a very long tail of other services? Does everyone want a different thing? Or are needs power-law distributed? Cheers
1) Find the docker compose file. 2) Change the expose line to make it specific 10.0.10.1:9000 instead of the default 0.0.0.0:9000 . 3) Connect via wireguard.
(Answers the "security" point a sister comment brought up too)
I had this wireguard setup in place long before I even ran my first docker container. It's all building on top of things already there.
The reality is that the vast majority of people don't really need "self-hosting", what they would need is a decent software that they can run on their computer and let others access the data from time to time, mostly locally because global availability is rarely worth the hassle.
But since there is not much money into that and devs are enamored with insane layers of complexities and obtuse use case that are irrelevant for the vast majority we get the server software, that relies on web views and have a large disconnect with the data on a local machine. You just had another layer of stuff to manage on yet another computer when most already have one sitting idle the vast majority of time. I think the laptop craze is also partly to blame.
Even good local Mac apps have dried up and now it's all cloud-based subscription software, and you are supposed to be thankful you can install open-source stuff with a docker image and god knows how many configuration steps and gotchas.
Many of those softwares would be desirable and worth a bit of money if only you could just have a simple installation and management process but instead you have to pay with your time, which is not worth it for most. So, at this point, people just say fuck it and pay someone else, usually one of the big tech providers to take care of the problem and that's that.
As much as I like web technology for interactive documents, the software use case is still largely a pain in the ass.
https://www.apachefriends.org/
I haven't used it in over a decade, but I'm glad to see it's still kicking.
Security.
As an avid self-hoster with a rack next to my desk, I shudder as I read your comment, unfortunately.
If they `apt-get install` on a standard debian computer, and the application's defaults are already configured for high-security, and those exact settings have been tested by everyone else with the same software, you have a much higher chance of being secure. And if a gap is found, update is pushed by the authors and downloaded by everyone in their automatic nightly update.
But are you really keen to make a PHP dynamic webpage application where each page imports some database function/credentials and uses them to render html?
Can you keep the behavior of fluent userflow (e.g. menu not rerendering) that way? Only with minimal design.
When in 2006 most webpages had an iframe from the main content, an iframe for the menu, and maybe an iframe for some other element (e.g. a chat window), it was fine to refresh one of those or have a link in one load another dynamic page. Today that is not seen to be very attractive and to common people (consumers and businesses), unattractive means low-trust which means less income. Just my experience, unfortunately. I also loved that era in hindsight, even though the bugs were frustrating, let alone the JS binding and undefined errors if you added that...
If you don't care enough to figure it out, then you don't care enough to make it secure and that leads to very very bad time in modern largely internet-centric world.
I'm now experimenting with a files-based approach, using syncthing for the p2p syncing, and it works really well.
No VPS or home server to setup and maintain, no security worries, no database migrations, no extra backups, no tinkering with Caddy configs.
The problem is that we have all been tricked into cloud syncing because big tech couldn't figure out proper local sync and they actually have incentives not to because they would really like you to pay to their subscriptions for storage on which they have great margins.
Yet for the vast majority of people what would be needed is just very simple syncing between their phone and personal computer. It should work with a cable for speed but also wirelessly for convenience and that's it.
All the crap they add on top is mostly overengineered crap that sometimes doesn't even work and creates interdependence/lock-in.
- infra work is thankless (see below)
- outages will last long because you're unlikely to have failovers (for disk failures, etc.), plus the time to react to these (no point in being paged for hobby work)
- more importantly, malicious LLM scrapers will put your infra under stress, and
- if you host large executable you'll likely want to do things like banning Microsoft's IP address because of irresponsible GH Actions users [1] [2] [3]
In the end it is just a lot less stress to pay someone else to deal with infra; for example, when hosting static sites on GH Pages or CF Pages, and when using CF caching solutions.
[1] https://www.theregister.com/2023/06/28/microsofts_github_gmp...
And if it's a hobby, no you don't, that should be part of it, the fun is getting nocked out from orbit and figuring out how and why and how to avoid it. Stand back up again and you've learned from that mistake :p
You generally still use a CDN and WAF to filter incoming traffic when you self host (even without abusive scrapers you should probably do this for client latency). You can also serve large files from a cloud storage bucket for external users where it makes sense.
We've known for decades now that the philosophy underpinning Free Software ("it's my computer and I should be able to use it as I wish") breaks down when it's no longer my computer.
Attempts were made to come up with a similar philosophy for Cloud infrastructure, but those attempts are largely struggling; they run into logical contradictions or deep complexity that the Four Essential Freedoms don't have. Issues like
1. Since we don't own the machines, we don't actually know what is needed to maintain system health. We are just guessing. Every new collected piece of information on our information is an opportunity for an argument.
2. Even if we can make arguments about owning our data, the arguments about owning metadata on that data, or data on the machines processing our data, are much murkier... Yet that data can often be reversed back to make guesses about our data because manipulation of our data creates that metadata.
3. With no physical control of the machines processing the data, we are de-facto in a trust relationship with (usually) strangers, a trust relationship that is generally not the case when we own the hardware; who cares what the contract says when every engineer at the hosting company has either physical access to the machine or a social relationship with someone who does, a relationship we lack? When your entire email account is out in the open or your PII has been compromised because of either bad security practices or an employee deciding to do whatever they want on their last day, are you really confident that contract will make you whole?
If there can be, practically, no similar philosophical grounding to the Four Freedoms, the conclusion is that cloud hosting is incompatible with those goals and we have to re-own the hardware to maintain the freedoms, if the freedoms matter.
I jumped the hoop and bought a Ugreen nas with 4 bays where the first thing I did was installing TrueNAS CE onto it and then use ChatGPT with highly customized prompts and the right context (my current docker-compose files).
Without much previous knowledge of docker, networking etc. except what I remembered from my IT vocational education from 15 years ago, I now have:
- Dockerized Apps
- App-Stacks in their own App-Network
- Apps that expose web UI not via ports, but via Traefik + Docker labels
- Only Traefik 443 ports reachable from WAN, plus optional port forwarding for non-http services
- Optional Cloudflare Tunnel
- Automatic Traefik TLS termination for LAN and WAN for my domain
- Split-DNS to get hostnames routed properly on LAN and WAN
- CrowdSec for all exposed containers
- Optional MFA via Cloudflare for exposed services
- Local DHCP/DNS via Technitium
- Automatic ZFS snapshots and remote backups
- Separation between ephemeral App data (DBs, Logs) on SSD and large files on HDD
In short you fill in the env-files, then run butane and ignition. (I should improve the README some time)
I love how it's all configuration. If it breaks I can set up another instance with the same secrets in minutes. It will then grab the latest backup and continue like nothing happened.
Since my basic search engine is self hosted nobody actually sees what I visit, and what I watch.
This is my conclusion seeing that social media algorithm is totally lost at what I would like to watch next.
Also I am in control over UI, and changes, which is a good and a bad thing
I was fascinated but also scared about that since I've never actually enabled it myself. I do like the fact that I could look up my location for every point in time but I want to be in control about that and know that only I have access to that data."
This made me thing whether there are any services (or ideas thereof) that would provide this kind of functionality but story encrypted in a similar way as proton does for email; in theory, you can use this pattern - data stored but encrypted on the server, but decrypted only by the client - to rebuild many useful services while retaining full sovereignty of your data.
Compare that to encrypted email: if I’m sending you an encrypted message, the total data involved is minimal. To a first approximation, it’s just the message contents.
But if I want “Google Maps but private,” I first need access to an entire globe’s worth of data, on the order of terabytes. That’s a lot of storage for your (usually mobile) client, and a lot of bandwidth for whoever delivers it. And that data needs to be refreshed over time.
Typical mapping applications (like Google Maps) solve this with a combination of network services that answer your questions remotely (“Tell me what you’re searching for, and I’ll tell you where it is.”) and by tiling data so your client can request exactly what it needs, and no more, which is also a tell.
The privacy focused options I see are:
1. Pre-download all the map data, like OrganicMaps [1], to perform your calculations on the device. From a privacy perspective, you reveal only a coarse-grained notion of the area you’re interested in. As a "bonus", you get an offline maps app. You need to know a priori what areas you’ll need. For directions, that's usually fine, because I’m usually looking at local places, but sometimes I want to explore a random spot around the globe. Real-time transit and traffic-adaptive routing remain unaddressed.
2. Self-host your own mapping stack, as with Headway (I work on Headway). For the reasons above, it’s harder than hosting your own wiki, but I think it’s doable. It doesn’t currently support storing personal data (where you’ve been, favorite places, etc.), but adding that in a privacy conscious way isn’t unfathomable.
[1] https://organicmaps.app (though there are others)
[2] https://github.com/headwaymaps/headway (see a hosted demo at https://maps.earth)
To first order, you're right on about the storage size of a vector tileset and an geocoding dataset based on OpenStreetMap. But Google maps is a lot more than that!
Headway uses Valhalla for most routing. A planet wide valhalla graph is about ~100gb of storage. It doesn't produce reasonable transit directions. Transit is an even tougher cookie.
OpenTripPlanner gives good transit routing, but it doesn't scale to planet-wide coverage. We've settled on a cluster of OTP nodes for select metro areas - each one being on the order of 5-10GB of RAM.
https://about.maps.earth/posts/2023/03/adding-transit-direct...
So, I'd say we have some of the pieces of a general purposes mapping tool that could replace Google Maps usage, which you could host yourself.
But we don't have satellite imagery, real time traffic data, global transit coverage, rich POI data (like accurate opening hours, photographs, reviews).
Do all people want all these features? Probably not, but a lot of people seem to want at least some of it and it's not obvious to me that they'll be quickly solved.
1. Your Data. It is the most irreplaceable digital asset. No one should see their photos, their email, their whatever, go poof because of external forces. Ensure everything on your devices is backed up to a NAS. Set a reminder for quarterly offline backups. Backups are an achievable goal for everyone, not just the tech elite.
2. Your Identity. By which I mean a domain name. Keep the domain pseudonymous. Use a trustworthy, respectable registrar. Maybe give some thought for geopolitics these days. Pay for email hosting and point your domain at them.
3. Lastly, your Apps. This is much harder work and only reasonably achievable by tech savy people.
Sure, you could run your own firewall and what not but the mesh VPN with it’s simple set up. Makes it a whole lot easier to access your home services.
It all comes down to what you want to spend vs what you want to host and how you want to host it.
You could build a raspberry pi docker swarm cluster and get very far. Heck, a single Pi 5 with 4gb of memory will get you on your way. Or you could use an old computer and get just as far. Or you could use a full blown rack mount server with a real IPMI. Or you could use a VPS and accomplish the same thing in the cloud.
No, you couldn't, and no, you wouldn't.
To build a swarm you need a lot of fiddling and tooling. Where are you keeping them? How are they all connected? What's the clustering software? How is this any better than an old PC with a few SSDs?
Raspberry Pi with any amount of RAM is an exercise in frustration: it's abysmally slow for any kind of work or experimentation.
Really, the only useful advice is to use an old PC or use a VPS or a dedicated server somewhere.
Then I had my old PC and it was very good but I wanted more nvme disks and motherboard supported only 7.
Now I am migrating to threadripper which is a bit overkill but I will have ability to run 1 or two GPUs along with 23 nvme disks for example.
There are other cards like that i.e. https://global.icydock.com/product_323.html This one have better support for smaller in size disks, much easier to swap disk but costs like 4 times more.
I think it could put even more drives in my new case I.e. using pcie to u2 card and the using 8 drives bays. But this would cost me probably like 3 times more just for the bay with connectors. and I do not need that much space.
https://global.icydock.com/product_363.html
If you like u2 drives then icydock provides solution for them too. Or if you want go cheaper there are other cards with slim-sas or mcio https://www.microsatacables.com/4-port-pcie-3-0-x16-to-u-2-s...
But u2 disks are at least 2 times more costly per GB. Like 40tb costs 10k$. This is too much IMO.
Intel n100 with 32GB RAM and single big SSD here (but with daily backups).
Eats roughly 10 Watts and does the job.
I did use my old PC and it was working very nice with 4 sata ssds, in raid 10.
And as I already said on other comment - in my case power does not matter much. Space too.
For every clown like me with massive RAM in their colo'd box, there is someone doing better and more amazing things with an ESP32 and a few molecules of RAM :D
... but I still worry about backups. Having encrypted off-site backups is essential for this to work, and they need to be frequently tested as well.
There are good tools for that too (I've had good experiences with restic to Cloudflare B2) but assembling them is still a fair amount of overhead, and making sure they keep working needs discipline that I may want to reserve for other problems!
https://github.com/juanfont/headscale
As for backups, I like both https://github.com/restic/restic and https://github.com/kopia/kopia/. Encryption is done client-side, so the only thing the offsite host receives is encrypted blobs.
For me, it's not just off-site backups, it's also the operational risks if I'm not around which I wrote about previously: https://news.ycombinator.com/item?id=39526863
In addition to changing my mind about self-hosting email, my most recent adventure was self-hosting Bitwarden/Vaultwarden for passwords management. I got everything to work (SSL certificates, re-startable container scripts to survive server reboots, etc) ... but I didn't like the resultant complexity. There was also the random unreliability because a new iOS client would break Vaultwarden and you'd have to go to github and download the latest bugfix. There's no way for my friend to manage that setup. She didn't want to pay for a 1Passord subscription so we switched to KeePass.
I'm still personally ok with self-hosting some low-stakes software like a media server where outages don't really matter. But I'm now more risk-averse with self-hosting critical email and passwords.
EDIT to reply: >Bitwarden client works fine if server goes down, you just can't edit data
I wasn't talking about the scenario of a self-hosted Vaultwarden being temporarily down. (Although I also didn't like that the smartphone clients will only work for 30-days offline[1] which was another decision factor to not stay on it.)
Instead, the issue is Bitwarden will make some changes to both their iOS client and their own "official" Bitwarden servers which is incompatible with Vaultwarden. This happens because they have no reason to test it on an "unofficial" implementation such as Vaultwarden. That's when you go to the Vaultwarden Github "Issues" tab and look for a new git commit with whatever new Rust code makes it work with the latest iOS client again. It doesn't happen very frequently, but it happened often enough that it makes it only usable for a techie (like me) to babysit. I can't inflict that type of random broken setup on the rest of my family. Vaultwarden is not set-and-forget. (I'm also not complaining about Bitwarden or Vaultwarden and those projects are fine. I'm just being realistic about how the self-hosted setup can't work without my IT support.)
[1] Offline access in Bitwarden client only works for 30 days. : https://bitwarden.com/blog/configuring-bitwarden-clients-for...
Both on Tailscale and we use Hyperbackup between them.
It was very easy to set up and provides offsite backups for both of us.
Synology very recently (a day ago) decided to allow 3rd party drives again with DSM 7.3.
The absolutely most important item (IMO) is photos- which I frankly do not trust Apple’s syncing logic to not screw up at some point. I’ve taken the approach that my self-hosting _is_ the backup. They lock me out or just wipe everything, no problem I have it all backed up. If the house burns down- everything is still operational.
A SaaS - they could change price tomorrow or change terms or do any number of things that could be an issue. It’s a severely asymmetrical dynamic
Don’t think I’ll ever do email though
I do however selfhost FreshRSS, Audiobooks, Readeck, Linkding, YoutubetoRSS... Useful services that individually hosted playforms want £5 or so per month to use. The redundancy is significantly less important with these services to me compared to losing £30+ extra a month.
And fixing things when they eventually break.
Honestly, there is a reason I still use a dreamhost shared plan. It's dirt cheap, been running forever, and I've never had to do the boring stuff.
And if they break my app, I can ask them to fix it.
If you deploy your app on a PaaS you still have to update everything inside the container.
Old school php hosting on a shared server does have some upsides - namely affordable support. (Sure, if I'm an extreme edge case support will not do much for me).
The same kind of thing for "self-hosting" would be cool.
People won’t install VPNs. They are usually okay with authenticating to a web server, so you can put authentication with something like Authentik in front of your reverse proxy. But can you configure this front end security correctly and patch it, and are you sure it doesn’t have easy zero days?
waits for the pitchforks and torches
If I run my services at home, I don’t want to provide Cloudflare with access to my data.
Public in the sense that the front page is public, and the client still need to authenticate to the service at home, in this case, that does not make sense (the user authenticates to reverse proxy, which authenticates to the service), for the reason I mentioned.
Frankly, because you don't trust your own abilities in that area, or you're simply not interested in taking responsibility for that piece - and that's totally fine.
> Public in the sense that the front page is public, and the client still need to authenticate to the service at home
Maybe your authentication doesn't live at home, or on the home network. It could be on a vps or a cloud radius/ldap/etc auth service.
Some people have been writing code for 30+ years. I've been running internet facing systems for 30+. Different backgrounds, different levels of comfort and enjoyment out of different things!
I think the missing piece is you need to enjoy the process itself - without that, it's not really tenable (at least today).
https://www.naut.ca/blog/2019/11/16/self-hosting-series-part...
it does work with apple devices from my experience
The biggest downside is initial cost in time, effort and cash compared to typing in a credit card.
Ok other downsides include lack of power redundancy and decent networking which are more common in data-centers.
Other side of this is, why buy 8xa100 for that project to stick them on eBay to recoup cost when you can rent them?
Convincing the family to buy in is hard too because (as you put) I can't promise the same level of redundancy/service guarantees.
These kernels could be for _any_ operating system that runs on the hardware, e.g., NetBSD
A. This already exists
B. This does not exist
It's great for learning and control - it's not so great for anxiety.
What would be a way to shine light in every part of their private life?
Maybe I'm getting old, but I think at this stage I want the third, often-unspoken route: no data.
Let go of things
No need for infrastructure when you have nothing to host. And data that doesn't exist is the most secure in the world.
Is my home a home – or the premises of a small-business? Racks, servers, cables, smart devices, the fan noise etc!
It does feel like we are operating our lives more and more like a small business these days: managing data, managing logins, "B2B" with hundreds of companies (EULAs, contracts, invoices, subscriptions...), files, archives, backups, contacts, appointments, app after app after app...on and on.
I wish life were simpler. Maybe a lot is in our control, more than we realise
Repurposing an old tower would offer you enough compute to self-host services back in the day, but now an Intel NUC has plenty of resources in a very small footprint and branching out into the Raspberry Pi-adjacent family of hardware also offers even smaller power draw on aarch64 SBCs.
One experiment in my own lab has been to deploy glusterfs across a fleet of ODroid HC4 devices to operate a distributed storage network. The devices sip small amounts of power, are easy to expand, and last week a disk died completely but I swapped the hardware out while never losing access to the data thanks to separate networked peers staying online while the downed host got new hardware.
Relying on container deployments rather than fat VMs also helps to compress resource requirements when operating lots of self-hosted services. I've got about ~20 nomad-operated services spread across various small, cheap aarch64 hosts that can go down without worrying about it because nomad will just pick a new one.
Don't be me, but have some solace in the fact that even if you royally mess up things won't be as bad as you think.
I self host a lot of things on a VPS and have recently started self hosting on a raspberry pi 5, it's extremely liberating!
In my ideal world, one tech savvy person would run services for a group of their friends and family. This makes the concept more mainstream and accessible, while also creating social cohesion for that group. I think we've monetized too many of our relationships, and often have no real reason to be in community. This is a big change from most of human history, where you depended on community for survival. Building lower-stakes bonds now (I run your email, you help me fix my car) helps avoid the problem later when you really need help (old, sick) but have never practiced getting anything you need except by paying for it
The reason is the anti-spam rules and the fact that Google, Microsoft and so on are creating a iron trust to each other, and the little server outside are marked spam by default.
Lets encrypt avoided a similar destiny to https connections, but the risk is always out of the window. I mean, https was becoming "pay-us-to-publish a web server, or our browser will mark you as unsafe and do not display it".
I think it is time also to self-host private free chats and possibly other services lik DDoS services.
I remember browsers used to have a native RSS button in the main interface and then you could curate your feed. Seems better than any news feed thing gamified to steal my attention. Sigh.
old-man-yells-at-cloud.gif
Huh.
I never appreciated the value of Self-hosting until then. I was so sick of finding new services to do essentially the same thing. I just wanted some stability.
Now I can continue using the thing I was already using, and have developed my own custom RSS Reader ontop of Omnivore.
I don't need to care about things breaking my flow. I can update the parsing logic if websites break, or I want to bypass some paywalls. It really changed my view on Self-hosting.
Self hosting remains untenable for most things because of the legacy of Unix and MS-DOS and the ambient authority model of computing.
- Use case/cloud business model mismatch: ultimately much of the value of cloud services comes from flexibility and amortization across massive audiences. Sometimes that's exactly what one might be after. But sometimes that can leave a big enough mismatch between how it gets charged for vs what you want to do that you will flat out save money, a lot of money, very fast with your own metal you can adjust to yourself.
- Speed: Somewhat related to above but on the performance side instead of cost. 10G at this point is nothing on a LAN and it's been regularly easy to pick up used 100G Chelsio NICs for <$200, I've got a bunch of them. Switches have been slowly coming down in price as well, Mikrotik's basic 4 port 100G switch is $200/port brand new. If you're ok with 25 or 40 can do even less. Any of those is much, much faster (and of course lower latency) then the WAN links a lot of us have access to, even at a lot of common data centers that'd be quite the cost add. And NVMe arrays have made it trivial to saturate that, even before getting into the computing side. Certainly not everyone has that kind of data and wants/needs to be able to access it fast offline, but it's not useless.
- Customization: catch all for beyond all-of-the-above, but just you really can tune directly to what you're interested in terms of cpu/memory/gpu/storage/whatever mix. You can find all sorts of interesting used stuff for cheap and toss it in if you want to play with it. Make it all fit you.
- Professional development: also not common, but on HN in particular probably a number of folks would derive some real benefit from kicking the tires on the various lower level moving parts that go into the infrastructure they work with at a higher level normally. Once in awhile you might even find it leads to entire new career paths, but I think even if one typically works with abstractions having a much better sense of what's behind them is occasionally quite valuable.
Not to diminish the value of privacy/sovereignty either, but there are had dollar/euro/yen considerations as well. I also think self hosting tends to build on itself, in that there can be a higher initial investment in infrastructure but then previously hard/expensive adaptions get easier and easier. Spinning up a whole isolated vm/jail/vlan/dynamic allocation becomes trivial.
Of course, it is upfront investment, you are making some bets on tech, and it's also just plain more physical stuff, which takes up physical space of yours. I think a fair number of people might get value out of the super shallow end of the pool (starting with having your own domain) but there's nothing wrong with deliberately leaning on remote infra in general for most. But worth reevaluating from time to time, because the amount of high value and/or open source stuff available now is just wonderful. And if we have a big crash might be a lot of great deals to pick up!
If you wanna self-host completly look at https://github.com/nextcloud/all-in-one . I have this running on my NAS for other stuff, but it just works out of the box.
Edit: and it scales. Orgs with a lot more people use it for 10k users or more. And it doesn't need a 100 EUR/month setup, from what I experienced.
Is storage share the managed service?
There are institutions with several thousands of employees that use Nextcloud, including mine.
I run an installation for our family, and it’s been problem free.
What kind of hosting infra are you using? Hetzner seems popular.
Any major recent security concerns, it seems to have a large attack surface.
The only security thing we've done is disable a few paths in the web configuration and only allow SSO logins. (Authentik). You can also put it behind Authentik's embedded proxy for more security. I didn't do it because of the use case with generic calendar/addresbook software.
Hetzner is good. Great even, in terms of what you get for the money. They do provide mostly professional service. You will not get one iota of extra service other than what they promise. VERY German in that regard and very unapologetic about it. And don't talk about them in public with your real identity attached. They ban people for arbitrary reasons and have their uber fans (children with a 4 dollar vps) convince other fellow users that if you got banned you must have been a Russian hacker trying to infiltrate the Hague.