If you absolutely have to do it that way, be very deliberate about what you actually need. Don't run an SSH daemon, don't run cron, don't an SMTP daemon, don't run the suite of daemons that run on a typical Linux server. Only run precisely what you need to create the files that you need for a "docker commit".
Each service that you run can potentially generate log files, lock files, temp files, named pipes, unix sockets and other things you don't want in your image.
Taking a snapshot from a working, regular VM and using that as a docker image is one of the worst ways to built one.
Thankfully LXD is here to serve this need: very lightweight containers for systems, where your app runs in a complete ecosystem, but very light on the ram usage.
How are you going to orchestrate all those daemons without systemd? :P
As you mentioned, a container running systemd and a suite of background services is the typical use case of LXD, not docker. But the difference seems to be cultural -- there's nothing preventing one from using systemd as the entry point of a docker container.
You can skip the uid/chown stuff if you work with userns mappings, but this was my work machine so I didn't want to globally touch the docker daemon.
When I saw the HN title, I thought this was going to be something subtle like deleting package files (e.g. apt) in a separate layer, so you end up with a layer containing the files and then a subsequent layer that hides them.
People are going to use the tools at their disposal, and they aren't all going to learn their tools at a high level. Think of every insane misuse of Excel you've ever heard of, for instance.
IT has the choice in this case to mitigate, or limit the access to the tools. Choosing mitigation prevents the growth of shadow IT and helps ensure that IT remains a trusted partner and not an obstacle to be worked around. This reflects well on the company, especially if they then go and provide better training to their users as well.
> The problematic user image had an astonishing 272 layers, each representing a commit operation.As someone who is currently there, it's very frustrating place.
You can’t just flip a switch. There is no “Hey, that was fun, but it’s time to start designing these things with a purpose and vision”. Beyond the totally unreasonable expectations that have been set by Product and C-level- you still have the mountain of tech debt that is coming due and changes slow to a crawl or outages skyrocket or both. Plus, hiring has been based on ‘getting things done’, so you have this group of people who are actually really skilled in hacking things together and getting it out the door. It’s tough and calls for an entire culture shift. How do you stop being a reactionary startup and become vision-based and purposeful organization?
This is a case of Product Team not working with customers, finding out what is reasonable and allowing system to set reasonable limits.
"The key insight is to treat container images not as opaque black boxes, but as structured, manipulable archives. Deeply understanding the underlying technology, like the OCI image specification, allows for advanced optimization and troubleshooting that goes far beyond standard tooling. This knowledge is essential for preventing issues like Kubernetes disk space exhaustion before they start."
One of the common phrase tropes I find is something like "Here's a set of small, surgical steps you can take to..."
They say they made a 800GB container image, so your issue is about singular vs plural?
Regardless, I don't really get why anyone would self report like this. Is next article going to be about how they don't encrypt passwords and when they accidentally dropped prod DB they could restore account from logs because it had the passwords in clear text?
Wouldn't a multistage Dockerfile have accomplished the same thing? smth like
FROM bigimage
RUN rm bigfile
FROM scratch
COPY --from=0 / /
The automation of containers looks simple but developers with systems experience know the actual complexity of operating systems and running applications.
People who know javascript but don't know how a file system works can build and deploy containers. They just copy and paste stuff until it runs. The automation of containers makes brute force iteration a viable option. It was a lot more difficult trying to run a Linux server, which would force you to learn something or use a platform as a service instead.
That is clearly not what these people are doing, though.
> Here's how the disaster unfolded:
> 1. A user's container is under a brute-force attack, and /var/log/btmp grows to 11GB.
> 2. The user performs a commit, creating a new image layer.
> 3. A single new failed login is appended to /var/log/btmp.
> 4. Because of CoW, OverlayFS doesn't just write the new line. It copies the entire 11GB file into the new, upper layer.
> 5. This process repeated 271 times.
So the user is creating hundreds of layers for unclear reasons. The article refers to this as "exponential growth", but for that to be the case those commits would need to be triggered in proportion to the number of existing layers, which seems unlikely. Assuming the commits are caused by the user for reasons unrelated to the size of the existing image, this is growth that is quadratic† (in the number of layers; it's hard to characterize as a function of time or whatever), and it'd be nice to know why there were so many layers.
† Note that while the growth is technically quadratic, I don't think that impacted them. They say that the problem occurred when one 11GB file got copied into each of 272 image layers. That would require 2,992 GB, but they also say that the image exhibiting this problem was only 800GB.
I suspect that the answer here is that only some of the layers modified (and therefore copied) the log file. Probably about 72 of the layers. This is more like growth that's linear (still technically slightly superlinear, but probably not quadratic) in the number of failed SSH login attempts. ~75% of layers aren't contributing to the problem at all.
Having /var/log set as as a persistent volume would have worked, but ultimately they were using "docker commit" to amend/update their images which is definitely the wrong way to do it.
Do people not know that each layer comes with its own downsides?
Do people just do 272 layers and think that it’s normal?
This seems like people discovering that water is wet and fire is hot.
Our users need to connect their local VS Code, Cursor, or JetBrains IDEs to the cloud environment. The industry-standard extensions for this only speak the SSH protocol. So, to give our users the tools they love, the container must run an SSHD to act as the host.
We aren't just a CDE like Coder or Codespaces. We're trying to provide a fully integrated, end-to-end application lifecycle in one place.
The idea is that a developer on Sealos can:
1. Spin up their DevBox instantly. 2. Code and test their feature in that environment (using their local IDE). 3. Then, from that same platform, package their application into a production-ready, versioned image. 4. And finally, deploy that image directly to a production Kubernetes environment with one click.
That "release" feature was how we let a developer "snapshot" their entire working environment into a deployable image without ever having to write a Dockerfile.
https://sealos.io/_next/image?url=.%2Fimages%2Fcontainerd-hi...
https://sealos.io/_next/image?url=.%2Fimages%2Fbloated-conta...
Either way, hope the user was communicated with or alerted to what's going on.
At the same time, someone said that 800 GB container images are a problem in of themselves no matter the circumstances and they got downvoted for saying so - yet I mostly agree.
Most of mine are about 50-250 MB at most and even if you need big ones with software that's GB in size, you will still be happier if you treat them as something largely immutable. I've never had odd issues with them thanks to this. If you really care about data persistence, then you can use volumes/bind mounts or if you don’t then just throw things into tmpfs.
I'm not sure whether treating containers as something long lived with additional commmits/layers is a great idea, but if it works for other people, then good for them. Must be a pain to run something so foundational for your clients, though, cause you'll be exposed to most of the edge cases imaginable sooner or later.
For stuff like security keys you should typically add them as build --args-- secrets, not as content in the image.
Build args are content in the image: https://docs.docker.com/reference/build-checks/secrets-used-...
Do not use build arguments for anything secret. The values are committed into the image layers.
The thing here is they're using Docker container images like if they were VM disks and they end up with images with almost 300 layers, like in this case. I think LXC or VMs should be a better case for this (but I don't know if they've tested it or why are they using Docker)
2GB is the expected and default size for a docker image. It's a bit bloated even.
(And indeed, the images are broken in Firefox and Edge. Is there another browser where they're not broken?)