OCI has some benefits over other systems, namely that tiered caching/pull-through is already pretty battle-tested as is signing etc, beating more naive distribution methods for reliability, performance and trust.
If combined with eStargz or zstd::chunked it's also pretty nice for distributed systems as long as you can slice things up into files in such a way that not every machine needs to pull the full model weights.
Failing that there are P2P distribution mechanisms for OCI (Dragonfly etc) that can lessen the burden without resorting to DIY on Bittorrent or similar.
(I ended up developing an alternative pull mechanism, which is described in https://outerbounds.com/blog/faster-cloud-compute though note that the article is a bit light on the technical details)
I don't have any perf numbers I can share but I can say we see ~30% compression with eStargz which is already a small win atleast heh.
Soon (end of May, according to the current roadmap) this feature will also be available with the Docker Engine (so not only as part of Docker Desktop).
As a reminder, Docker Engine is the Community Edition, Open Source and free for everyone.
This comment kind of makes it sound like maybe you can run Docker Engine directly on these operating systems (MacOS, Windows, etc.), is that the case?
You were always able to manually install Docker CE within WSL2 on Windows. But if you want to have an integrated Docker experience on the Windows host, you need to use Docker Desktop, which will ship it's own Linux VM and performs the transparent integration with the Windows host.
This is fully independent of the Docker Model Runner feature thought :)
For the end user, it would be one less deployment headache to worry about: not having to package ollama + the model into docker containers for deployment. Also a more standardized deployment for hardware accelerated models across platforms.
It's fine to disagree of course, but we envision Docker as a tool that has a higher abstraction level than just container management. That's why having a new domain-specific command (that also uses domain-specific technology that is independent from containers, at least on some platform targets) is a cohesive design choice from our perspective.
Seems fair to raise 1bn at a valuation of 100bn. (Might roll the funds over into pitching Kubernetes, but with Ai next month)
The existing stack - a server and model file - works just fine. There doesn’t seem to be a need to jam an abstraction layer in there. The core problem docker solves just isn’t there
We are not packaging models as Docker images, since indeed that is the wrong fit and comes with all kinds of technical problems. It also feels wrong to pure package data (which models are) into an image, which generally expects to be a runnable artifact.
That's why we decided to use OCI Artifacts, and specify our own OCI Artifact subset that is better suited for the use case. The spec and implementation is OSS, you can check it out here: https://github.com/docker/model-spec
There is at least one benefit. I'd be interested to see what their security model is.
There are currently very good uses for this and likely going to be more. There are increasing numbers of large generative AI models used in technical design work (e.g., semiconductor rules based design/validation, EUV mask design, design optimization). Many/most don't need to run all the time. Some have licensing that is based on length of time running, credits, etc. Some are just huge and intensive, but not run very often in the design glow. Many are run on the cloud but industrial customers are remiss to run them on someone else's cloud
Being able to have my GPU cluster/data center be running a ton of different and smaller models during the day or early in the design, and then be turned over to a full CFD or validation run as your office staff goes home seems to be to be useful. Especially if you are in anyway getting billed by your vendor based on run time or similar. It can mean a more flexible hardware investment. The use casae here is going to be Formula 1 teams, silicon vendors, etc. - not pure tech companies.
We decided to start with Apple silicon Macs, because they provide one of the worst experiences of running LLMs in a containerized form, while at the same time having very capable hardware, so it felt like a very sad situation for Mac users (because of the lack of GPU access within containers).
And of course we understand who our users are, so believe me when I say, macOS users on Apple silicon make up a significant portion of our user case, else we would not have started with it.
In production environments on Docker CE, you can already mount the GPUs, so while the UX is not great, it is not a blocker.
However, we have first class support for Docker Model Runner within Docker CE on our roadmap and we hope it comes sooner rather than later ;) It will also be purely OSS, so no worries there.