Docker Model Runner

100 points by kordlessagain 3 months ago

jpgvm 3 months ago

I'm looking at using OCI at $DAY_JOB for model distribution for fleets of machines also so it's good to see it's getting some traction elsewhere.

OCI has some benefits over other systems, namely that tiered caching/pull-through is already pretty battle-tested as is signing etc, beating more naive distribution methods for reliability, performance and trust.

If combined with eStargz or zstd::chunked it's also pretty nice for distributed systems as long as you can slice things up into files in such a way that not every machine needs to pull the full model weights.

Failing that there are P2P distribution mechanisms for OCI (Dragonfly etc) that can lessen the burden without resorting to DIY on Bittorrent or similar.

remram 3 months ago

Kubernetes added "image volumes" so this will probably become more and more common: https://kubernetes.io/blog/2024/08/16/kubernetes-1-31-image-...
- jpgvm 3 months ago
  
  That is exactly the feature we are using, right now you need to be on a beta release of containerd but before long it should be pretty widespread. In combination with lazy pull (eStargz) it's a pretty compelling implementation.
- mdaniel 3 months ago
  
  Damn, that's handy. I now wonder how much trouble making a CSI driver that does this would be for backporting to the 1.2x clusters (since I don't think that kubernetes does backports for anything)
  - jpgvm 3 months ago
    
    Not too hard. If you happen to be on CRI-O this has been implemented for a bit but if you are like us and on containerd then you need the new 2.1 beta release. That does most of the heavy lifting, implementing a CSI driver that mounted these as PVs wouldn't be super hard I don't think and you could borrow liberally from the volume source implementation.
wofo 3 months ago

I've been pretty disappointed with eStargz performance, though... Do you have any numbers you can share? All over the internet people refer to numbers from 10 years ago, from workloads that don't seem realistic at all. In my experiments it didn't provide a significant enough speedup.
(I ended up developing an alternative pull mechanism, which is described in https://outerbounds.com/blog/faster-cloud-compute though note that the article is a bit light on the technical details)
- jpgvm 3 months ago
  
  In our case some machines would need to access less than 1% of the image size but being able to have an image with the entire model weights as a single artifact is an important feature in and of itself. In our specific scenario even if eStargz would be slow by filesystem standards it's competing with network transfer anyway so if it's the same order of magnitude as rsync that will do.
  I don't have any perf numbers I can share but I can say we see ~30% compression with eStargz which is already a small win atleast heh.

israrkhan 3 months ago

Be aware of licensing restrictions. Docker Desktop is free for personal use, but it requires a paid license if you work for an organization sized 250+. This feature seems to be available in Docker Desktop only.

francesco-corti 3 months ago

Note: I'm part of the team developing this feature.
Soon (end of May, according to the current roadmap) this feature will also be available with the Docker Engine (so not only as part of Docker Desktop).
As a reminder, Docker Engine is the Community Edition, Open Source and free for everyone.
- cmiles74 3 months ago
  
  My understanding has always been that Docker Engine was only available directly on Linux. If you are running another operating system then you will need to run Docker Desktop (which, in turn, runs a Docker Engine instance in a VM).
  This comment kind of makes it sound like maybe you can run Docker Engine directly on these operating systems (MacOS, Windows, etc.), is that the case?
  - mdaniel 3 months ago
    
    I wanted to offer that the (Rancher Desktop, lima, colima, etc) products also launch a virtual machine and install docker on it, so one doesn't need Docker Desktop to do that. My experience has been that the choice of "frontend" to manage the VM and its software largely comes down to one's comfort level with the CLI, and/or how much customization one wishes over that experience
  - dboreham 3 months ago
    
    Quick note that on Windows you don't need docker desktop. It's convenient, but regular docker can be run in WSL2 (which is the same VM that docker desktop uses).
  - aednichols 3 months ago
    
    Docker Engine uses a feature of the Linux kernel called namespaces. Alternate OSes require a Linux VM. As another commenter mentioned, apps like Orbstack, Podman Desktop, and Docker Desktop provide a facility to create such a VM.
- daveguy 3 months ago
  
  Is it still the case that you can't run Docker Engine Community Edition on a windows machine?
  - kiview 3 months ago
    
    (disclaimer: I'm leading the Docker Model Runner team at Docker)
    You were always able to manually install Docker CE within WSL2 on Windows. But if you want to have an integrated Docker experience on the Windows host, you need to use Docker Desktop, which will ship it's own Linux VM and performs the transparent integration with the Windows host.
    This is fully independent of the Docker Model Runner feature thought :)

leowoo91 3 months ago

I don't understand why add another domain-specific command to a container manager and go out of scope for what the tool was designed for at first place.

saidinesh5 3 months ago

The main benefit I see for cloud platforms: caching/co-hosting various services based on model instead of (model + user's API layer on top).
For the end user, it would be one less deployment headache to worry about: not having to package ollama + the model into docker containers for deployment. Also a more standardized deployment for hardware accelerated models across platforms.
anentropic 3 months ago

gotta have an AI strategy to report to the board
kiview 3 months ago

(disclaimer: I'm leading the Docker Model Runner team at Docker)
It's fine to disagree of course, but we envision Docker as a tool that has a higher abstraction level than just container management. That's why having a new domain-specific command (that also uses domain-specific technology that is independent from containers, at least on some platform targets) is a cohesive design choice from our perspective.

rockwotj 3 months ago

Looks exactly like ollama but built into Docker desktop? Anyone know of any differences?

blitzar 3 months ago

Hear me out here ... it's like docker, but with Ai <pause for gasps and applause>.
Seems fair to raise 1bn at a valuation of 100bn. (Might roll the funds over into pitching Kubernetes, but with Ai next month)
- danparsonson 3 months ago
  
  What they really need is a Studio Ghibli'd version of their logo
ammo1662 3 months ago

They are using OCI artifacts to package models, so you can use your own registry to host these models internally. However, I just can't see any improvement comparing with a simple FTP server. I don't think the LLM models can adopt hierarchical structures like Docker, and thus cannot leverage the benefits of layered file systems, such as caching and reuse.
- remram 3 months ago
  
  I think ollama uses OCI too? At least it's trying to. https://github.com/ollama/ollama/issues/914#issuecomment-195...
  - hobofan 3 months ago
    
    Yes, ollama also uses OCI, but currently only works with unauthenticated registries.
- jesserwilliams 3 months ago
  
  It's not the only one using OCI to package models. There's a CNCF project called KitOps (https://kitops.org) that has been around for quite a bit longer. It solves some of the limitations that using Docker has, one of those being that you don't have to pull the entire project when you want to work on it. Instead, you can pull just the data set, tuning, model, etc.
krick 3 months ago

They imply it should be somehow optimized for apple silicon, but, yeah, I don't understand what this is. If docker can use GPU, well, it should be able to use GPU in any container that makes use of it properly. If (say) ollama as an app doesn't use it properly, but they figured a way to do it better, it would make more sense to fix ollama. I have no idea why this should be a different app than, well, the very docker daemon itself.
- mappu 3 months ago
  
  All that work (AGX acceleration...) is done in llama.cpp, not ollama. Ollama's raison d'être is a docker-style frontend to llama.cpp, so it makes sense that Docker would encroach from that angle.
gclawes 3 months ago

Aren't some of the ollama guys ex-Docker guys?
- rockwotj 3 months ago
  
  yes

tgmatt 3 months ago

Seems like https://kitops.org/ but fewer features.

Havoc 3 months ago

Can’t say I'm a fan of packaging models as docker images. Feels forced - a solution in search of a problem.

The existing stack - a server and model file - works just fine. There doesn’t seem to be a need to jam an abstraction layer in there. The core problem docker solves just isn’t there

kiview 3 months ago

(disclaimer: I'm leading the Docker Model Runner team at Docker)
We are not packaging models as Docker images, since indeed that is the wrong fit and comes with all kinds of technical problems. It also feels wrong to pure package data (which models are) into an image, which generally expects to be a runnable artifact.
That's why we decided to use OCI Artifacts, and specify our own OCI Artifact subset that is better suited for the use case. The spec and implementation is OSS, you can check it out here: https://github.com/docker/model-spec
gardnr 3 months ago

> GPU acceleration on Apple silicon
There is at least one benefit. I'd be interested to see what their security model is.
- cmiles74 3 months ago
  
  Is this really a Docker feature, though? llama.cpp provides acceleration on Apple hardware, I guess you could create a Docker image with llama.cpp and an LLLM model and have mostly this feature.
  - kiview 3 months ago
    
    Unfortunately not, since the container won't have access to the Apple silicon GPU. That's why in our architecture, we have to run llama.cpp as a host process and wire it up with the rest of the Docker Desktop architecture, to make it easily accessible from containers.

avs733 3 months ago

I'm going to take a contrarian perspective to the theme of comments here...

There are currently very good uses for this and likely going to be more. There are increasing numbers of large generative AI models used in technical design work (e.g., semiconductor rules based design/validation, EUV mask design, design optimization). Many/most don't need to run all the time. Some have licensing that is based on length of time running, credits, etc. Some are just huge and intensive, but not run very often in the design glow. Many are run on the cloud but industrial customers are remiss to run them on someone else's cloud

Being able to have my GPU cluster/data center be running a ton of different and smaller models during the day or early in the design, and then be turned over to a full CFD or validation run as your office staff goes home seems to be to be useful. Especially if you are in anyway getting billed by your vendor based on run time or similar. It can mean a more flexible hardware investment. The use casae here is going to be Formula 1 teams, silicon vendors, etc. - not pure tech companies.

superb_dev 3 months ago

Looks like Docker is feeling left out of the GenAI bubble. It’s a little late…

bsenftner 3 months ago

I wonder if the adult kids of some Docker execs own Macs, and they make it. Why on Earth make this not for the larger installed OSes, you know, the ones running Docker in production?
- kiview 3 months ago
  
  (disclaimer: I'm leading the Docker Model Runner team at Docker)
  We decided to start with Apple silicon Macs, because they provide one of the worst experiences of running LLMs in a containerized form, while at the same time having very capable hardware, so it felt like a very sad situation for Mac users (because of the lack of GPU access within containers).
  And of course we understand who our users are, so believe me when I say, macOS users on Apple silicon make up a significant portion of our user case, else we would not have started with it.
  In production environments on Docker CE, you can already mount the GPUs, so while the UX is not great, it is not a blocker.
  However, we have first class support for Docker Model Runner within Docker CE on our roadmap and we hope it comes sooner rather than later ;) It will also be purely OSS, so no worries there.
- pridkett 3 months ago
  
  Because the ones running Docker in production aren’t paying the license fees they make you pay to use Docker Desktop.
- amouat 3 months ago
  
  I'm pretty sure that's in development, it's just more difficult.

waffletower 3 months ago

I have used Replicate Cog, built on docker, fairly heavily and and find it is a decent compromise of features. Docker taking this use case more seriously is quite welcome, though surprisingly late. Local metal GPU support (where available to the containerized application APIs), not currently available in Cog, is attractive though it would require generalization of application code to support containers executable via Cuda and Metal etc.

mdaniel 3 months ago

I knew about Replicate but not about Cog, so linky if others are similarly interested https://github.com/replicate/cog#how-it-works (Apache 2)

tuananh 3 months ago

they are about ~2 years late.

ako 3 months ago

Doesn't matter. I have docker and ollama running, would be nice to ditch ollama and run everything through docker.
kiview 3 months ago

Better late than never ;)