AI Workloads
Colima provides an ideal environment to run GPU-powered AI workloads on Apple Silicon devices.
By leveraging Krunkit for GPU access, AI models run in confined, isolated and secure containers.
Requirements
- Apple Silicon Mac (M1, M2, M3, or newer)
- macOS 13 or newer
- Sufficient RAM for your chosen model (8GB+ recommended)
- Krunkit installed (see below)
Installing Krunkit
Krunkit is required for GPU access on Apple Silicon. Install it via Homebrew:
brew tap slp/krunkit && brew install krunkit
Getting Started
1. Start Colima with Krunkit
The krunkit VM type enables GPU access for AI workloads:
colima start --vm-type krunkit
2. Run a Model
Run an AI model:
# Start an interactive chat session
colima model run gemma3
# Or run a one-off prompt
colima model run gemma3 "Explain quantum computing in simple terms"
This downloads the model (if not cached) and either starts an interactive chat session or returns the response for a one-off prompt.
Model Runners
Colima supports two model runner backends:
- Docker Model Runner (default) - Uses Docker’s native AI model runner. Supports Docker AI Registry and HuggingFace.
- Ramalama - Uses Ramalama for model execution. Supports HuggingFace and Ollama registries.
To switch between runners:
# Use docker runner (default)
colima model run gemma3 --runner docker
# Use ramalama runner
colima model run gemma3 --runner ramalama
The runner can also be configured in the Colima config file. See Configuration - AI Workloads for details.
Supported Model Registries
Colima supports models from multiple registries:
Docker AI Registry (Default)
NOTE: Docker AI Registry is only available with the Docker Model Runner.
The Docker AI Registry provides curated, optimized models for local inference. When no prefix is specified, Docker AI Registry is used by default.
# Run models from Docker AI Registry (default, no prefix needed)
colima model run gemma3
colima model run llama3.2
colima model run qwen2.5
colima model run phi4
colima model run mistral
Browse available models at hub.docker.com/u/ai.
HuggingFace Hub
HuggingFace hosts thousands of open-source models. The prefix syntax differs by runner:
# Docker Model Runner uses hf.co/ prefix
colima model run hf.co/microsoft/Phi-3-mini-4k-instruct-gguf
# Ramalama uses hf:// prefix
colima model run hf://microsoft/Phi-3-mini-4k-instruct-gguf --runner ramalama
Browse models at huggingface.co/models.
Ollama Registry
NOTE: Ollama Registry is only available with the Ramalama runner.
Ollama provides a curated collection of popular open-source models optimized for local inference. Use the ollama:// prefix.
# Run Ollama models with ollama:// prefix (requires ramalama runner)
colima model run ollama://gemma3 --runner ramalama
colima model run ollama://llama3.2 --runner ramalama
Browse available models at ollama.com/library.
Available Commands
Running Models
Without a prompt, you enter an interactive chat session for continuous conversation:
# Start an interactive chat session
colima model run gemma3
For one-off prompts, specify the prompt before the model name:
# One-off prompt
colima model run gemma3 "What is the capital of France?"
More examples:
# Run different models from Docker AI Registry
colima model run llama3.2
colima model run qwen2.5
colima model run phi4
# Run a HuggingFace model (requires hf.co/ prefix)
colima model run hf.co/microsoft/Phi-3-mini-4k-instruct-gguf
# See all available options
colima model --help
Serving Models
Serve a model with a web-based chat interface:
# Serve a model (available at localhost:8080)
colima model serve gemma3
# Or serve a HuggingFace model
colima model serve hf.co/microsoft/Phi-3-mini-4k-instruct-gguf
# Serve on a custom port
colima model serve gemma3 --port 9000
The chat interface will be available at http://localhost:8080 by default. Use --port to specify a different port.
Open WebUI
Open WebUI is a feature-rich, self-hosted web interface for AI models. You can use it with Colima by serving a model and pointing OpenWebUI to the served endpoint.
First, serve a model:
colima model serve gemma3
Then run OpenWeb UI in Docker, specifying the Colima model server as the OpenAI API base URL:
docker run --rm -it -p 3000:8080 \
-e OPENAI_API_BASE_URL=http://host.docker.internal:8080 \
-v open-webui:/app/backend/data \
--name open-webui \
ghcr.io/open-webui/open-webui:main
OpenWebUI will be available at http://localhost:3000.
With web search enabled:
docker run --rm -it -p 3000:8080 \
-e OPENAI_API_BASE_URL=http://host.docker.internal:8080 \
-e ENABLE_WEB_SEARCH=true \
-v open-webui:/app/backend/data \
--name open-webui \
ghcr.io/open-webui/open-webui:main
Note:
host.docker.internalallows the Docker container to access services running on the host machine (in this case, the Colima model server on port 8080).
Resource Management
One of the key benefits of running AI workloads in Colima is the ability to control resource usage. This prevents models from consuming all available system resources.
CPU and Memory Limits
Allocate specific resources to your AI instance:
# Start with 4 CPUs and 8GB RAM
colima start --vm-type krunkit --cpus 4 --memory 8
Disk Space
Set the disk size for model storage:
# Start with 50GB disk
colima start --vm-type krunkit --disk 50
Recommended Resources
| Model Size | Minimum RAM | Recommended RAM | Examples |
|---|---|---|---|
| Tiny (1-2B) | 4GB | 8GB | TinyLlama (1.1B), Gemma 2B |
| Small (3-4B) | 8GB | 12GB | Phi-3 Mini (3.8B), Gemma 3 |
| Medium (7-8B) | 12GB | 16GB | Llama 3.2 (8B), Mistral 7B |
| Large (13B+) | 16GB | 32GB | Phi-4 (14B), Llama 13B, Mixtral |
Tip: Check the model page on HuggingFace or Ollama for specific hardware requirements before running a model.
Note: TinyLlama is useful for validating your setup but has limited capabilities. For meaningful results, use models with 3B+ parameters.
Using Profiles for AI
Create a dedicated profile for AI workloads to keep them separate from your container development:
# Create an AI-specific profile
colima start ai --vm-type krunkit --cpus 4 --memory 16 --disk 50
# Setup and run models on the AI profile
colima model setup -p ai
colima model run gemma3 -p ai
# Your main development profile remains unaffected
colima start # default profile with Docker
This allows you to:
- Keep AI resources isolated from development containers
- Quickly switch between AI and development workloads
- Delete AI resources without affecting your development environment
Security
Colima’s AI workloads benefit from multiple layers of security:
Container Isolation
Models run inside containerized environments, isolated from:
- Your host filesystem
- Other running containers
- Network access (unless explicitly enabled)
Resource Limits
Unlike running models directly on your host, Colima enforces:
- CPU limits to prevent system freeze
- Memory limits to prevent OOM conditions
- Disk quotas to manage storage
Extra Isolation
While the mounted host volumes are not accessible to AI containers by default, you can take security a step further by completely disabling volume mounts:
colima start --vm-type krunkit --mount=none
This ensures no host filesystem access is possible from within the VM, providing maximum isolation for untrusted models.
Troubleshooting
Model Download Issues
If a model fails to download:
# Check available disk space
colima ssh -- df -h
# Increase disk if needed (stop, then start with larger disk)
colima stop
colima start --vm-type krunkit --disk 100
GPU Not Detected
Ensure you’re using the krunkit VM type:
# Check current VM type
colima status
# Restart with krunkit
colima stop
colima start --vm-type krunkit
Out of Memory
If models crash or run slowly:
# Stop current instance
colima stop
# Start with more memory
colima start --vm-type krunkit --memory 16
Resetting AI Setup
To completely reset your AI environment:
# Delete the profile and its data
colima delete --data
# Start fresh
colima start --vm-type krunkit
colima model setup