-
Notifications
You must be signed in to change notification settings - Fork 4
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Bug Description
When trying the Metal quickstart on a fresh minikube setup, building the operator from latest main I get:
% kubectl port-forward svc/llama-3-2-3b 8080:8080
error: cannot attach to *v1.Service: invalid service 'llama-3-2-3b': Service is defined without a selector
Steps to Reproduce
- Start minikube:
minikube start - Install the operator:
make install - Run the operator:
make run ARGS="--model-cache-path=/tmp/llmkube-models" - In another terminal run the metal agent:
go run cmd/metal-agent/main.go -llama-server /opt/homebrew/bin/llama-server -log-level debug - Deploy a model specifying metal accelerator:
go run cmd/cli/main.go deploy llama-3.2-3b --accelerator metal - Try K8S port forward and hit the completion endpoint. Port forward fails.
Expected Behavior
Either update the instructions or investigate the selector issue.
Actual Behavior
Output is:
% go run cmd/cli/main.go deploy llama-3.2-3b --accelerator metal
📚 Using catalog model: Llama 3.2 3B Instruct
🚀 Deploying LLM inference service
═══════════════════════════════════════════════
Name: llama-3.2-3b
Namespace: default
Accelerator: metal
Replicas: 1
Context: 8192 tokens
Image: ghcr.io/ggml-org/llama.cpp:server
═══════════════════════════════════════════════
📦 Creating Model 'llama-3.2-3b'...
✅ Model created
⚙️ Creating InferenceService 'llama-3.2-3b'...
✅ InferenceService created
Waiting for deployment to be ready (timeout: 10m0s)...
[2s] Model: Ready, Service: Ready (1/1 replicas)
✅ Deployment ready!
═══════════════════════════════════════════════
Model: llama-3.2-3b
Size: 2.2 GiB
Path: /tmp/llmkube-models/086521cd8ae5819e/model.gguf
Endpoint: http://llama-3-2-3b.default.svc.cluster.local:8080/v1/chat/completions
Replicas: 1/1
═══════════════════════════════════════════════
🧪 To test the inference endpoint:
# Port forward the service
kubectl port-forward -n default svc/llama-3-2-3b 8080:8080
# Send a test request
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"What is 2+2?"}]}'
Then attempt to run the port-forward:
% kubectl port-forward svc/llama-3-2-3b 8080:8080
error: cannot attach to *v1.Service: invalid service 'llama-3-2-3b': Service is defined without a selector
K8S services and endpoints are created but I don't think a deployment or proxy pod is created at this stage?
Environment
- macOS Tahoe 26.2
- Macbook Pro (M4 Pro, Nov 2024)
- minikube v1.38.1 / commit
c93a4cb9311efc66b90d33ea03f75f2c4120e9b0 - Tested using a fresh/clean setup a few times.
- Latest
main/ 07630b8
Cluster Type:
- GKE
- EKS
- AKS
- minikube
- kind
- K3s
- Other:
GPU (if applicable):
- NVIDIA T4
- NVIDIA L4
- NVIDIA V100
- NVIDIA A100
- None (CPU only)
- Other: Metal
Logs
YAML Manifests
Additional Context
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working