Skip to content

[BUG] Metal quickstart not working as expected #166

@matiasinsaurralde

Description

@matiasinsaurralde

Bug Description

When trying the Metal quickstart on a fresh minikube setup, building the operator from latest main I get:

% kubectl port-forward svc/llama-3-2-3b 8080:8080
error: cannot attach to *v1.Service: invalid service 'llama-3-2-3b': Service is defined without a selector

Steps to Reproduce

  1. Start minikube: minikube start
  2. Install the operator:make install
  3. Run the operator: make run ARGS="--model-cache-path=/tmp/llmkube-models"
  4. In another terminal run the metal agent: go run cmd/metal-agent/main.go -llama-server /opt/homebrew/bin/llama-server -log-level debug
  5. Deploy a model specifying metal accelerator: go run cmd/cli/main.go deploy llama-3.2-3b --accelerator metal
  6. Try K8S port forward and hit the completion endpoint. Port forward fails.

Expected Behavior

Either update the instructions or investigate the selector issue.

Actual Behavior

Output is:

% go run cmd/cli/main.go deploy llama-3.2-3b --accelerator metal
📚 Using catalog model: Llama 3.2 3B Instruct

🚀 Deploying LLM inference service
═══════════════════════════════════════════════
Name:        llama-3.2-3b
Namespace:   default
Accelerator: metal
Replicas:    1
Context:     8192 tokens
Image:       ghcr.io/ggml-org/llama.cpp:server
═══════════════════════════════════════════════

📦 Creating Model 'llama-3.2-3b'...
   ✅ Model created

⚙️  Creating InferenceService 'llama-3.2-3b'...
   ✅ InferenceService created

Waiting for deployment to be ready (timeout: 10m0s)...


[2s] Model: Ready, Service: Ready (1/1 replicas)

✅ Deployment ready!
═══════════════════════════════════════════════
Model:       llama-3.2-3b
Size:        2.2 GiB
Path:        /tmp/llmkube-models/086521cd8ae5819e/model.gguf
Endpoint:    http://llama-3-2-3b.default.svc.cluster.local:8080/v1/chat/completions
Replicas:    1/1
═══════════════════════════════════════════════

🧪 To test the inference endpoint:

  # Port forward the service
  kubectl port-forward -n default svc/llama-3-2-3b 8080:8080

  # Send a test request
  curl http://localhost:8080/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{"messages":[{"role":"user","content":"What is 2+2?"}]}'

Then attempt to run the port-forward:

% kubectl port-forward svc/llama-3-2-3b 8080:8080
error: cannot attach to *v1.Service: invalid service 'llama-3-2-3b': Service is defined without a selector

K8S services and endpoints are created but I don't think a deployment or proxy pod is created at this stage?

Environment

  • macOS Tahoe 26.2
  • Macbook Pro (M4 Pro, Nov 2024)
  • minikube v1.38.1 / commit c93a4cb9311efc66b90d33ea03f75f2c4120e9b0
  • Tested using a fresh/clean setup a few times.
  • Latest main / 07630b8

Cluster Type:

  • GKE
  • EKS
  • AKS
  • minikube
  • kind
  • K3s
  • Other:

GPU (if applicable):

  • NVIDIA T4
  • NVIDIA L4
  • NVIDIA V100
  • NVIDIA A100
  • None (CPU only)
  • Other: Metal

Logs

YAML Manifests

Additional Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions