Add pre-flight memory validation for Metal agent by Defilan · Pull Request #204 · defilantech/LLMKube

Defilan · 2026-03-04T05:13:52Z

Summary

Adds pre-flight memory estimation before spawning llama-server on Apple Silicon, preventing OOM crashes and macOS force-kills
Estimates memory as weights + KV cache (from GGUF metadata) + 512MB overhead, with a file-size heuristic fallback when metadata is unavailable
Auto-detects memory budget fraction (67% for systems ≤36GB, 75% for larger) with --memory-fraction flag for manual override
Rejects models that exceed budget by setting SchedulingStatus: InsufficientMemory on the InferenceService

Test plan

make test — all existing + 15 new unit tests pass
make vet && make fmt — clean
make lint — 0 issues
make build — builds on darwin (real provider) and linux (stub)
Manual: deploy a model on Metal with --memory-fraction 0.5 on a small system, verify rejection with actionable error message

Closes #185

Estimate model memory requirements (weights + KV cache + overhead) before spawning llama-server processes on Apple Silicon. Refuses to start models that exceed the system's memory budget, preventing OOM crashes and macOS force-kills. - MemoryProvider interface with DarwinMemoryProvider (sysctl/vm_stat) and non-darwin stub for CI - KV cache estimation from GGUF metadata with file-size heuristic fallback - Auto-detected memory fraction (67% for ≤36GB, 75% for larger systems) - Sets InferenceService SchedulingStatus to InsufficientMemory on rejection - --memory-fraction flag for manual override - 15 unit tests covering estimation, budget checks, formatting, parsing Closes #185 Signed-off-by: Christopher Maher <chris@mahercode.io>

Document --memory-fraction flag, InsufficientMemory troubleshooting, and memory budget behavior in the Metal Agent guide and quickstart. Closes #185 Signed-off-by: Christopher Maher <chris@mahercode.io>

Defilan added 2 commits March 3, 2026 21:13

Update Metal agent docs for pre-flight memory validation

c88ff27

Document --memory-fraction flag, InsufficientMemory troubleshooting, and memory budget behavior in the Metal Agent guide and quickstart. Closes #185 Signed-off-by: Christopher Maher <chris@mahercode.io>

Defilan merged commit ba252ef into main Mar 4, 2026
15 checks passed

Defilan deleted the feat/metal-agent-memory-validation branch March 4, 2026 05:33

This was referenced Mar 4, 2026

chore: release 0.5.0 #191

Merged

fix: correct CHANGELOG entry from 0.4.21 to 0.5.0 #212

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pre-flight memory validation for Metal agent#204

Add pre-flight memory validation for Metal agent#204
Defilan merged 2 commits intomainfrom
feat/metal-agent-memory-validation

Defilan commented Mar 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Defilan commented Mar 4, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant