Skip to content

Add pre-flight memory validation for Metal agent#204

Merged
Defilan merged 2 commits intomainfrom
feat/metal-agent-memory-validation
Mar 4, 2026
Merged

Add pre-flight memory validation for Metal agent#204
Defilan merged 2 commits intomainfrom
feat/metal-agent-memory-validation

Conversation

@Defilan
Copy link
Member

@Defilan Defilan commented Mar 4, 2026

Summary

  • Adds pre-flight memory estimation before spawning llama-server on Apple Silicon, preventing OOM crashes and macOS force-kills
  • Estimates memory as weights + KV cache (from GGUF metadata) + 512MB overhead, with a file-size heuristic fallback when metadata is unavailable
  • Auto-detects memory budget fraction (67% for systems ≤36GB, 75% for larger) with --memory-fraction flag for manual override
  • Rejects models that exceed budget by setting SchedulingStatus: InsufficientMemory on the InferenceService

Test plan

  • make test — all existing + 15 new unit tests pass
  • make vet && make fmt — clean
  • make lint — 0 issues
  • make build — builds on darwin (real provider) and linux (stub)
  • Manual: deploy a model on Metal with --memory-fraction 0.5 on a small system, verify rejection with actionable error message

Closes #185

Defilan added 2 commits March 3, 2026 21:13
Estimate model memory requirements (weights + KV cache + overhead)
before spawning llama-server processes on Apple Silicon. Refuses to
start models that exceed the system's memory budget, preventing OOM
crashes and macOS force-kills.

- MemoryProvider interface with DarwinMemoryProvider (sysctl/vm_stat)
  and non-darwin stub for CI
- KV cache estimation from GGUF metadata with file-size heuristic fallback
- Auto-detected memory fraction (67% for ≤36GB, 75% for larger systems)
- Sets InferenceService SchedulingStatus to InsufficientMemory on rejection
- --memory-fraction flag for manual override
- 15 unit tests covering estimation, budget checks, formatting, parsing

Closes #185

Signed-off-by: Christopher Maher <chris@mahercode.io>
Document --memory-fraction flag, InsufficientMemory troubleshooting,
and memory budget behavior in the Metal Agent guide and quickstart.

Closes #185

Signed-off-by: Christopher Maher <chris@mahercode.io>
@Defilan Defilan merged commit ba252ef into main Mar 4, 2026
15 checks passed
@Defilan Defilan deleted the feat/metal-agent-memory-validation branch March 4, 2026 05:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Metal agent: pre-flight memory validation before spawning llama-server

1 participant