fix: count tracked model memory in pre-load admission (#1623) by popfido · Pull Request #1766 · jundot/omlx

popfido · 2026-06-09T09:39:35Z

Summary

Fixes #1623 (v0.4.0 regression: a second large model loads side-by-side instead of switching, over-committing memory and failing the next prefill).

The pre-load admission check in EnginePool.get_engine() gated on live memory only:

current = max(mx.get_active_memory(), get_phys_footprint())
projected = current + entry.estimated_size
if projected <= ceiling: break   # admit, no eviction

It never consulted the tracked accumulator _current_model_memory. After a model settles or idles, both mx.get_active_memory() and the process footprint can read well below the model's true resident size (Metal releases pooled memory; the SSD hot cache isn't counted), while _current_model_memory still reflects the committed total. So loading a second large model projected under the ceiling and was admitted without evicting the first — pushing memory to 57+ GB and failing the next request with a prefill memory-guard error.

This is a v0.4.0 regression: 0.3.7's _ensure_memory_available() (removed along with max_model_memory in the memory_guard_tier refactor) evicted using the accumulator.

Fix

Add self._current_model_memory to the max():

current = max(
    mx.get_active_memory(),
    get_phys_footprint(),
    self._current_model_memory,
)

The accumulator is a floor on committed model memory that live reads can dip below. This also makes pre-load admission consistent with the request-time prefill-eviction path (_evict_idle_lru_for_prefill), which already takes max(active, phys, _current_model_memory) — the pre-load gate was the lone path missing it.

Test plan

New regression test test_eviction_when_live_memory_undercounts: with active = phys = 0 but the accumulator showing the pair over-commits, the second model must still evict the first. Fails on main (model-a not evicted), passes with the fix.
pytest tests/test_engine_pool.py — 83 passed (incl. the existing test_eviction_before_load, unaffected: its fixture already proxies phys to the accumulator, so max(0, acc, acc) == acc).
pytest tests/test_process_memory_enforcer.py tests/test_scheduler_admission.py tests/test_scheduler_prefill_memory_guard.py tests/test_memory_monitor.py tests/test_server_prefill_memory_handler.py — 177 passed.
No new ruff findings.

The pre-load admission check in get_engine() gated on current = max(mx.get_active_memory(), get_phys_footprint()) and never consulted the tracked accumulator _current_model_memory. After a model settles or idles, both live-memory reads can fall well below the model's true resident size, so loading a second large model projected under the ceiling and admitted without evicting the first — over-committing to 57+ GB and failing the next prefill with a memory-guard error (v0.4.0 regression; 0.3.7's removed _ensure_memory_available() used the accumulator). Add self._current_model_memory to the max(), making pre-load admission consistent with the request-time prefill-eviction path, which already counts the accumulator. Adds a regression test where live memory under-reports (active=phys=0) but the accumulator shows the pair over-commits: the second model must still evict the first.

jundot · 2026-06-10T02:18:45Z

Thanks for the fix. I checked the pre-load admission path against the prefill eviction path, and counting _current_model_memory here makes the decision consistent when live memory under-reports. The regression test covers the missing case cleanly, and the related memory/scheduler tests pass on my side.

This looks good to me, and I'm going to merge it.

popfido force-pushed the fix/preload-admission-accumulator-1623 branch from 240722e to 2d5b524 Compare June 9, 2026 10:05

jundot merged commit be18e07 into jundot:main Jun 10, 2026
4 checks passed

popfido deleted the fix/preload-admission-accumulator-1623 branch June 10, 2026 02:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: count tracked model memory in pre-load admission (#1623)#1766

fix: count tracked model memory in pre-load admission (#1623)#1766
jundot merged 1 commit into
jundot:mainfrom
popfido:fix/preload-admission-accumulator-1623

popfido commented Jun 9, 2026

Uh oh!

jundot commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

popfido commented Jun 9, 2026

Summary

Fix

Test plan

Uh oh!

jundot commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants