fix(memory): walk VLM nested config for accurate KV+SDPA estimation by cfbraun · Pull Request #1448 · jundot/omlx

cfbraun · 2026-05-27T07:32:22Z

Summary

_set_model_info_for_monitor reads model.config.num_hidden_layers directly. For multimodal models (Qwen3.6-VL, Gemma-4, some Llava / HF auto-wrappers) the top-level config holds the vision tower dimensions; the actual language model lives under text_config / language_config / llm_config. Reading the wrong field under-estimates the LM's KV+SDPA peak by a constant factor (a 40-layer LM wrapped in a 33-layer vision tower undercounts by ~20%) and lets _preflight_memory_check approve prefills that go on to crash Metal mid-prefill.

Fix: probe text_config / language_config / llm_config in priority order before extracting dimensions. If any sub-config exposes num_hidden_layers (or the legacy n_layer alias for GPT-style configs), prefer it — even when the top-level config also has a value, since the top-level value is the unreliable one on these wrappers. Falls back to the top-level config when no sub-config qualifies.

Test plan

pytest tests/test_memory_monitor_vlm_config.py::TestSetModelInfoForMonitorVLMWalk — 6 passed
- text_config / language_config / llm_config each take priority over top-level vision-tower dims
- plain LM (no sub-configs) still works
- sub-config without a layer count is correctly skipped
- GPT-style n_layer in the sub-config is recognized

``_set_model_info_for_monitor`` reads ``model.config.num_hidden_layers`` directly. For multimodal models (Qwen3.6-VL, Gemma-4, some Llava / HF auto-wrappers) the top-level config holds the *vision tower* dimensions; the actual language model lives under ``text_config`` / ``language_config`` / ``llm_config``. Reading the wrong field under-estimates the LM's KV+SDPA peak by a constant factor (a 40-layer LM wrapped in a 33-layer vision tower undercounts by ~20%) and lets ``_preflight_memory_check`` approve prefills that go on to crash Metal mid-prefill. Fix: probe ``text_config`` / ``language_config`` / ``llm_config`` in priority order before extracting dimensions. If any sub-config exposes ``num_hidden_layers`` (or the legacy ``n_layer`` alias for GPT-style configs), prefer it — even when the top-level config also has a value, since the top-level value is the unreliable one on these wrappers. Falls back to the top-level config when no sub-config qualifies. Tests (tests/test_memory_monitor_vlm_config.py::TestSetModelInfoForMonitorVLMWalk): - text_config / language_config / llm_config each take priority over top-level vision-tower dims - plain LM (no sub-configs) still works - sub-config without a layer count is correctly skipped - ``n_layer`` (GPT-style) in the sub-config is recognized 6 new tests pass; 93 existing scheduler tests pass on this branch.

jundot · 2026-05-28T08:47:52Z

Thanks for tracing this — VLM nested config is a real gap. I confirmed mlx-vlm Qwen3-VL/Gemma-4 configs (and the gemma-4-31B-it config.json on disk) and they all carry the LM layer count under text_config only. omlx already walks text_config in models/vlm.py and models/embedding.py; scheduler was just missing it.

Merging.

The first CI run on this PR (https://github.com/jundot/omlx/actions/runs/26738412492) surfaced three failures introduced by the rebase against upstream's cb0904d (predictive prefill throttle) and our own change to ``Scheduler.__init__`` (constructs a MemoryMonitor in estimator-only mode so preflight checks work without prior set_model_info): 1. ``_set_model_info_for_monitor`` accepted MagicMock-style proxies as ``head_dim`` / ``num_kv_heads`` / ``num_layers`` because the ``if num_layers and num_kv_heads and head_dim:`` truthiness gate doesn't reject auto-attribute mocks. The estimator path (``estimate_chunk_transient_bytes`` → ``if hd > 128``) then blew up the upstream throttle tests with ``TypeError: '>' not supported between instances of 'MagicMock' and 'int'``. Tighten to ``isinstance(..., int) and ... > 0`` so the estimator stays disabled until set_model_info is called with genuine model dims. 2. ``test_schedule_waiting_preflight_rejection_releases`` patched ``_preflight_memory_check`` to return a plain ``str`` (matching the pre-PR contract). Our PR types the return as ``_PreflightRejection`` and the rejection-path code in ``_schedule_waiting`` reads ``.message`` on it. Update the mock to return an actual ``_PreflightRejection`` so the test exercises the real shape. 3. ``test_picks_text_config_over_top_level_vision_dims`` (added by our merged jundot#1448) asserted ``sched.memory_monitor is None`` at init. Our PR now constructs the monitor in estimator-only mode at init time; replace the assertion with the actual setup it needs (overwrite ``memory_monitor`` with a MagicMock to inspect ``set_model_info`` calls).

cfbraun force-pushed the pr/vlm-nested-config-estimation branch 3 times, most recently from 675402a to 2035d24 Compare May 28, 2026 05:48

cfbraun force-pushed the pr/vlm-nested-config-estimation branch from 2035d24 to 0920a2c Compare May 28, 2026 08:01

jundot merged commit b5bfdd9 into jundot:main May 28, 2026

This was referenced Jun 9, 2026

[BUG] preflight memory check over-estimates KV peak ~4x when turboquant_kv_enabled, causing false 413 rejections after #1448 #1763

Closed

fix(memory): treat MLX fused SDPA as O(n) for all head_dim #1764

Merged

hojin12312 mentioned this pull request Jun 10, 2026

Regression risk from #1764: MLX fused full (prefill) SDPA only supports head_dim 64/80/128 — estimator now under-estimates head_dim=256 prefill transients, disabling the 413 guard #1797

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(memory): walk VLM nested config for accurate KV+SDPA estimation#1448

fix(memory): walk VLM nested config for accurate KV+SDPA estimation#1448
jundot merged 1 commit into
jundot:mainfrom
cfbraun:pr/vlm-nested-config-estimation

cfbraun commented May 27, 2026

Uh oh!

jundot commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cfbraun commented May 27, 2026

Summary

Test plan

Uh oh!

jundot commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants