feat(server): add max_context_window_policy for server-wide context cap#1628
Conversation
|
Thanks for explaining the motivation. I agree the current dashboard wording can read like a policy cap, but I do not think changing the existing max_context_window semantics is the right fix. To me, this setting is better treated as a fallback/default for models whose native context length cannot be discovered. If a model declares 262k in config.json, that native value is the better source of truth for the model capability exposed by /v1/models and used by validation. A single global hard cap also feels too broad as an operational control. The memory cost of the same context length varies a lot by model size, architecture, quantization, KV cache behavior, and batching, so operators would likely have to tune it for the largest active model. That makes it less useful as a server-wide policy than either per-model settings or a separate explicit cap. It would also change behavior for existing settings.json files that persisted the old 32768 default, causing long-context models to be clamped after upgrade. I think the safer direction is to keep max_context_window as the fallback behavior, update the dashboard label/help text to make that clear, and add a separate explicit global policy cap later if that feature is needed. |
|
@jundot thanks for the considered review. The migration risk is the load-bearing concern here, so pushing back specifically on that — the rest of the analysis follows from where it lands. The persisted If the migration mechanic itself is the blocker, I can push a version-gated Briefly on the other points:
Happy to push the version-gated migration if that closes the upgrade-safety concern. If you'd rather go with the label-only fix + separate field later, I'll close this and rework — let me know which path. |
|
@cfbraun Thanks, that framing helps. I agree the operator need for a server-wide context ceiling is valid, but I do not want to change the semantics of the existing max_context_window field. I want to keep the current field as the fallback value used when a model's native context length cannot be discovered, and make that meaning clearer in the dashboard text. For the hard policy behavior, I think the safer design is to add a separate nullable global context limit setting that is unset by default and only applies min(native_context, global_limit) when explicitly configured. That preserves existing settings.json behavior, avoids treating stale 32768 defaults as new hard caps, and avoids using 1_000_000 as a pseudo "no policy" value for models with native context above that. If you are open to that direction, this PR can be reworked around the separate setting. |
f292112 to
725a9ff
Compare
725a9ff to
511095c
Compare
|
@jundot Reworked exactly around your proposal. The PR now adds The three properties you flagged are preserved:
Title + body updated to match the new design. Force-pushed Ready for another look. |
511095c to
9d3dd05
Compare
Replaces the original PR design (rework reusing max_context_window semantics) with @jundot's proposed shape after review: > I want to keep the current field as the fallback value used when a > model's native context length cannot be discovered, and make that > meaning clearer in the dashboard text. For the hard policy behavior, > I think the safer design is to add a separate nullable global > context limit setting that is unset by default and only applies > min(native_context, global_limit) when explicitly configured. What this PR adds: - ``SamplingSettings.max_context_window_policy: int | None = None`` (nullable, unset by default) in both ``omlx/server.py`` (``SamplingDefaults``) and ``omlx/settings.py`` (``SamplingSettings``). - ``get_max_context_window`` returns ``min(native, policy)`` only when policy is set and the model's native context length was discovered via its ``config.json``. Per-model overrides and the fallback default are not capped — those are explicit operator choices that the policy must not override. Why a separate field: - Preserves existing ``settings.json`` behavior. Files carrying the historical ``max_context_window=32768`` default keep working as fallback only; no model gets silently clamped after upgrade. - Avoids using a sentinel like ``1_000_000`` as a pseudo "no policy" value (which would have failed for models whose native context is above that). - Discoverability: the new field has a name that reads as what it does (``..._policy``), and the existing field's role is unchanged so its dashboard label can be clarified without breaking semantics. Tests under ``TestGetMaxContextWindow``: - Policy unset → native returned verbatim (zero behavior change). - Policy set + native > policy → ``min`` clamps to policy. - Policy set + native < policy → native wins (cap, not floor). - Per-model override escapes policy clamp. - Policy does not apply to the fallback-only path (no native, no per-model override) — fallback wins. Plus ``test_policy_field_round_trip`` in ``TestSamplingSettings`` covers serialization/deserialization keeping ``None`` semantics.
9d3dd05 to
5beb395
Compare
|
Thanks for reworking this around the separate nullable setting. The core server-side resolution now matches the design I was looking for: the existing max_context_window remains the fallback, and max_context_window_policy is unset by default and only caps discovered native context lengths. I'm going to handle the remaining admin UI/settings polish as a follow-up: clarify the existing global setting so it reads as a fallback, and add the new optional "Global Context Window Cap" at the top of Generation Defaults with an empty/default None value. This looks good to me, and I'm going to merge it. |
Resolves conflicts in favor of the merged upstream shape for: - jundot#1628 (max_context_window_policy): upstream merged the reworked separate-nullable-field design, not the original "reuse max_context_window" design that still sat on this branch. Drop the 1_000_000 default and adopt the 32768 fallback + optional policy cap jundot ultimately wanted. - jundot#1451 (inline LRU unlinks): upstream's slice+reinsert variant of the burst cap landed (with observability counters from 021162b). Drop the local push-cap-into-evict_until_size variant. - Test suites realigned to the merged shapes (TestInlineLRUUnlinks, policy-cap test names, 32768 fallback assertions).
Summary
Reworked per the maintainer review. The original PR changed
max_context_windowsemantics from "fallback" to "policy cap"; this rev instead adds a separate nullable field alongside the existing fallback, preserving all current behavior unless explicitly opted in.Design
SamplingSettings.max_context_window_policy: int | None = None(unset by default). When set,get_max_context_windowreturnsmin(native, policy)for models whose native context length was discovered. Per-model overrides and the fallback default are not capped — those are explicit operator choices that the policy must not silently override.Resolution
Why a separate field
settings.jsonbehavior — files carrying the historicalmax_context_window=32768default keep working as fallback only; no model gets silently clamped after upgrade.1_000_000as a pseudo "no policy" value, which would have failed for models whose native context is above that...._policyreads as what it does, and the existing field's role is unchanged so the dashboard label can be clarified without breaking semantics.Test plan
pytest tests/test_context_window.py tests/test_context_scaling.py tests/test_settings.py tests/test_server.py— 243 passedTestGetMaxContextWindowcases: policy unset, policy clamps, native below policy, per-model override escapes, fallback-only path unaffectedTestSamplingSettings::test_policy_field_round_tripcovers None-semantics serializationDashboard label/help-text clarification for the existing field is intentionally out of scope for this PR.