feat(alloy): context_window safety + min_context_window assertion#14
feat(alloy): context_window safety + min_context_window assertion#14
Conversation
There was a problem hiding this comment.
Pull request overview
Adds context-window metadata and startup validation to alloy configs so misconfigured model mixes (e.g., 32K + 200K) fail fast instead of silently truncating at runtime. This extends the existing alloy provider/config infrastructure with optional fields and exposes the computed minimum for downstream request-fit enforcement.
Changes:
- Add optional
context_windowper alloy constituent and optionalmin_context_windowat the alloy level in config schema. - Compute an effective min context window during
AlloyProvider::from_config()and validate declared constituents against an explicit user floor. - Add
AlloyProvider::min_context_window()plus unit tests covering the new behaviors.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
crates/zeroclawed/src/providers/alloy.rs |
Computes/validates effective context window minimum at provider build time; exposes accessor; adds tests. |
crates/zeroclawed/src/config.rs |
Extends TOML schema types with min_context_window and context_window fields and documentation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Context-window safety: if the user specified min_context_window, every | ||
| // constituent with a declared size must meet it. Otherwise auto-compute | ||
| // the effective ceiling as the min of declared sizes. | ||
| let declared_min = constituents | ||
| .iter() | ||
| .filter_map(|c| c.context_window) | ||
| .min(); | ||
| let effective_min = match (cfg.min_context_window, declared_min) { |
| /// Effective minimum context window (tokens) this alloy can safely serve. | ||
| /// | ||
| /// `None` means no constituent declared a size — caller should treat as | ||
| /// unknown and trust the upstream model to error on over-long inputs. |
| /// `context_window` are treated as unknown and do not participate in | ||
| /// the minimum calculation. | ||
| /// | ||
| /// See `docs/rfcs/model-gateway-primitives.md`. |
Per RFC #13 review decision (no back-compat in prototype phase): every alloy constituent must declare its context_window. Serde enforces the field at config load, so misconfigured alloys fail with a clear parse error instead of quietly accepting "unknown" sizes that could mask routing bugs later. ## Config changes AlloyConstituentConfig.context_window is now a required u32 (was an optional field in the previous draft). Every existing config must be updated to declare sizes; live config on .210 already patched in the matching commit on infra. AlloyConfig.min_context_window remains optional; when unset, it's auto-computed as min(constituent.context_window). When set, validation rejects any constituent whose declared size falls below it with a clear error naming both the offender and the numbers. ## Runtime exposure AlloyProvider::min_context_window() returns u32 (no longer Option), since size declaration is always present. ## Tests 7 tests: round_robin/weighted/stats unchanged; new tests cover auto-compute from mixed-size constituents, shared-size parity, explicit min priority, and explanatory error on constituent-below-floor. The "no declared size" test from the previous draft is deleted — that state is now unreachable. ## Live config migration Deployed alongside this PR: .210's kimi-for-coding alloy gets context_window declared on both constituents (262144 for Kimi, 64000 for DeepSeek V3). No other nodes use [[alloys]] today. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
31b2c6a to
53a1ba5
Compare
- cargo fmt: collapse multi-line array in alloy.rs test (CI fmt gate) - Reject context_window=0 on constituents (clear misconfig error) - Reject min_context_window=0 at alloy build (same) - Drop stale doc reference to docs/rfcs/model-gateway-primitives.md (lives in PR #13, not yet on main) - Simplify min_context_window doc: context_window is required on constituents now, so "treated as unknown" qualifier is dead. Addresses Copilot review feedback on #14.
There was a problem hiding this comment.
Pull request overview
This PR hardens the “alloy” model-blending primitive by requiring each constituent to declare a context_window and by computing/enforcing an effective minimum context window for the alloy, eliminating silent truncation risks from mixing mismatched models.
Changes:
- Add required
context_windowto alloy constituents at the config/schema level and validate it at provider build time. - Introduce
min_context_window(optional) to enforce a floor across constituents or auto-compute from the smallest constituent. - Expose
AlloyProvider::min_context_window()and add unit tests covering min computation and validation errors.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| crates/zeroclawed/src/providers/alloy.rs | Stores validated per-constituent context_window, computes/enforces effective min context window, and adds min_context_window() plus tests. |
| crates/zeroclawed/src/config.rs | Extends config schema with min_context_window and requires context_window on each alloy constituent. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// Minimum context window (tokens) required from every constituent. | ||
| /// | ||
| /// If set: alloy build fails when any constituent with a declared | ||
| /// `context_window` is smaller than this. Catches "I didn't mean to mix a | ||
| /// 32K local model into a 200K-context alloy" at config-load time. | ||
| /// | ||
| /// If unset: auto-computed as `min(constituent.context_window)` over | ||
| /// all constituents. | ||
| #[serde(default)] | ||
| pub min_context_window: Option<u32>, |
| /// Declared context window (tokens) for this constituent. | ||
| /// | ||
| /// **Required.** No sensible default — declared per-constituent so the | ||
| /// alloy can compute and enforce a safe effective ceiling. Without this, | ||
| /// oversized requests could be silently routed to a constituent that | ||
| /// can't hold them. See `docs/rfcs/model-gateway-primitives.md`. | ||
| pub context_window: u32, |
- Cascade semantics: reconcile "errors only" with "pre-skip unfit steps". Now distinguishes ELIGIBILITY (size-based, checked before attempt) from RETRY TRIGGER (errors: timeout, 5xx, 429). - Fix fabricated mechanism reference: there is no `fallbacks` field on AlloyConfig. Describe the real mechanism (`ordered_models` returned from `select_plan` and iterated in `route_with_fallback`). - CharRatio safety margin: explicitly flag that 10% is insufficient for CJK-heavy prompts; suggest remediations (tune chars_per_token, raise safety_margin, or switch to Tiktoken). - Correct "streaming output doesn't count" — it DOES count against the combined context. Describe the input + max_tokens check approach and the default output budget callers must supply. - Fix K-suffix math: 262K = 262*1024 = 268288, not 262144. Add clarifying example using "256K" → 262144. - Remove stale "context_window=0 = unknown" sentinel note; PR #14 rejects 0 explicitly at alloy validation. - Call out [tokenizer], [model_defaults], [[models]] as PROPOSED schema additions, not current.
Guards against silently reintroducing a serde default (Option<u32> or #[serde(default)]) on AlloyConstituentConfig::context_window. Addresses Copilot review feedback on #14.
There was a problem hiding this comment.
Pull request overview
This PR hardens the alloy configuration and provider initialization by requiring each alloy constituent to declare an explicit context_window, and by introducing an optional min_context_window that is validated at startup to prevent unsafe blends that could otherwise lead to silent truncation.
Changes:
- Add required
context_window: u32toAlloyConstituentConfig(serde-enforced at TOML parse time) and propagate it into the runtimeAlloyConstituent. - Add optional
min_context_window: Option<u32>toAlloyConfigand compute/validate an effective minimum inAlloyProvider::from_config(). - Add tests covering: missing
context_windowparse failure, zero-value rejection, auto-computed effective minimum, and explicit minimum enforcement.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| crates/zeroclawed/src/providers/alloy.rs | Adds context window fields, computes/validates min_context_window, exposes min_context_window() accessor, and expands provider tests. |
| crates/zeroclawed/src/config.rs | Extends alloy schema with required context_window and optional min_context_window, plus a deserialization test to ensure context_window remains required. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
* docs(rfc): model-gateway primitives — alloy + cascade + dispatcher Proposes three explicitly-scoped primitives for model-gateway routing, each with distinct semantics, and a shared TokenEstimator trait so they can reason about context-window fit consistently. Problem motivating this: alloys today assume constituents are interchangeable, which breaks when mixing models with wildly different context windows (e.g., local Qwen at 32K and Kimi K2.6 at 262K). Using an alloy for size-dependent routing leads to silent truncation. Proposal: - Alloy (today, + safety) blends equivalent models via sampling. New: min_context_window assertion + runtime fit-check rejection. - Cascade (promoted to named primitive) tries members in order on error. New: skip-on-size when a cascade step can't fit the request. - Dispatcher (new) picks by request shape, MVP rule type is max_input_tokens; extensible to other matchers. Shared TokenEstimator trait: - Default: CharRatioEstimator (chars-per-token configurable, 3.5 default) - Safety margin (default 10%) to bias toward over-estimation and avoid silent truncation footgun. - Pluggable for real tokenizers (tiktoken-rs, sentencepiece) as future opt-in crates behind feature flags. - Configurable globally, per-primitive, and per-model with clear precedence. Includes naming discussion (going with "dispatcher"), migration plan (no breaking changes), edge cases (session-context growth, tool-use inflation, streaming output), and open questions for reviewers. Implementation split into focused follow-up PRs. RFC is the long-form design; PRs execute. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(rfc): incorporate first-round review decisions Captures reviewer resolutions: - dispatcher confirmed as the size-routing primitive name - cascade promoted to a named primitive - safety margin split into two distinct knobs: estimator safety_margin (counting-accuracy pad) and per-model capacity_fraction (avoid the quality-degradation zone near a model's ceiling). Composition formula and rationale added. - TiktokenEstimator included in v1 behind a feature flag; SentencePiece deferred to avoid C++ deps + per-model vocab plumbing for now - dispatcher reevaluate default flipped to per_turn (task-completion flows benefit from auto-promotion over consistency); sticky and sticky_escalate documented as opt-ins - dispatcher rule semantics simplified: default is "first target whose effective ceiling fits the request", computed per-target from context_window × capacity_fraction. Explicit rules remain for non-size routing. - alloy context_window required on every constituent. No back-compat for missing fields — prototype phase, owned installations, worth the one-time config edit to eliminate silent truncation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(rfc): address Copilot review on model-gateway-primitives - Cascade semantics: reconcile "errors only" with "pre-skip unfit steps". Now distinguishes ELIGIBILITY (size-based, checked before attempt) from RETRY TRIGGER (errors: timeout, 5xx, 429). - Fix fabricated mechanism reference: there is no `fallbacks` field on AlloyConfig. Describe the real mechanism (`ordered_models` returned from `select_plan` and iterated in `route_with_fallback`). - CharRatio safety margin: explicitly flag that 10% is insufficient for CJK-heavy prompts; suggest remediations (tune chars_per_token, raise safety_margin, or switch to Tiktoken). - Correct "streaming output doesn't count" — it DOES count against the combined context. Describe the input + max_tokens check approach and the default output budget callers must supply. - Fix K-suffix math: 262K = 262*1024 = 268288, not 262144. Add clarifying example using "256K" → 262144. - Remove stale "context_window=0 = unknown" sentinel note; PR #14 rejects 0 explicitly at alloy validation. - Call out [tokenizer], [model_defaults], [[models]] as PROPOSED schema additions, not current. * docs(rfc): second round of Copilot review — new feedback on updated push - Include `name` field in alloy TOML example (AlloyConfig still requires it). - Reference `AlloyProvider::from_config()` (actual) instead of `AlloyProvider::new()` (non-existent). - Clarify ordered_models determinism: round_robin deterministic; weighted is random sampling without replacement. Cascade, unlike alloy, is always deterministic (declaration order). - Resolve safety_margin double-apply ambiguity: estimator returns a margin-applied count; caller never multiplies again. Fit-check becomes estimate > context_window × capacity_fraction (one multiplication, one comparison). Reworked the worked example accordingly. - Fix `safety_margin 0.15–0.20` typo — the value is a multiplier, so "bump it" means 1.15–1.20, not 0.15 (which would shrink the estimate). --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(rfc): model-gateway primitives — alloy + cascade + dispatcher Proposes three explicitly-scoped primitives for model-gateway routing, each with distinct semantics, and a shared TokenEstimator trait so they can reason about context-window fit consistently. Problem motivating this: alloys today assume constituents are interchangeable, which breaks when mixing models with wildly different context windows (e.g., local Qwen at 32K and Kimi K2.6 at 262K). Using an alloy for size-dependent routing leads to silent truncation. Proposal: - Alloy (today, + safety) blends equivalent models via sampling. New: min_context_window assertion + runtime fit-check rejection. - Cascade (promoted to named primitive) tries members in order on error. New: skip-on-size when a cascade step can't fit the request. - Dispatcher (new) picks by request shape, MVP rule type is max_input_tokens; extensible to other matchers. Shared TokenEstimator trait: - Default: CharRatioEstimator (chars-per-token configurable, 3.5 default) - Safety margin (default 10%) to bias toward over-estimation and avoid silent truncation footgun. - Pluggable for real tokenizers (tiktoken-rs, sentencepiece) as future opt-in crates behind feature flags. - Configurable globally, per-primitive, and per-model with clear precedence. Includes naming discussion (going with "dispatcher"), migration plan (no breaking changes), edge cases (session-context growth, tool-use inflation, streaming output), and open questions for reviewers. Implementation split into focused follow-up PRs. RFC is the long-form design; PRs execute. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(rfc): incorporate first-round review decisions Captures reviewer resolutions: - dispatcher confirmed as the size-routing primitive name - cascade promoted to a named primitive - safety margin split into two distinct knobs: estimator safety_margin (counting-accuracy pad) and per-model capacity_fraction (avoid the quality-degradation zone near a model's ceiling). Composition formula and rationale added. - TiktokenEstimator included in v1 behind a feature flag; SentencePiece deferred to avoid C++ deps + per-model vocab plumbing for now - dispatcher reevaluate default flipped to per_turn (task-completion flows benefit from auto-promotion over consistency); sticky and sticky_escalate documented as opt-ins - dispatcher rule semantics simplified: default is "first target whose effective ceiling fits the request", computed per-target from context_window × capacity_fraction. Explicit rules remain for non-size routing. - alloy context_window required on every constituent. No back-compat for missing fields — prototype phase, owned installations, worth the one-time config edit to eliminate silent truncation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(rfc): address Copilot review on model-gateway-primitives - Cascade semantics: reconcile "errors only" with "pre-skip unfit steps". Now distinguishes ELIGIBILITY (size-based, checked before attempt) from RETRY TRIGGER (errors: timeout, 5xx, 429). - Fix fabricated mechanism reference: there is no `fallbacks` field on AlloyConfig. Describe the real mechanism (`ordered_models` returned from `select_plan` and iterated in `route_with_fallback`). - CharRatio safety margin: explicitly flag that 10% is insufficient for CJK-heavy prompts; suggest remediations (tune chars_per_token, raise safety_margin, or switch to Tiktoken). - Correct "streaming output doesn't count" — it DOES count against the combined context. Describe the input + max_tokens check approach and the default output budget callers must supply. - Fix K-suffix math: 262K = 262*1024 = 268288, not 262144. Add clarifying example using "256K" → 262144. - Remove stale "context_window=0 = unknown" sentinel note; PR #14 rejects 0 explicitly at alloy validation. - Call out [tokenizer], [model_defaults], [[models]] as PROPOSED schema additions, not current. * docs(rfc): second round of Copilot review — new feedback on updated push - Include `name` field in alloy TOML example (AlloyConfig still requires it). - Reference `AlloyProvider::from_config()` (actual) instead of `AlloyProvider::new()` (non-existent). - Clarify ordered_models determinism: round_robin deterministic; weighted is random sampling without replacement. Cascade, unlike alloy, is always deterministic (declaration order). - Resolve safety_margin double-apply ambiguity: estimator returns a margin-applied count; caller never multiplies again. Fit-check becomes estimate > context_window × capacity_fraction (one multiplication, one comparison). Reworked the worked example accordingly. - Fix `safety_margin 0.15–0.20` typo — the value is a multiplier, so "bump it" means 1.15–1.20, not 0.15 (which would shrink the estimate). --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(alloy): require context_window + min_context_window safety Per RFC #13 review decision (no back-compat in prototype phase): every alloy constituent must declare its context_window. Serde enforces the field at config load, so misconfigured alloys fail with a clear parse error instead of quietly accepting "unknown" sizes that could mask routing bugs later. ## Config changes AlloyConstituentConfig.context_window is now a required u32 (was an optional field in the previous draft). Every existing config must be updated to declare sizes; live config on .210 already patched in the matching commit on infra. AlloyConfig.min_context_window remains optional; when unset, it's auto-computed as min(constituent.context_window). When set, validation rejects any constituent whose declared size falls below it with a clear error naming both the offender and the numbers. ## Runtime exposure AlloyProvider::min_context_window() returns u32 (no longer Option), since size declaration is always present. ## Tests 7 tests: round_robin/weighted/stats unchanged; new tests cover auto-compute from mixed-size constituents, shared-size parity, explicit min priority, and explanatory error on constituent-below-floor. The "no declared size" test from the previous draft is deleted — that state is now unreachable. ## Live config migration Deployed alongside this PR: .210's kimi-for-coding alloy gets context_window declared on both constituents (262144 for Kimi, 64000 for DeepSeek V3). No other nodes use [[alloys]] today. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(alloy): rustfmt + reject zero context_window + stale doc - cargo fmt: collapse multi-line array in alloy.rs test (CI fmt gate) - Reject context_window=0 on constituents (clear misconfig error) - Reject min_context_window=0 at alloy build (same) - Drop stale doc reference to docs/rfcs/model-gateway-primitives.md (lives in PR #13, not yet on main) - Simplify min_context_window doc: context_window is required on constituents now, so "treated as unknown" qualifier is dead. Addresses Copilot review feedback on #14. * test(config): assert missing context_window fails to deserialize Guards against silently reintroducing a serde default (Option<u32> or #[serde(default)]) on AlloyConstituentConfig::context_window. Addresses Copilot review feedback on #14. * fix(config): use expect_err per clippy --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Per RFC #13 review decision: every alloy constituent must declare its
context_window. No back-compat — prototype phase, all installations owned in-house. The one-time cost of updating existing configs is worth forever removing the silent-truncation footgun.Config surface
Validation
context_windowat parse time — misconfigured alloys fail with a clear parse error (test:alloy_constituent_missing_context_window_fails_to_parse).context_window = 0andmin_context_window = 0are rejected atAlloyProvider::from_config()with explanatory errors.min_context_windowis set: every constituent must meet it, or error names the offender and the numbers.min(constituent.context_window).AlloyProvider::min_context_window() -> u32— always known.Live config migration
Part of this PR: .210
/etc/zeroclawed/config.tomlalready patched to addcontext_windowto both constituents of thekimi-for-codingalloy (262144 for Kimi, 64000 for DeepSeek V3). No other nodes use[[alloys]]today.Test plan
9 tests covering: weighted/round-robin/stats (pre-existing), auto-compute min from mixed sizes, shared-size parity, explicit min priority, clear error message on below-floor constituent, rejection of 0 on constituent and alloy, TOML deserialization failure on missing
context_window.cargo check --features channel-matrixclean.Not in this PR (follow-ups per RFC #13)
[[cascades]]named primitive[[dispatchers]]new primitivecapacity_fraction