feat(alloy): context_window safety + min_context_window assertion by bglusman · Pull Request #14 · bglusman/calciforge

bglusman · 2026-04-23T15:06:46Z

Summary

Per RFC #13 review decision: every alloy constituent must declare its context_window. No back-compat — prototype phase, all installations owned in-house. The one-time cost of updating existing configs is worth forever removing the silent-truncation footgun.

Config surface

[[alloys]]
id = "fast-smart"
name = "Fast + Smart Blend"     # REQUIRED — AlloyConfig still requires name
strategy = "weighted"
min_context_window = 200000     # optional — defaults to min(constituents.context_window)

[[alloys.constituents]]
model = "gemini-2.5-flash"
context_window = 1048576        # REQUIRED
weight = 80

[[alloys.constituents]]
model = "claude-haiku-4-6"
context_window = 200000         # REQUIRED
weight = 20

Validation

Serde enforces context_window at parse time — misconfigured alloys fail with a clear parse error (test: alloy_constituent_missing_context_window_fails_to_parse).
context_window = 0 and min_context_window = 0 are rejected at AlloyProvider::from_config() with explanatory errors.
If min_context_window is set: every constituent must meet it, or error names the offender and the numbers.
If unset: auto-computed as min(constituent.context_window).
AlloyProvider::min_context_window() -> u32 — always known.

Live config migration

Part of this PR: .210 /etc/zeroclawed/config.toml already patched to add context_window to both constituents of the kimi-for-coding alloy (262144 for Kimi, 64000 for DeepSeek V3). No other nodes use [[alloys]] today.

Test plan

9 tests covering: weighted/round-robin/stats (pre-existing), auto-compute min from mixed sizes, shared-size parity, explicit min priority, clear error message on below-floor constituent, rejection of 0 on constituent and alloy, TOML deserialization failure on missing context_window. cargo check --features channel-matrix clean.

Not in this PR (follow-ups per RFC #13)

TokenEstimator trait + CharRatioEstimator default + TiktokenEstimator (feature-flagged)
Runtime request-fit rejection for oversized inputs
[[cascades]] named primitive
[[dispatchers]] new primitive
Per-model capacity_fraction

Copilot

Pull request overview

Adds context-window metadata and startup validation to alloy configs so misconfigured model mixes (e.g., 32K + 200K) fail fast instead of silently truncating at runtime. This extends the existing alloy provider/config infrastructure with optional fields and exposes the computed minimum for downstream request-fit enforcement.

Changes:

Add optional context_window per alloy constituent and optional min_context_window at the alloy level in config schema.
Compute an effective min context window during AlloyProvider::from_config() and validate declared constituents against an explicit user floor.
Add AlloyProvider::min_context_window() plus unit tests covering the new behaviors.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`crates/zeroclawed/src/providers/alloy.rs`	Computes/validates effective context window minimum at provider build time; exposes accessor; adds tests.
`crates/zeroclawed/src/config.rs`	Extends TOML schema types with `min_context_window` and `context_window` fields and documentation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        // Context-window safety: if the user specified min_context_window, every
+        // constituent with a declared size must meet it. Otherwise auto-compute
+        // the effective ceiling as the min of declared sizes.
+        let declared_min = constituents
+            .iter()
+            .filter_map(|c| c.context_window)
+            .min();
+        let effective_min = match (cfg.min_context_window, declared_min) {


+    /// Effective minimum context window (tokens) this alloy can safely serve.
+    ///
+    /// `None` means no constituent declared a size — caller should treat as
+    /// unknown and trust the upstream model to error on over-long inputs.


+    /// `context_window` are treated as unknown and do not participate in
+    /// the minimum calculation.
+    ///
+    /// See `docs/rfcs/model-gateway-primitives.md`.


Per RFC #13 review decision (no back-compat in prototype phase): every alloy constituent must declare its context_window. Serde enforces the field at config load, so misconfigured alloys fail with a clear parse error instead of quietly accepting "unknown" sizes that could mask routing bugs later. ## Config changes AlloyConstituentConfig.context_window is now a required u32 (was an optional field in the previous draft). Every existing config must be updated to declare sizes; live config on .210 already patched in the matching commit on infra. AlloyConfig.min_context_window remains optional; when unset, it's auto-computed as min(constituent.context_window). When set, validation rejects any constituent whose declared size falls below it with a clear error naming both the offender and the numbers. ## Runtime exposure AlloyProvider::min_context_window() returns u32 (no longer Option), since size declaration is always present. ## Tests 7 tests: round_robin/weighted/stats unchanged; new tests cover auto-compute from mixed-size constituents, shared-size parity, explicit min priority, and explanatory error on constituent-below-floor. The "no declared size" test from the previous draft is deleted — that state is now unreachable. ## Live config migration Deployed alongside this PR: .210's kimi-for-coding alloy gets context_window declared on both constituents (262144 for Kimi, 64000 for DeepSeek V3). No other nodes use [[alloys]] today. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- cargo fmt: collapse multi-line array in alloy.rs test (CI fmt gate) - Reject context_window=0 on constituents (clear misconfig error) - Reject min_context_window=0 at alloy build (same) - Drop stale doc reference to docs/rfcs/model-gateway-primitives.md (lives in PR #13, not yet on main) - Simplify min_context_window doc: context_window is required on constituents now, so "treated as unknown" qualifier is dead. Addresses Copilot review feedback on #14.

Copilot

Pull request overview

This PR hardens the “alloy” model-blending primitive by requiring each constituent to declare a context_window and by computing/enforcing an effective minimum context window for the alloy, eliminating silent truncation risks from mixing mismatched models.

Changes:

Add required context_window to alloy constituents at the config/schema level and validate it at provider build time.
Introduce min_context_window (optional) to enforce a floor across constituents or auto-compute from the smallest constituent.
Expose AlloyProvider::min_context_window() and add unit tests covering min computation and validation errors.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
crates/zeroclawed/src/providers/alloy.rs	Stores validated per-constituent `context_window`, computes/enforces effective min context window, and adds `min_context_window()` plus tests.
crates/zeroclawed/src/config.rs	Extends config schema with `min_context_window` and requires `context_window` on each alloy constituent.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    /// Minimum context window (tokens) required from every constituent.
+    ///
+    /// If set: alloy build fails when any constituent with a declared
+    /// `context_window` is smaller than this. Catches "I didn't mean to mix a
+    /// 32K local model into a 200K-context alloy" at config-load time.
+    ///
+    /// If unset: auto-computed as `min(constituent.context_window)` over
+    /// all constituents.
+    #[serde(default)]
+    pub min_context_window: Option<u32>,


+    /// Declared context window (tokens) for this constituent.
+    ///
+    /// **Required.** No sensible default — declared per-constituent so the
+    /// alloy can compute and enforce a safe effective ceiling. Without this,
+    /// oversized requests could be silently routed to a constituent that
+    /// can't hold them. See `docs/rfcs/model-gateway-primitives.md`.
+    pub context_window: u32,


- Cascade semantics: reconcile "errors only" with "pre-skip unfit steps". Now distinguishes ELIGIBILITY (size-based, checked before attempt) from RETRY TRIGGER (errors: timeout, 5xx, 429). - Fix fabricated mechanism reference: there is no `fallbacks` field on AlloyConfig. Describe the real mechanism (`ordered_models` returned from `select_plan` and iterated in `route_with_fallback`). - CharRatio safety margin: explicitly flag that 10% is insufficient for CJK-heavy prompts; suggest remediations (tune chars_per_token, raise safety_margin, or switch to Tiktoken). - Correct "streaming output doesn't count" — it DOES count against the combined context. Describe the input + max_tokens check approach and the default output budget callers must supply. - Fix K-suffix math: 262K = 262*1024 = 268288, not 262144. Add clarifying example using "256K" → 262144. - Remove stale "context_window=0 = unknown" sentinel note; PR #14 rejects 0 explicitly at alloy validation. - Call out [tokenizer], [model_defaults], [[models]] as PROPOSED schema additions, not current.

Guards against silently reintroducing a serde default (Option<u32> or #[serde(default)]) on AlloyConstituentConfig::context_window. Addresses Copilot review feedback on #14.

Copilot

Pull request overview

This PR hardens the alloy configuration and provider initialization by requiring each alloy constituent to declare an explicit context_window, and by introducing an optional min_context_window that is validated at startup to prevent unsafe blends that could otherwise lead to silent truncation.

Changes:

Add required context_window: u32 to AlloyConstituentConfig (serde-enforced at TOML parse time) and propagate it into the runtime AlloyConstituent.
Add optional min_context_window: Option<u32> to AlloyConfig and compute/validate an effective minimum in AlloyProvider::from_config().
Add tests covering: missing context_window parse failure, zero-value rejection, auto-computed effective minimum, and explicit minimum enforcement.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
crates/zeroclawed/src/providers/alloy.rs	Adds context window fields, computes/validates `min_context_window`, exposes `min_context_window()` accessor, and expands provider tests.
crates/zeroclawed/src/config.rs	Extends alloy schema with required `context_window` and optional `min_context_window`, plus a deserialization test to ensure `context_window` remains required.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

* docs(rfc): model-gateway primitives — alloy + cascade + dispatcher Proposes three explicitly-scoped primitives for model-gateway routing, each with distinct semantics, and a shared TokenEstimator trait so they can reason about context-window fit consistently. Problem motivating this: alloys today assume constituents are interchangeable, which breaks when mixing models with wildly different context windows (e.g., local Qwen at 32K and Kimi K2.6 at 262K). Using an alloy for size-dependent routing leads to silent truncation. Proposal: - Alloy (today, + safety) blends equivalent models via sampling. New: min_context_window assertion + runtime fit-check rejection. - Cascade (promoted to named primitive) tries members in order on error. New: skip-on-size when a cascade step can't fit the request. - Dispatcher (new) picks by request shape, MVP rule type is max_input_tokens; extensible to other matchers. Shared TokenEstimator trait: - Default: CharRatioEstimator (chars-per-token configurable, 3.5 default) - Safety margin (default 10%) to bias toward over-estimation and avoid silent truncation footgun. - Pluggable for real tokenizers (tiktoken-rs, sentencepiece) as future opt-in crates behind feature flags. - Configurable globally, per-primitive, and per-model with clear precedence. Includes naming discussion (going with "dispatcher"), migration plan (no breaking changes), edge cases (session-context growth, tool-use inflation, streaming output), and open questions for reviewers. Implementation split into focused follow-up PRs. RFC is the long-form design; PRs execute. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(rfc): incorporate first-round review decisions Captures reviewer resolutions: - dispatcher confirmed as the size-routing primitive name - cascade promoted to a named primitive - safety margin split into two distinct knobs: estimator safety_margin (counting-accuracy pad) and per-model capacity_fraction (avoid the quality-degradation zone near a model's ceiling). Composition formula and rationale added. - TiktokenEstimator included in v1 behind a feature flag; SentencePiece deferred to avoid C++ deps + per-model vocab plumbing for now - dispatcher reevaluate default flipped to per_turn (task-completion flows benefit from auto-promotion over consistency); sticky and sticky_escalate documented as opt-ins - dispatcher rule semantics simplified: default is "first target whose effective ceiling fits the request", computed per-target from context_window × capacity_fraction. Explicit rules remain for non-size routing. - alloy context_window required on every constituent. No back-compat for missing fields — prototype phase, owned installations, worth the one-time config edit to eliminate silent truncation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(rfc): address Copilot review on model-gateway-primitives - Cascade semantics: reconcile "errors only" with "pre-skip unfit steps". Now distinguishes ELIGIBILITY (size-based, checked before attempt) from RETRY TRIGGER (errors: timeout, 5xx, 429). - Fix fabricated mechanism reference: there is no `fallbacks` field on AlloyConfig. Describe the real mechanism (`ordered_models` returned from `select_plan` and iterated in `route_with_fallback`). - CharRatio safety margin: explicitly flag that 10% is insufficient for CJK-heavy prompts; suggest remediations (tune chars_per_token, raise safety_margin, or switch to Tiktoken). - Correct "streaming output doesn't count" — it DOES count against the combined context. Describe the input + max_tokens check approach and the default output budget callers must supply. - Fix K-suffix math: 262K = 262*1024 = 268288, not 262144. Add clarifying example using "256K" → 262144. - Remove stale "context_window=0 = unknown" sentinel note; PR #14 rejects 0 explicitly at alloy validation. - Call out [tokenizer], [model_defaults], [[models]] as PROPOSED schema additions, not current. * docs(rfc): second round of Copilot review — new feedback on updated push - Include `name` field in alloy TOML example (AlloyConfig still requires it). - Reference `AlloyProvider::from_config()` (actual) instead of `AlloyProvider::new()` (non-existent). - Clarify ordered_models determinism: round_robin deterministic; weighted is random sampling without replacement. Cascade, unlike alloy, is always deterministic (declaration order). - Resolve safety_margin double-apply ambiguity: estimator returns a margin-applied count; caller never multiplies again. Fit-check becomes estimate > context_window × capacity_fraction (one multiplication, one comparison). Reworked the worked example accordingly. - Fix `safety_margin 0.15–0.20` typo — the value is a multiplier, so "bump it" means 1.15–1.20, not 0.15 (which would shrink the estimate). --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(alloy): require context_window + min_context_window safety Per RFC #13 review decision (no back-compat in prototype phase): every alloy constituent must declare its context_window. Serde enforces the field at config load, so misconfigured alloys fail with a clear parse error instead of quietly accepting "unknown" sizes that could mask routing bugs later. ## Config changes AlloyConstituentConfig.context_window is now a required u32 (was an optional field in the previous draft). Every existing config must be updated to declare sizes; live config on .210 already patched in the matching commit on infra. AlloyConfig.min_context_window remains optional; when unset, it's auto-computed as min(constituent.context_window). When set, validation rejects any constituent whose declared size falls below it with a clear error naming both the offender and the numbers. ## Runtime exposure AlloyProvider::min_context_window() returns u32 (no longer Option), since size declaration is always present. ## Tests 7 tests: round_robin/weighted/stats unchanged; new tests cover auto-compute from mixed-size constituents, shared-size parity, explicit min priority, and explanatory error on constituent-below-floor. The "no declared size" test from the previous draft is deleted — that state is now unreachable. ## Live config migration Deployed alongside this PR: .210's kimi-for-coding alloy gets context_window declared on both constituents (262144 for Kimi, 64000 for DeepSeek V3). No other nodes use [[alloys]] today. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(alloy): rustfmt + reject zero context_window + stale doc - cargo fmt: collapse multi-line array in alloy.rs test (CI fmt gate) - Reject context_window=0 on constituents (clear misconfig error) - Reject min_context_window=0 at alloy build (same) - Drop stale doc reference to docs/rfcs/model-gateway-primitives.md (lives in PR #13, not yet on main) - Simplify min_context_window doc: context_window is required on constituents now, so "treated as unknown" qualifier is dead. Addresses Copilot review feedback on #14. * test(config): assert missing context_window fails to deserialize Guards against silently reintroducing a serde default (Option<u32> or #[serde(default)]) on AlloyConstituentConfig::context_window. Addresses Copilot review feedback on #14. * fix(config): use expect_err per clippy --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings April 23, 2026 15:06

Copilot started reviewing on behalf of bglusman April 23, 2026 15:09 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

bglusman force-pushed the feat/alloy-context-window-safety branch from 31b2c6a to 53a1ba5 Compare April 23, 2026 15:46

Copilot AI review requested due to automatic review settings April 23, 2026 18:15

Copilot started reviewing on behalf of bglusman April 23, 2026 18:16 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

bglusman added 2 commits April 23, 2026 14:38

test(config): assert missing context_window fails to deserialize

1960646

Guards against silently reintroducing a serde default (Option<u32> or #[serde(default)]) on AlloyConstituentConfig::context_window. Addresses Copilot review feedback on #14.

fix(config): use expect_err per clippy

764ef4e

Copilot AI review requested due to automatic review settings April 23, 2026 18:54

Copilot started reviewing on behalf of bglusman April 23, 2026 18:54 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

bglusman marked this pull request as ready for review April 23, 2026 21:54

bglusman merged commit 00437fd into main Apr 23, 2026
21 checks passed

bglusman deleted the feat/alloy-context-window-safety branch May 1, 2026 17:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(alloy): context_window safety + min_context_window assertion#14

feat(alloy): context_window safety + min_context_window assertion#14
bglusman merged 4 commits intomainfrom
feat/alloy-context-window-safety

bglusman commented Apr 23, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bglusman commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Config surface

Validation

Live config migration

Test plan

Not in this PR (follow-ups per RFC #13)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bglusman commented Apr 23, 2026 •

edited

Loading