Skip to content

feat(alloy): context_window safety + min_context_window assertion#14

Merged
bglusman merged 4 commits intomainfrom
feat/alloy-context-window-safety
Apr 23, 2026
Merged

feat(alloy): context_window safety + min_context_window assertion#14
bglusman merged 4 commits intomainfrom
feat/alloy-context-window-safety

Conversation

@bglusman
Copy link
Copy Markdown
Owner

@bglusman bglusman commented Apr 23, 2026

Summary

Per RFC #13 review decision: every alloy constituent must declare its context_window. No back-compat — prototype phase, all installations owned in-house. The one-time cost of updating existing configs is worth forever removing the silent-truncation footgun.

Config surface

[[alloys]]
id = "fast-smart"
name = "Fast + Smart Blend"     # REQUIRED — AlloyConfig still requires name
strategy = "weighted"
min_context_window = 200000     # optional — defaults to min(constituents.context_window)

[[alloys.constituents]]
model = "gemini-2.5-flash"
context_window = 1048576        # REQUIRED
weight = 80

[[alloys.constituents]]
model = "claude-haiku-4-6"
context_window = 200000         # REQUIRED
weight = 20

Validation

  • Serde enforces context_window at parse time — misconfigured alloys fail with a clear parse error (test: alloy_constituent_missing_context_window_fails_to_parse).
  • context_window = 0 and min_context_window = 0 are rejected at AlloyProvider::from_config() with explanatory errors.
  • If min_context_window is set: every constituent must meet it, or error names the offender and the numbers.
  • If unset: auto-computed as min(constituent.context_window).
  • AlloyProvider::min_context_window() -> u32 — always known.

Live config migration

Part of this PR: .210 /etc/zeroclawed/config.toml already patched to add context_window to both constituents of the kimi-for-coding alloy (262144 for Kimi, 64000 for DeepSeek V3). No other nodes use [[alloys]] today.

Test plan

9 tests covering: weighted/round-robin/stats (pre-existing), auto-compute min from mixed sizes, shared-size parity, explicit min priority, clear error message on below-floor constituent, rejection of 0 on constituent and alloy, TOML deserialization failure on missing context_window. cargo check --features channel-matrix clean.

Not in this PR (follow-ups per RFC #13)

  • TokenEstimator trait + CharRatioEstimator default + TiktokenEstimator (feature-flagged)
  • Runtime request-fit rejection for oversized inputs
  • [[cascades]] named primitive
  • [[dispatchers]] new primitive
  • Per-model capacity_fraction

Copilot AI review requested due to automatic review settings April 23, 2026 15:06
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds context-window metadata and startup validation to alloy configs so misconfigured model mixes (e.g., 32K + 200K) fail fast instead of silently truncating at runtime. This extends the existing alloy provider/config infrastructure with optional fields and exposes the computed minimum for downstream request-fit enforcement.

Changes:

  • Add optional context_window per alloy constituent and optional min_context_window at the alloy level in config schema.
  • Compute an effective min context window during AlloyProvider::from_config() and validate declared constituents against an explicit user floor.
  • Add AlloyProvider::min_context_window() plus unit tests covering the new behaviors.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
crates/zeroclawed/src/providers/alloy.rs Computes/validates effective context window minimum at provider build time; exposes accessor; adds tests.
crates/zeroclawed/src/config.rs Extends TOML schema types with min_context_window and context_window fields and documentation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +136 to +143
// Context-window safety: if the user specified min_context_window, every
// constituent with a declared size must meet it. Otherwise auto-compute
// the effective ceiling as the min of declared sizes.
let declared_min = constituents
.iter()
.filter_map(|c| c.context_window)
.min();
let effective_min = match (cfg.min_context_window, declared_min) {
Comment on lines +298 to +301
/// Effective minimum context window (tokens) this alloy can safely serve.
///
/// `None` means no constituent declared a size — caller should treat as
/// unknown and trust the upstream model to error on over-long inputs.
Comment thread crates/zeroclawed/src/config.rs Outdated
/// `context_window` are treated as unknown and do not participate in
/// the minimum calculation.
///
/// See `docs/rfcs/model-gateway-primitives.md`.
Per RFC #13 review decision (no back-compat in prototype phase): every
alloy constituent must declare its context_window. Serde enforces the
field at config load, so misconfigured alloys fail with a clear parse
error instead of quietly accepting "unknown" sizes that could mask
routing bugs later.

## Config changes

AlloyConstituentConfig.context_window is now a required u32 (was an
optional field in the previous draft). Every existing config must be
updated to declare sizes; live config on .210 already patched in the
matching commit on infra.

AlloyConfig.min_context_window remains optional; when unset, it's
auto-computed as min(constituent.context_window). When set, validation
rejects any constituent whose declared size falls below it with a
clear error naming both the offender and the numbers.

## Runtime exposure

AlloyProvider::min_context_window() returns u32 (no longer Option),
since size declaration is always present.

## Tests

7 tests: round_robin/weighted/stats unchanged; new tests cover
auto-compute from mixed-size constituents, shared-size parity, explicit
min priority, and explanatory error on constituent-below-floor. The
"no declared size" test from the previous draft is deleted — that
state is now unreachable.

## Live config migration

Deployed alongside this PR: .210's kimi-for-coding alloy gets
context_window declared on both constituents (262144 for Kimi,
64000 for DeepSeek V3). No other nodes use [[alloys]] today.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bglusman bglusman force-pushed the feat/alloy-context-window-safety branch from 31b2c6a to 53a1ba5 Compare April 23, 2026 15:46
- cargo fmt: collapse multi-line array in alloy.rs test (CI fmt gate)
- Reject context_window=0 on constituents (clear misconfig error)
- Reject min_context_window=0 at alloy build (same)
- Drop stale doc reference to docs/rfcs/model-gateway-primitives.md
  (lives in PR #13, not yet on main)
- Simplify min_context_window doc: context_window is required on
  constituents now, so "treated as unknown" qualifier is dead.

Addresses Copilot review feedback on #14.
Copilot AI review requested due to automatic review settings April 23, 2026 18:15
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the “alloy” model-blending primitive by requiring each constituent to declare a context_window and by computing/enforcing an effective minimum context window for the alloy, eliminating silent truncation risks from mixing mismatched models.

Changes:

  • Add required context_window to alloy constituents at the config/schema level and validate it at provider build time.
  • Introduce min_context_window (optional) to enforce a floor across constituents or auto-compute from the smallest constituent.
  • Expose AlloyProvider::min_context_window() and add unit tests covering min computation and validation errors.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
crates/zeroclawed/src/providers/alloy.rs Stores validated per-constituent context_window, computes/enforces effective min context window, and adds min_context_window() plus tests.
crates/zeroclawed/src/config.rs Extends config schema with min_context_window and requires context_window on each alloy constituent.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +94 to +103
/// Minimum context window (tokens) required from every constituent.
///
/// If set: alloy build fails when any constituent with a declared
/// `context_window` is smaller than this. Catches "I didn't mean to mix a
/// 32K local model into a 200K-context alloy" at config-load time.
///
/// If unset: auto-computed as `min(constituent.context_window)` over
/// all constituents.
#[serde(default)]
pub min_context_window: Option<u32>,
Comment on lines +112 to +118
/// Declared context window (tokens) for this constituent.
///
/// **Required.** No sensible default — declared per-constituent so the
/// alloy can compute and enforce a safe effective ceiling. Without this,
/// oversized requests could be silently routed to a constituent that
/// can't hold them. See `docs/rfcs/model-gateway-primitives.md`.
pub context_window: u32,
bglusman added a commit that referenced this pull request Apr 23, 2026
- Cascade semantics: reconcile "errors only" with "pre-skip unfit steps".
  Now distinguishes ELIGIBILITY (size-based, checked before attempt) from
  RETRY TRIGGER (errors: timeout, 5xx, 429).
- Fix fabricated mechanism reference: there is no `fallbacks` field on
  AlloyConfig. Describe the real mechanism (`ordered_models` returned
  from `select_plan` and iterated in `route_with_fallback`).
- CharRatio safety margin: explicitly flag that 10% is insufficient for
  CJK-heavy prompts; suggest remediations (tune chars_per_token, raise
  safety_margin, or switch to Tiktoken).
- Correct "streaming output doesn't count" — it DOES count against the
  combined context. Describe the input + max_tokens check approach and
  the default output budget callers must supply.
- Fix K-suffix math: 262K = 262*1024 = 268288, not 262144. Add
  clarifying example using "256K" → 262144.
- Remove stale "context_window=0 = unknown" sentinel note; PR #14
  rejects 0 explicitly at alloy validation.
- Call out [tokenizer], [model_defaults], [[models]] as PROPOSED
  schema additions, not current.
Guards against silently reintroducing a serde default (Option<u32> or
#[serde(default)]) on AlloyConstituentConfig::context_window. Addresses
Copilot review feedback on #14.
Copilot AI review requested due to automatic review settings April 23, 2026 18:54
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the alloy configuration and provider initialization by requiring each alloy constituent to declare an explicit context_window, and by introducing an optional min_context_window that is validated at startup to prevent unsafe blends that could otherwise lead to silent truncation.

Changes:

  • Add required context_window: u32 to AlloyConstituentConfig (serde-enforced at TOML parse time) and propagate it into the runtime AlloyConstituent.
  • Add optional min_context_window: Option<u32> to AlloyConfig and compute/validate an effective minimum in AlloyProvider::from_config().
  • Add tests covering: missing context_window parse failure, zero-value rejection, auto-computed effective minimum, and explicit minimum enforcement.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
crates/zeroclawed/src/providers/alloy.rs Adds context window fields, computes/validates min_context_window, exposes min_context_window() accessor, and expands provider tests.
crates/zeroclawed/src/config.rs Extends alloy schema with required context_window and optional min_context_window, plus a deserialization test to ensure context_window remains required.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

bglusman added a commit that referenced this pull request Apr 23, 2026
* docs(rfc): model-gateway primitives — alloy + cascade + dispatcher

Proposes three explicitly-scoped primitives for model-gateway routing,
each with distinct semantics, and a shared TokenEstimator trait so they
can reason about context-window fit consistently.

Problem motivating this: alloys today assume constituents are
interchangeable, which breaks when mixing models with wildly different
context windows (e.g., local Qwen at 32K and Kimi K2.6 at 262K). Using
an alloy for size-dependent routing leads to silent truncation.

Proposal:
- Alloy (today, + safety) blends equivalent models via sampling. New:
  min_context_window assertion + runtime fit-check rejection.
- Cascade (promoted to named primitive) tries members in order on error.
  New: skip-on-size when a cascade step can't fit the request.
- Dispatcher (new) picks by request shape, MVP rule type is
  max_input_tokens; extensible to other matchers.

Shared TokenEstimator trait:
- Default: CharRatioEstimator (chars-per-token configurable, 3.5 default)
- Safety margin (default 10%) to bias toward over-estimation and avoid
  silent truncation footgun.
- Pluggable for real tokenizers (tiktoken-rs, sentencepiece) as future
  opt-in crates behind feature flags.
- Configurable globally, per-primitive, and per-model with clear
  precedence.

Includes naming discussion (going with "dispatcher"), migration plan
(no breaking changes), edge cases (session-context growth, tool-use
inflation, streaming output), and open questions for reviewers.

Implementation split into focused follow-up PRs. RFC is the long-form
design; PRs execute.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(rfc): incorporate first-round review decisions

Captures reviewer resolutions:
- dispatcher confirmed as the size-routing primitive name
- cascade promoted to a named primitive
- safety margin split into two distinct knobs: estimator safety_margin
  (counting-accuracy pad) and per-model capacity_fraction (avoid the
  quality-degradation zone near a model's ceiling). Composition formula
  and rationale added.
- TiktokenEstimator included in v1 behind a feature flag; SentencePiece
  deferred to avoid C++ deps + per-model vocab plumbing for now
- dispatcher reevaluate default flipped to per_turn (task-completion
  flows benefit from auto-promotion over consistency); sticky and
  sticky_escalate documented as opt-ins
- dispatcher rule semantics simplified: default is "first target whose
  effective ceiling fits the request", computed per-target from
  context_window × capacity_fraction. Explicit rules remain for
  non-size routing.
- alloy context_window required on every constituent. No back-compat
  for missing fields — prototype phase, owned installations, worth the
  one-time config edit to eliminate silent truncation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(rfc): address Copilot review on model-gateway-primitives

- Cascade semantics: reconcile "errors only" with "pre-skip unfit steps".
  Now distinguishes ELIGIBILITY (size-based, checked before attempt) from
  RETRY TRIGGER (errors: timeout, 5xx, 429).
- Fix fabricated mechanism reference: there is no `fallbacks` field on
  AlloyConfig. Describe the real mechanism (`ordered_models` returned
  from `select_plan` and iterated in `route_with_fallback`).
- CharRatio safety margin: explicitly flag that 10% is insufficient for
  CJK-heavy prompts; suggest remediations (tune chars_per_token, raise
  safety_margin, or switch to Tiktoken).
- Correct "streaming output doesn't count" — it DOES count against the
  combined context. Describe the input + max_tokens check approach and
  the default output budget callers must supply.
- Fix K-suffix math: 262K = 262*1024 = 268288, not 262144. Add
  clarifying example using "256K" → 262144.
- Remove stale "context_window=0 = unknown" sentinel note; PR #14
  rejects 0 explicitly at alloy validation.
- Call out [tokenizer], [model_defaults], [[models]] as PROPOSED
  schema additions, not current.

* docs(rfc): second round of Copilot review — new feedback on updated push

- Include `name` field in alloy TOML example (AlloyConfig still requires it).
- Reference `AlloyProvider::from_config()` (actual) instead of
  `AlloyProvider::new()` (non-existent).
- Clarify ordered_models determinism: round_robin deterministic; weighted
  is random sampling without replacement. Cascade, unlike alloy, is
  always deterministic (declaration order).
- Resolve safety_margin double-apply ambiguity: estimator returns a
  margin-applied count; caller never multiplies again. Fit-check becomes
  estimate > context_window × capacity_fraction (one multiplication, one
  comparison). Reworked the worked example accordingly.
- Fix `safety_margin 0.15–0.20` typo — the value is a multiplier, so
  "bump it" means 1.15–1.20, not 0.15 (which would shrink the estimate).

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bglusman bglusman marked this pull request as ready for review April 23, 2026 21:54
@bglusman bglusman merged commit 00437fd into main Apr 23, 2026
21 checks passed
bglusman added a commit that referenced this pull request Apr 25, 2026
* docs(rfc): model-gateway primitives — alloy + cascade + dispatcher

Proposes three explicitly-scoped primitives for model-gateway routing,
each with distinct semantics, and a shared TokenEstimator trait so they
can reason about context-window fit consistently.

Problem motivating this: alloys today assume constituents are
interchangeable, which breaks when mixing models with wildly different
context windows (e.g., local Qwen at 32K and Kimi K2.6 at 262K). Using
an alloy for size-dependent routing leads to silent truncation.

Proposal:
- Alloy (today, + safety) blends equivalent models via sampling. New:
  min_context_window assertion + runtime fit-check rejection.
- Cascade (promoted to named primitive) tries members in order on error.
  New: skip-on-size when a cascade step can't fit the request.
- Dispatcher (new) picks by request shape, MVP rule type is
  max_input_tokens; extensible to other matchers.

Shared TokenEstimator trait:
- Default: CharRatioEstimator (chars-per-token configurable, 3.5 default)
- Safety margin (default 10%) to bias toward over-estimation and avoid
  silent truncation footgun.
- Pluggable for real tokenizers (tiktoken-rs, sentencepiece) as future
  opt-in crates behind feature flags.
- Configurable globally, per-primitive, and per-model with clear
  precedence.

Includes naming discussion (going with "dispatcher"), migration plan
(no breaking changes), edge cases (session-context growth, tool-use
inflation, streaming output), and open questions for reviewers.

Implementation split into focused follow-up PRs. RFC is the long-form
design; PRs execute.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(rfc): incorporate first-round review decisions

Captures reviewer resolutions:
- dispatcher confirmed as the size-routing primitive name
- cascade promoted to a named primitive
- safety margin split into two distinct knobs: estimator safety_margin
  (counting-accuracy pad) and per-model capacity_fraction (avoid the
  quality-degradation zone near a model's ceiling). Composition formula
  and rationale added.
- TiktokenEstimator included in v1 behind a feature flag; SentencePiece
  deferred to avoid C++ deps + per-model vocab plumbing for now
- dispatcher reevaluate default flipped to per_turn (task-completion
  flows benefit from auto-promotion over consistency); sticky and
  sticky_escalate documented as opt-ins
- dispatcher rule semantics simplified: default is "first target whose
  effective ceiling fits the request", computed per-target from
  context_window × capacity_fraction. Explicit rules remain for
  non-size routing.
- alloy context_window required on every constituent. No back-compat
  for missing fields — prototype phase, owned installations, worth the
  one-time config edit to eliminate silent truncation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(rfc): address Copilot review on model-gateway-primitives

- Cascade semantics: reconcile "errors only" with "pre-skip unfit steps".
  Now distinguishes ELIGIBILITY (size-based, checked before attempt) from
  RETRY TRIGGER (errors: timeout, 5xx, 429).
- Fix fabricated mechanism reference: there is no `fallbacks` field on
  AlloyConfig. Describe the real mechanism (`ordered_models` returned
  from `select_plan` and iterated in `route_with_fallback`).
- CharRatio safety margin: explicitly flag that 10% is insufficient for
  CJK-heavy prompts; suggest remediations (tune chars_per_token, raise
  safety_margin, or switch to Tiktoken).
- Correct "streaming output doesn't count" — it DOES count against the
  combined context. Describe the input + max_tokens check approach and
  the default output budget callers must supply.
- Fix K-suffix math: 262K = 262*1024 = 268288, not 262144. Add
  clarifying example using "256K" → 262144.
- Remove stale "context_window=0 = unknown" sentinel note; PR #14
  rejects 0 explicitly at alloy validation.
- Call out [tokenizer], [model_defaults], [[models]] as PROPOSED
  schema additions, not current.

* docs(rfc): second round of Copilot review — new feedback on updated push

- Include `name` field in alloy TOML example (AlloyConfig still requires it).
- Reference `AlloyProvider::from_config()` (actual) instead of
  `AlloyProvider::new()` (non-existent).
- Clarify ordered_models determinism: round_robin deterministic; weighted
  is random sampling without replacement. Cascade, unlike alloy, is
  always deterministic (declaration order).
- Resolve safety_margin double-apply ambiguity: estimator returns a
  margin-applied count; caller never multiplies again. Fit-check becomes
  estimate > context_window × capacity_fraction (one multiplication, one
  comparison). Reworked the worked example accordingly.
- Fix `safety_margin 0.15–0.20` typo — the value is a multiplier, so
  "bump it" means 1.15–1.20, not 0.15 (which would shrink the estimate).

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bglusman added a commit that referenced this pull request Apr 25, 2026
* feat(alloy): require context_window + min_context_window safety

Per RFC #13 review decision (no back-compat in prototype phase): every
alloy constituent must declare its context_window. Serde enforces the
field at config load, so misconfigured alloys fail with a clear parse
error instead of quietly accepting "unknown" sizes that could mask
routing bugs later.

## Config changes

AlloyConstituentConfig.context_window is now a required u32 (was an
optional field in the previous draft). Every existing config must be
updated to declare sizes; live config on .210 already patched in the
matching commit on infra.

AlloyConfig.min_context_window remains optional; when unset, it's
auto-computed as min(constituent.context_window). When set, validation
rejects any constituent whose declared size falls below it with a
clear error naming both the offender and the numbers.

## Runtime exposure

AlloyProvider::min_context_window() returns u32 (no longer Option),
since size declaration is always present.

## Tests

7 tests: round_robin/weighted/stats unchanged; new tests cover
auto-compute from mixed-size constituents, shared-size parity, explicit
min priority, and explanatory error on constituent-below-floor. The
"no declared size" test from the previous draft is deleted — that
state is now unreachable.

## Live config migration

Deployed alongside this PR: .210's kimi-for-coding alloy gets
context_window declared on both constituents (262144 for Kimi,
64000 for DeepSeek V3). No other nodes use [[alloys]] today.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(alloy): rustfmt + reject zero context_window + stale doc

- cargo fmt: collapse multi-line array in alloy.rs test (CI fmt gate)
- Reject context_window=0 on constituents (clear misconfig error)
- Reject min_context_window=0 at alloy build (same)
- Drop stale doc reference to docs/rfcs/model-gateway-primitives.md
  (lives in PR #13, not yet on main)
- Simplify min_context_window doc: context_window is required on
  constituents now, so "treated as unknown" qualifier is dead.

Addresses Copilot review feedback on #14.

* test(config): assert missing context_window fails to deserialize

Guards against silently reintroducing a serde default (Option<u32> or
#[serde(default)]) on AlloyConstituentConfig::context_window. Addresses
Copilot review feedback on #14.

* fix(config): use expect_err per clippy

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bglusman bglusman deleted the feat/alloy-context-window-safety branch May 1, 2026 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants