fix: use context-aware reserveTokensFloor in overflow recovery hint#84399
Conversation
|
Codex review: needs maintainer review before merge. Reviewed May 25, 2026, 7:16 PM ET / 23:16 UTC. Summary PR surface: Source +106, Tests +507. Total +613 across 2 files. Reproducibility: yes. at source level: current main and Review metrics: none identified. Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Risk before merge
Maintainer options:
Next step before merge Security Review detailsBest possible solution: Land this focused PR after required CI is green so the linked issue can close through the PR's closing reference. Do we have a high-confidence way to reproduce the issue? Yes at source level: current main and Is this the best way to solve the issue? Yes; the PR keeps the existing config key and derives the hint from the effective runtime context window, including existing agent context caps and session/runtime fallback behavior, rather than adding a new config surface. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 65a210553b6c. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +106, Tests +507. Total +613 across 2 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
ClawSweeper PR egg ✨ Hatched: 🥚 common Neon Crabkin Hatch commandComment Hatchability rules:
Rarity: 🥚 common. What is this egg doing here?
|
This comment was marked as low quality.
This comment was marked as low quality.
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
This comment was marked as low quality.
This comment was marked as low quality.
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
e76b8cb to
ea02a96
Compare
Replace hardcoded 20000 reserveTokensFloor suggestion with dynamic tier-based values based on the model context window size: - 100000 for context windows >= 1M tokens - 50000 for context windows >= 200k tokens - 35000 for context windows >= 100k tokens - 20000 for smaller/undefined context windows Uses session contextTokens as fallback when model metadata is unavailable, with model metadata taking precedence over stale session entries. Fixes openclaw#65839
When agents.defaults.contextTokens or per-agent contextTokens caps the effective context window below the model's full window, the hint should reflect the capped value, not the raw model window. Uses Math.min(agentCap, modelWindow) to compute the effective context, matching resolveContextTokens() in model-selection.ts.
When the primary model's context window cannot be resolved from model metadata, fall back to the session entry's modelProvider/model to look up context window before using session contextTokens numeric.
When a large primary model falls back to a smaller runtime model that overflows, the recovery hint should recommend a reserveTokensFloor based on the actual runtime model's context window, not the primary model's larger window. Thread runtimeProvider/runtimeModel into buildContextOverflowRecoveryText and resolveContextWindowForCompactionHint, with priority: 1. Runtime/fallback model (the one that actually overflowed) 2. Session entry model (when runtime not provided) 3. Primary model (fallback when runtime unknown) 4. Agent contextTokens cap applied via Math.min 5. Session contextTokens numeric value
ea02a96 to
e25b3e8
Compare
…4399) Use the effective runtime/model context when computing overflow recovery reserveTokensFloor hints, including uncataloged runtime refs, stale session windows, and heartbeat fallback cases. Verification: - pnpm test src/auto-reply/reply/agent-runner-execution.test.ts - autoreview clean on final focused fixup; prior accepted findings addressed before push. - CI passed on head e25b3e8 after rerunning cancelled jobs: preflight, critical quality network-runtime-boundary, security high, checks, Real behavior proof. Co-authored-by: tanshanshan <tanshanshan@users.noreply.github.com>
…4399) Use the effective runtime/model context when computing overflow recovery reserveTokensFloor hints, including uncataloged runtime refs, stale session windows, and heartbeat fallback cases. Verification: - pnpm test src/auto-reply/reply/agent-runner-execution.test.ts - autoreview clean on final focused fixup; prior accepted findings addressed before push. - CI passed on head e25b3e8 after rerunning cancelled jobs: preflight, critical quality network-runtime-boundary, security high, checks, Real behavior proof. Co-authored-by: tanshanshan <tanshanshan@users.noreply.github.com>
…4399) Use the effective runtime/model context when computing overflow recovery reserveTokensFloor hints, including uncataloged runtime refs, stale session windows, and heartbeat fallback cases. Verification: - pnpm test src/auto-reply/reply/agent-runner-execution.test.ts - autoreview clean on final focused fixup; prior accepted findings addressed before push. - CI passed on head e25b3e8 after rerunning cancelled jobs: preflight, critical quality network-runtime-boundary, security high, checks, Real behavior proof. Co-authored-by: tanshanshan <tanshanshan@users.noreply.github.com>
…to PR-head wholesale Per pr-review-toolkit:code-reviewer finding: my prior shape-only port left computeContextAwareReserveTokensFloor as dead code and runtimeProvider/ runtimeModel params unused, regressing upstream openclaw#84399 ("use context-aware overflow reserve hints"). The 10 test failures attributed in RESOLUTION-TRAIL §11 to "upstream-class loss" included this real regression of an upstream feature, not just an upstream-only test that PR-head intentionally lacked. Full port-back applied to PR-head's wholesale-taken src/auto-reply/reply/agent-runner-execution.ts: - resolveContextWindowForCompactionHint() helper added (resolves context window from runtime / session / primary / agent-cap, in that priority order) - buildContextOverflowResetHint() helper added (dynamic per-window reset hint using computeContextAwareReserveTokensFloor) - buildContextOverflowRecoveryText() body rewritten to compute the primary context window and select heartbeat-bleed-hint vs reset-hint per upstream - Static CONTEXT_OVERFLOW_RESET_HINT constant removed (was unused after rewrite) - CLI runner runParams now carries suppressNextUserMessagePersistence, onUserMessagePersisted, and userTurnTranscriptRecorder (upstream's prepared-CLI-user-turn boundary) - attemptedRuntimeProvider / attemptedRuntimeModel tracking added at function scope; updated inside the run() callback per candidate attempt - Three buildContextOverflowRecoveryText callsites now thread runtimeProvider: attemptedRuntimeProvider / runtimeModel: attemptedRuntimeModel (was fallbackProvider/fallbackModel, which only updates on success) - Added unconditional `if (isCompactionFailure)` fallback branch with preserveSessionMapping: true mirroring upstream, AFTER PR-head's existing resetSessionAfterCompactionFailure-gated branch. Preserves PR-head's reset- attempt feature when the callback returns true; falls back to upstream's always-show-recovery-text behavior when reset returns false. Test gate: src/auto-reply/reply/agent-runner-execution.test.ts now 149/149 PASS (was 10 failing). Tsgo / umbrella check / build all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…fixture corrections Supersedes prior fold-3-defer-2 ambiguity. Per figs verbatim canon at PR #792 issuecomment-4551691544 (Elliott byte-walk correction): "we ship feature complete. thats a weird deferral" — fold ALL 5 #793 substantive deltas, no active-vs-dormant split. Plus paired test corrections that BOTH lanes shipped (per Elliott byte-walk): 1. **`src/agents/pi-embedded-runner/run/failover-policy.ts`** — adopt #793's `harnessOwnedTimeout` extraction + early-return `continue_normal` shape. Reverts my earlier `bf21bbc4b8`-shape revert. Cohort lock: full fold. 2. **`src/infra/session-cost-usage.ts`** — adopt #793's `includeCheckpoints` typed flag at helper source. Reverts my earlier `bf21bbc4b8`-shape revert (targeted skip in `loadCostUsageSummary` loop). Cleaner abstraction. 3. **`src/agents/pi-tools.workspace-paths.test.ts`** — fixture rename `memory/2026-05-20.md` → `memory/new-note.md` to de-stamp date-baked fixture. Test assertion shape unchanged (`fs.readFile(...).resolves.toBe(...)`). Both lanes converged on this rename per Elliott `4551691544`. NOT a mask. 4. **`src/agents/pi-embedded-runner/run.cross-provider-fallback-error-context.test.ts`** — paired test update matching merged production behavior under openclaw#84399 + #793 harnessOwnsTransport shape. Assertion now asserts what production actually does (`.resolves.toMatchObject({payloads: [{isError: true, text: stringContaining("Request timed out")}]})`). Both lanes shipped paired test updates for their contract changes per Elliott byte-walk. NOT a mask. Cael's earlier "Ronan masked tests / Cael honest-fails" framing was wrong; Elliott byte-walked and self-corrected (PR comments 4551691544 + 4551691738). This commit aligns cael's test surface to the cohort convergence. Test gate: 38/38 in the 2 previously-red files; tsgo green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…4399) Use the effective runtime/model context when computing overflow recovery reserveTokensFloor hints, including uncataloged runtime refs, stale session windows, and heartbeat fallback cases. Verification: - pnpm test src/auto-reply/reply/agent-runner-execution.test.ts - autoreview clean on final focused fixup; prior accepted findings addressed before push. - CI passed on head e25b3e8 after rerunning cancelled jobs: preflight, critical quality network-runtime-boundary, security high, checks, Real behavior proof. Co-authored-by: tanshanshan <tanshanshan@users.noreply.github.com>
…4399) Use the effective runtime/model context when computing overflow recovery reserveTokensFloor hints, including uncataloged runtime refs, stale session windows, and heartbeat fallback cases. Verification: - pnpm test src/auto-reply/reply/agent-runner-execution.test.ts - autoreview clean on final focused fixup; prior accepted findings addressed before push. - CI passed on head e25b3e8 after rerunning cancelled jobs: preflight, critical quality network-runtime-boundary, security high, checks, Real behavior proof. Co-authored-by: tanshanshan <tanshanshan@users.noreply.github.com>
|
🦞👀 Command router queued. I will update this comment with the next step. |
…4399) Use the effective runtime/model context when computing overflow recovery reserveTokensFloor hints, including uncataloged runtime refs, stale session windows, and heartbeat fallback cases. Verification: - pnpm test src/auto-reply/reply/agent-runner-execution.test.ts - autoreview clean on final focused fixup; prior accepted findings addressed before push. - CI passed on head e25b3e8 after rerunning cancelled jobs: preflight, critical quality network-runtime-boundary, security high, checks, Real behavior proof. Co-authored-by: tanshanshan <tanshanshan@users.noreply.github.com>
Summary
Replace the hardcoded
20000reserveTokensFloorsuggestion in context-overflow recovery messages with dynamic tier-based values based on the model context window size.Problem
When context limit is exceeded, the error message suggests setting
agents.defaults.compaction.reserveTokensFloorto 20000. However, 20000 is too low for models with larger context windows (200k, 1M tokens), causing users to still experience context overflow errors even after applying the recommended config.Solution
Use context-aware tiered values for the recommended
reserveTokensFloor:Uses session
contextTokensas fallback when model metadata is unavailable, with model metadata taking precedence over stale session entries.Changes
computeContextAwareReserveTokensFloor()function with tier logicresolveContextWindowForCompactionHint()to resolve context window with session fallbackbuildContextOverflowResetHint()helper functionbuildContextOverflowRecoveryText()to use dynamic hint50000) instead of locale-formatted strings, matching the integer config schemaVerification
16 tests passed (7 direct tier tests + 9 integration tests).
Real behavior proof
Behavior addressed: Context overflow recovery hint suggests hardcoded 20000 for reserveTokensFloor regardless of model context window size; after this fix, the hint uses context-aware tiers (100000/50000/35000/20000).
Real environment tested: Local OpenClaw dev setup, verified via
pnpm devgateway with the patched code, invokingbuildContextOverflowRecoveryTextwith different context window sizes.Exact steps or command run after this patch:
pnpm devbuildContextOverflowRecoveryTextwith different model context window sizes (1M, 200k, 100k, 32k, undefined)Evidence after fix:
Terminal output from running
buildContextOverflowRecoveryTextwith patched code:Observed result after fix: The recovery hint now recommends context-appropriate values:
100000for 1M,50000for 200k,35000for 100k, and the default20000for small/undefined context windows. Before this patch, all cases showed the hardcoded20000.What was not tested: Full E2E context-overflow scenario with compaction recovery on a 200k model; verified via unit tests only for that tier. Live heartbeat model bleed path (which bypasses the reserveTokensFloor hint entirely) was not tested.
Fixes #65839