Skip to content

fix(reply): block risky smaller-context model switches#44389

Closed
cgdusek wants to merge 4 commits into
openclaw:mainfrom
cgdusek:codex/model-switch-context-guard
Closed

fix(reply): block risky smaller-context model switches#44389
cgdusek wants to merge 4 commits into
openclaw:mainfrom
cgdusek:codex/model-switch-context-guard

Conversation

@cgdusek

@cgdusek cgdusek commented Mar 12, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Problem: /model accepted switches to smaller-context models even when the recorded session context was already too large for the target model.
  • Why it matters: the very next turn could fall straight into overflow/compaction failure territory, which is the core failure behind Switching to a smaller-context model mid-session causes hard crash instead of graceful truncation #44303.
  • What changed: added a shared preflight guard that refuses risky model switches, keeps the current model unchanged, and tells the user to /compact or /new; wired it into both the immediate /model path and the later persistence path so inline directives cannot sneak the switch through.
  • What did NOT change (scope boundary): this PR does not auto-compact on switch, does not change compaction internals, and does not change memory-flush threshold scaling.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

  • If a session is already larger than a target model's safe working budget, /model <provider/model> now refuses the switch instead of accepting it and risking an immediate overflow on the next turn.
  • The refusal message explains the recorded session size, the target model budget, and recommends /compact or /new.
  • Inline model directives are kept consistent with the same guard and no longer persist the blocked selection behind the scenes.

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS: macOS
  • Runtime/container: Node 22+, pnpm 10.23.0
  • Model/provider: synthetic config with anthropic/claude-opus-4-5 at 200k and openai/gpt-4o at 128k
  • Integration/channel (if any): reply directive handling tests
  • Relevant config (redacted): agents.defaults.compaction.reserveTokensFloor=20000

Steps

  1. Create a session with recorded token usage above the smaller model's safe budget.
  2. Issue /model openai/gpt-4o.
  3. Observe whether the switch is accepted or rejected.

Expected

  • The switch is rejected before the next turn runs.
  • The current model remains unchanged.
  • Inline directive persistence also skips the blocked switch.

Actual

  • Added tests reproduce the old behavior first; with this patch, the guard now blocks the risky switch and preserves the current model.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios: direct /model command rejection when the target safe budget is already exceeded; inline directive persistence does not write the blocked switch.
  • Edge cases checked: normal /model success path still passes existing tests; commands reply-path tests still pass.
  • What you did not verify: live Telegram reproduction against a real gateway process.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No
  • If yes, exact upgrade steps:

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly: revert this PR.
  • Files/config to restore: src/auto-reply/reply/directive-handling.impl.ts, src/auto-reply/reply/directive-handling.persist.ts, src/auto-reply/reply/model-switch-guard.ts
  • Known bad symptoms reviewers should watch for: legitimate model switches being blocked when the session token snapshot is badly stale.

Risks and Mitigations

  • Risk: a stale recorded token total could conservatively block a switch that would have fit.
    • Mitigation: this is fail-safe compared with allowing a switch that immediately overflows; the message tells the user to /compact or /new, and the PR intentionally does not alter compaction logic itself.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5bc73f537c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/auto-reply/reply/model-switch-guard.ts Outdated
@greptile-apps

greptile-apps Bot commented Mar 12, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR introduces a preflight guard (maybeBlockOversizedModelSwitch) that refuses /model switches when the recorded session token usage already exceeds the target model's safe context budget, wiring it into both the direct /model command path and the inline-directive persistence path so neither route can silently accept an overflowing switch.

Key changes:

  • model-switch-guard.ts (new): computes safeBudgetTokens = contextWindow − reserveTokensFloor and rejects the switch if recordedTokens > safeBudgetTokens, returning a user-facing error with /compact / /new guidance.
  • directive-handling.impl.ts: guard is called after resolveModelSelectionFromDirective; an early return prevents applyModelOverrideToSessionEntry from mutating the session entry on a blocked switch.
  • directive-handling.persist.ts: inline-directive path wraps the existing model-override block in if (!blockedModelSwitchText), keeping the session store clean when the switch is refused.
  • directive-handling.model.test.ts: two new tests verify the blocked-switch behaviour for both paths.

Issues found:

  • resolveRecordedSessionTokens picks totalTokensFresh in its type signature but never reads it. If this field was meant to gate the guard (only block on fresh snapshots), the staleness check is absent and stale token totals could incorrectly prevent a legitimate switch. If the intent is to always block regardless of freshness, removing totalTokensFresh from the Pick would make that explicit. Tests only cover the totalTokensFresh: true case.
  • Minor asymmetric trim() in the same-model short-circuit: currentProvider / currentModel are trimmed but targetProvider / targetModel are not.

Confidence Score: 4/5

  • Safe to merge with a clarification needed on whether totalTokensFresh staleness should gate the guard.
  • The guard logic is correctly placed in both code paths, early returns prevent any session mutation on a blocked switch, and the success path is unaffected. The one substantive concern is that totalTokensFresh is present in the type signature of resolveRecordedSessionTokens but is never consulted — this is either a missing staleness check (which could cause false-positive rejections on stale snapshots) or intentional fail-safe behaviour with a misleading type. The PR risks section acknowledges the stale-snapshot scenario, so the behaviour is at minimum intentional, but the type inconsistency and absent test coverage for the stale case lower confidence slightly.
  • Pay close attention to src/auto-reply/reply/model-switch-guard.ts — specifically the unused totalTokensFresh field and the intent behind its presence in the type signature.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/auto-reply/reply/model-switch-guard.ts
Line: 7-15

Comment:
**`totalTokensFresh` is picked but never consulted**

`resolveRecordedSessionTokens` accepts `Pick<SessionEntry, "totalTokens" | "totalTokensFresh">` but the function body only ever reads `entry?.totalTokens``totalTokensFresh` is completely ignored. This means the guard fires on stale token snapshots too, which the PR risk section acknowledges could block a legitimate switch.

The inclusion of `totalTokensFresh` in the `Pick` strongly implies it was meant to be used (e.g. skip the guard when the snapshot is stale and the real usage may have dropped). If that was the intent, the staleness check is missing:

```ts
function resolveRecordedSessionTokens(
  entry?: Pick<SessionEntry, "totalTokens" | "totalTokensFresh"> | null,
): number | undefined {
  const total = entry?.totalTokens;
  if (typeof total !== "number" || !Number.isFinite(total) || total <= 0) {
    return undefined;
  }
  // Only treat the snapshot as actionable when it is fresh.
  if (entry?.totalTokensFresh !== true) {
    return undefined;
  }
  return Math.floor(total);
}
```

If the decision is intentionally fail-safe (always block on any recorded total), removing `totalTokensFresh` from the `Pick` would make that intent explicit and avoid misleading future readers. The test suite only covers `totalTokensFresh: true`, so the stale-snapshot behaviour is currently untested either way.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/auto-reply/reply/model-switch-guard.ts
Line: 25-30

Comment:
**Asymmetric `trim()` in same-model short-circuit**

`currentProvider` and `currentModel` are trimmed before comparison, but `targetProvider` and `targetModel` are not. If a target string ever carried leading/trailing whitespace (e.g. from a loosely-parsed directive), the equality check would fail and the guard would proceed unnecessarily. Consider trimming both sides for consistency:

```suggestion
  if (
    params.currentProvider?.trim() === params.targetProvider?.trim() &&
    params.currentModel?.trim() === params.targetModel?.trim()
  ) {
    return undefined;
  }
```

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: 5bc73f5

Comment thread src/auto-reply/reply/model-switch-guard.ts Outdated
Comment thread src/auto-reply/reply/model-switch-guard.ts Outdated
@cgdusek cgdusek force-pushed the codex/model-switch-context-guard branch from 70fd0f4 to 5ed57ce Compare March 12, 2026 21:35
@openclaw-barnacle openclaw-barnacle Bot added the app: macos App: macos label Mar 12, 2026
@cgdusek

cgdusek commented Mar 12, 2026

Copy link
Copy Markdown
Contributor Author

Rebased onto current main and synced the generated Swift protocol models that the current base now expects, so checks (node, protocol, pnpm protocol:check) is green on this head.

Local verification on the rebased branch:

  • pnpm test -- src/auto-reply/reply/directive-handling.model.test.ts src/auto-reply/reply/commands.test.ts
  • pnpm build
  • pnpm check
  • pnpm protocol:check

The remaining failing check is secrets, which is reproducible both locally and in CI as the repo-wide pnpm audit --prod advisory for GHSA-5wcw-8jjv-m286 via extensions__googlechat>openclaw. I left that out of this PR so the model-switch fix stays scoped.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5ed57ce9d3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/auto-reply/reply/directive-handling.impl.ts
@openclaw-barnacle openclaw-barnacle Bot added channel: googlechat Channel integration: googlechat extensions: memory-core Extension: memory-core labels Mar 12, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7bf6d2d588

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/auto-reply/reply/directive-handling.persist.ts Outdated
Comment thread src/auto-reply/reply/model-switch-guard.ts Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d8fd0e2b68

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/auto-reply/reply/directive-handling.persist.ts

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f26229534c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/auto-reply/reply/model-switch-guard.ts
@openclaw-barnacle openclaw-barnacle Bot added the agents Agent runtime and tooling label Mar 12, 2026
@cgdusek

cgdusek commented Mar 13, 2026

Copy link
Copy Markdown
Contributor Author

Follow-up on the latest head (52eaa4d3fd3c84b2e3b6697dbb3f91493583e5af): I pushed a test-only hardening patch for the CI failures we saw on the previous head.

Local verification on this head:

  • pnpm check
  • OPENCLAW_TEST_WORKERS=1 pnpm exec vitest run --isolate=false src/cron/isolated-agent/run.skill-filter.test.ts src/cron/isolated-agent/run.payload-fallbacks.test.ts src/cron/isolated-agent/run.message-tool-policy.test.ts src/cron/isolated-agent/run.interim-retry.test.ts src/infra/outbound/outbound-send-service.test.ts src/tui/gateway-chat.test.ts src/cron/isolated-agent.direct-delivery-forum-topics.test.ts
  • OPENCLAW_TEST_WORKERS=1 OPENCLAW_TEST_SHARDS=6 OPENCLAW_TEST_SHARD_INDEX=1 pnpm test
  • pnpm canvas:a2ui:bundle && bunx vitest run --config vitest.unit.config.ts

I also ran the broad pnpm test lane locally. The touched suites passed there as well; the run eventually hit an unrelated local Vitest worker-start timeout on src/signal/probe.test.ts, which looks like host/process contention rather than anything in this patch.

One GitHub Actions note: the upstream CI workflow run for the previous head (4d9412200e4f2474bac6cf60c4482490c97a21a2) is still stuck because its macos job never left the queue, so the PR currently only shows the new labeler run on this head. I tried to cancel the stale upstream run from here, but fork permissions do not allow that.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 52eaa4d3fd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/auto-reply/reply/model-switch-guard.ts Outdated
@cgdusek cgdusek force-pushed the codex/model-switch-context-guard branch from 52eaa4d to 21a2af3 Compare March 13, 2026 02:27
@openclaw-barnacle openclaw-barnacle Bot removed channel: googlechat Channel integration: googlechat app: macos App: macos extensions: memory-core Extension: memory-core labels Mar 13, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5c56ed0f4b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/auto-reply/reply/model-switch-guard.ts Outdated
@cgdusek

cgdusek commented Mar 13, 2026

Copy link
Copy Markdown
Contributor Author

New head: 85d229a

Tracked the failing Windows shard back to src/memory/qmd-manager.test.ts picking up ambient qmd.cmd / mcporter.cmd wrappers from the runner PATH. That meant the new fail-closed wrapper policy tripped before the mocked child-process layer ran.

Remedy on this head:

  • isolate Windows PATH setup inside src/memory/qmd-manager.test.ts
  • seed deterministic local qmd/mcporter shim fixtures for real Windows runs
  • reuse the same helper in the explicit Windows resolution tests

Local verification:

  • pnpm exec vitest run src/memory/qmd-manager.test.ts src/memory/qmd-process.test.ts
  • OPENCLAW_TEST_WORKERS=1 OPENCLAW_TEST_SHARDS=6 OPENCLAW_TEST_SHARD_INDEX=1 pnpm test
  • pnpm check

Watching the new Actions run now.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 85d229ab3f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/auto-reply/reply/model-switch-guard.ts Outdated
@cgdusek

cgdusek commented Mar 13, 2026

Copy link
Copy Markdown
Contributor Author

New head: 2ea8c0f

The remaining Windows shard failure was still in src/memory/qmd-manager.test.ts, but the root cause was narrower than the first pass: the deterministic test shims were resolving to node <cli.js>, which changed argv shape on real Windows and made a large part of the suite miss its mocked search / query / update branches.

This follow-up keeps the production fail-closed behavior intact and fixes only the fixture path:

  • seed deterministic Windows qmd/mcporter shims that resolve to a direct fake .exe entrypoint instead of a Node entrypoint
  • broaden the mcporter test helper to recognize .exe entrypoints too
  • tighten the two explicit Windows resolution assertions so they now check the argv shape directly and will catch this regression earlier

Local verification on this head:

  • pnpm exec vitest run src/memory/qmd-manager.test.ts src/memory/qmd-process.test.ts
  • OPENCLAW_TEST_WORKERS=1 OPENCLAW_TEST_SHARDS=6 OPENCLAW_TEST_SHARD_INDEX=1 pnpm test
  • pnpm check

I also rechecked the underlying resolver locally with the seeded shim fixture to confirm the Windows materialized invocation now stays in the direct-entrypoint form (["search", ...]) rather than node <entrypoint> ....

Watching the new CI run now.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2ea8c0f518

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/auto-reply/reply/model-switch-guard.ts Outdated
@openclaw-barnacle openclaw-barnacle Bot added the channel: slack Channel integration: slack label Mar 13, 2026
@cgdusek

cgdusek commented Mar 13, 2026

Copy link
Copy Markdown
Contributor Author

New head: 6c0b086

Two follow-ups from the latest pass:

  • addressed the still-open model-switch review point by taking the smaller of the cache/config windows when both exist, with a regression test that forces a warmed 200k cache entry against a 128k configured minimum
  • fixed the check lane’s CI-only tsgo failure around typeof sendMessageSlack by switching the Slack callers to explicit exported send types

Local verification on this head:

  • pnpm exec vitest run src/channels/plugins/outbound/slack.test.ts src/channels/plugins/outbound/slack.sendpayload.test.ts src/slack/send.blocks.test.ts src/slack/send.upload.test.ts src/slack/draft-stream.test.ts src/auto-reply/reply/model-switch-guard.test.ts src/auto-reply/reply/directive-handling.model.test.ts src/memory/qmd-manager.test.ts src/memory/qmd-process.test.ts
  • pnpm check

I also replied to and resolved the duplicate open bot threads for the cache/config budget case. Watching the fresh CI run now.

@cgdusek

cgdusek commented Mar 13, 2026

Copy link
Copy Markdown
Contributor Author

New head: c2d6f0f

While the previous merge-run was still draining, it exposed one current-main mismatch that wasn’t in this branch yet: readScheduledTaskCommand() now returns sourcePath, but the branch-side schtasks implementation/tests were still on the older shape. I forward-ported that behavior and updated the schtasks expectations so the PR branch matches the merge state GitHub is actually testing.

Local verification on this head:

  • pnpm exec vitest run src/daemon/schtasks.test.ts src/memory/qmd-manager.test.ts
  • pnpm canvas:a2ui:bundle && bunx vitest run --config vitest.unit.config.ts src/daemon/schtasks.test.ts src/memory/qmd-manager.test.ts
  • pnpm check

Watching the fresh runs for this head now.

@cgdusek cgdusek force-pushed the codex/model-switch-context-guard branch from c2d6f0f to 8525784 Compare March 13, 2026 03:43
@openclaw-barnacle openclaw-barnacle Bot removed the gateway Gateway runtime label Mar 13, 2026
@cgdusek

cgdusek commented Mar 13, 2026

Copy link
Copy Markdown
Contributor Author

New head: 8525784

I rebased onto current main, resolved the one overlap in src/channels/plugins/outbound/slack.ts, and pushed a small post-rebase test hardening follow-up for the Windows mcporter wrapper assertion in src/memory/qmd-manager.test.ts. The test now enforces the real contract (shell: true must never be used) while still treating direct entrypoint resolution as acceptable when the host can resolve past the wrapper.

Local verification on this head:

  • pnpm exec vitest run src/memory/qmd-manager.test.ts src/memory/qmd-process.test.ts src/auto-reply/reply/model-switch-guard.test.ts src/auto-reply/reply/directive-handling.model.test.ts src/channels/plugins/outbound/slack.test.ts src/channels/plugins/outbound/slack.sendpayload.test.ts src/slack/send.blocks.test.ts src/slack/send.upload.test.ts src/slack/draft-stream.test.ts src/daemon/schtasks.test.ts
  • pnpm check

I also replied to and resolved the remaining outdated duplicate bot thread for the cache/config minimum-window case. The fresh CI and Install Smoke runs are now attached to this head, and I’m watching those next.

@cgdusek

cgdusek commented Mar 13, 2026

Copy link
Copy Markdown
Contributor Author

New head: e947b0f

The previous merge-run exposed one remaining Windows-shard regression in src/memory/qmd-manager.test.ts: the hardened wrapper assertion still assumed the only acceptable error path was a .cmd spawn-time failure. On current Windows merge state, the resolver can also fail earlier with the same no-shell wrapper error before spawn() is reached.

I reproduced the failing shard locally and tightened the test to accept both no-shell failure modes while still forbidding any shell: true fallback.

Local verification on this head:

  • pnpm exec vitest run src/memory/qmd-manager.test.ts
  • OPENCLAW_TEST_WORKERS=1 OPENCLAW_TEST_SHARDS=6 OPENCLAW_TEST_SHARD_INDEX=1 pnpm test
  • pnpm check

Watching the replacement CI run on this head now.

@cgdusek cgdusek force-pushed the codex/model-switch-context-guard branch from e947b0f to 7b726eb Compare March 13, 2026 03:55
@openclaw-barnacle openclaw-barnacle Bot added the app: web-ui App: web-ui label Mar 13, 2026
@cgdusek

cgdusek commented Mar 13, 2026

Copy link
Copy Markdown
Contributor Author

Follow-up on the Aisle note about bundling a blocked /model switch with other directives such as /elevated off:

I looked at that path while fixing the mixed-message regression, and I intentionally kept this PR on the atomic side rather than introducing partial-success semantics under the same bug-fix. The concrete regression here was that we could tell the user the /model change did not happen while still persisting other inline/session mutations. This patch closes that gap by making a failed /model directive leave the session unchanged.

I agree there is a separate product/security question about whether privilege-reducing directives should be allowed to apply even when another directive in the same batch fails. I did not fold that policy change into this PR because it would require new acknowledgement semantics for partial success and would change directive-only command behavior beyond the scoped fix for #44303.

@cgdusek cgdusek force-pushed the codex/model-switch-context-guard branch from 7b726eb to 829b132 Compare March 13, 2026 05:10
@openclaw-barnacle openclaw-barnacle Bot removed the app: web-ui App: web-ui label Mar 13, 2026
@cgdusek

cgdusek commented Mar 13, 2026

Copy link
Copy Markdown
Contributor Author

New head: 829b132

main moved again and flipped the PR back to merge-conflicting, so I rebased onto current origin/main and resolved the one overlap in ui/src/ui/views/agents-utils.test.ts. That overlap was just the older test-only typing follow-up; current main already carries the needed avatar identity shape, so I kept the newer base version and dropped the obsolete commit while preserving the model-switch work.

Local verification on this rebased head:

  • pnpm exec vitest run src/memory/qmd-manager.test.ts src/auto-reply/reply/model-switch-guard.test.ts src/auto-reply/reply/directive-handling.model.test.ts ui/src/ui/views/agents-utils.test.ts
  • pnpm check
  • OPENCLAW_TEST_WORKERS=1 OPENCLAW_TEST_SHARDS=6 OPENCLAW_TEST_SHARD_INDEX=1 pnpm test

No new review threads were opened in the meantime; I’m watching the replacement Actions runs on this head now.

@cgdusek

cgdusek commented Mar 13, 2026

Copy link
Copy Markdown
Contributor Author

One quick note on the latest Aisle analysis: it is pointing at commit 7b726eb, not the current head, but the underlying concern is the same product-policy question I addressed earlier.

I intentionally kept this PR atomic: if a /model directive fails or is refused, the batch leaves persisted session state unchanged. The bug here was the opposite behavior, where we could tell the user the model switch did not happen while still silently persisting other directive side effects. Allowing selected non-model directives to continue through the same failure path would be a broader semantic change that needs explicit partial-success UX and follow-up coverage, so I kept it out of this scoped fix for #44303.

@cgdusek cgdusek force-pushed the codex/model-switch-context-guard branch from 829b132 to d57a18f Compare March 19, 2026 03:13

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d57a18f2e2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/auto-reply/reply/model-switch-guard.ts
Comment thread src/auto-reply/reply/model-switch-guard.ts
@cgdusek cgdusek force-pushed the codex/model-switch-context-guard branch from 1e976a2 to 79023aa Compare March 20, 2026 02:19
@openclaw-barnacle openclaw-barnacle Bot removed channel: slack Channel integration: slack agents Agent runtime and tooling labels Mar 20, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 79023aab39

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/auto-reply/reply/model-switch-guard.ts
@cgdusek cgdusek force-pushed the codex/model-switch-context-guard branch from 79023aa to b7f447b Compare March 23, 2026 22:22

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b7f447bcfc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/auto-reply/reply/model-switch-guard.ts

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cbd5849872

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +47 to +52
function resolveEffectiveContextWindowTokens(params: {
cfg: OpenClawConfig;
model: string;
}): number | undefined {
const cachedWindow = normalizePositiveInt(lookupContextTokens(params.model));
const configuredWindow = resolveConfiguredModelContextWindow(params);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use target provider when resolving switch budgets

resolveEffectiveContextWindowTokens only keys off model, so the new guard compares against the bare-model cache / smallest configured duplicate-ID window instead of the selected provider’s runtime limit. In a long session this produces false blocks for safe provider switches, e.g. from a 128k provider to a 200k provider that shares the same model id: this guard will still pick 128k and reject /model, even though the actual execution path resolves the target model with provider + modelId in src/agents/pi-embedded-runner/run.ts:376-403 and applies that provider-specific window in src/agents/context-window-guard.ts:28-47. Fresh evidence on this head is that the preflight still drops targetProvider here, while the runner continues to honor it.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0d7a6a6ee9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +148 to +149
if (blockedModelSwitchText) {
return { text: blockedModelSwitchText };

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Clear runtime directives when blocking an inline /model

Returning here makes a mixed-message /model failure look final, but the current turn still honors other inline runtime overrides because resolveReplyDirectives has already derived them from the original directives in src/auto-reply/reply/get-reply-directives.ts:370-395 and :510, and runPreparedReply still consumes them in src/auto-reply/reply/get-reply-run.ts:474-479 and :553-563. For example, hello /model openai/gpt-4o /elevated full or /exec host=node will reply Current model unchanged yet run this turn with elevated/exec/think/queue changes anyway. If the new guard is meant to reject the inline /model attempt as a unit, those per-turn overrides need to be cleared before continuing.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Switching to a smaller-context model mid-session causes hard crash instead of graceful truncation

1 participant