doctor --fix cleans only codex-route legacy agentRuntimeOverride pins; stale claude-cli / other harness pins persist forever
TL;DR
Since #79238 (merged 2026-05-08, shipped in v2026.5.9-beta.1+), live routing correctly ignores stale agentRuntimeOverride / agentHarnessId pins in session entries — the status label is no longer driven by them, and directive-handling.persist.ts clears them when a new runtime directive arrives.
However, no code path actively cleans pre-existing stale pins unless one of two narrow triggers fires:
src/commands/doctor/shared/codex-route-warnings.ts — delete entry.agentRuntimeOverride — runs only when the entry matches a Codex-route legacy condition.
src/auto-reply/reply/directive-handling.persist.ts — deletes the pin only when a new runtimeOverride directive (clear / set / invalid) is being processed for the same session.
So a session pinned long ago (e.g. agentRuntimeOverride: "claude-cli" from a previous Anthropic-Opus-via-claude-cli setup) accumulates indefinitely in ~/.openclaw/agents/<id>/sessions/sessions.json even after:
- the user has fully migrated away from
claude-cli (no cliBackends.claude-cli, no claude binary, model switched)
agents.defaults.agentRuntime.id reconfigured to pi / codex / anything
openclaw doctor --fix --non-interactive --yes run multiple times
- gateway restarted
/reset and /new issued in the chat channel
The pin no longer affects routing (good — fixed by #79238), but it remains visible in status (Runtime: Claude CLI next to an openai-codex/gpt-5.5 model) and is a confusing data debt that misleads any future audit, diagnostics, or migration tooling. Tested on a real upgrade path v2026.5.7 → in-place edits where I had to manually python -c "del entry['agentRuntimeOverride']" the sessions.json to clear the label.
Environment
- openclaw:
2026.5.7 (commit eeef486) — the specific tag where I reproduced the display form. Pin-creation predates that; the residue is data-level and version-independent.
- Node.js:
v22.22.0
- Platform: Linux x86_64
- 1 agent (
general), Telegram channel active, session key agent:general:telegram:direct:<chat-id>
- Pin origin: an earlier era when
cliBackends.claude-cli was the active runtime and the model was anthropic/claude-opus-*. The cliBackend has since been removed from agents.defaults.cliBackends, the model switched to openai-codex/gpt-5.5, the claude binary is no longer in PATH for the gateway service.
Although the live regression I hit is on 2026.5.7, the cleanup hole is still present on main — verified by reading src/commands/doctor/shared/* and src/auto-reply/reply/directive-handling.persist.ts at HEAD. The fix in #79238 was scoped to display correctness and route ignorance, not stored-pin sanitation.
Reproduction
# 1. Make a session pin a non-default runtime. Easiest natural way:
# set cliBackends.claude-cli, set primary model to anthropic/claude-opus-*,
# send a chat through any channel — openclaw will write
# "agentRuntimeOverride": "claude-cli" into sessions.json for that key.
# (Or shortcut: edit sessions.json by hand and add the field to any entry.)
# 2. Migrate away:
openclaw config unset agents.defaults.cliBackends.claude-cli
openclaw models set openai-codex/gpt-5.5 # or any non-CLI provider model
openclaw config set agents.defaults.agentRuntime.id pi
# 3. Try every "clean things up" command openclaw exposes:
openclaw doctor --fix --non-interactive --yes
openclaw plugins doctor
systemctl --user restart openclaw-gateway.service
# in the chat channel:
/reset
/new
# 4. Inspect the session entry — pin is still there.
python3 -c '
import json
d = json.load(open("/home/$USER/.openclaw/agents/general/sessions/sessions.json"))
for k, v in d.items():
if v.get("agentRuntimeOverride") or v.get("agentHarnessId"):
print(k, "->", v.get("agentRuntimeOverride"), v.get("agentHarnessId"))
'
# Expected output (post #79238): nothing.
# Actual: the legacy pin is still present.
On a 5.7 install you will additionally see Runtime: Claude CLI in /status for that session (display layer is also broken pre-#79238). On ≥ 5.9-beta.1 the display will correctly say OpenClaw Pi Default, but the underlying JSON pollution stays.
Evidence
1. Cleanup is narrowly scoped to codex-route legacy
$ gh search code --owner openclaw "delete entry.agentRuntimeOverride"
src/commands/doctor/shared/codex-route-warnings.ts: delete entry.agentRuntimeOverride;
extensions/discord/src/monitor/native-command-model-picker-apply.ts: delete entry.agentRuntimeOverride;
Only two call sites delete the pin from an arbitrary entry. The doctor one is gated on Codex-route legacy detection; the Discord one runs only when a Discord native-command model picker reassigns runtime. Neither sweeps non-Codex, non-Discord pins.
2. The persist-time deletion needs a fresh directive to fire
src/auto-reply/reply/directive-handling.persist.ts (HEAD):
if (runtimeOverride?.kind === "clear") {
if (sessionEntry.agentRuntimeOverride) {
delete sessionEntry.agentRuntimeOverride;
updated = true;
}
} else if (runtimeOverride?.kind === "set") {
if (sessionEntry.agentRuntimeOverride) {
delete sessionEntry.agentRuntimeOverride;
updated = true;
}
enqueueSystemEvent(`Ignored session runtime ${runtimeOverride.runtime}; ...`, ...);
}
If nobody is trying to set or clear runtime via a directive on that exact session, the existing override is never re-evaluated.
3. /reset does not touch session-stored runtime overrides
src/commands/reset.ts only knows scopes config / config+creds+sessions / full. The middle scope nukes the whole sessions dir, the strict scope nukes nothing in the session store, and the "full" one nukes everything. There is no in-between "this session, runtime-only" path, and the per-chat-channel /reset slash command goes through src/auto-reply/reply/session-reset-*.ts which clears history but does not delete agentRuntimeOverride / agentHarnessId from the entry it preserves.
4. Real reproduction artifact (v2026.5.7 install, post-migration)
"agent:general:telegram:direct:245931306": {
"model": "gpt-5.5",
"modelProvider": "openai-codex",
"agentRuntimeOverride": "claude-cli" <-- still here after migration + doctor + restart + /reset + /new
}
After a manual delete entry["agentRuntimeOverride"] + gateway restart, the next Telegram turn correctly resolves to OpenClaw Pi Default for the runtime label and continues routing through openai-codex/gpt-5.5 as desired.
Suggested fix
Either:
- (A) Generalize
doctor --fix to drop any agentRuntimeOverride / agentHarnessId that does not match a currently-registered, currently-enabled harness OR the configured agents.defaults.agentRuntime.id. The codex-route warning logic in codex-route-warnings.ts could be promoted to a generic stale-pin sweeper.
- (B) Add a CLI surface:
openclaw sessions reset-runtime <session-key> and openclaw sessions clear-overrides --all so operators can fix this without hand-editing JSON.
- (C) Have
/reset (and src/auto-reply/reply/session-reset-*.ts) drop session-level runtime overrides as part of normal reset semantics — that matches user expectation.
(A) is probably the cleanest; (C) plus a one-time migration sweep on gateway start would also do it.
Related
doctor --fixcleans only codex-route legacyagentRuntimeOverridepins; staleclaude-cli/ other harness pins persist foreverTL;DR
Since #79238 (merged 2026-05-08, shipped in v2026.5.9-beta.1+), live routing correctly ignores stale
agentRuntimeOverride/agentHarnessIdpins in session entries — the status label is no longer driven by them, anddirective-handling.persist.tsclears them when a new runtime directive arrives.However, no code path actively cleans pre-existing stale pins unless one of two narrow triggers fires:
src/commands/doctor/shared/codex-route-warnings.ts—delete entry.agentRuntimeOverride— runs only when the entry matches a Codex-route legacy condition.src/auto-reply/reply/directive-handling.persist.ts— deletes the pin only when a newruntimeOverridedirective (clear/set/invalid) is being processed for the same session.So a session pinned long ago (e.g.
agentRuntimeOverride: "claude-cli"from a previous Anthropic-Opus-via-claude-cli setup) accumulates indefinitely in~/.openclaw/agents/<id>/sessions/sessions.jsoneven after:claude-cli(nocliBackends.claude-cli, no claude binary, model switched)agents.defaults.agentRuntime.idreconfigured topi/codex/ anythingopenclaw doctor --fix --non-interactive --yesrun multiple times/resetand/newissued in the chat channelThe pin no longer affects routing (good — fixed by #79238), but it remains visible in
status(Runtime: Claude CLInext to anopenai-codex/gpt-5.5model) and is a confusing data debt that misleads any future audit, diagnostics, or migration tooling. Tested on a real upgrade path v2026.5.7 → in-place edits where I had to manuallypython -c "del entry['agentRuntimeOverride']"thesessions.jsonto clear the label.Environment
2026.5.7(commiteeef486) — the specific tag where I reproduced the display form. Pin-creation predates that; the residue is data-level and version-independent.v22.22.0general), Telegram channel active, session keyagent:general:telegram:direct:<chat-id>cliBackends.claude-cliwas the active runtime and the model wasanthropic/claude-opus-*. The cliBackend has since been removed fromagents.defaults.cliBackends, the model switched toopenai-codex/gpt-5.5, theclaudebinary is no longer in PATH for the gateway service.Although the live regression I hit is on 2026.5.7, the cleanup hole is still present on
main— verified by readingsrc/commands/doctor/shared/*andsrc/auto-reply/reply/directive-handling.persist.tsat HEAD. The fix in #79238 was scoped to display correctness and route ignorance, not stored-pin sanitation.Reproduction
On a 5.7 install you will additionally see
Runtime: Claude CLIin/statusfor that session (display layer is also broken pre-#79238). On ≥ 5.9-beta.1 the display will correctly sayOpenClaw Pi Default, but the underlying JSON pollution stays.Evidence
1. Cleanup is narrowly scoped to codex-route legacy
Only two call sites delete the pin from an arbitrary entry. The doctor one is gated on Codex-route legacy detection; the Discord one runs only when a Discord native-command model picker reassigns runtime. Neither sweeps non-Codex, non-Discord pins.
2. The persist-time deletion needs a fresh directive to fire
src/auto-reply/reply/directive-handling.persist.ts(HEAD):If nobody is trying to set or clear runtime via a directive on that exact session, the existing override is never re-evaluated.
3.
/resetdoes not touch session-stored runtime overridessrc/commands/reset.tsonly knows scopesconfig/config+creds+sessions/full. The middle scope nukes the whole sessions dir, the strict scope nukes nothing in the session store, and the "full" one nukes everything. There is no in-between "this session, runtime-only" path, and the per-chat-channel/resetslash command goes throughsrc/auto-reply/reply/session-reset-*.tswhich clears history but does not deleteagentRuntimeOverride/agentHarnessIdfrom the entry it preserves.4. Real reproduction artifact (v2026.5.7 install, post-migration)
After a manual
delete entry["agentRuntimeOverride"]+ gateway restart, the next Telegram turn correctly resolves toOpenClaw Pi Defaultfor the runtime label and continues routing throughopenai-codex/gpt-5.5as desired.Suggested fix
Either:
doctor --fixto drop anyagentRuntimeOverride/agentHarnessIdthat does not match a currently-registered, currently-enabled harness OR the configuredagents.defaults.agentRuntime.id. The codex-route warning logic incodex-route-warnings.tscould be promoted to a generic stale-pin sweeper.openclaw sessions reset-runtime <session-key>andopenclaw sessions clear-overrides --allso operators can fix this without hand-editing JSON./reset(andsrc/auto-reply/reply/session-reset-*.ts) drop session-level runtime overrides as part of normal reset semantics — that matches user expectation.(A) is probably the cleanest; (C) plus a one-time migration sweep on
gateway startwould also do it.Related
Keep OpenAI Codex migrations on automatic runtime routing— the upstream fix that made live routing ignore stale pins and added the codex-route doctor cleanup. This issue is about closing the remaining cleanup gap.