Improve stale Codex auth recovery guidance#83937
Conversation
|
Codex review: needs maintainer review before merge. Workflow note: Future ClawSweeper reviews update this same comment in place. How this review workflow works
Summary Reproducibility: yes. at source level. Current main maps PR rating Rank-up moves:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. PR egg Rarity: 🥚 common. What is this egg doing here?
Real behavior proof Mantis proof suggestion Risk before merge
Maintainer options:
Next step before merge Security Review detailsBest possible solution: Land the bounded stale-route cleanup and doctor-first guidance after maintainer acceptance of the auth-provider/session-state recovery order; keep the linked issue open until this PR merges. Do we have a high-confidence way to reproduce the issue? Yes, at source level. Current main maps Is this the best way to solve the issue? Yes, with maintainer acceptance. Clearing only auto-created same-model legacy pins while preserving explicit user overrides is a bounded fix, and the doctor-first message keeps a supported re-auth/configure fallback for genuine missing auth. Label justifications:
What I checked:
Likely related people:
Codex review notes: model gpt-5.5, reasoning high; reviewed against cf235b209f1e. |
|
Adding the relevant Redacted/normalized excerpt: This supports the PR behavior: when an agent run surfaces |
|
Addressed the PR review proof gap. After-patch safe runtime proof from the PR checkout: The proof script imports the patched new Error('No API key found for provider "openai-codex".')This demonstrates the PR branch now produces the new user-facing recovery text without disturbing the production Telegram session. Additional validation:
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
@clawsweeper re-review |
b340bbf to
9991f1b
Compare
|
@clawsweeper re-review Addressed the P2 by preserving doctor-first stale-route guidance and adding the supported openai-codex re-auth/configure fallback. Validation: |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
9991f1b to
9f2557c
Compare
|
@clawsweeper re-review Expanded this after Paul’s follow-up: the PR now also prevents stale auto-created legacy Validation: |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
9f2557c to
4eaf34e
Compare
|
@clawsweeper re-review Follow-up after CI: adjusted the new prevention test to match existing auth-profile cleanup semantics when the fixture has no stored profile. Product behavior is unchanged: stale auto-created legacy Validation: full |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
4eaf34e to
d233ef2
Compare
|
Maintainer fixup pushed in e3c8df7. Thanks @pfrederiksen. Verification:
Known proof gap: local Vitest was not used as proof because the laptop runner stayed silent/spun; GitHub CI covered the affected auto-reply shards. |
Summary
openai-codex/gpt-*session route pins when the configured primary has migrated to canonicalopenai/gpt-*openai-codexoverrides while cleaning up stale automatic session stateNo API key found for provider "openai-codex"openclaw doctor --fixas the first stale-route repair, withopenclaw models auth login --provider openai-codex/openclaw configureas fallback guidanceFixes #83935.
Real behavior proof
Behavior or issue addressed: A live Telegram group session on a real OpenClaw install failed after an update with the stale provider-auth message
Missing API key for provider "openai-codex". The working recovery wasopenclaw doctor, not configuring a new API key. This PR now does two things for that path: it prevents stale auto-createdopenai-codex/gpt-*session pins from being reused after migration toopenai/gpt-*, and it improves the fallback user-facing error if a missing-key failure still gets through.Real environment tested: OpenClaw gateway on Linux/systemd user service, Telegram group session, updated from
2026.5.12to2026.5.18.Exact steps or command run after fix:
Missing API key for provider "openai-codex".openclaw doctor.Evidence after fix:
After-patch safe runtime proof from the PR checkout:
The proof script imports the patched
buildKnownAgentRunFailureReplyPayloadand feeds it the exact real failure signature:No API key found for provider "openai-codex".Observed result after fix: The real setup recovered after doctor, confirming stale
openai-codexroute state should direct users to doctor repair. The source patch additionally clears matching stale automatic session pins before they can keep routing future turns through the legacy provider.What was not tested: I did not install this branch into the live gateway and intentionally re-break the active Telegram session, because doing that would interrupt the working production assistant session again.
Validation
node scripts/run-vitest.mjs src/auto-reply/reply/model-selection.test.tsnode scripts/run-vitest.mjs src/auto-reply/reply/agent-runner-execution.test.ts -t "points stale openai-codex missing-key failures at doctor repair"git diff --check4eaf34e3f5d71679b9d766e249c34a7b27447efa