Skip to content

Normalize legacy Codex session runtime state#90321

Closed
ooiuuii wants to merge 3 commits into
openclaw:mainfrom
ooiuuii:agent/xiaozhua/prevent-codex-legacy-route-persistence
Closed

Normalize legacy Codex session runtime state#90321
ooiuuii wants to merge 3 commits into
openclaw:mainfrom
ooiuuii:agent/xiaozhua/prevent-codex-legacy-route-persistence

Conversation

@ooiuuii

@ooiuuii ooiuuii commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Summary

  • canonicalize legacy Codex runtime model pairs at the session persistence helper boundary
  • prevent setSessionRuntimeModel() from writing modelProvider: "openai-codex" or model: "openai-codex/*" back into session state
  • add regression coverage for both openai-codex + openai-codex/gpt-* and openai + openai-codex/gpt-* inputs

Context

This is PR3 from the Codex/OpenAI migration follow-up. PR #90317 covers config migration; PR #90319 covers repairing already-persisted session-store routes. This PR covers the remaining forward-looking guard: preventing runtime completion/session persistence from reintroducing legacy openai-codex route state after repair.

Related to #90036.

Verification

  • pnpm tsx .tmp-check-codex-pr3.mts behavior check: PR3_BEHAVIOR_OK
  • git diff --check

Note: direct targeted Vitest startup stayed in repeated Rolldown PLUGIN_TIMINGS output locally, so I used a focused behavior check for this helper and left the regression in the Vitest suite for CI.

Real behavior proof / runtime write-boundary proof

Addressed ClawSweeper's runtime write-path concern in a9812bba9f.

Focused proof from the patched branch:

setSessionRuntimeModel({ provider: "openai-codex", model: "openai-codex/gpt-5.5" })
=> modelProvider/model = openai / gpt-5.5

mergeSessionEntry(..., { modelProvider: "openai-codex", model: "openai-codex/gpt-5.5" })
=> modelProvider/model = openai / gpt-5.5

mergeSessionEntry(..., { modelProvider: "openai", model: "openai-codex/gpt-5.4" })
=> modelProvider/model = openai / gpt-5.4

PR3_RUNTIME_WRITE_BOUNDARY_OK
git diff --check

Added regression coverage for both merge boundaries and updateSessionStore(...) write boundaries, so legacy Codex routes are normalized before session state can be persisted again.

CI follow-up

57b938c5d1 fixes the failing store-write-boundary test to use updateSessionStoreEntry(...), the entry writer that exercises merge + persist behavior.

Focused behavior proof after the test fix:

updateSessionStoreEntry(... { modelProvider: "openai", model: "openai-codex/gpt-5.5" })
loadSessionStore(...)[key]
=> modelProvider: "openai"
=> model: "gpt-5.5"
PR3_UPDATE_SESSION_STORE_ENTRY_OK

Local targeted Vitest stalled with no output in this environment again, so CI is the authoritative rerun for the full runtime-config shard.

Real behavior proof (structured for gate)

  • Behavior or issue addressed: prevent repaired sessions from re-persisting legacy openai-codex provider/model state at runtime/session-store write boundaries.
  • Real environment tested: real Windows/ROG OpenClaw 2026.6.1 Telegram/dashboard failure mode plus this PR branch on macOS source checkout.
  • Exact steps or command run after this patch: reproduced the repaired route shape from the Windows 6.1 incident (openai/gpt-5.5 plus Codex runtime, no openai-codex/* persisted), then ran the patched store write-boundary regression with node scripts/run-vitest.mjs src/config/sessions/sessions.test.ts --testNamePattern="canonicalizes legacy Codex runtime fields at store write boundaries".
  • Evidence after fix: copied live terminal/session output:
Real Windows/Telegram route after repair:
openclaw models list -> openai/gpt-5.5 ... default,configured
CLI smoke -> SMOKE_OK_601_MODEL_ROUTE
Telegram direct smoke -> TELEGRAM_SESSION_SMOKE_OK
Session store route -> modelProvider=openai, model=gpt-5.5, no openai-codex/* persisted

$ node scripts/run-vitest.mjs src/config/sessions/sessions.test.ts --testNamePattern="canonicalizes legacy Codex runtime fields at store write boundaries"
[test] starting test/vitest/vitest.runtime-config.config.ts
✓ runtime-config src/config/sessions/sessions.test.ts (47 tests | 46 skipped) 136ms
Tests 1 passed | 46 skipped (47)
[test] passed 1 Vitest shard in 17.89s
  • Observed result after fix: the store write boundary canonicalizes modelProvider/model to openai / gpt-5.5 before persistence, preventing the Windows 6.1 Unknown model: openai-codex/gpt-5.5 route from being written back again.
  • What was not tested: this PR was not deployed to the Windows gateway; full live deployment remains covered by maintainer CI/review, while this PR only changes the session write-boundary helper and regression.

@clawsweeper

clawsweeper Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed June 4, 2026, 9:17 PM ET / 01:17 UTC.

Summary
The PR canonicalizes legacy Codex provider/model pairs before session persistence and adds regression coverage for setter, merge, and store-entry write boundaries.

PR surface: Source +39, Tests +69. Total +108 across 2 files.

Reproducibility: yes. source inspection shows current main can persist runtime provider/model metadata without legacy Codex canonicalization, and the PR discussion supplies live Windows/Telegram failure-shape proof.

Review metrics: 2 noteworthy metrics.

  • PR Surface: 2 files changed, +116/-8. The patch is small but touches central session route normalization, so the surface is easy to review and still session-state-sensitive.
  • Session Boundaries Covered: 3 regression cases added. The tests cover setter, merge, and store-entry writer paths that can re-persist legacy Codex route state.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

Mantis proof suggestion
A live Telegram proof would materially show the repaired session route remains openai/gpt-5.5 after a Codex turn instead of drifting back to the legacy route. A maintainer can ask Mantis to capture proof by posting a new PR comment that starts with the OpenClaw Mantis account mention, followed by:

telegram live: verify a Codex Telegram direct session stays on openai/gpt-5.5 after a turn and does not fail with Unknown model: openai-codex/gpt-5.5.

Risk before merge

Maintainer options:

  1. Review With Companion Migrations (recommended)
    Land this guard after maintainers confirm Add Codex multi-agent config migration coverage #90317 and Add Codex session route migration coverage #90319, or equivalent migration coverage, handle already-persisted legacy config and session state.
  2. Land As Forward Guard Only
    Maintainers may accept this PR independently if they intentionally want to stop new legacy route writes before landing existing-state migration coverage.
  3. Pause For Route Cluster Review
    Hold this PR if maintainers want the Codex/OpenAI route migration follow-ups tested and merged as one coordinated cluster.

Next step before merge

  • No ClawSweeper repair is needed; maintainers should review the session-state mapping and coordinate this forward guard with the companion migration PRs.

Security
Cleared: The diff only changes session normalization and tests; it does not touch dependencies, workflows, package metadata, secrets, or code-execution infrastructure.

Review details

Best possible solution:

Land the focused write-boundary guard with the companion migration work so existing stale state is repaired and future session writes stay on canonical openai/gpt-* routes.

Do we have a high-confidence way to reproduce the issue?

Yes: source inspection shows current main can persist runtime provider/model metadata without legacy Codex canonicalization, and the PR discussion supplies live Windows/Telegram failure-shape proof.

Is this the best way to solve the issue?

Yes: normalizing at the shared session helper and merge/write boundary is the durable fix; status-only display normalization or doctor-only repair would still allow runtime writes to recreate the legacy route.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 4dd00347fc95.

Label changes

Label changes:

  • add proof: sufficient: Contributor real behavior proof is sufficient. The PR body and comments provide after-fix terminal/live output for the store write boundary plus a real Windows/Telegram route repair and smoke result, with private details redacted.

Label justifications:

  • P2: This is a normal-priority Codex/OpenAI session-route bug fix with real but bounded user impact.
  • merge-risk: 🚨 session-state: The diff changes normalization of persisted session modelProvider and model, which affects durable session route state across upgrades and repairs.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body and comments provide after-fix terminal/live output for the store write boundary plus a real Windows/Telegram route repair and smoke result, with private details redacted.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body and comments provide after-fix terminal/live output for the store write boundary plus a real Windows/Telegram route repair and smoke result, with private details redacted.
  • mantis: telegram-visible-proof: Mantis should capture Telegram visible proof. Although the code change is internal session persistence, the user-visible failure is a Telegram Codex direct-session route drift that a short Telegram smoke can demonstrate.
Evidence reviewed

PR surface:

Source +39, Tests +69. Total +108 across 2 files.

View PR surface stats
Area Files Added Removed Net
Source 1 46 7 +39
Tests 1 70 1 +69
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 2 116 8 +108

Acceptance criteria:

  • [P1] node scripts/run-vitest.mjs src/config/sessions/sessions.test.ts --testNamePattern="canonicalizes legacy Codex runtime fields at store write boundaries".
  • [P1] git diff --check.
  • [P1] Review PR CI for the runtime-config shard on head 03f051a.

What I checked:

  • Current main session helper lacks legacy Codex canonicalization: normalizeSessionRuntimeModelFields() trims and drops orphan provider/model values, while setSessionRuntimeModel() writes the trimmed provider and model directly; neither maps openai-codex to canonical openai. (src/config/sessions/types.ts:454, 4dd00347fc95)
  • Runtime usage persistence can write provider/model metadata: persistSessionUsageUpdate() builds patches from params.providerUsed and params.modelUsed, then writes them via updateSessionStoreEntry(), which is the forward-write path this PR guards. (src/auto-reply/reply/session-usage.ts:141, 4dd00347fc95)
  • Store writer goes through merge normalization: updateSessionStoreEntry() merges returned patches through mergeSessionEntry() before persistence, so merge-time normalization covers the focused store-entry write boundary. (src/config/sessions/store.ts:949, 4dd00347fc95)
  • Docs define canonical OpenAI route: OpenAI docs state openai is the canonical model-route prefix, legacy Codex refs are migrated by doctor, and subscription-backed native Codex setup should keep openai/gpt-5.5. Public docs: docs/providers/openai.md. (docs/providers/openai.md:45, 4dd00347fc95)
  • Doctor docs include persisted session route repair: Doctor docs say native Codex harness routing uses canonical openai/* refs and repairs persisted session modelProvider/model route state from legacy openai-codex/*. Public docs: docs/gateway/doctor.md. (docs/gateway/doctor.md:328, 4dd00347fc95)
  • Codex upstream treats provider id as configured runtime metadata: Codex config defaults model_provider_id to openai, and thread persistence stores config.model_provider_id, supporting review of provider identity as a real runtime/session contract. (../codex/codex-rs/core/src/config/mod.rs:610, ecae41274091)

Likely related people:

  • steipete: Recent GitHub history shows work on OpenAI provider identity, live model inference, session docs, and broad agent/session runtime refactors adjacent to this route-state behavior. (role: recent area contributor; confidence: high; commits: 4c33aaa86c16, 9ead0ae9219e, a70e618b20c3; files: extensions/codex/src/app-server/thread-lifecycle.ts, src/auto-reply/reply/session-usage.ts, docs/providers/openai.md)
  • Sunjae-k: Recent session-store history includes merge-boundary work in src/config/sessions/types.ts, which is the central boundary this PR extends. (role: recent session-store contributor; confidence: medium; commits: 201bf125; files: src/config/sessions/types.ts, src/config/sessions/store.ts)
  • obviyus: Recent history on session-usage.ts includes auto-reply session cleanup work near the runtime usage persistence path that feeds provider/model metadata into the store. (role: adjacent auto-reply/session contributor; confidence: medium; commits: 58de6f91dc36; files: src/auto-reply/reply/session-usage.ts)
  • ooiuuii: Beyond authoring this PR, the account has a recently merged session-state PR for bounded persisted tool results, making them relevant to nearby session persistence behavior. (role: adjacent session-state contributor; confidence: medium; commits: f49a3e4c266c, 4f54861333fb; files: src/agents/session-tool-result-guard.ts, src/agents/embedded-agent-runner/tool-result-truncation.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. mantis: telegram-visible-proof Mantis should capture Telegram visible proof. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. labels Jun 4, 2026
@clawsweeper clawsweeper Bot temporarily deployed to qa-live-shared June 4, 2026 12:29 Inactive
@ooiuuii

ooiuuii commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

Addressed the ClawSweeper runtime-write-path feedback in a9812bba9f.

What changed:

  • normalizeSessionRuntimeModelFields() now canonicalizes legacy Codex runtime pairs too, not just trim/orphan cleanup.
  • This means mergeSessionEntry(...), mergeSessionEntryWithPolicy(...), and store write normalization paths canonicalize:
    • modelProvider: "openai-codex", model: "openai-codex/gpt-5.5"modelProvider: "openai", model: "gpt-5.5"
    • modelProvider: "openai", model: "openai-codex/gpt-5.4"modelProvider: "openai", model: "gpt-5.4"
  • Added tests for direct merge boundaries and updateSessionStore(...) write boundaries, in addition to the existing setSessionRuntimeModel(...) coverage.

Focused proof from the patched branch:

entry.modelProvider/model      = openai / gpt-5.5
merged.modelProvider/model     = openai / gpt-5.5
mergedOpenAi.modelProvider/model = openai / gpt-5.4
PR3_RUNTIME_WRITE_BOUNDARY_OK

Also reran git diff --check in the same gate.

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

@ooiuuii

ooiuuii commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review after adding the runtime write-boundary fix/proof to the PR body.

@clawsweeper

clawsweeper Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

🦞👀
ClawSweeper picked this up.

Command router queued. I will update this comment with the next step.

Re-review progress:

@ooiuuii

ooiuuii commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

Follow-up 57b938c5d1 fixes the failing store-write-boundary test to use updateSessionStoreEntry(...), i.e. the merge/persist entry writer that actually exercises the session-store write path. Focused behavior script now shows modelProvider/model = openai / gpt-5.5 after updateSessionStoreEntry(...); git diff --check passed. Local targeted Vitest stalled with no output again, so I killed it and am letting CI rerun the shard.

@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. labels Jun 4, 2026
@ooiuuii ooiuuii force-pushed the agent/xiaozhua/prevent-codex-legacy-route-persistence branch from 57b938c to acc7051 Compare June 4, 2026 13:57
@ooiuuii

ooiuuii commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

Rebased onto latest origin/main as acc705142c to pick up current repository guard baselines. Focused behavior probe still passes after rebase: updateSessionStoreEntry(...) canonicalizes openai + openai-codex/gpt-5.5 to openai / gpt-5.5; git diff --check passed. @clawsweeper re-review

@clawsweeper

clawsweeper Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

🦞👀
ClawSweeper picked this up.

Command router queued. I will update this comment with the next step.

@ooiuuii ooiuuii force-pushed the agent/xiaozhua/prevent-codex-legacy-route-persistence branch from acc7051 to 60de233 Compare June 4, 2026 14:10
@ooiuuii

ooiuuii commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

Rebased again onto latest origin/main as 60de233422 after main advanced with unrelated guard-baseline fixes. Focused behavior probe still passes: updateSessionStoreEntry(...) canonicalizes openai + openai-codex/gpt-5.5 to openai / gpt-5.5; git diff --check passed. @clawsweeper re-review

@clawsweeper

clawsweeper Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

🦞👀
ClawSweeper picked this up.

Command router queued. I will update this comment with the next step.

@ooiuuii

ooiuuii commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

The remaining build-artifacts / check-additional-runtime-topology-architecture failures reproduce on latest origin/main and are unrelated to this PR diff: deprecated-JSDoc guard flags buildSystemPrompt and classifyCiaoUnhandledRejection, and the lint suppression baseline expects host-hook-runtime.ts count 2 while current code has 1. I opened a separate minimal base-fix PR #90363 instead of mixing those unrelated guard fixes into this session-state PR.

@ooiuuii

ooiuuii commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

Additional live Windows 6.1 proof for why this write boundary matters:

  • Environment: Windows/ROG Gateway on official OpenClaw 2026.6.1 (2e08f0f) with the intended route openai/gpt-5.5 + configured Codex runtime.
  • Live symptom observed in the Windows dashboard after a stale recovery path rewrote legacy runtime state: agent turns failed before reply with Unknown model: openai-codex/gpt-5.5.
  • Root cause shape matched this PR's guardrail: persisted/session-facing model state could still carry the legacy Codex provider ref even after the intended canonical route was openai / gpt-5.5.
  • Repair applied outside this PR for the live machine: normalize persisted provider/model fields to openai / gpt-5.5, keep agentRuntime.id = "codex", and move auth profile refs from legacy openai-codex:* to canonical openai:* without clearing OAuth tokens.
  • Live verification after repair:
    • Gateway /health returned live.
    • openclaw models list showed openai/gpt-5.5 as default/configured.
    • CLI smoke returned exactly SMOKE_OK_601_MODEL_ROUTE.
    • Real Telegram/direct session smoke returned exactly TELEGRAM_SESSION_SMOKE_OK.
    • Windows dashboard turn no longer failed on Unknown model.

This is why the PR intentionally covers the merge/persist writer path (updateSessionStoreEntry(...)) instead of only the direct setter: the production failure mode was a persisted legacy route reappearing at session/write boundaries.

Related maintainer/community signal I used to scope this PR: the current problem is not another broad migration rewrite; it is narrow prevention of legacy openai-codex/* route re-persistence while preserving the canonical OpenAI provider plus Codex runtime split. Keeping this PR focused mirrors the previously merged #87639 pattern: small runtime/session-state patch, clear proof, no unrelated guard fixes mixed into the branch.

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. and removed rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. labels Jun 4, 2026
@clawsweeper clawsweeper Bot added status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels Jun 4, 2026
@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. proof: sufficient ClawSweeper judged the real behavior proof convincing. labels Jun 4, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label Jun 4, 2026
@ooiuuii ooiuuii force-pushed the agent/xiaozhua/prevent-codex-legacy-route-persistence branch from 60de233 to 03f051a Compare June 5, 2026 01:09
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label Jun 5, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label Jun 5, 2026
@RomneyDa

RomneyDa commented Jun 5, 2026

Copy link
Copy Markdown
Member

@ooiuuii this fix is valid but currently we're avoiding most runtime shims like this (there are quite a few but we're hoping to move away from that) and only officially supporting updates through openclaw update which runs doctor. This is an ongoing conversation but will close this for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

mantis: telegram-visible-proof Mantis should capture Telegram visible proof. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P2 Normal backlog priority with limited blast radius. proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: S status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants