fix: self-heal lane wedges + restore openai-codex OAuth on embedded path by Totalsolutionsync · Pull Request #84752 · openclaw/openclaw

Totalsolutionsync · 2026-05-21T01:00:42Z

Summary

Four related fixes for failure modes that can take a self-hosted gateway's Telegram channel offline and require a manual restart to recover. All four were diagnosed and validated in production on a v2026.5.19 build. Two are diagnostics/lane-queue self-healing; two restore OAuth resolution for the openai-codex provider on embedded-agent paths after a regression between 2026.5.12 and 2026.5.19.

The four functional commits are independent and can be split if preferred, but they share one theme: make the gateway self-heal from transient infrastructure blips instead of wedging until a human restarts it. The final commit is review polish: it preserves the active-abort recovery flag for the newly recovery-eligible terminal embedded-run case and removes local emergency-patch wording from source comments.

1. `fix(diagnostic): pump lane on idle + recover from terminal-progress stalls`

Two issues in the per-lane command queue:

Lane pump on idle (src/logging/diagnostic.ts): logSessionStateChange() decrements queueDepth when a lane returns to idle but never re-triggers the dequeue. Normally drainLane() re-fires recursively, but in production we observed lanes that go idle with queueDepth > 0 and never dequeue, stranding queued messages. Fix: on idle transition with queueDepth > 0 and a known sessionKey, call resetCommandLane(resolveEmbeddedSessionLane(sessionKey)). It is a no-op when the lane queue is already empty.
Stall recovery for terminal active work (src/logging/diagnostic-session-attention.ts): classifySessionAttention() flagged queued_behind_terminal_active_work as recoveryEligible: false, so the existing recovery coordinator never fired and the lane wedged (recovery=none). Fix: mark it recoveryEligible: true so the existing recovery path runs.

The follow-up commit also keeps allowActiveAbort: true when the terminal embedded-run case has crossed the abort threshold; otherwise the recovery runtime can still skip because it sees an active embedded run.

2 & 3. `fix(auth): ... legacy OAuth sidecars in secrets-runtime store load` (+ follow-up for all entry points)

Regression: after upgrading 2026.5.12 -> 2026.5.19, embedded agent turns (channel replies, cron-isolated runs) fail with No API key found for provider "openai-codex" even though the OAuth profile is valid and direct CLI inference works with the same profile.

Root cause: PR #82777 removed sidecar runtime support; follow-up #83312 reintroduced it via a helper used by the OAuth manager's refresh path. The parallel secrets-runtime store-load helpers were still defaulting resolveLegacyOAuthSidecars: false, so OAuth profiles whose credential material lives in the legacy sidecar layout (oauthRef.source: "openclaw-credentials", hash-named files under <state-dir>/credentials/auth-profiles/<id>.json) were loaded without access/refresh tokens.

Fix (src/agents/auth-profiles/store.ts): default resolveLegacyOAuthSidecars to true in:

loadAuthProfileStoreForSecretsRuntime
loadAuthProfileStoreWithoutExternalProfiles
ensureAuthProfileStoreWithoutExternalProfiles

These helpers are read-only and do not mutate persisted state; they only include credential material that is already on disk.

4. `fix(telegram): restart isolated polling cycle when bot loses initialized state`

After a network drop mid-cycle, the grammy bot can end up Bot not initialized! without the polling session noticing. Every subsequent spooled-update handler can then fail with the same error in a tight retry loop, recoverable only by an external gateway restart.

Fix (extensions/telegram/src/polling-session.ts): the spool-failure path detects the Bot not initialized error and asks the active isolated ingress cycle to abort via a one-shot callback. The existing try/finally tears the cycle down, and the outer runUntilAbort loop creates a fresh bot and re-runs bot.init().

Real behavior proof

Behavior or issue addressed: A self-hosted OpenClaw gateway could wedge after transient infrastructure failures: embedded OpenAI Codex OAuth profiles failed to resolve legacy sidecar tokens in cron/Telegram embedded paths, Telegram polling could loop on Bot not initialized! after a network drop, and queued lane work could remain stuck behind terminal embedded progress instead of recovering.
Real environment tested: Real self-hosted OpenClaw gateway running OpenClaw 2026.5.19 from the patched fork build 9d27317 on a user systemd gateway service, using Telegram direct messages plus isolated cron agent turns with provider openai-codex / model openai/gpt-5.5. Local paths, PIDs, account IDs, and user identifiers are redacted.
Exact steps or command run after this patch: Activated the patched fork build on the real gateway, confirmed the gateway process was active, sent Telegram messages through the live bot, forced/observed an isolated AgentOS task-board sweep cron using openai/gpt-5.5, then checked the current gateway PID logs for OAuth and bot-init recurrence.

Evidence after fix: Redacted terminal output from the live gateway after activation:

$ openclaw --version
OpenClaw 2026.5.19 (9d27317)

$ readlink <deploy>/current
releases/fork-9d273178-20260521T005128Z

$ systemctl --user show openclaw-gateway.service -p ActiveState -p SubState -p MainPID -p NRestarts
ActiveState=active
SubState=running
MainPID=<pid>
NRestarts=0

$ journalctl --user -u openclaw-gateway.service _PID=<pid> --no-pager | grep -c 'No API key found for provider'
0

Redacted live cron result from the same gateway:

job: AgentOS task board auto-advance sweep
sessionTarget: isolated
payload.model: openai/gpt-5.5
lastRunStatus: ok
lastDeliveryStatus: delivered
provider: openai-codex

Observed result after fix: The gateway kept responding to direct Telegram messages, the isolated cron embedded-agent run completed and delivered with openai-codex, current-PID logs showed zero No API key found for provider occurrences after activation, and no Bot not initialized! retry storm recurred on the patched runtime.
What was not tested: No browser UI flow was exercised. The evidence is from the real gateway runtime, Telegram path, isolated cron path, service state, and current-PID runtime logs; targeted automated tests are listed separately below.

Testing

Local targeted checks run on this branch:

$ git diff --check
# passed

$ node scripts/test-projects.mjs src/logging/diagnostic-session-attention.test.ts src/logging/diagnostic.test.ts
# 54 tests passed across unit-fast + logging shards

$ node scripts/test-projects.mjs src/agents/auth-profiles/profiles.test.ts src/agents/auth-profiles.store-cache.test.ts src/commands/doctor-auth-oauth-sidecar.test.ts
# 23 tests passed across commands + agents shards

$ node scripts/test-projects.mjs extensions/telegram/src/polling-session.test.ts
# 42 tests passed

$ pnpm check:import-cycles
# Import cycle check: 0 runtime value cycle(s).

Notes

Cherry-picked clean onto current main with no conflicts.
No deploy-specific scripts, secrets, local paths, or operational state are included in the diff.
Happy to split into separate PRs (diagnostics vs auth vs telegram) if that is easier to review.

…talls Two related fixes for a recurring failure mode where the gateway's per-lane queue gets stuck with items waiting but no agent picks them up — observed in production today as Ghost going silent on Telegram for ~10 min at a time. Layer 1 (lane pump on idle, src/logging/diagnostic.ts): `logSessionStateChange()` decrements `queueDepth` when the lane returns to idle but does not re-trigger the lane's dequeue. In normal operation `drainLane()` re-fires recursively after each task completes, so a fresh pump is not needed. In production we have seen lanes go `idle` with `queueDepth > 0` (typically after an embedded_run ends with terminal progress) and never dequeue, leaving queued user messages stranded. Fix: on idle transition with `queueDepth > 0` and a known sessionKey, call `resetCommandLane(resolveEmbeddedSessionLane(sessionKey))`. This bumps the lane generation, clears any stale `activeTaskIds`, and re-invokes `drainLane`. It is a no-op when the lane queue is already empty, so it is safe as a belt-and-suspenders pump. Layer 2 (stalled-session recovery for terminal active work, src/logging/diagnostic-session-attention.ts): `classifySessionAttention()` flags the `queued_behind_terminal_active_work` case (active embedded_run that emitted a terminal progress signal such as `rawResponseItem/completed` while `queueDepth > 0`) as `recoveryEligible: false`, so the existing recovery coordinator (`requestStuckSessionRecovery` at diagnostic.ts:1137) never fires — the detector logged `recovery=none` and the lane wedged forever. Fix: mark this case `recoveryEligible: true`. The terminal progress signal indicates the active turn is effectively done, so the recovery coordinator's existing `release_lane` path is the right action — it releases the lane without aborting any healthy in-flight work. Widened the `session.stalled` discriminant's `recoveryEligible` type from `false` to `boolean` to allow future per-case overrides. Test update: `diagnostic-session-attention.test.ts` case "queued behind terminal embedded progress" updated to expect `recoveryEligible: true` — pinning the new (correct) classification. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The embedded agent runner — Telegram replies, cron invocations, any sub-agent dispatch — calls into the secrets runtime to resolve provider auth. That path goes through `loadAuthProfileStoreForSecretsRuntime`, which hardcoded `resolveLegacyOAuthSidecars: false`. As a result, OAuth profiles whose credential material lives in the legacy sidecar layout (`oauthRef.source: "openclaw-credentials"`, hash-named files under `<state>/credentials/auth-profiles/<id>.json`) were loaded without their access/refresh tokens, and `resolveApiKeyForProfile()` fell through to the "No API key found for provider" error. The OAuth-manager-internal helper added in #83312 already sets this to `true`, but the secrets-runtime path is a parallel entry point: when the embedded agent resolves provider auth for a model turn, it loads the store through this helper, *before* the OAuth manager's own reload would have a chance to compensate. Direct CLI inference is unaffected because it routes through a different store-load path that still sees the material. Repro (against v2026.5.19 stock): 1. Have an `openai-codex:default` profile with type=oauth and `oauthRef.source = "openclaw-credentials"` (typical for users who onboarded before the sidecar runtime was removed in #82777). 2. Send a Telegram message to the bot, or wait for any cron with an embedded payload to fire. 3. Gateway logs: [diagnostic] lane task error: ... error="Error: No API key found for provider \"openai-codex\". Auth store: .../auth-profiles.json ... Configure auth for this agent (openclaw agents add <id>) or copy only portable static auth profiles from the main agentDir." 4. Meanwhile, `openclaw infer model run --model openai/gpt-5.5 --prompt "say OK"` returns a normal completion using the same OAuth profile. Fix: flip the hardcoded default in `loadAuthProfileStoreForSecretsRuntime` from `false` to `true`, matching the OAuth-manager helper's choice. Sidecar resolution is read-only and already gated by per-process feature gates downstream, so this is safe to enable unconditionally for the secrets-runtime load. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…zed state After a network drop mid-cycle, the grammy bot can end up in a "not initialized" state without the polling session noticing — every subsequent spooled-update handler fails with `Bot not initialized!` in a tight retry loop (observed at ~500ms cadence), and the only escape is an external gateway restart. Observed today 2026-05-20 on `The-Ghosts-Shell`: bot init succeeded at 17:14:14, WiFi dropped at 17:16:34 (`ath10k_pci DEAUTH_LEAVING by local choice`), WiFi reconnected at 17:16:49, and by 17:24:14 the bot was firing `Bot not initialized!` on every retry. The OUTER `runUntilAbort` loop never got a chance to recreate the bot + re-run `bot.init()` because nothing inside the cycle signaled "exit/continue" — the spool worker just kept retrying the same dead update forever. Fix: 1. Add a one-shot `#requestCycleRestartOnBotReinitNeeded` callback on `TelegramPollingSession`. The active `#runIsolatedIngressCycle` populates it on entry with a closure that sets the local `restartRequested = true` and calls `worker.stop()`. Cleared in the existing `finally` cleanup so a future cycle doesn't see a stale handle. 2. In `#releaseFailedSpooledUpdate`, after logging the "keeping for retry" line, detect the substring "Bot not initialized" in the formatted error message. If present, invoke the registered restart callback to ask the cycle to tear itself down cleanly. The cycle's existing try/finally cleanup (worker.stop, drainOnce, stopBot, unsubscribe, abort-listener removal) already does the right teardown when `restartRequested` is true — this commit only adds the detection + signaling. The outer `runUntilAbort` loop then creates a fresh `TelegramBot` instance via `#createPollingBot()` and re-runs `bot.init()` against the (now-stable) network on the next iteration. The substring check is intentionally conservative — grammy throws this specific message string from its `BotInfoCacheBase` when `botInfo` is undefined, and we don't want to false-positive on unrelated errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…y points L4 patched only `loadAuthProfileStoreForSecretsRuntime`, but `ensureAuthProfileStoreWithoutExternalProfiles` (used by the embedded runner via `pi-embedded-runner/run.ts:59` and by `model-provider-auth.ts`) and `loadAuthProfileStoreWithoutExternalProfiles` (used by `model-auth-label`, `pi-auth-discovery`, the models list command, the OAuth manager) are parallel entry points that ALSO needed the same flag flip. Without this follow-up, cron-isolated lanes (`lane=cron-nested`, `lane=session:agent:main:cron:...:run:...`) keep hitting the legacy "No API key found for provider \"openai-codex\"" error path even though direct user-Telegram lanes resolve fine. Observed live 2026-05-20 17:45 PDT on the L4-patched v2026.5.19 build: direct Telegram replies worked, but the 15-minute AgentOS task-board sweep cron (`9584014c`) fired at 17:45:24 and surfaced the same `FailoverError: No API key found for provider "openai-codex"` to the delivery channel. Fix: 1. `loadAuthProfileStoreWithoutExternalProfiles`: default `resolveLegacyOAuthSidecars` from `false` to `true` — matches L4's reasoning for the secrets-runtime helper. 2. `ensureAuthProfileStoreWithoutExternalProfiles`: accept the `resolveLegacyOAuthSidecars` option (was unsupported, hardcoded `false` downstream), default to `true`, and forward it through `resolveRuntimeAuthProfileStore` and both `loadAuthProfileStoreForAgent` call sites (requested agentDir + main fallback merge). These functions are read-only and do not mutate persisted state, so flipping the default is safe — they just include the credential material that's been on disk all along. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

clawsweeper · 2026-05-21T01:02:22Z

Codex review: needs maintainer review before merge.

Workflow note: Future ClawSweeper reviews update this same comment in place.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

Summary
The PR adds Telegram isolated-polling restart recovery, read-only legacy OAuth sidecar resolution for embedded auth store loads, and diagnostic lane recovery for queued terminal embedded runs.

Reproducibility: yes. Source inspection shows current main still disables legacy sidecar resolution on the embedded auth-store paths, and the grammY dependency source confirms the exact not-initialized error the Telegram recovery handles; the PR body also supplies redacted live gateway proof.

PR rating
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Summary: The proof is strong and the patch is focused, with remaining merge readiness mostly tied to draft/required-check workflow rather than a code defect.

Rank-up moves:

none

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

Real behavior proof
Sufficient (terminal): The PR body includes redacted terminal/live gateway proof showing a patched self-hosted gateway running, OpenAI Codex cron delivery succeeding, and no current-PID API-key failures after activation.

Risk before merge

The PR is still marked draft, so it should not land until the author intentionally moves it to maintainer review and the normal required checks for the exact head are satisfied.

Maintainer options:

Decide the mitigation before merge
Keep this focused PR open for maintainer review and land it after the draft and required-check gates are satisfied.
Pause or close
Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge
No ClawSweeper repair lane is needed because the review found no concrete patch defect; remaining action is normal maintainer review on this PR.

Security
Cleared: The diff adds no dependencies, workflow changes, or secret printing; the auth change reads already-persisted legacy OAuth sidecar material through existing store parsing.

Review details

Best possible solution:

Keep this focused PR open for maintainer review and land it after the draft and required-check gates are satisfied.

Do we have a high-confidence way to reproduce the issue?

Yes. Source inspection shows current main still disables legacy sidecar resolution on the embedded auth-store paths, and the grammY dependency source confirms the exact not-initialized error the Telegram recovery handles; the PR body also supplies redacted live gateway proof.

Is this the best way to solve the issue?

Yes. The patch stays within existing recovery and auth-store seams, restores read-only compatibility for an already-supported legacy credential path, and avoids adding new config or product policy.

Label changes:

add P1: The PR addresses a reported latest-release gateway/channel outage pattern involving OpenAI Codex OAuth resolution and Telegram polling recovery.
add proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes redacted terminal/live gateway proof showing a patched self-hosted gateway running, OpenAI Codex cron delivery succeeding, and no current-PID API-key failures after activation.
add rating: 🐚 platinum hermit: Current PR rating is 🐚 platinum hermit because proof is 🦞 diamond lobster, patch quality is 🐚 platinum hermit, and The proof is strong and the patch is focused, with remaining merge readiness mostly tied to draft/required-check workflow rather than a code defect.
add status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body includes redacted terminal/live gateway proof showing a patched self-hosted gateway running, OpenAI Codex cron delivery succeeding, and no current-PID API-key failures after activation.

Label justifications:

P1: The PR addresses a reported latest-release gateway/channel outage pattern involving OpenAI Codex OAuth resolution and Telegram polling recovery.
rating: 🐚 platinum hermit: Current PR rating is 🐚 platinum hermit because proof is 🦞 diamond lobster, patch quality is 🐚 platinum hermit, and The proof is strong and the patch is focused, with remaining merge readiness mostly tied to draft/required-check workflow rather than a code defect.
status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body includes redacted terminal/live gateway proof showing a patched self-hosted gateway running, OpenAI Codex cron delivery succeeding, and no current-PID API-key failures after activation.
proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes redacted terminal/live gateway proof showing a patched self-hosted gateway running, OpenAI Codex cron delivery succeeding, and no current-PID API-key failures after activation.

What I checked:

Current main auth gap: On current main, secrets-runtime and no-external auth-store helpers still pass or default resolveLegacyOAuthSidecars to false, which matches the reported embedded OpenAI Codex OAuth failure path. (src/agents/auth-profiles/store.ts:589, e964987cd20e)
PR auth fix: The PR changes the secrets-runtime/no-external helpers to resolve legacy OAuth sidecars by default while keeping the loads read-only. (src/agents/auth-profiles/store.ts:589, 68542eb63b94)
Telegram dependency contract: grammY 1.43.0 throws the exact Bot not initialized! error from handleUpdate when bot info is absent, so the PR's restart trigger is anchored to the upstream runtime error text.
PR Telegram recovery path: The PR detects the grammY not-initialized error after requeueing a spooled update, requests a one-shot isolated polling cycle restart, and clears the callback in the cycle cleanup. (extensions/telegram/src/polling-session.ts:473, 68542eb63b94)
Diagnostic recovery fix: The PR marks queued_behind_terminal_active_work as recovery-eligible and preserves allowActiveAbort when the abort threshold is crossed, matching the existing stuck-session recovery contract. (src/logging/diagnostic-session-attention.ts:46, 68542eb63b94)
Merge-diff sanity: Git's synthetic merge of the PR head into current main changes only the five expected files and has no whitespace errors. (59917efdf995)

Likely related people:

joshavant: Recent merged PRs changed legacy Codex OAuth sidecar runtime support and Telegram isolated spool recovery on the same central surfaces. (role: recent auth compatibility and Telegram recovery contributor; confidence: high; commits: 8d3027dffa7d, 06f4c97130c6, b7735f88fa27; files: src/agents/auth-profiles/store.ts, extensions/telegram/src/polling-session.ts)
steipete: Recent history shows work on secrets startup, gateway diagnostics, and liveness recovery adjacent to the PR's auth and diagnostic paths. (role: recent adjacent auth/diagnostics contributor; confidence: medium; commits: 0177a4b6c9cc, 669786595d64, e30be460e1a3; files: src/agents/auth-profiles/store.ts, src/logging/diagnostic.ts)
obviyus: Recent Telegram polling history includes spool claim and recovery refactors in the same file the PR modifies. (role: recent Telegram spool contributor; confidence: medium; commits: 89a3b9a07edc, 494517a99054; files: extensions/telegram/src/polling-session.ts)
Eva (agent): Recent commits touched release-stability and diagnostic recovery logic adjacent to the session attention/recovery code path. (role: recent diagnostics contributor; confidence: medium; commits: 08ecc518ecb1, 54f87184f0c5, a792068d9da1; files: src/logging/diagnostic.ts, src/logging/diagnostic-session-attention.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against e964987cd20e.

clawsweeper · 2026-05-21T01:22:52Z

ClawSweeper PR egg

✨ Hatched: 🥚 common Sunspot Shellbean

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

Merged PRs are hatchable.
Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 🥚 common.
Trait: hums during re-review.
Image traits: location workflow harbor; accessory rollback rope; palette pearl, teal, and neon green; mood calm; pose standing beside its cracked shell; shell starlit enamel shell; lighting tiny status-light glow; background tiny artifact crates.
Share on X: post this hatch
Copy: My PR egg hatched a 🥚 common Sunspot Shellbean in ClawSweeper.

What is this egg doing here?

Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

RomneyDa · 2026-05-21T16:57:36Z

@Totalsolutionsync yes if you could split 4624e34 and 85f36e8 into a separate PR that would be great!

RomneyDa · 2026-05-21T19:14:43Z

@Totalsolutionsync I've flagged those 2 commits as super high priority. If you're able to make a new PR in next few minutes that would be great. Otherwise I will cherry pick them into a PR and get this merged for release and give you credit.

…me loaders The auto-migration introduced in #83312 only fires when a credential is loaded via a path that reads its sidecar tokens. The OAuth refresh manager's internal loader does (so direct CLI inference works and self-heals on first refresh). The embedded runner's secrets-runtime loaders did not: - loadAuthProfileStoreForSecretsRuntime - loadAuthProfileStoreWithoutExternalProfiles - ensureAuthProfileStoreWithoutExternalProfiles All three opted out of sidecar resolution. So for an upgraded user with a legacy oauthRef-backed openai-codex profile, the credential loaded with no access/refresh material, evaluateStoredCredentialEligibility marked it ineligible, resolveAuthProfileOrder filtered it out, and resolveApiKeyForProvider threw "No API key found for provider 'openai-codex'" before the OAuth manager (and its migration path) was ever consulted. CLI worked, Telegram/cron/embedded turns broke — only doctor-or-bust would fix it. Flip the three embedded loaders to default resolveLegacyOAuthSidecars to true (matching loadStoredOAuthRefreshStore). The existing #83312 refresh-and-rewrite then fires on the first embedded turn for these users and persists tokens inline, removing the legacy sidecar from disk on the next doctor pass. Cherry-picked and squashed from PR #84752 (commits 85f36e8 and 4624e34). Comments noting local-fork bookkeeping stripped per repo policy. Co-authored-by: Will <totalsolutionspm@gmail.com>

…me loaders (#85074) The auto-migration introduced in #83312 only fires when a credential is loaded via a path that reads its sidecar tokens. The OAuth refresh manager's internal loader does (so direct CLI inference works and self-heals on first refresh). The embedded runner's secrets-runtime loaders did not: - loadAuthProfileStoreForSecretsRuntime - loadAuthProfileStoreWithoutExternalProfiles - ensureAuthProfileStoreWithoutExternalProfiles All three opted out of sidecar resolution. So for an upgraded user with a legacy oauthRef-backed openai-codex profile, the credential loaded with no access/refresh material, evaluateStoredCredentialEligibility marked it ineligible, resolveAuthProfileOrder filtered it out, and resolveApiKeyForProvider threw "No API key found for provider 'openai-codex'" before the OAuth manager (and its migration path) was ever consulted. CLI worked, Telegram/cron/embedded turns broke — only doctor-or-bust would fix it. Flip the three embedded loaders to default resolveLegacyOAuthSidecars to true (matching loadStoredOAuthRefreshStore). The existing #83312 refresh-and-rewrite then fires on the first embedded turn for these users and persists tokens inline, removing the legacy sidecar from disk on the next doctor pass. Cherry-picked and squashed from PR #84752 (commits 85f36e8 and 4624e34). Comments noting local-fork bookkeeping stripped per repo policy. Co-authored-by: Will <totalsolutionspm@gmail.com>

…me loaders (openclaw#85074) The auto-migration introduced in openclaw#83312 only fires when a credential is loaded via a path that reads its sidecar tokens. The OAuth refresh manager's internal loader does (so direct CLI inference works and self-heals on first refresh). The embedded runner's secrets-runtime loaders did not: - loadAuthProfileStoreForSecretsRuntime - loadAuthProfileStoreWithoutExternalProfiles - ensureAuthProfileStoreWithoutExternalProfiles All three opted out of sidecar resolution. So for an upgraded user with a legacy oauthRef-backed openai-codex profile, the credential loaded with no access/refresh material, evaluateStoredCredentialEligibility marked it ineligible, resolveAuthProfileOrder filtered it out, and resolveApiKeyForProvider threw "No API key found for provider 'openai-codex'" before the OAuth manager (and its migration path) was ever consulted. CLI worked, Telegram/cron/embedded turns broke — only doctor-or-bust would fix it. Flip the three embedded loaders to default resolveLegacyOAuthSidecars to true (matching loadStoredOAuthRefreshStore). The existing openclaw#83312 refresh-and-rewrite then fires on the first embedded turn for these users and persists tokens inline, removing the legacy sidecar from disk on the next doctor pass. Cherry-picked and squashed from PR openclaw#84752 (commits 85f36e8 and 4624e34). Comments noting local-fork bookkeeping stripped per repo policy. Co-authored-by: Will <totalsolutionspm@gmail.com>

Totalsolutionsync and others added 4 commits May 20, 2026 18:00

openclaw-barnacle Bot added channel: telegram Channel integration: telegram agents Agent runtime and tooling size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 21, 2026

fix: preserve active abort recovery on terminal stalls

68542eb

openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 21, 2026

Han-HanqingDong mentioned this pull request May 21, 2026

[Bug]: v2026.5.19 Codex unusable on headless VPS: openai-codex auth binding failure and codex provider Cloudflare 403 #84893

Closed

RomneyDa mentioned this pull request May 21, 2026

fix(auth): skip OAuth refresh adapter when credential has no refresh token #85028

Merged

3 tasks

RomneyDa mentioned this pull request May 21, 2026

fix(auth): load legacy Codex OAuth sidecars in embedded secrets-runtime loaders #85074

Merged

RomneyDa mentioned this pull request May 21, 2026

Legacy Codex OAuth sidecars stored only in macOS Keychain still require doctor for embedded runtime path #85083

Closed

RomneyDa mentioned this pull request May 22, 2026

fix(auth): auto-migrate keychain-only legacy Codex OAuth profiles on first interactive CLI run #85163

Closed

RomneyDa mentioned this pull request May 24, 2026

fix(auth): point Keychain-only legacy Codex OAuth users at doctor instead of auto-prompting #86220

Merged

Totalsolutionsync closed this by deleting the head repository May 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: self-heal lane wedges + restore openai-codex OAuth on embedded path#84752

fix: self-heal lane wedges + restore openai-codex OAuth on embedded path#84752
Totalsolutionsync wants to merge 5 commits into
openclaw:mainfrom
Totalsolutionsync:pr/embedded-oauth-lane-resilience

Totalsolutionsync commented May 21, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented May 21, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented May 21, 2026

Uh oh!

RomneyDa commented May 21, 2026

Uh oh!

RomneyDa commented May 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Totalsolutionsync commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. fix(diagnostic): pump lane on idle + recover from terminal-progress stalls

2 & 3. fix(auth): ... legacy OAuth sidecars in secrets-runtime store load (+ follow-up for all entry points)

4. fix(telegram): restart isolated polling cycle when bot loses initialized state

Real behavior proof

Testing

Notes

Uh oh!

clawsweeper Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clawsweeper Bot commented May 21, 2026

Hatch command

Uh oh!

RomneyDa commented May 21, 2026

Uh oh!

RomneyDa commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Totalsolutionsync commented May 21, 2026 •

edited

Loading

1. `fix(diagnostic): pump lane on idle + recover from terminal-progress stalls`

2 & 3. `fix(auth): ... legacy OAuth sidecars in secrets-runtime store load` (+ follow-up for all entry points)

4. `fix(telegram): restart isolated polling cycle when bot loses initialized state`

clawsweeper Bot commented May 21, 2026 •

edited

Loading

RomneyDa commented May 21, 2026 •

edited

Loading