fix(auth): auto-migrate keychain-only legacy Codex OAuth profiles on first interactive CLI run#85163
fix(auth): auto-migrate keychain-only legacy Codex OAuth profiles on first interactive CLI run#85163RomneyDa wants to merge 14 commits into
Conversation
|
Codex review: found issues before merge. Reviewed May 24, 2026, 7:05 PM ET / 23:05 UTC. Summary PR surface: Source +176, Tests +508, Docs +1. Total +685 across 10 files. Reproducibility: yes. Source inspection shows that on macOS with an interactive TTY and a migratable legacy sidecar, an ordinary command such as Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Next step before merge Security Review findings
Review detailsBest possible solution: Prefer the doctor-owned path: keep migration in Do we have a high-confidence way to reproduce the issue? Yes. Source inspection shows that on macOS with an interactive TTY and a migratable legacy sidecar, an ordinary command such as Is this the best way to solve the issue? No. The branch solves a real auth migration gap, but moving legacy repair into startup conflicts with current repo policy and the author's later manual-testing conclusion; the doctor-only warning path is the safer maintainable direction unless maintainers explicitly approve the UX change. Full review comments:
Overall correctness: patch is incorrect Codex review notes: model gpt-5.5, reasoning high; reviewed against 3c8d101f5a85. Label changesLabel justifications:
Evidence reviewedPR surface: Source +176, Tests +508, Docs +1. Total +685 across 10 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
ClawSweeper PR egg 🔥 Warming up: real-behavior proof passed; findings, security review, or rank-up moves are still in progress. Hatch commandComment Hatchability rules:
What is this egg doing here?
|
633f8c1 to
02aeeec
Compare
|
Addressed the P2 finding. Root cause: Fix: Gate the startup hook on the same active oauthRef-backed store detection doctor already runs. New Regression coverage: new test Verification Head: d592e96 @clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
…first interactive CLI run Closes the remaining bucket from #85083: macOS users whose legacy `oauthRef` sidecar seed lives only in the Keychain still had to know to run `openclaw doctor --fix` after upgrade, because the embedded-runtime self-heal added in #85074 intentionally runs with `allowKeychainPrompt: false`. This change adds a tiny pre-command trigger in `runCli` that, on darwin + TTY only, reuses the existing `maybeRepairLegacyOAuthSidecarProfiles` doctor flow when a sidecar dir is present. Fast bail-outs make the cost a single `readdirSync` for users without legacy sidecars. Declining writes a 24h cool-off marker in the state dir so we never pester the user every command. Headless paths (cron, systemd, Telegram polling, embedded sub-agent dispatch) still skip the prompt, but the legacy sidecar loader now emits a one-shot warn that names `openclaw doctor --fix` and macOS Keychain, instead of letting the credential silently fall through to a downstream "No API key found". Also drops the user-confusing `sidecar-backed` phrasing from the doctor prompt/notes/changes; existing internal identifier names are left intact.
…d sidecars exist Doctor intentionally retains unreferenced legacy Codex OAuth sidecar files because external agent directories outside the scan may still reference them. Before this change the startup auto-migration hook treated any sidecar JSON as trigger material, so accepting the prompt for an unreferenced-only state did zero work and re-prompted on every later interactive command. Gate the startup hook on the same active oauthRef-backed store detection doctor already runs, so: - Unreferenced-only sidecars no longer trigger a no-op prompt loop. - Doctor's existing retention policy (and explicit `openclaw doctor --fix` flow) stays the canonical place to surface unreferenced sidecar files. - A new regression test exercises the unreferenced-only case end-to-end.
…wConsole.warn cast CI/Linux failures on PR #85163: - `checks-node-agentic-cli`: 2 happy-path auto-migrate tests inherited `CI=true` from the GitHub Actions process env via `createSpawnEnv`, which short-circuited the macOS-interactive skip rules and left `confirm` uncalled. Clear `CI` and related non-interactive signals when building the test state env. - `check-test-types`: `warnSpy: ReturnType<typeof vi.fn>` widens to `Mock<Constructable | Procedure>`, which is not assignable to `rawConsole.warn: typeof console.warn`. Cast at the assignment site.
d592e96 to
781edbd
Compare
The 24h cool-off was too short — users who say "not now" are deliberately opting out, and re-prompting them every day was nag-y. Switch the decline marker to permanent: if the marker file exists, skip the prompt forever. `openclaw doctor --fix` remains the explicit retry path for anyone who changes their mind, and `OPENCLAW_AUTO_MIGRATE_LEGACY_OAUTH_SIDECAR=0` is still the env-level opt-out. Drops `COOL_OFF_TTL_MS`, the `now` plumbing through `shouldSkip`, and the now-dead `changes-driven` marker cleanup (after the migratable-store gate, an accepted run that produced no changes is unreachable). Renames the constant to `DECLINE_MARKER_FILENAME` and updates docs + changelog wording.
The hook must stay above the bare-root Crestodian and gateway fast paths because those branches return early and the embedded runtimes they start cannot prompt for the macOS Keychain on their own. The ordering invariant was easy to miss when restructuring run-main and caused a P2 finding during PR review; pin it with an inline note so future refactors don't regress the gateway/TUI launch paths.
…input `--force` (`openclaw agent prune --force`, `plugins uninstall --force`, `migrate apply --force`) and `--no-input` (`openclaw models scan --no-input`) are documented prompt-suppression flags, but the startup hook was still prompting on those invocations. Add both to the skip path. Consolidate the per-flag `hasFlag` calls into a single `NO_PROMPT_FLAGS` allowlist so the next flag is a one-line addition and the set is grep-able for code review. `hasJsonOutputFlag` stays separate because its parsing rules (`--json=...` value form) differ from `hasFlag`. Regression coverage: - `plugins uninstall pkg --force` - `agent prune old --force` - `models scan --no-input` - `status -- --force` and `status -- --no-input` (positive cases: terminator preserved, prompt still fires)
|
Addressed the P1. Root cause: two more documented no-prompt flags weren't in the skip set —
Fix ( Regression coverage:
Verification Head: @clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
@clawsweeper re-review |
|
🦞👀 Command router queued. I will update this comment with the next step. |
…edentials migrate Previously the decline marker was only written on explicit decline / Ctrl+C. If the user accepted the clack confirm but the migration produced no changes (macOS Keychain "Deny", wrong env seed, ciphertext corruption, etc.) we left no marker, and the next interactive command would re-show the confirm and re-trigger the macOS Keychain prompt — the only spam loop in the hook. Treat any "prompted, no changes happened" outcome as a back-off signal and write the marker. The user is not stuck — `openclaw doctor --fix` is the explicit retry path and `OPENCLAW_AUTO_MIGRATE_LEGACY_OAUTH_SIDECAR=0` silences the hook entirely. Doctor itself still emits the "Could not decrypt legacy OAuth sidecar … re-authenticate this profile" warning during the failed attempt, so the user knows what happened. Drops the now-unused `declined` flag.
Keep only comments whose absence would invite a regression: - `run-main.ts` ordering constraint (hook must precede the gateway / bare-root fast-path returns). - `auto-migrate` zero-changes-writes-marker rationale (the obvious "only on decline" simplification reintroduces the Keychain prompt spam loop). - Test ciphertext-corruption note (so a future refactor doesn't assume the corruption is a bug and "fix" the test). Removes self-evident narration on the `NO_PROMPT_FLAGS` list, test-only assertion explanations, and intermediate setup notes.
|
After some manual testing I feel this is not the right approach, it's an obtrusive UX and feels like an antipattern. Looking into an option where I just make the log more informative. |
…ing comment to all early-returning fast paths
… only consent surface for auto-migrate
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
Re. ClawSweeper P1 on The AGENTS.md rule (
OAuth sidecars aren't really "config" in the AGENTS.md sense — they're credentials, and the OAuth credential layer already has well-established runtime self-healing seams. The doctor-only alternative (#86220) is on the table if you'd rather take the conservative UX path, but the contract-violation framing here doesn't match what OAuth-credential code in this repo actually does today. |
|
Once again closing this because it feels pretty intrusive. Will prefer #86220 for now |
|
Closing in favor of #86220 (doctor-only with a one-shot headless warn). Leaving this comment as a retrospective on the concerns that drove the decision, in case the design comes up again later. What this PR tried to doClose the macOS Keychain-only bucket from #85083 by reusing the existing
Why neither shape shippedArchitecture tensionThe AGENTS.md rule
So the policy-violation framing was contestable. But ClawSweeper rated this 🦐 → 🧂 across the two shapes precisely because the pattern here (pop a Keychain dialog before an arbitrary command, mutate auth-profile credentials before the requested command runs) is louder than any of the existing precedents. Even with the AGENTS.md ambiguity resolved, the UX shape was the real blocker. Concrete release risksWhat killed it for me on a release timeline: Tier 1 — could spam users
Tier 2 — surprise, not spam
Tier 3 — design choice that compounds the above
What pushed me overThe author follow-up captures it: "manual testing made this approach feel obtrusive and like an antipattern." ClawSweeper called the same thing. Both shapes traded a real architectural cost (startup-time auth migration with a wide blast surface) for a marginal UX improvement (users don't have to know What's replacing it#86220 — doctor-only path: keeps the one-shot headless If a future maintainer wants the auto-heal UX back, the receipts are above: the precedent argument is real, the policy ambiguity is real, but the release-risk profile (cross-version Keychain re-prompt + marker-write spam loop + destructive-primary surprise dialogs + unguarded dynamic import) needs to be addressed before it's safe to ship. None of those are blockers individually; they're load-bearing collectively. |
Summary
Closes the remaining bucket from #85083: macOS users whose legacy
oauthRefsidecar seed lives only in the Keychain still had to know to runopenclaw doctor --fixafter upgrade, because the embedded-runtime self-heal that landed in #85074 intentionally runs withallowKeychainPrompt: false.The fix adds a tiny pre-command trigger in
runCli(src/cli/auto-migrate-legacy-oauth-sidecar.ts) that, on darwin + TTY only, silently reuses the existingmaybeRepairLegacyOAuthSidecarProfilesdoctor flow when a sidecar dir is present. There is no in-CLI confirm prompt — the macOS Keychain dialog is the only consent surface. On Allow: migration completes inline and the original command proceeds. On Deny or any decryption failure: a permanent decline marker (OPENCLAW_STATE_DIR/legacy-oauth-sidecar-migration-declined) is written so the next CLI run does not re-trigger the OS dialog;openclaw doctor --fixis the explicit retry path.OPENCLAW_AUTO_MIGRATE_LEGACY_OAUTH_SIDECAR=0is an explicit opt-out for power users.Headless paths (cron, systemd, Telegram polling, embedded sub-agent dispatch) still skip the auto-heal entirely — but the legacy sidecar loader now emits a one-shot
log.warnthat namesopenclaw doctor --fixand macOS Keychain, instead of letting the credential silently fall through to a downstreamNo API key found for provider "openai-codex".Also drops the user-confusing
sidecar-backedphrasing from the doctor prompt/notes/changes; existing internal identifier names are left intact.Approach
Approach B from the issue (auto-trigger doctor migration on first interactive
openclaw <anything>invocation), but silent: no clack confirm, no two-step "we're about to prompt you for Keychain access" UX. The macOS OS dialog itself is the explicit consent surface — clicking Allow once both authorizes the binary and triggers migration in the same step. Architecturally this leaves "when is it safe to prompt for Keychain access?" as one decision in one place (the embedded runtime path stays headless; the CLI path lets the OS prompt).Skip rules in order:
process.platform !== "darwin"→!isInteractiveTty(no Keychain dialog on cron/headless) →CI=true || OPENCLAW_NON_INTERACTIVE=1 || OPENCLAW_AUTO_MIGRATE_LEGACY_OAUTH_SIDECAR=0 || OPENCLAW_AUTH_STORE_READONLY=1→--help/--version→ primary isdoctor/update/help/completion/version→--json/--json=…(anywhere in argv before--, keeps clean output) → sidecar dir empty → fresh decline marker. The hook is dynamically imported only on darwin (gated at the call site inrunCli).Skip rules deliberately removed from earlier revisions: the auto-prompt design had a growing skip list for
--non-interactive/--yes/--no-input/--forceandchannels remove --delete— the entire treadmill was about avoiding a clack prompt that no longer exists. With the OS Keychain dialog as the only consent surface, those flag-based skips are unnecessary; the OS dialog appears at most once per machine (Allow → never again; Deny → marker → never again).Headless one-shot warning
Added in
legacy-oauth-sidecar.ts. Fires whenloadLegacyOAuthSidecarMaterialreturns null on darwin withallowKeychainPrompt: false(i.e., env/file seeds were tried and Keychain was blocked). Rate-limited to one warn per process so embedded turns log it once per gateway lifetime, not per attempt. Guarded by the same VITEST env signals as the keychain read so unrelated test runs don't surface it.Why this is safe vs. existing inline credentials
writeOAuthCredentials(provider-auth-helpers.ts:297) builds a fresh credential object with nooauthRef, andupsertAuthProfile(profiles.ts:139) doesstore.profiles[id] = credential— full replacement, not merge. So any user who re-signed in via normal flow has nooauthRefon their profile, and the migration scan (resolveLegacyOAuthSidecarStore,doctor-auth-oauth-sidecar.ts:117-126) only includes profiles wherevalue.oauthRefis a validLegacyOAuthRef. Migration is a no-op for them.persisted.ts:274-279checkshasInlineOAuthTokenMaterialfirst; if inline tokens are present, the legacy sidecar resolver is never called regardless of whatoauthRefsays.oauthRef" to "inline access/refresh/idToken." Same values, same refresh cycle. If the refresh token still works → user is healed. If it's already revoked → user re-signs in via normal flow; migration never makes auth worse.Regression tests
src/cli/auto-migrate-legacy-oauth-sidecar.test.ts— table-driven preflight skip checks for each rule (platform/TTY/env opt-outs/CI/readonly/no-sidecar/primary skips/--jsonvariants), plus the silent happy path (sidecar removed, inline tokens written, no marker), unreferenced-sidecar bail-out, and decryption-failure → marker → second-run-skips. Asserts behavior (file state), not internal prompter calls.src/agents/auth-profiles/legacy-oauth-sidecar.test.ts— exercises the headless warn path: emit once when blocked on darwin with no env/file seeds; never emit on non-darwin.Verification
node scripts/run-vitest.mjs src/cli/auto-migrate-legacy-oauth-sidecar.test.ts src/agents/auth-profiles/legacy-oauth-sidecar.test.ts src/commands/doctor-auth-oauth-sidecar.test.ts src/commands/doctor/repair-sequencing.test.ts— 53/53 green across 5 files.node scripts/run-oxlint.mjsclean on all touched files;oxfmt --checkclean.Scope notes
agents.add.test.ts/stale-oauth-profile-shadows.test.tstest descriptions that still containsidecar-backed; those are dev-facing and reference implementation, not UI copy.OPENCLAW_AUTO_MIGRATE_LEGACY_OAUTH_SIDECARto docs anywhere exceptdocs/gateway/doctor.mdaccordion 5 since it is an escape hatch, not a normal-user knob.maybeRepairLegacyOAuthSidecarProfilesitself beyond the wording rename — the migration semantics are intentionally identical between the doctor-driven and CLI-startup-driven paths. The CLI-startup caller passesemitNotes: falseand aconfirmAutoFix: async () => trueprompter so the doctor function runs silently.Out of scope
openclaw doctor --fixbehavior itself; the migration path it runs is already correct.Related