Skip to content

fix(controller): recover from interrupted cloud login browser flow#892

Merged
lefarcen merged 5 commits intomainfrom
fix/issue-865-cloud-login-stuck
Apr 8, 2026
Merged

fix(controller): recover from interrupted cloud login browser flow#892
lefarcen merged 5 commits intomainfrom
fix/issue-865-cloud-login-stuck

Conversation

@alchemistklk
Copy link
Copy Markdown
Contributor

What

Re-clicking Login with Nexu account after closing the authorization browser tab now starts a fresh login flow instead of getting stuck for 5 minutes.

Why

Closes #865. When a user started the Nexu cloud-account login and closed the authorization browser tab without completing it, the controller kept polling /api/auth/device-poll for up to 5 minutes and every subsequent click on Login was rejected with "Connection attempt already in progress". The frontend also deliberately short-circuited on that error when status.polling === true, so the UI sat frozen on the waiting spinner and users had a dead-end state with no recovery path.

How

Two small, targeted changes:

Controller (apps/controller/src/store/nexu-config-store.ts)

  • New private helper abortDesktopCloudPolling() that aborts pollingState.abortController and clears the reference.
  • connectDesktopCloud() no longer errors when a poll is already in flight. Instead it calls the helper, clears persisted desktop cloud state, and falls through to a fresh device-register + new poll. The "Already connected" guard is preserved.
  • The three existing inline abort sites (setDesktopCloudProfiles, switchDesktopCloudProfile, disconnectDesktopCloud) now use the shared helper.
  • pollDesktopCloudAuthorization() already handles signal.aborted at every await point and returns silently, so aborting an in-flight poll is race-free.

Web (apps/web/src/pages/welcome.tsx)

  • Removed the if (data?.error === "Connection attempt already in progress") { ... } dead-end block in handleAccountLogin. Any residual occurrence now falls through to the existing generic data?.error recovery branch that calls disconnect + reconnect.

Deliberately out of scope: the 100-attempt × 3s polling budget, adding an explicit "Cancel" button, any cloud-server-side TTL changes.

Affected areas

  • Desktop app (Electron shell)
  • Controller (backend / API)
  • Web dashboard (React UI)
  • Web dashboard (React UI)
  • OpenClaw runtime
  • Skills
  • Shared schemas / packages
  • Build / CI / Tooling

Checklist

  • pnpm typecheck passes (controller + web)
  • pnpm lint passes
  • pnpm test passes — pre-existing failures on origin/main in sessions-runtime.test.ts, openclaw-config-compiler.test.ts, openclaw-sync.test.ts, route-compat.test.ts, and one unrelated nexu-config-store.test.ts case (imports cloud profiles and switches active profile while clearing cloud auth). None of the failures touch the cloud-login flow — they reproduce on pristine origin/main without this patch.
  • pnpm generate-types run (if API routes/schemas changed) — n/a, no API routes or schemas changed
  • No credentials or tokens in code or logs
  • No any types introduced

Notes for reviewers

Manual repro to verify the fix:

  1. pnpm dev start
  2. Welcome → Login with Nexu account → external browser opens the authorization page → desktop UI enters waiting state.
  3. Close the browser tab without authorizing.
  4. Click Login with Nexu account again.
    • Expected (fixed): a new browser tab opens with a fresh device_id, desktop UI re-enters waiting state immediately, no error toast.
    • Before fix: nothing happens / "Connection attempt already in progress" persists for up to 5 minutes.
  5. Complete authorization in the new tab and verify navigation to /workspace.
  6. Tail pnpm dev logs controller to confirm two device-register POSTs and that the first poll loop exited via the aborted signal without writing state afterward.

Please also double-check: mid-poll Disconnect still works (now uses the shared abortDesktopCloudPolling helper instead of inline abort).

When the user closes the authorization browser tab without completing
device login, the controller's 5-minute poll kept running and every
subsequent click on "Login with Nexu account" was rejected with
"Connection attempt already in progress", leaving the UI stuck.

Treat a re-click while a poll is in flight as an explicit retry: abort
the in-flight poll, clear the persisted polling flag, and start a fresh
device registration. Remove the matching dead-end branch in the welcome
page so the UI no longer swallows the retry.

Fixes #865
@sentry
Copy link
Copy Markdown

sentry Bot commented Apr 7, 2026

Codecov Report

❌ Patch coverage is 5.71429% with 33 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
apps/controller/src/store/nexu-config-store.ts 5.71% 33 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2015f301cf

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread apps/controller/src/store/nexu-config-store.ts
lefarcen added 2 commits April 8, 2026 11:46
The polling reset block in connectDesktopCloud() was missing
\`userId: null\` while every other reset site (polling expired/timeout
branches, disconnect, switch profile) clears it. Without this the
persisted state could keep a stale userId after aborting an in-flight
device login, leaving UI metadata out of sync with the rest of the
cloud profile fields.
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 8, 2026

Deploying nexu-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: 96c8707
Status: ✅  Deploy successful!
Preview URL: https://a9518680.nexu-docs.pages.dev
Branch Preview URL: https://fix-issue-865-cloud-login-st.nexu-docs.pages.dev

View logs

lefarcen added 2 commits April 8, 2026 11:54
Address codex review on PR #892: when a stale poll's success/expired/
maxAttempts branch is processing in parallel with a fresh
connectDesktopCloud() call, the old loop's "this.pollingState = null +
setDesktopCloudState(...)" can clobber the new attempt's pollingState
and persisted credentials.

Identify the active poll by AbortSignal identity. Each final-state
write now no-ops if the loop has been aborted or replaced, so the new
device flow keeps full ownership of the polling state.
The "detects SKILL.md removal" test was using vitest's default 5000ms
timeout while internally calling waitUntil() which itself polls for up
to 5000ms. On macos-14 CI runners fsevents occasionally takes >1s to
deliver the unlink event, which is enough to make waitUntil() bump
into the test-level timeout before it can complete a successful poll.

Mirror the sibling "detects new SKILL.md" test which already declares
{ timeout: 10000 } for the same reason. This eliminates the recurring
flake on the macos-14 desktop-ci shard without changing watcher
semantics.
@lefarcen lefarcen merged commit d04434d into main Apr 8, 2026
11 checks passed
@lefarcen lefarcen mentioned this pull request Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Closing browser login page leaves account login stuck in 'in progress' state

3 participants