Skip to content

fix(auth): preserve valid session on refresh failure and cooldown repeat failures#2430

Closed
mandarini wants to merge 2 commits into
masterfrom
fix/auth-refresh-preserve-session
Closed

fix(auth): preserve valid session on refresh failure and cooldown repeat failures#2430
mandarini wants to merge 2 commits into
masterfrom
fix/auth-refresh-preserve-session

Conversation

@mandarini

@mandarini mandarini commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Description

Fixes the long-standing _callRefreshToken issue where a transient or non-retryable refresh failure would destroy a session whose access token was still valid, and the related symptom where a sustained outage would have the SDK hammer /token with the same dead refresh token until the access token actually expired or the user wiped local storage.

What changed?

  • Proactive vs reactive distinction in _callRefreshToken. When a non-retryable error comes back from /token (e.g. 400 invalid_grant) and the access token is still within its real expiry window, the SDK no longer calls _removeSession(). The session is preserved and remains usable until its actual expires_at. Only when the access token has truly expired does the SDK clear storage and emit SIGNED_OUT.
  • Serial-failure cooldown cache. Any refresh failure (retryable or not) is cached on the client for REFRESH_FAILURE_COOLDOWN_MS (60s, two auto-refresh ticks). Subsequent serial callers within that window, including the next auto-refresh tick, receive the cached failure synchronously instead of firing another /token call. Cleared on any successful refresh, on _removeSession, and on a TOKEN_REFRESHED / SIGNED_IN broadcast from another tab.
  • Wider transient classification in lib/fetch.ts. NETWORK_ERROR_CODES now includes 500, 501, and the Cloudflare-origin 525-529 codes. Previously these were misclassified as non-retryable, which on the old catch path triggered _removeSession().

Why was this change needed?

Two real-world failure modes converge on the same code path in GoTrueClient._callRefreshToken:

  1. Proactive refresh destroying still-valid sessions. __loadSession() triggers a refresh whenever the access token is within EXPIRY_MARGIN_MS (90s) of expiry. If that refresh failed with any non-retryable error (multi-tab rotation race, mobile-browser tab lifecycle, transient 400 from GoTrue), _removeSession() was called unconditionally, destroying a session whose access token still worked for up to 90 more seconds. The user was silently logged out with getSession() returning { session: null, error: null } and no recovery path short of a full reload and re-login.
  2. Refresh storm during outages. When the same /token call kept failing (DNS unreachable, persistent 4xx/5xx), every subsequent getSession() call in the 90s margin re-fired _callRefreshToken against the same broken refresh token. Reporters on the referenced issue documented hundreds to tens of thousands of /token requests per hour from a single client, all hitting the same failure.

The proactive/reactive distinction addresses (1). The cooldown cache addresses (2) by capping /token calls to one per 60s window during sustained failure. The widened NETWORK_ERROR_CODES ensures common outage status codes are classified as transient instead of dragging the session into the reactive-removal path.

Closes #2145

Screenshots/Examples

Before:

// access token still valid for 60s, refresh fails with 400 invalid_grant
await supabase.auth.getSession()
// returns { data: { session: null }, error: null }
// storage cleared, SIGNED_OUT emitted, user logged out

After:

// access token still valid for 60s, refresh fails with 400 invalid_grant
await supabase.auth.getSession()
// returns { data: { session: <existing valid session> }, error: null }
// storage preserved, no SIGNED_OUT emitted, access token still works
// next refresh attempt deferred by REFRESH_FAILURE_COOLDOWN_MS (60s)

Breaking changes

  • This PR contains no breaking changes

No public API changes. All behavior changes correct misbehavior in failure modes:

  • Sessions are preserved that previously would have been silently removed.
  • /token storm volume drops from "every 30s for the duration of an outage" to "at most one call per 60s".
  • Wider transient classification means fewer false sign-outs on outage-shaped HTTP responses.

The auto-refresh ticker cadence (AUTO_REFRESH_TICK_DURATION_MS) is unchanged. The commit-guard logic for mid-flight signOut races is unchanged. onAuthStateChange callbacks see fewer spurious SIGNED_OUT events, never extra ones.

Checklist

  • I have read the Contributing Guidelines
  • My PR title follows the conventional commit format: <type>(<scope>): <description>
  • I have run pnpm nx format to ensure consistent code formatting
  • I have added tests for new functionality (if applicable)
  • I have updated documentation (if applicable)

Additional notes

The fix follows the failure-cooldown shape suggested by @thomaslarsson on #2145 (the lastRefreshResult pattern). Concurrent dedupe via refreshingDeferred is preserved; the cooldown extends the dedupe contract to serial callers spaced across short failure windows.

Tests added under a new describe('Refresh-token lifecycle (proactive/reactive, cooldown)') block in GoTrueClient.test.ts, covering: proactive failure preserves session, reactive failure removes session, cooldown dedupes serial callers, successful refresh clears the cache, transient network failure caches without storage side effects, and _removeSession clears the cache.

@github-actions github-actions Bot added the auth-js Related to the auth-js library. label Jun 5, 2026
@pkg-pr-new

pkg-pr-new Bot commented Jun 5, 2026

Copy link
Copy Markdown

Open in StackBlitz

@supabase/auth-js

npm i https://pkg.pr.new/@supabase/auth-js@2430

@supabase/functions-js

npm i https://pkg.pr.new/@supabase/functions-js@2430

@supabase/postgrest-js

npm i https://pkg.pr.new/@supabase/postgrest-js@2430

@supabase/realtime-js

npm i https://pkg.pr.new/@supabase/realtime-js@2430

@supabase/storage-js

npm i https://pkg.pr.new/@supabase/storage-js@2430

@supabase/supabase-js

npm i https://pkg.pr.new/@supabase/supabase-js@2430

commit: b7ca8d7

@mandarini mandarini marked this pull request as ready for review June 9, 2026 15:39
@mandarini mandarini requested review from a team as code owners June 9, 2026 15:39
@mandarini mandarini self-assigned this Jun 10, 2026
@mandarini

Copy link
Copy Markdown
Contributor Author

close in favor of #2436

@mandarini mandarini closed this Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auth-js Related to the auth-js library.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: _callRefreshToken permanently deletes session on non-retryable refresh failure, even when access token is still valid

1 participant