fix(auth): preserve valid session on refresh failure and cooldown repeat failures by mandarini · Pull Request #2436 · supabase/supabase-js

mandarini · 2026-06-10T12:42:49Z

Description

Fixes the long-standing _callRefreshToken issue where a transient or non-retryable refresh failure destroyed a session whose access token was still valid, and the related symptom where a sustained outage had the SDK hammer /token with the same dead refresh token until the access token actually expired or the user wiped local storage.

This is the second attempt — see #2430 for the prior approach. Key difference: that PR preserved storage in _callRefreshToken but __loadSession still translated the refresh error into { session: null, error }, so getSession() callers stayed effectively logged out — the same failure mode that got #2146 rejected. This PR closes that gap end-to-end while keeping explicit refresh entry points (refreshSession, setSession) honest about failures.

What changed?

Five complementary changes, all in packages/core/auth-js/:

Proactive vs reactive in _callRefreshToken. On non-retryable error, re-read storage and skip _removeSession if the access token is still inside its real expiry window. Return shape unchanged — explicit callers still see the underlying error.
Caller-visible preservation in __loadSession only. When _callRefreshToken errors but the in-scope currentSession is still valid, hand the caller the preserved session instead of { session: null, error }. Guards against a concurrent signOut clearing storage during the refresh attempt by re-reading storage before returning. Scoped to __loadSession deliberately so refreshSession() / setSession() keep their honest error semantics.
Serial-failure cooldown cache. Any refresh failure is cached on the client for REFRESH_FAILURE_COOLDOWN_MS (60s, two auto-refresh ticks). Subsequent serial callers within that window receive the cached failure synchronously instead of firing another /token call. Cleared on any successful refresh, on _removeSession, and on a TOKEN_REFRESHED / SIGNED_IN broadcast from another tab.
Wider transient classification in lib/fetch.ts. NETWORK_ERROR_CODES now includes 500, 501, and the Cloudflare-origin 525-529 codes. Previously these were misclassified as non-retryable, which on the old catch path triggered _removeSession() during outages.
Strip the redundant _removeSession in _recoverAndRefresh. _callRefreshToken's catch is now the single source of truth for "session is dead enough to wipe." Stops the double-SIGNED_OUT during init that @nathanschram flagged on Bug: _callRefreshToken permanently deletes session on non-retryable refresh failure, even when access token is still valid #2145, and prevents the new proactive-preserve from being undone at init time.

Why was this change needed?

Two real-world failure modes converge on the same code path:

Proactive refresh destroying still-valid sessions. __loadSession() triggers a refresh whenever the access token is within EXPIRY_MARGIN_MS (90s) of expiry. If that refresh failed with any non-retryable error (multi-tab rotation race, mobile-browser tab lifecycle, transient 400 from GoTrue), _removeSession() was called unconditionally, destroying a session whose access token still worked for up to 90 more seconds. The user was silently logged out with getSession() returning { session: null, error: null } and no recovery path short of a full reload and re-login.
Refresh storm during outages. When the same /token call kept failing (DNS unreachable, persistent 4xx/5xx), every subsequent getSession() call in the 90s margin re-fired _callRefreshToken against the same broken refresh token. Reporters on Bug: _callRefreshToken permanently deletes session on non-retryable refresh failure, even when access token is still valid #2145 documented hundreds to tens of thousands of /token requests per hour from a single client, all hitting the same failure.

The proactive/reactive distinction in _callRefreshToken plus the __loadSession mirror address (1). The cooldown cache addresses (2) by capping /token calls to one per 60s window during sustained failure. The widened NETWORK_ERROR_CODES ensures common outage status codes are classified as transient instead of dragging the session into the reactive-removal path — the Reddit r/Supabase report of a real outage signing out entire mobile-app user bases was 500 + HTML body responses falling into the non-retryable branch.

Closes #2145

Screenshots/Examples

Before:

// access token still valid for 60s, refresh fails with 400 invalid_grant
await supabase.auth.getSession()
// returns { data: { session: null }, error: null }
// storage cleared, SIGNED_OUT emitted, user logged out

After:

// access token still valid for 60s, refresh fails with 400 invalid_grant
await supabase.auth.getSession()
// returns { data: { session: <existing valid session> }, error: null }
// storage preserved, no SIGNED_OUT emitted, access token still works
// next refresh attempt deferred by REFRESH_FAILURE_COOLDOWN_MS (60s)

refreshSession() and setSession() are unchanged — they still surface the refresh error to their callers so they don't lie about whether the token actually rotated.

Breaking changes

This PR contains no breaking changes

No public API changes, no exported type changes, no method signatures changed. Three observable behavior changes worth calling out:

Error class for 500/525-529 from auth. Previously AuthApiError, now AuthRetryableFetchError. Both extend AuthError, so catch (e) { if (e instanceof AuthError) ... } is unaffected. Only instanceof AuthApiError for those specific status codes would stop matching.
Fewer spurious SIGNED_OUT events. onAuthStateChange callbacks see strictly fewer SIGNED_OUT events — only when the session is genuinely dead, never extra. Init-time non-retryable refresh now fires SIGNED_OUT exactly once instead of twice.
getSession() returns the preserved session in proactive-preserve scenarios it previously returned null for. This is the headline bug fix.

The auto-refresh ticker cadence (AUTO_REFRESH_TICK_DURATION_MS) is unchanged. The commit-guard logic for mid-flight signOut races is unchanged. refreshSession() / setSession() semantics are unchanged.

Checklist

I have read the Contributing Guidelines
My PR title follows the conventional commit format: <type>(<scope>): <description>
I have run pnpm nx format to ensure consistent code formatting
I have added tests for new functionality (if applicable)
I have updated documentation (if applicable)

Additional notes

The cooldown shape follows @thomaslarsson's lastRefreshResult sketch on #2145. Concurrent dedupe via refreshingDeferred is preserved; the cooldown extends the dedupe contract to serial callers spaced across short failure windows.

Tests added under a new describe('Refresh-token lifecycle (proactive/reactive, cooldown)') block in GoTrueClient.test.ts, split into five sub-describes:

storage preservation — _callRefreshToken preserves on proactive failure, removes on reactive, preserves on retryable network failure regardless of expiry
caller-visible preservation in getSession — returns preserved session on proactive-preserve, null on reactive, null when storage cleared concurrently (race guard)
explicit-caller contract — refreshSession() and setSession() still surface the error on proactive-preserve scenarios
failure cooldown — 50 serial calls collapse to 1 /token, cleared on success, cleared in _removeSession, expires after REFRESH_FAILURE_COOLDOWN_MS (verified with fake timers)
init cleanup — _recoverAndRefresh emits SIGNED_OUT exactly once on non-retryable refresh failure

The BroadcastChannel cache-clear branch is not unit tested — globalThis.BroadcastChannel is undefined in the Jest node env and adding a stub for one small branch isn't worth the surface area. Inline comment in the test file documents this.

…eat failures

pkg-pr-new · 2026-06-10T12:45:45Z

Open in StackBlitz

@supabase/auth-js

npm i https://pkg.pr.new/@supabase/auth-js@2436

@supabase/functions-js

npm i https://pkg.pr.new/@supabase/functions-js@2436

@supabase/postgrest-js

npm i https://pkg.pr.new/@supabase/postgrest-js@2436

@supabase/realtime-js

npm i https://pkg.pr.new/@supabase/realtime-js@2436

@supabase/storage-js

npm i https://pkg.pr.new/@supabase/storage-js@2436

@supabase/supabase-js

npm i https://pkg.pr.new/@supabase/supabase-js@2436

commit: 05eccde

thomaslarsson · 2026-06-12T15:34:17Z

@mandarini Thank you for taking my report seriously and trying to fix the issues. I really appreciate it and want to try to provide som value back to the community. I installed 2.107.0-beta.1 at June 1st around midnight Europe/Oslo time. I added Operation email notifications every time one of our guards tripped after that. We email on auth guard trips (5 min aggregation window; counting individual emails ≈ counting events).

Some stats:

Guard	Email threads	Total emails	Share
`circuit_breaker_trip`	11	60	82%
`middleware_legacy_cleanup`	8	13	18%
Total	19	73

My analysis of this PR shows:

Expected impact of #2436 on our guards

Likely helps a lot — circuit_breaker_trip (~82% of alerts)

PR change	Why it maps to our production pain
Proactive preserve in `_callRefreshToken`	Transient refresh failures no longer `_removeSession` while access token still valid → fewer silent logouts and re-auth loops
`__loadSession` returns preserved session	`getSession()` callers stay logged in instead of `{ session: null }` → less client thrashing
60s refresh-failure cooldown	Caps serial `/token` hammer — directly reduces trips on our 3 refreshes/min budget
500 / CF 525–529 → retryable	Outages no longer drag sessions into reactive removal
Single `_removeSession` in `_recoverAndRefresh`	Fewer spurious `SIGNED_OUT` → fewer re-armed refresh loops
Expectation: `circuit_breaker_trip` emails should drop sharply after upgrade. We'll keep the breaker as a safety net for stale deploy bundles, dead refresh tokens, and cookie-domain edge cases auth-js can't reach.

I will deploy 2.108.2-beta.5 to production now and report back later. Hopefully I don't keep getting those specific operational emails the next couple of weeks. 😅

mandarini · 2026-06-15T06:46:16Z

Thank you @thomaslarsson for the detailed report!!! :D :D Let me know how the testing goes! If an issue comes up, can you please open a new issue on supabase-js and tag me? It will be easier to track!! Thank you SO much! 💚

@mandarini

This PR updates @supabase/*-js libraries to version 2.108.2. **Source**: supabase-js-stable-release **Changes**: - Updated @supabase/supabase-js to 2.108.2 - Updated @supabase/auth-js to 2.108.2 - Updated @supabase/realtime-js to 2.108.2 - Updated @supabase/postgest-js to 2.108.2 - Refreshed pnpm-lock.yaml --- ## Release Notes ## v2.108.2 ## 2.108.2 (2026-06-15) ### 🩹 Fixes - **auth:** preserve valid session on refresh failure and cooldown repeat failures ([#2436](supabase/supabase-js#2436)) - **realtime:** clarify httpSend() 404 error and server migration note ([#2444](supabase/supabase-js#2444)) - **release:** pin Deno and bound JSR publish to survive stranded-task hangs ([#2439](supabase/supabase-js#2439)) - **release:** restore JSR publish flags and enable for beta ([#2440](supabase/supabase-js#2440)) ### ❤️ Thank You - Katerina Skroumpelou @mandarini ## v2.108.1 ## 2.108.1 (2026-06-09) ### 🩹 Fixes - **ci:** forward DOGFOOD_APP_CLIENT_ID to dogfood workflow ([#2434](supabase/supabase-js#2434)) - **postgrest:** then typing ([#2349](supabase/supabase-js#2349)) ### ❤️ Thank You - Katerina Skroumpelou @mandarini - Vaibhav @7ttp This PR was created automatically. Co-authored-by: supabase-workflow-trigger[bot] <266661614+supabase-workflow-trigger[bot]@users.noreply.github.com>

fix(auth): preserve valid session on refresh failure and cooldown rep…

d134be0

…eat failures

mandarini requested review from a team as code owners June 10, 2026 12:42

mandarini marked this pull request as draft June 10, 2026 12:42

github-actions Bot added the auth-js Related to the auth-js library. label Jun 10, 2026

fix(auth): remove bad test

68e2807

mandarini self-assigned this Jun 10, 2026

mandarini marked this pull request as ready for review June 10, 2026 12:55

mandarini mentioned this pull request Jun 10, 2026

fix(auth): preserve valid session on refresh failure and cooldown repeat failures #2430

Closed

6 tasks

fix(auth): failing test

0eeeb6e

spydon reviewed Jun 10, 2026

View reviewed changes

Comment thread packages/core/auth-js/src/GoTrueClient.ts Outdated

fix(auth): key refresh failure cooldown by refresh token

05eccde

spydon approved these changes Jun 10, 2026

View reviewed changes

mandarini merged commit ad23adf into master Jun 11, 2026
39 of 40 checks passed

mandarini deleted the fix/auth-refresh-preserve-session-2 branch June 11, 2026 06:51

supabase-supabase-autofixer Bot mentioned this pull request Jun 15, 2026

chore: update @supabase/supabase-js to v2.108.2 supabase/multiplayer.dev#68

Merged

supabase-libs-pr-manager Bot mentioned this pull request Jun 15, 2026

chore: update @supabase/supabase-js to v2.108.2 supabase/realtime#1960

Merged

supabase-supabase-autofixer Bot mentioned this pull request Jun 15, 2026

feat: update @supabase/*-js libraries to v2.108.2 supabase/supabase#46927

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(auth): preserve valid session on refresh failure and cooldown repeat failures#2436

fix(auth): preserve valid session on refresh failure and cooldown repeat failures#2436
mandarini merged 4 commits into
masterfrom
fix/auth-refresh-preserve-session-2

mandarini commented Jun 10, 2026 •

edited

Loading

Uh oh!

pkg-pr-new Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

thomaslarsson commented Jun 12, 2026

Uh oh!

mandarini commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mandarini commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

What changed?

Why was this change needed?

Screenshots/Examples

Breaking changes

Checklist

Additional notes

Uh oh!

pkg-pr-new Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thomaslarsson commented Jun 12, 2026

Expected impact of #2436 on our guards

Uh oh!

mandarini commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mandarini commented Jun 10, 2026 •

edited

Loading

pkg-pr-new Bot commented Jun 10, 2026 •

edited

Loading