fix(auth): preserve valid session on refresh failure and cooldown repeat failures#2436
Conversation
@supabase/auth-js
@supabase/functions-js
@supabase/postgrest-js
@supabase/realtime-js
@supabase/storage-js
@supabase/supabase-js
commit: |
|
@mandarini Thank you for taking my report seriously and trying to fix the issues. I really appreciate it and want to try to provide som value back to the community. I installed Some stats:
My analysis of this PR shows: Expected impact of #2436 on our guardsLikely helps a lot —
I will deploy |
|
Thank you @thomaslarsson for the detailed report!!! :D :D Let me know how the testing goes! If an issue comes up, can you please open a new issue on supabase-js and tag me? It will be easier to track!! Thank you SO much! 💚 |
This PR updates @supabase/*-js libraries to version 2.108.2. **Source**: supabase-js-stable-release **Changes**: - Updated @supabase/supabase-js to 2.108.2 - Updated @supabase/auth-js to 2.108.2 - Updated @supabase/realtime-js to 2.108.2 - Updated @supabase/postgest-js to 2.108.2 - Refreshed pnpm-lock.yaml --- ## Release Notes ## v2.108.2 ## 2.108.2 (2026-06-15) ### 🩹 Fixes - **auth:** preserve valid session on refresh failure and cooldown repeat failures ([#2436](supabase/supabase-js#2436)) - **realtime:** clarify httpSend() 404 error and server migration note ([#2444](supabase/supabase-js#2444)) - **release:** pin Deno and bound JSR publish to survive stranded-task hangs ([#2439](supabase/supabase-js#2439)) - **release:** restore JSR publish flags and enable for beta ([#2440](supabase/supabase-js#2440)) ### ❤️ Thank You - Katerina Skroumpelou @mandarini ## v2.108.1 ## 2.108.1 (2026-06-09) ### 🩹 Fixes - **ci:** forward DOGFOOD_APP_CLIENT_ID to dogfood workflow ([#2434](supabase/supabase-js#2434)) - **postgrest:** then typing ([#2349](supabase/supabase-js#2349)) ### ❤️ Thank You - Katerina Skroumpelou @mandarini - Vaibhav @7ttp This PR was created automatically. Co-authored-by: supabase-workflow-trigger[bot] <266661614+supabase-workflow-trigger[bot]@users.noreply.github.com>
Description
Fixes the long-standing
_callRefreshTokenissue where a transient or non-retryable refresh failure destroyed a session whose access token was still valid, and the related symptom where a sustained outage had the SDK hammer/tokenwith the same dead refresh token until the access token actually expired or the user wiped local storage.This is the second attempt — see #2430 for the prior approach. Key difference: that PR preserved storage in
_callRefreshTokenbut__loadSessionstill translated the refresh error into{ session: null, error }, sogetSession()callers stayed effectively logged out — the same failure mode that got #2146 rejected. This PR closes that gap end-to-end while keeping explicit refresh entry points (refreshSession,setSession) honest about failures.What changed?
Five complementary changes, all in
packages/core/auth-js/:_callRefreshToken. On non-retryable error, re-read storage and skip_removeSessionif the access token is still inside its real expiry window. Return shape unchanged — explicit callers still see the underlying error.__loadSessiononly. When_callRefreshTokenerrors but the in-scopecurrentSessionis still valid, hand the caller the preserved session instead of{ session: null, error }. Guards against a concurrentsignOutclearing storage during the refresh attempt by re-reading storage before returning. Scoped to__loadSessiondeliberately sorefreshSession()/setSession()keep their honest error semantics.REFRESH_FAILURE_COOLDOWN_MS(60s, two auto-refresh ticks). Subsequent serial callers within that window receive the cached failure synchronously instead of firing another/tokencall. Cleared on any successful refresh, on_removeSession, and on aTOKEN_REFRESHED/SIGNED_INbroadcast from another tab.lib/fetch.ts.NETWORK_ERROR_CODESnow includes500,501, and the Cloudflare-origin525-529codes. Previously these were misclassified as non-retryable, which on the old catch path triggered_removeSession()during outages._removeSessionin_recoverAndRefresh._callRefreshToken's catch is now the single source of truth for "session is dead enough to wipe." Stops the double-SIGNED_OUTduring init that @nathanschram flagged on Bug: _callRefreshToken permanently deletes session on non-retryable refresh failure, even when access token is still valid #2145, and prevents the new proactive-preserve from being undone at init time.Why was this change needed?
Two real-world failure modes converge on the same code path:
__loadSession()triggers a refresh whenever the access token is withinEXPIRY_MARGIN_MS(90s) of expiry. If that refresh failed with any non-retryable error (multi-tab rotation race, mobile-browser tab lifecycle, transient 400 from GoTrue),_removeSession()was called unconditionally, destroying a session whose access token still worked for up to 90 more seconds. The user was silently logged out withgetSession()returning{ session: null, error: null }and no recovery path short of a full reload and re-login./tokencall kept failing (DNS unreachable, persistent 4xx/5xx), every subsequentgetSession()call in the 90s margin re-fired_callRefreshTokenagainst the same broken refresh token. Reporters on Bug: _callRefreshToken permanently deletes session on non-retryable refresh failure, even when access token is still valid #2145 documented hundreds to tens of thousands of/tokenrequests per hour from a single client, all hitting the same failure.The proactive/reactive distinction in
_callRefreshTokenplus the__loadSessionmirror address (1). The cooldown cache addresses (2) by capping/tokencalls to one per 60s window during sustained failure. The widenedNETWORK_ERROR_CODESensures common outage status codes are classified as transient instead of dragging the session into the reactive-removal path — the Reddit r/Supabase report of a real outage signing out entire mobile-app user bases was 500 + HTML body responses falling into the non-retryable branch.Closes #2145
Screenshots/Examples
Before:
After:
refreshSession()andsetSession()are unchanged — they still surface the refresh error to their callers so they don't lie about whether the token actually rotated.Breaking changes
No public API changes, no exported type changes, no method signatures changed. Three observable behavior changes worth calling out:
AuthApiError, nowAuthRetryableFetchError. Both extendAuthError, socatch (e) { if (e instanceof AuthError) ... }is unaffected. Onlyinstanceof AuthApiErrorfor those specific status codes would stop matching.SIGNED_OUTevents.onAuthStateChangecallbacks see strictly fewerSIGNED_OUTevents — only when the session is genuinely dead, never extra. Init-time non-retryable refresh now firesSIGNED_OUTexactly once instead of twice.getSession()returns the preserved session in proactive-preserve scenarios it previously returnednullfor. This is the headline bug fix.The auto-refresh ticker cadence (
AUTO_REFRESH_TICK_DURATION_MS) is unchanged. The commit-guard logic for mid-flight signOut races is unchanged.refreshSession()/setSession()semantics are unchanged.Checklist
<type>(<scope>): <description>pnpm nx formatto ensure consistent code formattingAdditional notes
The cooldown shape follows @thomaslarsson's
lastRefreshResultsketch on #2145. Concurrent dedupe viarefreshingDeferredis preserved; the cooldown extends the dedupe contract to serial callers spaced across short failure windows.Tests added under a new
describe('Refresh-token lifecycle (proactive/reactive, cooldown)')block inGoTrueClient.test.ts, split into five sub-describes:_callRefreshTokenpreserves on proactive failure, removes on reactive, preserves on retryable network failure regardless of expirygetSession— returns preserved session on proactive-preserve, null on reactive, null when storage cleared concurrently (race guard)refreshSession()andsetSession()still surface the error on proactive-preserve scenarios/token, cleared on success, cleared in_removeSession, expires afterREFRESH_FAILURE_COOLDOWN_MS(verified with fake timers)_recoverAndRefreshemitsSIGNED_OUTexactly once on non-retryable refresh failureThe
BroadcastChannelcache-clear branch is not unit tested —globalThis.BroadcastChannelis undefined in the Jest node env and adding a stub for one small branch isn't worth the surface area. Inline comment in the test file documents this.