fix(auth): preserve valid session on refresh failure and cooldown repeat failures#2430
Closed
mandarini wants to merge 2 commits into
Closed
fix(auth): preserve valid session on refresh failure and cooldown repeat failures#2430mandarini wants to merge 2 commits into
mandarini wants to merge 2 commits into
Conversation
@supabase/auth-js
@supabase/functions-js
@supabase/postgrest-js
@supabase/realtime-js
@supabase/storage-js
@supabase/supabase-js
commit: |
6 tasks
Contributor
Author
|
close in favor of #2436 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fixes the long-standing
_callRefreshTokenissue where a transient or non-retryable refresh failure would destroy a session whose access token was still valid, and the related symptom where a sustained outage would have the SDK hammer/tokenwith the same dead refresh token until the access token actually expired or the user wiped local storage.What changed?
_callRefreshToken. When a non-retryable error comes back from/token(e.g.400 invalid_grant) and the access token is still within its real expiry window, the SDK no longer calls_removeSession(). The session is preserved and remains usable until its actualexpires_at. Only when the access token has truly expired does the SDK clear storage and emitSIGNED_OUT.REFRESH_FAILURE_COOLDOWN_MS(60s, two auto-refresh ticks). Subsequent serial callers within that window, including the next auto-refresh tick, receive the cached failure synchronously instead of firing another/tokencall. Cleared on any successful refresh, on_removeSession, and on aTOKEN_REFRESHED/SIGNED_INbroadcast from another tab.lib/fetch.ts.NETWORK_ERROR_CODESnow includes500,501, and the Cloudflare-origin525-529codes. Previously these were misclassified as non-retryable, which on the old catch path triggered_removeSession().Why was this change needed?
Two real-world failure modes converge on the same code path in
GoTrueClient._callRefreshToken:__loadSession()triggers a refresh whenever the access token is withinEXPIRY_MARGIN_MS(90s) of expiry. If that refresh failed with any non-retryable error (multi-tab rotation race, mobile-browser tab lifecycle, transient 400 from GoTrue),_removeSession()was called unconditionally, destroying a session whose access token still worked for up to 90 more seconds. The user was silently logged out withgetSession()returning{ session: null, error: null }and no recovery path short of a full reload and re-login./tokencall kept failing (DNS unreachable, persistent 4xx/5xx), every subsequentgetSession()call in the 90s margin re-fired_callRefreshTokenagainst the same broken refresh token. Reporters on the referenced issue documented hundreds to tens of thousands of/tokenrequests per hour from a single client, all hitting the same failure.The proactive/reactive distinction addresses (1). The cooldown cache addresses (2) by capping
/tokencalls to one per 60s window during sustained failure. The widenedNETWORK_ERROR_CODESensures common outage status codes are classified as transient instead of dragging the session into the reactive-removal path.Closes #2145
Screenshots/Examples
Before:
After:
Breaking changes
No public API changes. All behavior changes correct misbehavior in failure modes:
/tokenstorm volume drops from "every 30s for the duration of an outage" to "at most one call per 60s".The auto-refresh ticker cadence (
AUTO_REFRESH_TICK_DURATION_MS) is unchanged. The commit-guard logic for mid-flight signOut races is unchanged.onAuthStateChangecallbacks see fewer spuriousSIGNED_OUTevents, never extra ones.Checklist
<type>(<scope>): <description>pnpm nx formatto ensure consistent code formattingAdditional notes
The fix follows the failure-cooldown shape suggested by @thomaslarsson on #2145 (the
lastRefreshResultpattern). Concurrent dedupe viarefreshingDeferredis preserved; the cooldown extends the dedupe contract to serial callers spaced across short failure windows.Tests added under a new
describe('Refresh-token lifecycle (proactive/reactive, cooldown)')block inGoTrueClient.test.ts, covering: proactive failure preserves session, reactive failure removes session, cooldown dedupes serial callers, successful refresh clears the cache, transient network failure caches without storage side effects, and_removeSessionclears the cache.