Key authserver's upstream token storage on (sessionID, providerName)#4198
Key authserver's upstream token storage on (sessionID, providerName)#4198
Conversation
b0e6717 to
f4ae69d
Compare
There was a problem hiding this comment.
Large PR Detected
This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.
How to unblock this PR:
Add a section to your PR description with the following format:
## Large PR Justification
[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformationAlternative:
Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.
See our Contributing Guidelines for more details.
This review will be automatically dismissed once you add the justification section.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4198 +/- ##
==========================================
- Coverage 69.32% 68.69% -0.63%
==========================================
Files 470 471 +1
Lines 47189 47533 +344
==========================================
- Hits 32712 32653 -59
- Misses 11957 12133 +176
- Partials 2520 2747 +227 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review. |
Large PR justification has been provided. Thank you!
jerm-dro
left a comment
There was a problem hiding this comment.
No blocking concerns, but if you think a bulk upfront migration would simplify the code, I'd be willing to re-review.
4a50ef3 to
aa9c9ca
Compare
|
Thanks for the review @jerm-dro ! I took all your suggestions, they all made sense. I haven't tested the bulk migration yet outside the tests in the PR. I won't merge the PR until I've done so even if you ack in the meantime. |
Restructure UpstreamTokenStorage so tokens are keyed by
(sessionID, providerName) instead of just sessionID, enabling
multiple upstream providers' tokens to coexist per session.
This is the foundation for multi-upstream IDP support (RFC-0052).
Storage layer:
- UpstreamTokenStorage interface gains providerName param on
Store/Get and new GetAllUpstreamTokens bulk-read method
- Memory backend uses nested map (sessionID -> providerName -> entry)
with empty inner map cleanup during eviction
- Redis backend uses per-provider keys (upstream:{sid}:{provider})
with a session index SET for enumeration and atomic deletion
- PendingAuthorization gains UpstreamProviderName and SessionID
fields for future multi-leg authorization chains
Service + middleware:
- upstreamtoken.Service.GetValidTokens takes providerName
- upstreamswap.Config gains required ProviderName field
- runner/middleware derives ProviderName from upstream config
Bug fixes from review:
- Fix provider name key mismatch: callback handler now uses the
upstream's logical name (UpstreamConfig.Name) instead of the
protocol type (upstream.Type()) for both ProviderID and the
storage key, matching what the middleware uses to retrieve tokens
- Fix singleflight key collision: include providerName in the
singleflight dedup key to prevent cross-provider result leaking
- Fix Redis GetAllUpstreamTokens excluding expired tokens (violates
interface contract that includes expired tokens at bulk-read level)
- Add warn logging for corrupt entries in Redis GetAllUpstreamTokens
- Defensive slice clone in Redis DeleteUpstreamTokens to prevent
append mutation of the original providerKeys slice
- Add sessionID validation in GetUpstreamTokens for consistency
with StoreUpstreamTokens
- Populate PendingAuthorization.UpstreamProviderName in authorize
handler for multi-upstream forward compatibility
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Existing deployments store upstream tokens under the legacy key
format (upstream:{sessionID}) with ProviderID set to the protocol
type ("oidc"/"oauth2"). The multi-upstream restructuring changed
both the key format and ProviderID to use logical names. Without
migration, upgrading users would lose sessions and get new internal
user IDs.
Upstream token migration (RedisStorage.GetUpstreamTokens):
- On ErrNotFound, fall back to legacy key format
- Patch ProviderID to the logical name before returning so the
refresher writes to the correct new key (prevents refresh loop)
- Re-store under new key format and delete legacy key
- Preserve ErrExpired semantics for expired legacy tokens
ProviderIdentity migration (UserResolver.ResolveUser):
- Before creating a new user, check legacy provider IDs ("oidc",
"oauth2") for existing identity records
- If found, create new identity under current provider ID pointing
to the same internal user, preserving sub claim continuity
- Keep legacy records as defensive fallback (not deleted)
- Handle concurrent migration via ErrAlreadyExists
Both migration paths are idempotent and marked with TODO for
removal after all deployments have migrated.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Verify end-to-end that the callback handler stores upstream tokens
under the correct (sessionID, providerName) key and that the tsid
JWT claim maps to retrievable tokens.
TestIntegration_UpstreamTokenStorage:
- Tokens retrievable by provider name ("default") after full flow
- ProviderID is the logical name, not the protocol type
- Binding fields (UserID, UpstreamSubject, ClientID) populated
- UserID matches JWT sub claim
TestIntegration_RefreshPreservesUpstreamTokenBinding:
- tsid claim preserved across refresh token grant
- Upstream tokens still retrievable after refresh
- ProviderID unchanged after refresh
Infrastructure: expose UpstreamTokenStorage on testServer struct
via srv.IDPTokenStorage() for storage assertions in tests.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add miniredis unit tests for the legacy upstream token migration path, verifying: - Legacy key format tokens are migrated on read (ProviderID patched, old key deleted, new key created) - Expired legacy tokens are migrated with ErrExpired preserved - New-format keys take priority over legacy keys The expired token test caught a bug: the migration fallback skipped expired legacy tokens because getUpstreamTokensFromKey returns ErrExpired (non-nil), which the guard condition treated as "not found". Fixed by allowing ErrExpired through the legacy fallback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AuthorizeHandler records which upstream initiated the authorization, but CallbackHandler ignored it and always used h.upstreamName. Add a consistency check: if the pending state names a different provider than the handler serving the callback, reject with a server_error redirect. This prevents misrouted callbacks from associating sessions with the wrong provider namespace. Harden legacy migration against cross-provider attacks. Identity migration: - Scope findLegacyProviderIdentity to a single legacy ID derived from UpstreamConfig.Type, not a global ["oidc","oauth2"] scan. This prevents cross-provider account merge when two upstreams share a subject value. - Propagate transient storage errors instead of swallowing them (fail closed rather than creating duplicate users on Redis blip). - Inject UserResolver into NewHandler as a dependency so Handler carries no migration-specific knowledge. Redis token migration: - Add isLegacyUpstreamProviderID guard: refuse to migrate tokens whose ProviderID is already a logical name, preventing any provider from claiming another provider's legacy tokens. - In practice, the remaining window (protocol-typed tokens claimed by first caller) is safe because validateUpstreams() enforces a single upstream, and new multi-upstream deployments start fresh without legacy keys.
Move legacy data migration from the read path (GetUpstreamTokens, ResolveUser) to a single bulk migration that runs at authserver startup before handlers are constructed. This eliminates ~100 lines of inline fallback logic, security guards, and legacy-aware fields from the hot path. The read path is now a straight key lookup with no branching. The migration is idempotent, crash-safe (each key independent), and fails startup if any keys cannot be migrated — since the request path no longer has inline fallbacks for unmigrated data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace map[sessionID]map[providerName]*timedEntry with a flat map[upstreamKey]*timedEntry using a composite struct key. This eliminates the empty-inner-map cleanup pattern that was duplicated in cleanupExpired and DeleteUser — a known footgun when adding new deletion paths. The trade-off is O(N) scans in GetAllUpstreamTokens and DeleteUpstreamTokens instead of O(1) session lookup, which is acceptable for this dev/test-only storage backend. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Test the one-shot bulk migration against a real Redis Sentinel cluster via testcontainers, covering: full lifecycle (token key migration, identity duplication, session index, user:upstream reverse index, DeleteUser cascade), idempotency, TTL preservation, empty-store no-op, and new-format key non-interference. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
aa9c9ca to
9b276a9
Compare
|
OK, I managed to test the migration end-to-end in a kind cluster and it seemed to work. I deployed thv from main, set up a github MCP server and authenticated to seed the legacy data. Upgraded to this branch, checked the migration log: then checked the data in redis directly, they looked correct: finally I called an MCP tool from the same VSCode session to see if I'm going to be reprompted for auth, I was not. |
jerm-dro
left a comment
There was a problem hiding this comment.
Thanks for marking the migration change and testing it!
I left one of the prior comments unresolved, because I raise a question about the redis schema. I'll defer to judgment on whether it's worth doing.
…tacklok#4198) * Key upstream token storage on (sessionID, providerName) Restructure UpstreamTokenStorage so tokens are keyed by (sessionID, providerName) instead of just sessionID, enabling multiple upstream providers' tokens to coexist per session. This is the foundation for multi-upstream IDP support (RFC-0052). Storage layer: - UpstreamTokenStorage interface gains providerName param on Store/Get and new GetAllUpstreamTokens bulk-read method - Memory backend uses nested map (sessionID -> providerName -> entry) with empty inner map cleanup during eviction - Redis backend uses per-provider keys (upstream:{sid}:{provider}) with a session index SET for enumeration and atomic deletion - PendingAuthorization gains UpstreamProviderName and SessionID fields for future multi-leg authorization chains Service + middleware: - upstreamtoken.Service.GetValidTokens takes providerName - upstreamswap.Config gains required ProviderName field - runner/middleware derives ProviderName from upstream config Bug fixes from review: - Fix provider name key mismatch: callback handler now uses the upstream's logical name (UpstreamConfig.Name) instead of the protocol type (upstream.Type()) for both ProviderID and the storage key, matching what the middleware uses to retrieve tokens - Fix singleflight key collision: include providerName in the singleflight dedup key to prevent cross-provider result leaking - Fix Redis GetAllUpstreamTokens excluding expired tokens (violates interface contract that includes expired tokens at bulk-read level) - Add warn logging for corrupt entries in Redis GetAllUpstreamTokens - Defensive slice clone in Redis DeleteUpstreamTokens to prevent append mutation of the original providerKeys slice - Add sessionID validation in GetUpstreamTokens for consistency with StoreUpstreamTokens - Populate PendingAuthorization.UpstreamProviderName in authorize handler for multi-upstream forward compatibility Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add backwards-compatible migration for provider ID rename Existing deployments store upstream tokens under the legacy key format (upstream:{sessionID}) with ProviderID set to the protocol type ("oidc"/"oauth2"). The multi-upstream restructuring changed both the key format and ProviderID to use logical names. Without migration, upgrading users would lose sessions and get new internal user IDs. Upstream token migration (RedisStorage.GetUpstreamTokens): - On ErrNotFound, fall back to legacy key format - Patch ProviderID to the logical name before returning so the refresher writes to the correct new key (prevents refresh loop) - Re-store under new key format and delete legacy key - Preserve ErrExpired semantics for expired legacy tokens ProviderIdentity migration (UserResolver.ResolveUser): - Before creating a new user, check legacy provider IDs ("oidc", "oauth2") for existing identity records - If found, create new identity under current provider ID pointing to the same internal user, preserving sub claim continuity - Keep legacy records as defensive fallback (not deleted) - Handle concurrent migration via ErrAlreadyExists Both migration paths are idempotent and marked with TODO for removal after all deployments have migrated. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add integration tests for upstream token session binding Verify end-to-end that the callback handler stores upstream tokens under the correct (sessionID, providerName) key and that the tsid JWT claim maps to retrievable tokens. TestIntegration_UpstreamTokenStorage: - Tokens retrievable by provider name ("default") after full flow - ProviderID is the logical name, not the protocol type - Binding fields (UserID, UpstreamSubject, ClientID) populated - UserID matches JWT sub claim TestIntegration_RefreshPreservesUpstreamTokenBinding: - tsid claim preserved across refresh token grant - Upstream tokens still retrievable after refresh - ProviderID unchanged after refresh Infrastructure: expose UpstreamTokenStorage on testServer struct via srv.IDPTokenStorage() for storage assertions in tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add legacy migration tests and fix expired token migration Add miniredis unit tests for the legacy upstream token migration path, verifying: - Legacy key format tokens are migrated on read (ProviderID patched, old key deleted, new key created) - Expired legacy tokens are migrated with ErrExpired preserved - New-format keys take priority over legacy keys The expired token test caught a bug: the migration fallback skipped expired legacy tokens because getUpstreamTokensFromKey returns ErrExpired (non-nil), which the guard condition treated as "not found". Fixed by allowing ErrExpired through the legacy fallback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Enforce pending.UpstreamProviderName in CallbackHandler. AuthorizeHandler records which upstream initiated the authorization, but CallbackHandler ignored it and always used h.upstreamName. Add a consistency check: if the pending state names a different provider than the handler serving the callback, reject with a server_error redirect. This prevents misrouted callbacks from associating sessions with the wrong provider namespace. Harden legacy migration against cross-provider attacks. Identity migration: - Scope findLegacyProviderIdentity to a single legacy ID derived from UpstreamConfig.Type, not a global ["oidc","oauth2"] scan. This prevents cross-provider account merge when two upstreams share a subject value. - Propagate transient storage errors instead of swallowing them (fail closed rather than creating duplicate users on Redis blip). - Inject UserResolver into NewHandler as a dependency so Handler carries no migration-specific knowledge. Redis token migration: - Add isLegacyUpstreamProviderID guard: refuse to migrate tokens whose ProviderID is already a logical name, preventing any provider from claiming another provider's legacy tokens. - In practice, the remaining window (protocol-typed tokens claimed by first caller) is safe because validateUpstreams() enforces a single upstream, and new multi-upstream deployments start fresh without legacy keys. * Fix lint * Regenerate docs * Replace inline migration with one-shot bulk startup migration Move legacy data migration from the read path (GetUpstreamTokens, ResolveUser) to a single bulk migration that runs at authserver startup before handlers are constructed. This eliminates ~100 lines of inline fallback logic, security guards, and legacy-aware fields from the hot path. The read path is now a straight key lookup with no branching. The migration is idempotent, crash-safe (each key independent), and fails startup if any keys cannot be migrated — since the request path no longer has inline fallbacks for unmigrated data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Flatten nested upstream token map in memory storage Replace map[sessionID]map[providerName]*timedEntry with a flat map[upstreamKey]*timedEntry using a composite struct key. This eliminates the empty-inner-map cleanup pattern that was duplicated in cleanupExpired and DeleteUser — a known footgun when adding new deletion paths. The trade-off is O(N) scans in GetAllUpstreamTokens and DeleteUpstreamTokens instead of O(1) session lookup, which is acceptable for this dev/test-only storage backend. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Return error from NewHandler instead of panicking on nil resolver Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Clarify Redis key comments for session index set and key prefix Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add integration tests for legacy data migration against real Redis Test the one-shot bulk migration against a real Redis Sentinel cluster via testcontainers, covering: full lifecycle (token key migration, identity duplication, session index, user:upstream reverse index, DeleteUser cascade), idempotency, TTL preservation, empty-store no-op, and new-format key non-interference. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Multi-upstream IDP support requires each provider's tokens to be stored and retrieved independently per session. The existing storage keyed upstream tokens on sessionID alone, making it impossible to hold tokens from more than one provider concurrently. This PR re-keys the storage to (sessionID, providerName) and adds backwards-compatible migration for existing deployments.
UpstreamTokenStorageinterface so tokens are keyed by (sessionID, providerName), with a newGetAllUpstreamTokensbulk-read method for session enumerationPendingAuthorization.UpstreamProviderNameandSessionIDfields for future multi-leg authorization chainsisLegacyUpstreamProviderIDguard in Redis migration to prevent logical-name tokens from being re-claimed under a different provider namespacepending.UpstreamProviderNameinCallbackHandlerto reject misrouted callbacks that would associate a session with the wrong providerUserResolverintoNewHandleras a dependency so the handler carries no migration-specific knowledgeProviderNamefor the upstream swap middleware from the first upstream config entryFixes #4136
Type of change
Test plan
task test)task lint-fix)Does this introduce a user-facing change?
No. This is an internal storage restructuring with backwards-compatible migration. Existing single-upstream deployments continue to work without configuration changes.
Large PR Justification
Mostly the data migration makes things difficult to split. I was unsure if the migration was even needed given that all existing deployments are running with 1 upstream only, but I thought that the case where all users must re-log after an upgrade would be irritating enough.
Generated with Claude Code