Skip to content

Key authserver's upstream token storage on (sessionID, providerName)#4198

Merged
jhrozek merged 12 commits intomainfrom
authserver-multi-upstream/1-storage
Mar 19, 2026
Merged

Key authserver's upstream token storage on (sessionID, providerName)#4198
jhrozek merged 12 commits intomainfrom
authserver-multi-upstream/1-storage

Conversation

@jhrozek
Copy link
Copy Markdown
Contributor

@jhrozek jhrozek commented Mar 18, 2026

Summary

Multi-upstream IDP support requires each provider's tokens to be stored and retrieved independently per session. The existing storage keyed upstream tokens on sessionID alone, making it impossible to hold tokens from more than one provider concurrently. This PR re-keys the storage to (sessionID, providerName) and adds backwards-compatible migration for existing deployments.

  • Restructure UpstreamTokenStorage interface so tokens are keyed by (sessionID, providerName), with a new GetAllUpstreamTokens bulk-read method for session enumeration
  • Memory backend uses nested map; Redis backend uses per-provider keys with a session index SET for atomic enumeration and deletion
  • Add PendingAuthorization.UpstreamProviderName and SessionID fields for future multi-leg authorization chains
  • Add backwards-compatible migration paths for both Redis token keys and user provider identities, so upgrading deployments preserve sessions and internal user IDs
  • Harden migration against cross-provider identity merge: scope legacy identity lookup to the single protocol-type ID matching the upstream's type, not a global scan
  • Propagate transient storage errors during legacy lookup (fail closed) instead of silently creating duplicate users
  • Add isLegacyUpstreamProviderID guard in Redis migration to prevent logical-name tokens from being re-claimed under a different provider namespace
  • Enforce pending.UpstreamProviderName in CallbackHandler to reject misrouted callbacks that would associate a session with the wrong provider
  • Inject UserResolver into NewHandler as a dependency so the handler carries no migration-specific knowledge
  • Derive ProviderName for the upstream swap middleware from the first upstream config entry

Fixes #4136

Type of change

  • New feature

Test plan

  • Unit tests (task test)
  • Linting (task lint-fix)
  • Extended the integration tests
  • Manual testing: end-to-end verified in a Kubernetes cluster. This branch is cherry-picked from a complete branch implementing the full multi-upstream feature.

Does this introduce a user-facing change?

No. This is an internal storage restructuring with backwards-compatible migration. Existing single-upstream deployments continue to work without configuration changes.

Large PR Justification

Mostly the data migration makes things difficult to split. I was unsure if the migration was even needed given that all existing deployments are running with 1 upstream only, but I thought that the case where all users must re-log after an upgrade would be irritating enough.

Generated with Claude Code

@jhrozek jhrozek changed the title Key upstream token storage on (sessionID, providerName) Key authserver's upstream token storage on (sessionID, providerName) Mar 18, 2026
@jhrozek jhrozek force-pushed the authserver-multi-upstream/1-storage branch from b0e6717 to f4ae69d Compare March 18, 2026 11:21
@github-actions github-actions bot added the size/XL Extra large PR: 1000+ lines changed label Mar 18, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 18, 2026

Codecov Report

❌ Patch coverage is 54.81928% with 150 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.69%. Comparing base (5748d93) to head (9b276a9).
⚠️ Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
pkg/authserver/storage/redis_migrate.go 59.11% 42 Missing and 23 partials ⚠️
pkg/authserver/storage/redis.go 39.50% 23 Missing and 26 partials ⚠️
pkg/authserver/storage/memory.go 65.95% 4 Missing and 12 partials ⚠️
pkg/authserver/server/handlers/handler.go 40.00% 3 Missing and 3 partials ⚠️
pkg/authserver/server_impl.go 14.28% 3 Missing and 3 partials ⚠️
pkg/runner/middleware.go 57.14% 2 Missing and 1 partial ⚠️
pkg/auth/upstreamtoken/service.go 60.00% 0 Missing and 2 partials ⚠️
pkg/authserver/server/handlers/callback.go 77.77% 0 Missing and 2 partials ⚠️
pkg/auth/upstreamswap/middleware.go 66.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4198      +/-   ##
==========================================
- Coverage   69.32%   68.69%   -0.63%     
==========================================
  Files         470      471       +1     
  Lines       47189    47533     +344     
==========================================
- Hits        32712    32653      -59     
- Misses      11957    12133     +176     
- Partials     2520     2747     +227     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jhrozek jhrozek marked this pull request as ready for review March 18, 2026 21:25
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Mar 18, 2026
@github-actions
Copy link
Copy Markdown
Contributor

✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review.

@github-actions github-actions bot dismissed their stale review March 18, 2026 21:56

Large PR justification has been provided. Thank you!

@jhrozek jhrozek requested a review from jerm-dro March 18, 2026 23:00
jerm-dro
jerm-dro previously approved these changes Mar 19, 2026
Copy link
Copy Markdown
Contributor

@jerm-dro jerm-dro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blocking concerns, but if you think a bulk upfront migration would simplify the code, I'd be willing to re-review.

@jhrozek jhrozek force-pushed the authserver-multi-upstream/1-storage branch from 4a50ef3 to aa9c9ca Compare March 19, 2026 12:50
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Mar 19, 2026
@jhrozek
Copy link
Copy Markdown
Contributor Author

jhrozek commented Mar 19, 2026

Thanks for the review @jerm-dro ! I took all your suggestions, they all made sense.

I haven't tested the bulk migration yet outside the tests in the PR. I won't merge the PR until I've done so even if you ack in the meantime.

jhrozek and others added 6 commits March 19, 2026 14:46
Restructure UpstreamTokenStorage so tokens are keyed by
(sessionID, providerName) instead of just sessionID, enabling
multiple upstream providers' tokens to coexist per session.
This is the foundation for multi-upstream IDP support (RFC-0052).

Storage layer:
- UpstreamTokenStorage interface gains providerName param on
  Store/Get and new GetAllUpstreamTokens bulk-read method
- Memory backend uses nested map (sessionID -> providerName -> entry)
  with empty inner map cleanup during eviction
- Redis backend uses per-provider keys (upstream:{sid}:{provider})
  with a session index SET for enumeration and atomic deletion
- PendingAuthorization gains UpstreamProviderName and SessionID
  fields for future multi-leg authorization chains

Service + middleware:
- upstreamtoken.Service.GetValidTokens takes providerName
- upstreamswap.Config gains required ProviderName field
- runner/middleware derives ProviderName from upstream config

Bug fixes from review:
- Fix provider name key mismatch: callback handler now uses the
  upstream's logical name (UpstreamConfig.Name) instead of the
  protocol type (upstream.Type()) for both ProviderID and the
  storage key, matching what the middleware uses to retrieve tokens
- Fix singleflight key collision: include providerName in the
  singleflight dedup key to prevent cross-provider result leaking
- Fix Redis GetAllUpstreamTokens excluding expired tokens (violates
  interface contract that includes expired tokens at bulk-read level)
- Add warn logging for corrupt entries in Redis GetAllUpstreamTokens
- Defensive slice clone in Redis DeleteUpstreamTokens to prevent
  append mutation of the original providerKeys slice
- Add sessionID validation in GetUpstreamTokens for consistency
  with StoreUpstreamTokens
- Populate PendingAuthorization.UpstreamProviderName in authorize
  handler for multi-upstream forward compatibility

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Existing deployments store upstream tokens under the legacy key
format (upstream:{sessionID}) with ProviderID set to the protocol
type ("oidc"/"oauth2"). The multi-upstream restructuring changed
both the key format and ProviderID to use logical names. Without
migration, upgrading users would lose sessions and get new internal
user IDs.

Upstream token migration (RedisStorage.GetUpstreamTokens):
- On ErrNotFound, fall back to legacy key format
- Patch ProviderID to the logical name before returning so the
  refresher writes to the correct new key (prevents refresh loop)
- Re-store under new key format and delete legacy key
- Preserve ErrExpired semantics for expired legacy tokens

ProviderIdentity migration (UserResolver.ResolveUser):
- Before creating a new user, check legacy provider IDs ("oidc",
  "oauth2") for existing identity records
- If found, create new identity under current provider ID pointing
  to the same internal user, preserving sub claim continuity
- Keep legacy records as defensive fallback (not deleted)
- Handle concurrent migration via ErrAlreadyExists

Both migration paths are idempotent and marked with TODO for
removal after all deployments have migrated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Verify end-to-end that the callback handler stores upstream tokens
under the correct (sessionID, providerName) key and that the tsid
JWT claim maps to retrievable tokens.

TestIntegration_UpstreamTokenStorage:
- Tokens retrievable by provider name ("default") after full flow
- ProviderID is the logical name, not the protocol type
- Binding fields (UserID, UpstreamSubject, ClientID) populated
- UserID matches JWT sub claim

TestIntegration_RefreshPreservesUpstreamTokenBinding:
- tsid claim preserved across refresh token grant
- Upstream tokens still retrievable after refresh
- ProviderID unchanged after refresh

Infrastructure: expose UpstreamTokenStorage on testServer struct
via srv.IDPTokenStorage() for storage assertions in tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add miniredis unit tests for the legacy upstream token migration
path, verifying:
- Legacy key format tokens are migrated on read (ProviderID patched,
  old key deleted, new key created)
- Expired legacy tokens are migrated with ErrExpired preserved
- New-format keys take priority over legacy keys

The expired token test caught a bug: the migration fallback skipped
expired legacy tokens because getUpstreamTokensFromKey returns
ErrExpired (non-nil), which the guard condition treated as "not
found". Fixed by allowing ErrExpired through the legacy fallback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
AuthorizeHandler records which upstream initiated the authorization,
but CallbackHandler ignored it and always used h.upstreamName.
Add a consistency check: if the pending state names a different
provider than the handler serving the callback, reject with a
server_error redirect. This prevents misrouted callbacks from
associating sessions with the wrong provider namespace.

Harden legacy migration against cross-provider attacks.

Identity migration:
- Scope findLegacyProviderIdentity to a single legacy ID derived
  from UpstreamConfig.Type, not a global ["oidc","oauth2"] scan.
  This prevents cross-provider account merge when two upstreams
  share a subject value.
- Propagate transient storage errors instead of swallowing them
  (fail closed rather than creating duplicate users on Redis blip).
- Inject UserResolver into NewHandler as a dependency so Handler
  carries no migration-specific knowledge.

Redis token migration:
- Add isLegacyUpstreamProviderID guard: refuse to migrate tokens
  whose ProviderID is already a logical name, preventing any
  provider from claiming another provider's legacy tokens.
- In practice, the remaining window (protocol-typed tokens claimed
  by first caller) is safe because validateUpstreams() enforces a
  single upstream, and new multi-upstream deployments start fresh
  without legacy keys.
jhrozek and others added 6 commits March 19, 2026 14:46
Move legacy data migration from the read path (GetUpstreamTokens,
ResolveUser) to a single bulk migration that runs at authserver
startup before handlers are constructed.

This eliminates ~100 lines of inline fallback logic, security
guards, and legacy-aware fields from the hot path. The read path
is now a straight key lookup with no branching.

The migration is idempotent, crash-safe (each key independent),
and fails startup if any keys cannot be migrated — since the
request path no longer has inline fallbacks for unmigrated data.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace map[sessionID]map[providerName]*timedEntry with a flat
map[upstreamKey]*timedEntry using a composite struct key. This
eliminates the empty-inner-map cleanup pattern that was duplicated
in cleanupExpired and DeleteUser — a known footgun when adding
new deletion paths.

The trade-off is O(N) scans in GetAllUpstreamTokens and
DeleteUpstreamTokens instead of O(1) session lookup, which is
acceptable for this dev/test-only storage backend.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Test the one-shot bulk migration against a real Redis Sentinel
cluster via testcontainers, covering: full lifecycle (token key
migration, identity duplication, session index, user:upstream
reverse index, DeleteUser cascade), idempotency, TTL preservation,
empty-store no-op, and new-format key non-interference.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jhrozek jhrozek force-pushed the authserver-multi-upstream/1-storage branch from aa9c9ca to 9b276a9 Compare March 19, 2026 14:46
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Mar 19, 2026
@jhrozek
Copy link
Copy Markdown
Contributor Author

jhrozek commented Mar 19, 2026

OK, I managed to test the migration end-to-end in a kind cluster and it seemed to work.

I deployed thv from main, set up a github MCP server and authenticated to seed the legacy data. Upgraded to this branch, checked the migration log:

  migrated legacy upstream token    session_id=LNV44... provider_name=github
  migrated legacy provider identity legacy_provider_id=oauth2 provider_name=github provider_subject=715522
  legacy data migration complete    tokens_migrated=1 tokens_skipped=0 identities_migrated=1 identities_skipped=0

then checked the data in redis directly, they looked correct:
- Legacy token key deleted, new key upstream:{sessionID}:github created
- Session index set (upstream:idx:{sessionID}) populated
- User reverse index (user:upstream:{userID}) updated (for DeleteUser cascade)

finally I called an MCP tool from the same VSCode session to see if I'm going to be reprompted for auth, I was not.

Copy link
Copy Markdown
Contributor

@jerm-dro jerm-dro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for marking the migration change and testing it!

I left one of the prior comments unresolved, because I raise a question about the redis schema. I'll defer to judgment on whether it's worth doing.

@jhrozek jhrozek merged commit f52b6d0 into main Mar 19, 2026
75 of 76 checks passed
@jhrozek jhrozek deleted the authserver-multi-upstream/1-storage branch March 19, 2026 21:49
Sanskarzz pushed a commit to Sanskarzz/toolhive that referenced this pull request Mar 23, 2026
…tacklok#4198)

* Key upstream token storage on (sessionID, providerName)

Restructure UpstreamTokenStorage so tokens are keyed by
(sessionID, providerName) instead of just sessionID, enabling
multiple upstream providers' tokens to coexist per session.
This is the foundation for multi-upstream IDP support (RFC-0052).

Storage layer:
- UpstreamTokenStorage interface gains providerName param on
  Store/Get and new GetAllUpstreamTokens bulk-read method
- Memory backend uses nested map (sessionID -> providerName -> entry)
  with empty inner map cleanup during eviction
- Redis backend uses per-provider keys (upstream:{sid}:{provider})
  with a session index SET for enumeration and atomic deletion
- PendingAuthorization gains UpstreamProviderName and SessionID
  fields for future multi-leg authorization chains

Service + middleware:
- upstreamtoken.Service.GetValidTokens takes providerName
- upstreamswap.Config gains required ProviderName field
- runner/middleware derives ProviderName from upstream config

Bug fixes from review:
- Fix provider name key mismatch: callback handler now uses the
  upstream's logical name (UpstreamConfig.Name) instead of the
  protocol type (upstream.Type()) for both ProviderID and the
  storage key, matching what the middleware uses to retrieve tokens
- Fix singleflight key collision: include providerName in the
  singleflight dedup key to prevent cross-provider result leaking
- Fix Redis GetAllUpstreamTokens excluding expired tokens (violates
  interface contract that includes expired tokens at bulk-read level)
- Add warn logging for corrupt entries in Redis GetAllUpstreamTokens
- Defensive slice clone in Redis DeleteUpstreamTokens to prevent
  append mutation of the original providerKeys slice
- Add sessionID validation in GetUpstreamTokens for consistency
  with StoreUpstreamTokens
- Populate PendingAuthorization.UpstreamProviderName in authorize
  handler for multi-upstream forward compatibility

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add backwards-compatible migration for provider ID rename

Existing deployments store upstream tokens under the legacy key
format (upstream:{sessionID}) with ProviderID set to the protocol
type ("oidc"/"oauth2"). The multi-upstream restructuring changed
both the key format and ProviderID to use logical names. Without
migration, upgrading users would lose sessions and get new internal
user IDs.

Upstream token migration (RedisStorage.GetUpstreamTokens):
- On ErrNotFound, fall back to legacy key format
- Patch ProviderID to the logical name before returning so the
  refresher writes to the correct new key (prevents refresh loop)
- Re-store under new key format and delete legacy key
- Preserve ErrExpired semantics for expired legacy tokens

ProviderIdentity migration (UserResolver.ResolveUser):
- Before creating a new user, check legacy provider IDs ("oidc",
  "oauth2") for existing identity records
- If found, create new identity under current provider ID pointing
  to the same internal user, preserving sub claim continuity
- Keep legacy records as defensive fallback (not deleted)
- Handle concurrent migration via ErrAlreadyExists

Both migration paths are idempotent and marked with TODO for
removal after all deployments have migrated.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add integration tests for upstream token session binding

Verify end-to-end that the callback handler stores upstream tokens
under the correct (sessionID, providerName) key and that the tsid
JWT claim maps to retrievable tokens.

TestIntegration_UpstreamTokenStorage:
- Tokens retrievable by provider name ("default") after full flow
- ProviderID is the logical name, not the protocol type
- Binding fields (UserID, UpstreamSubject, ClientID) populated
- UserID matches JWT sub claim

TestIntegration_RefreshPreservesUpstreamTokenBinding:
- tsid claim preserved across refresh token grant
- Upstream tokens still retrievable after refresh
- ProviderID unchanged after refresh

Infrastructure: expose UpstreamTokenStorage on testServer struct
via srv.IDPTokenStorage() for storage assertions in tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add legacy migration tests and fix expired token migration

Add miniredis unit tests for the legacy upstream token migration
path, verifying:
- Legacy key format tokens are migrated on read (ProviderID patched,
  old key deleted, new key created)
- Expired legacy tokens are migrated with ErrExpired preserved
- New-format keys take priority over legacy keys

The expired token test caught a bug: the migration fallback skipped
expired legacy tokens because getUpstreamTokensFromKey returns
ErrExpired (non-nil), which the guard condition treated as "not
found". Fixed by allowing ErrExpired through the legacy fallback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Enforce pending.UpstreamProviderName in CallbackHandler.

AuthorizeHandler records which upstream initiated the authorization,
but CallbackHandler ignored it and always used h.upstreamName.
Add a consistency check: if the pending state names a different
provider than the handler serving the callback, reject with a
server_error redirect. This prevents misrouted callbacks from
associating sessions with the wrong provider namespace.

Harden legacy migration against cross-provider attacks.

Identity migration:
- Scope findLegacyProviderIdentity to a single legacy ID derived
  from UpstreamConfig.Type, not a global ["oidc","oauth2"] scan.
  This prevents cross-provider account merge when two upstreams
  share a subject value.
- Propagate transient storage errors instead of swallowing them
  (fail closed rather than creating duplicate users on Redis blip).
- Inject UserResolver into NewHandler as a dependency so Handler
  carries no migration-specific knowledge.

Redis token migration:
- Add isLegacyUpstreamProviderID guard: refuse to migrate tokens
  whose ProviderID is already a logical name, preventing any
  provider from claiming another provider's legacy tokens.
- In practice, the remaining window (protocol-typed tokens claimed
  by first caller) is safe because validateUpstreams() enforces a
  single upstream, and new multi-upstream deployments start fresh
  without legacy keys.

* Fix lint

* Regenerate docs

* Replace inline migration with one-shot bulk startup migration

Move legacy data migration from the read path (GetUpstreamTokens,
ResolveUser) to a single bulk migration that runs at authserver
startup before handlers are constructed.

This eliminates ~100 lines of inline fallback logic, security
guards, and legacy-aware fields from the hot path. The read path
is now a straight key lookup with no branching.

The migration is idempotent, crash-safe (each key independent),
and fails startup if any keys cannot be migrated — since the
request path no longer has inline fallbacks for unmigrated data.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Flatten nested upstream token map in memory storage

Replace map[sessionID]map[providerName]*timedEntry with a flat
map[upstreamKey]*timedEntry using a composite struct key. This
eliminates the empty-inner-map cleanup pattern that was duplicated
in cleanupExpired and DeleteUser — a known footgun when adding
new deletion paths.

The trade-off is O(N) scans in GetAllUpstreamTokens and
DeleteUpstreamTokens instead of O(1) session lookup, which is
acceptable for this dev/test-only storage backend.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Return error from NewHandler instead of panicking on nil resolver

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Clarify Redis key comments for session index set and key prefix

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add integration tests for legacy data migration against real Redis

Test the one-shot bulk migration against a real Redis Sentinel
cluster via testcontainers, covering: full lifecycle (token key
migration, identity duplication, session index, user:upstream
reverse index, DeleteUser cascade), idempotency, TTL preservation,
empty-store no-op, and new-format key non-interference.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR: 1000+ lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement multi-provider upstream token storage layer (RFC-0052 Phase 1)

2 participants