Enable multi-upstream for MCPServer, MCPRemoteProxy, and proxy runner by jhrozek · Pull Request #4322 · stacklok/toolhive

jhrozek · 2026-03-23T23:24:34Z

Summary

The embedded auth server has supported sequential multi-upstream authorization chains since Phase 2, but every consumer layer (MCPServer controller, MCPRemoteProxy controller, proxy runner) blocked configs with more than one upstream provider. This PR lifts those restrictions and fixes the env var naming scheme to support multiple providers correctly.

The CRD now accepts multiple upstream providers at admission time with duplicate-name detection and DNS-label validation (MaxLength=63, RFC 1123 pattern)
Env var derivation switched from single TOOLHIVE_UPSTREAM_CLIENT_SECRET to name-keyed TOOLHIVE_UPSTREAM_CLIENT_SECRET_<PROVIDER>, making bindings position-independent across provider reordering
Runtime config validation enforces explicit naming for multi-upstream (empty names default to "default" only for single-upstream; "default" is reserved)
Upstream names validated against DNS-label regex to prevent storage key delimiter injection
Upstreamswap middleware log lines now include provider name for multi-upstream debugging

Fixes #4138

Type of change

New feature

Test plan

Unit tests (task test)
Linting (task lint-fix)

Changes

File	Change
`cmd/thv-operator/api/v1alpha1/mcpexternalauthconfig_types.go`	Remove single-upstream restriction, add duplicate-name + DNS-label validation
`cmd/thv-operator/pkg/controllerutil/authserver.go`	Name-keyed env var binding via `upstreamSecretBinding`, multi-provider iteration
`pkg/authserver/config.go`	Multi-upstream name validation, remove `GetUpstream()`
`pkg/auth/upstreamswap/middleware.go`	Add provider name to structured log lines
`deploy/charts/operator-crds/`	Regenerated CRD manifests with Pattern + MaxLength
`docs/operator/crd-api.md`	Regenerated API reference

Special notes for reviewers

The upstreamSecretBinding struct is the single source of truth for env var naming, shared between GenerateAuthServerEnvVars (Pod env) and buildUpstreamRunConfig (runtime config) to prevent desync.
The env var suffix is derived from provider Name (uppercased, hyphens→underscores). Since the CRD regex forbids underscores, the transform is injective — no two valid names can produce the same suffix.
Multi-upstream proxy runner swap middleware defaults to Upstreams[0] for now; explicit per-backend ProviderName selection is addressed in the follow-up branch (authserver-multi-upstream/4-tokens-in-identity).

Generated with Claude Code

codecov · 2026-03-23T23:29:01Z

Codecov Report

❌ Patch coverage is 76.78571% with 26 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.66%. Comparing base (4838bb4) to head (5da616b).
⚠️ Report is 6 commits behind head on main.

Files with missing lines	Patch %	Lines
pkg/authserver/config.go	55.00%	2 Missing and 16 partials ⚠️
...erator/api/v1alpha1/mcpexternalauthconfig_types.go	25.00%	1 Missing and 2 partials ⚠️
cmd/thv-operator/pkg/controllerutil/authserver.go	90.90%	1 Missing and 1 partial ⚠️
...-operator/controllers/mcpremoteproxy_controller.go	92.30%	0 Missing and 1 partial ⚠️
...d/thv-operator/controllers/mcpserver_controller.go	93.33%	0 Missing and 1 partial ⚠️
pkg/auth/upstreamswap/middleware.go	91.66%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4322      +/-   ##
==========================================
- Coverage   68.86%   68.66%   -0.21%     
==========================================
  Files         479      479              
  Lines       48505    48619     +114     
==========================================
- Hits        33404    33385      -19     
- Misses      12347    12359      +12     
- Partials     2754     2875     +121

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

pkg/auth/upstreamswap/middleware.go

tgrunnagle · 2026-03-24T16:07:50Z

Question: Should there be some validation at the proxy runner and remote proxy controllers that restricts upstream providers to length = 1?

Move multi-upstream restrictions from the authserver library to consumer layers. The library now accepts multi-upstream configs but enforces name semantics: single-upstream defaults empty names to "default", while multi-upstream requires explicit non-"default" names with distinct error messages for empty vs reserved names. Validate upstream names against a DNS-label regex (no leading/trailing hyphens, lowercase alphanumeric only) to prevent delimiter injection in storage keys. Add test coverage for invalid name formats (uppercase, underscores, leading/trailing hyphens). Remove the GetUpstream() convenience method (no callers remain after Phase 2). Add a cardinality guard in the proxy runner's Run() that rejects len(Upstreams) > 1 with an actionable error pointing to VirtualMCPServer. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove the len > 1 guard from MCPExternalAuthConfig.validateEmbeddedAuthServer() so the CRD accepts multi-upstream configs. Add multi-upstream rejection to MCPServer and MCPRemoteProxy controllers in handleExternalAuthConfig(), setting a ConditionFalse status with reason MultiUpstreamNotSupported and an actionable error directing users to VirtualMCPServer. Add duplicate upstream name validation in the CRD webhook so conflicts are caught at admission time rather than Pod startup. Tighten the Name field pattern to disallow trailing hyphens and add MaxLength=63 for RFC 1123 compliance. VirtualMCPServer remains unrestricted as the intended multi-upstream consumer. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The embedded auth server has supported sequential multi-upstream chains since the Phase 2 work, but every consumer layer (MCPServer controller, MCPRemoteProxy controller, proxy runner) blocked configs with more than one upstream provider. This commit lifts all those restrictions and fixes the two bugs that would have broken multi-upstream even without the guards. Guard removal: - MCPServer controller: remove len > 1 check in handleExternalAuthConfig - MCPRemoteProxy controller: same removal - Proxy runner Run(): remove len > 1 early-return - Remove associated condition type/reason constants and rejection tests Converter fix (authserver.go): - GenerateAuthServerEnvVars now iterates all providers and emits name-keyed env vars (TOOLHIVE_UPSTREAM_CLIENT_SECRET_OKTA, _GITHUB, …) derived from each provider's Name field, instead of only reading UpstreamProviders[0] into a single unindexed var. Name-keyed bindings are position-independent, so reordering providers in the CRD does not change the secret-to-provider mapping. - buildEmbeddedAuthServerRunnerConfig now builds Upstreams from all providers instead of only the first; buildUpstreamRunConfig gains an envVarName parameter to match the env var naming Middleware fix (middleware.go): - Restore ProviderName auto-derivation in addUpstreamSwapMiddleware, defaulting to the first upstream's name for single-upstream configs. Multi-upstream ProviderName selection will be addressed when a CRD field is added in a follow-up. Observability (upstreamswap/middleware.go): - Add provider field to all upstreamswap log lines (injection success, empty-token warning, get-tokens error) to make confused-deputy debugging tractable when multiple providers are in play. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Extract validateUpstreamName and validateUpstreamType from validateUpstreams to bring cyclomatic complexity under the limit. Break long fmt.Errorf line that exceeded 130 characters. Update stale godoc comments in controllerutil/authserver.go that still referenced the old index-based env var naming scheme. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Client-attributable errors (session not found, no refresh token, refresh failed, invalid binding) are expected conditions that trigger re-auth, not server issues an operator needs to investigate. Log these at Debug instead of Warn; keep Warn for unexpected server-side failures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jhrozek · 2026-03-24T17:14:20Z

Question: Should there be some validation at the proxy runner and remote proxy controllers that restricts upstream providers to length = 1?

Don't we want to extend to 1 or 2? I'd like to enable the multi-hop through corporate IDPs.

You are right though that we should add that as a separate commit with separate tests and docs. I'm adding a new patch that re-adds the guards.

The upstream swap middleware only injects one provider's token, and there is no CRD field to control which provider. Until tokenInjection- Provider is added and Cedar upstream claims extraction is implemented, multi-upstream configs would silently inject only the first upstream's token while the second is stored but unused. Re-add controller-level guards (MCPServer, MCPRemoteProxy, proxy runner) that reject len > 1 with actionable errors directing users to VirtualMCPServer. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jhrozek requested review from ChrisJBurns, JAORMX, jerm-dro and yrobla as code owners March 23, 2026 23:24

github-actions bot added the size/M Medium PR: 300-599 lines changed label Mar 23, 2026

github-actions bot added size/M Medium PR: 300-599 lines changed and removed size/M Medium PR: 300-599 lines changed labels Mar 24, 2026

tgrunnagle previously approved these changes Mar 24, 2026

View reviewed changes

pkg/auth/upstreamswap/middleware.go Outdated Show resolved Hide resolved

jhrozek and others added 5 commits March 24, 2026 16:30

jhrozek dismissed tgrunnagle’s stale review via 0c9ef62 March 24, 2026 17:14

jhrozek force-pushed the authserver-multi-upstream/3-enable branch from f86c222 to 0c9ef62 Compare March 24, 2026 17:14

jhrozek requested review from amirejaz and lujunsan as code owners March 24, 2026 17:14

github-actions bot added size/L Large PR: 600-999 lines changed and removed size/M Medium PR: 300-599 lines changed size/L Large PR: 600-999 lines changed labels Mar 24, 2026

jhrozek force-pushed the authserver-multi-upstream/3-enable branch from 0c9ef62 to cdf5377 Compare March 24, 2026 17:47

github-actions bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Mar 24, 2026

tgrunnagle previously approved these changes Mar 24, 2026

View reviewed changes

regen docs

5da616b

jhrozek dismissed tgrunnagle’s stale review via 5da616b March 24, 2026 18:00

github-actions bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Mar 24, 2026

tgrunnagle approved these changes Mar 24, 2026

View reviewed changes

jhrozek merged commit 074326e into main Mar 24, 2026
72 of 77 checks passed

jhrozek deleted the authserver-multi-upstream/3-enable branch March 24, 2026 20:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable multi-upstream for MCPServer, MCPRemoteProxy, and proxy runner#4322

Enable multi-upstream for MCPServer, MCPRemoteProxy, and proxy runner#4322
jhrozek merged 7 commits intomainfrom
authserver-multi-upstream/3-enable

jhrozek commented Mar 23, 2026

Uh oh!

codecov bot commented Mar 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

tgrunnagle commented Mar 24, 2026

Uh oh!

jhrozek commented Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jhrozek commented Mar 23, 2026

Summary

Type of change

Test plan

Changes

Special notes for reviewers

Uh oh!

codecov bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

tgrunnagle commented Mar 24, 2026

Uh oh!

jhrozek commented Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Mar 23, 2026 •

edited

Loading