Skip to content

Make proxy response timeout configurable#4063

Merged
amirejaz merged 3 commits intostacklok:mainfrom
gkatz2:fix/configurable-proxy-timeout-4061
Mar 12, 2026
Merged

Make proxy response timeout configurable#4063
amirejaz merged 3 commits intostacklok:mainfrom
gkatz2:fix/configurable-proxy-timeout-4061

Conversation

@gkatz2
Copy link
Copy Markdown
Contributor

@gkatz2 gkatz2 commented Mar 10, 2026

Summary

  • The streamable HTTP proxy (used by stdio transport) had a hardcoded 30-second response timeout. Any MCP tool call exceeding 30s returned a timeout error, even though the backend server was still processing. This made legitimate long-running tools unusable.
  • Bump the default from 30s to 5m and add TOOLHIVE_PROXY_RESPONSE_TIMEOUT environment variable for users who need a different value. Follows the existing TOOLHIVE_REMOTE_HEALTHCHECKS pattern for env var overrides.

Fixes #4061

Type of change

  • Bug fix

Test plan

  • Unit tests (task test)
  • Linting (task lint-fix)
  • Manual testing (describe below)

Built a custom stdio MCP server with a 35-second delay on tool calls.
Verified the fix in both directions:

Scenario Timeout Result
TOOLHIVE_PROXY_RESPONSE_TIMEOUT=30s + 35s tool 30s Timeout at 30.00s (expected)
Default (no env var) + 35s tool 5m Success at 35.00s
TOOLHIVE_PROXY_RESPONSE_TIMEOUT=1us + any tool 1us Timeout (even initialize fails)

Changes

File Change
pkg/transport/proxy/streamable/streamable_proxy.go Bump default to 5m, add responseTimeout field, add resolveResponseTimeout(), update 3 usage sites
pkg/transport/proxy/streamable/streamable_proxy_test.go Table-driven tests for timeout resolution + constructor wiring tests
docs/arch/03-transport-architecture.md Document TOOLHIVE_PROXY_RESPONSE_TIMEOUT env var

Does this introduce a user-facing change?

The default proxy response timeout increases from 30 seconds to 5 minutes. Users who were hitting 504 timeouts on slow MCP tools will no longer see them without any configuration change. Users who need a different timeout can set TOOLHIVE_PROXY_RESPONSE_TIMEOUT (e.g., 10m, 120s).

Special notes for reviewers

  • Only the streamable HTTP proxy for stdio transport is affected (pkg/transport/proxy/streamable/). The transparent proxy used by --transport sse and --transport streamable-http has no such timeout.
  • The env var is read once at proxy creation time (in NewHTTPProxy) and stored as a struct field, avoiding repeated os.Getenv calls on every request.
  • Invalid or non-positive values log a warning and fall back to the default.

@github-actions github-actions bot added the size/S Small PR: 100-299 lines changed label Mar 10, 2026
@gkatz2 gkatz2 force-pushed the fix/configurable-proxy-timeout-4061 branch from 3a51b39 to 5e34bdd Compare March 10, 2026 02:39
@github-actions github-actions bot added size/S Small PR: 100-299 lines changed and removed size/S Small PR: 100-299 lines changed labels Mar 10, 2026
@gkatz2 gkatz2 force-pushed the fix/configurable-proxy-timeout-4061 branch from 5e34bdd to 0d267f7 Compare March 10, 2026 02:46
@github-actions github-actions bot added size/S Small PR: 100-299 lines changed and removed size/S Small PR: 100-299 lines changed labels Mar 10, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.62%. Comparing base (2fa4371) to head (24ccabe).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4063      +/-   ##
==========================================
- Coverage   68.65%   68.62%   -0.03%     
==========================================
  Files         454      458       +4     
  Lines       46051    46243     +192     
==========================================
+ Hits        31616    31735     +119     
- Misses      11994    12064      +70     
- Partials     2441     2444       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@gkatz2 gkatz2 force-pushed the fix/configurable-proxy-timeout-4061 branch from 0d267f7 to 6071d98 Compare March 10, 2026 03:05
@github-actions github-actions bot added size/S Small PR: 100-299 lines changed and removed size/S Small PR: 100-299 lines changed labels Mar 10, 2026
@gkatz2 gkatz2 force-pushed the fix/configurable-proxy-timeout-4061 branch from 6071d98 to 04f948a Compare March 10, 2026 19:08
@github-actions github-actions bot added size/S Small PR: 100-299 lines changed and removed size/S Small PR: 100-299 lines changed labels Mar 10, 2026
reyortiz3
reyortiz3 previously approved these changes Mar 10, 2026
aponcedeleonch
aponcedeleonch previously approved these changes Mar 11, 2026
Copy link
Copy Markdown
Member

@aponcedeleonch aponcedeleonch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for the contribution!

Suggestion: Per-MCP-Server Request Timeout (follow-up)

The global env var solves the immediate problem well. As a natural next step, it would be valuable to support a per-server --proxy-request-timeout flag on thv run, since different MCP servers can have very different response characteristics. The env var would remain as a global fallback.

This can be done as a separate follow-up — just leaving breadcrumbs on how it fits into the existing architecture.

Precedence

--proxy-request-timeout flag  >  TOOLHIVE_PROXY_REQUEST_TIMEOUT env var  >  60s default

Implementation sketch

The pattern already exists for other per-server settings (--trust-proxy-headers, --endpoint-prefix). It touches ~7 files with small additions each:

pkg/runner/config.go — add field to RunConfig:

ProxyRequestTimeout time.Duration `json:"proxy_request_timeout,omitempty" yaml:"proxy_request_timeout,omitempty"`

pkg/runner/config_builder.go — add builder option:

func WithProxyRequestTimeout(d time.Duration) RunConfigBuilderOption {
    return func(c *RunConfig) { c.ProxyRequestTimeout = d }
}

cmd/thv/app/run_flags.go — add CLI flag:

cmd.Flags().DurationVar(&f.ProxyRequestTimeout, "proxy-request-timeout", 0,
    "Request timeout for the streamable HTTP proxy used by stdio servers (e.g. 2m, 120s). 0 uses global default.")

pkg/transport/proxy/streamable/streamable_proxy.go — accept per-server value, fall back to env then default:

func resolveRequestTimeout(perServer time.Duration) time.Duration {
    if perServer > 0 {
        return perServer
    }
    if v := os.Getenv(proxyRequestTimeoutEnv); v != "" {
        if d, err := time.ParseDuration(v); err == nil && d > 0 {
            return d
        }
    }
    return defaultRequestTimeout
}

The remaining files (types/transport.go, runner.go, http.go) just propagate the value through the existing config plumbing.

🤖 Generated with Claude Code

@aponcedeleonch aponcedeleonch force-pushed the fix/configurable-proxy-timeout-4061 branch from 04f948a to 0d28989 Compare March 11, 2026 11:58
@github-actions github-actions bot added size/S Small PR: 100-299 lines changed and removed size/S Small PR: 100-299 lines changed labels Mar 11, 2026
@aponcedeleonch
Copy link
Copy Markdown
Member

I rebased to check if the errors in the e2e tests were fixed by some recent merged PRs on main

@gkatz2 gkatz2 dismissed stale reviews from reyortiz3 and aponcedeleonch via 084cb2c March 11, 2026 15:52
@gkatz2 gkatz2 force-pushed the fix/configurable-proxy-timeout-4061 branch from 0d28989 to 084cb2c Compare March 11, 2026 15:52
@github-actions github-actions bot added size/S Small PR: 100-299 lines changed and removed size/S Small PR: 100-299 lines changed labels Mar 11, 2026
@gkatz2
Copy link
Copy Markdown
Contributor Author

gkatz2 commented Mar 11, 2026

@reyortiz3 @aponcedeleonch Sorry for the approval reset — I had to push a fix to a test. The only change since your reviews:

TestNewHTTPProxyDefaultTimeout was reading TOOLHIVE_PROXY_REQUEST_TIMEOUT from the ambient environment, so it would fail for anyone who has that env var set. Fixed by clearing it with t.Setenv (which requires dropping t.Parallel()):

-func TestNewHTTPProxyDefaultTimeout(t *testing.T) {
-	t.Parallel()
+// Not parallel because t.Setenv is needed to clear any ambient
+// TOOLHIVE_PROXY_REQUEST_TIMEOUT from the test runner's environment.
+func TestNewHTTPProxyDefaultTimeout(t *testing.T) { //nolint:paralleltest
+	t.Setenv(proxyRequestTimeoutEnv, "")
 	proxy := NewHTTPProxy("localhost", 0, nil)
 	assert.Equal(t, defaultRequestTimeout, proxy.requestTimeout)
 }

No production code changes — just the one test. CI is running now. Would appreciate a re-approval when you get a chance.

reyortiz3
reyortiz3 previously approved these changes Mar 11, 2026
The streamable HTTP proxy used by stdio transport had a hardcoded
30-second timeout. Any MCP tool call exceeding 30 seconds returned
a timeout error, even though the backend was still processing.

Bump the default to 60s (matching the MCP SDK convention) and add
TOOLHIVE_PROXY_REQUEST_TIMEOUT environment variable for users who
need a different value. This follows the existing
TOOLHIVE_REMOTE_HEALTHCHECKS pattern for env var overrides.

Fixes stacklok#4061

Signed-off-by: Greg Katz <gkatz@indeed.com>
@gkatz2 gkatz2 force-pushed the fix/configurable-proxy-timeout-4061 branch from 084cb2c to 99bdfbf Compare March 11, 2026 18:51
@github-actions github-actions bot added size/S Small PR: 100-299 lines changed and removed size/S Small PR: 100-299 lines changed labels Mar 11, 2026
amirejaz
amirejaz previously approved these changes Mar 11, 2026
@amirejaz amirejaz dismissed stale reviews from reyortiz3 and themself via fc13f77 March 11, 2026 23:35
@github-actions github-actions bot added size/S Small PR: 100-299 lines changed and removed size/S Small PR: 100-299 lines changed labels Mar 11, 2026
Signed-off-by: Greg Katz <gkatz@indeed.com>
@github-actions github-actions bot added size/S Small PR: 100-299 lines changed and removed size/S Small PR: 100-299 lines changed labels Mar 11, 2026
@amirejaz amirejaz merged commit 5b6d22c into stacklok:main Mar 12, 2026
97 of 100 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/S Small PR: 100-299 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Streamable HTTP proxy has hardcoded 30s response timeout causing 504 on long-running MCP tools

4 participants