Skip to content

Fix #18: Increase pool_idle_timeout from 50s to 600s (10 minutes)#19

Merged
erans merged 2 commits intomainfrom
bug/18/timeout-after-5-min
Oct 23, 2025
Merged

Fix #18: Increase pool_idle_timeout from 50s to 600s (10 minutes)#19
erans merged 2 commits intomainfrom
bug/18/timeout-after-5-min

Conversation

@erans
Copy link
Copy Markdown
Owner

@erans erans commented Oct 23, 2025

Summary

Fixes #18 - "Poor internet connection" errors after 5 minutes when using proxies like MegaLLM

This PR contains two commits that address connection stability issues:

1. Increase pool_idle_timeout from 50s to 600s (10 minutes)

The previous 50s pool idle timeout was designed to expire connections before upstream servers (OpenAI/Anthropic) close them at 60-120s. However, this was too aggressive for proxies that have longer connection timeouts, causing idle connections to be closed by the pool while actively streaming.

2. Add configurable HTTP server settings for TCP and SSE behavior

Externalizes HTTP/TCP settings that were previously hardcoded, allowing users to tune streaming behavior for different network conditions and proxy configurations.

Problem

With the 50s pool idle timeout and hardcoded SSE keepalive settings:

  • Idle connections were closed too early during long-running streaming requests
  • "Poor internet connection" errors occurred after exactly 5 minutes of streaming
  • No way to adjust SSE keepalive frequency for different proxies
  • Unnecessary connection churn during long operations
  • Incompatible with some proxy timeout configurations

Solution

Part 1: Pool Idle Timeout (Egress Side)

Increase default pool_idle_timeout_secs from 50s to 600s (10 minutes)

Part 2: HTTP Server Configuration (Ingress Side)

Add new http_server configuration section with:

  • tcp_nodelay (bool): Disable Nagle's algorithm for low-latency (default: true)
  • tcp_keepalive_secs (u64): TCP keepalive interval (default: 60s)
  • sse_keepalive_enabled (bool): Enable SSE keepalive comments (default: true)
  • sse_keepalive_interval_secs (u64): SSE keepalive interval (default: 15s)
  • send_buffer_size (optional): TCP send buffer size in bytes
  • recv_buffer_size (optional): TCP receive buffer size in bytes

Benefits

  1. Stable Long-Running Connections: 600s pool timeout accommodates extended streaming sessions
  2. Configurable SSE Keepalive: Adjust frequency for aggressive proxies (e.g., set to 10s)
  3. Full Control: Users can tune both egress (pool) and ingress (SSE) behavior
  4. Reduced Connection Churn: Fewer reconnections during long operations
  5. Proxy Compatibility: Works with various proxy timeout configurations

Changes

Commit 1: Pool Idle Timeout

  • crates/lunaroute-egress/src/client.rs:

    • Update HttpClientConfig::default() to set pool_idle_timeout_secs: 600
    • Add detailed comments explaining the rationale
    • Update all related tests to reflect new default
  • crates/lunaroute-server/src/config.rs:

    • Update tests to expect 600s as default pool idle timeout
  • examples/configs/dual-dialect-passthrough.yaml:

    • Add commented example showing how to configure http_client timeouts

Commit 2: HTTP Server Configuration

  • crates/lunaroute-server/src/config.rs:

    • Added HttpServerSettings struct with serde defaults
    • Integrated into ServerConfig with #[serde(default)]
  • crates/lunaroute-server/src/main.rs:

    • Server startup logs HTTP server configuration
    • Pass SSE config to all passthrough routers
  • crates/lunaroute-ingress/src/{anthropic.rs,openai.rs,multi_dialect.rs}:

    • Updated passthrough routers to accept SSE keepalive parameters
    • Applied configured keepalive to SSE responses
    • Fixed borrow checker issues by cloning config before async moves
  • crates/lunaroute-integration-tests/tests/*.rs:

    • Updated all test calls with new passthrough_router signatures
  • examples/configs/dual-dialect-passthrough.yaml:

    • Added comprehensive http_server configuration examples
    • Explained defaults and when to customize each setting

Configuration Examples

Default (optimized for streaming with proxies):

# Defaults apply automatically - no configuration needed
# - pool_idle_timeout: 600s (10 min)
# - SSE keepalive: 15s
# - TCP keepalive: 60s

For aggressive proxies (e.g., MegaLLM):

http_server:
  sse_keepalive_interval_secs: 10  # Send data every 10s to prevent timeout
  tcp_keepalive_secs: 45           # Detect dead connections faster

For direct API connections (no proxies):

providers:
  anthropic:
    http_client:
      pool_idle_timeout_secs: 50   # Aggressive connection cycling

Testing

  • ✅ All 408 unit tests pass
  • ✅ Cargo clippy passes with no warnings
  • ✅ Cargo build successful
  • ✅ All pre-commit hooks pass

Compatibility

With Proxies: The new 600s defaults work well with proxies that have longer idle timeouts.

Direct Connections: Users can override settings in their config file if they prefer the more aggressive 50s timeout for direct API connections.

🤖 Generated with Claude Code

The previous 50s pool idle timeout was too aggressive for proxies like MegaLLM
that have longer connection timeouts. This caused "poor internet connection"
errors after exactly 5 minutes of streaming when connections were closed by
the pool while still actively being used.

Changes:
- Increase default pool_idle_timeout_secs from 50s to 600s (10 minutes)
- Update all tests to reflect new default
- Add documentation to example config showing how to customize timeouts
- Update comments explaining the rationale for the change

The new 600s timeout provides:
- Stable connections for long-running streaming requests (e.g., extended thinking)
- Compatibility with various proxy timeout configurations
- Reduced connection overhead from frequent reconnections
- Still well below typical load balancer timeouts (30+ minutes)

For direct connections to OpenAI/Anthropic (without proxies), users can override
this in their config file with http_client.pool_idle_timeout_secs: 50 if desired.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Adds new http_server configuration section to externalize HTTP/TCP settings
that were previously hardcoded. This allows users to tune streaming behavior
for different network conditions and proxy configurations.

New Configuration Options:
- tcp_nodelay (bool): Disable Nagle's algorithm for low-latency (default: true)
- tcp_keepalive_secs (u64): TCP keepalive interval (default: 60s)
- sse_keepalive_enabled (bool): Enable SSE keepalive comments (default: true)
- sse_keepalive_interval_secs (u64): SSE keepalive interval (default: 15s)
- send_buffer_size (optional): TCP send buffer size in bytes
- recv_buffer_size (optional): TCP receive buffer size in bytes

Implementation:
- Added HttpServerSettings struct with serde defaults
- Integrated into ServerConfig with #[serde(default)]
- Passed SSE keepalive config to all passthrough routers
- Applied to both Anthropic and OpenAI streaming handlers
- Updated all three API dialects (OpenAI, Anthropic, Both)
- Fixed borrow checker issues by cloning config before async moves

Benefits:
- Users can adjust SSE keepalive frequency for aggressive proxies (e.g., 10s)
- Configurable TCP behavior without code changes
- Combined with PR #19's pool_idle_timeout increase, provides full control
  over both egress (pool) and ingress (SSE) connection behavior
- Helps prevent "poor internet connection" errors during long streams

Documentation:
- Added comprehensive examples to dual-dialect-passthrough.yaml
- Explained defaults and when to customize each setting
- Server startup logs display configured HTTP server settings

Testing:
- All 408 unit tests pass
- Updated integration tests with new passthrough_router signatures
- Clippy passes with no warnings

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@erans erans merged commit 7e177e2 into main Oct 23, 2025
8 checks passed
@erans erans deleted the bug/18/timeout-after-5-min branch October 23, 2025 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

After 5 minutes there are some connection issues withe MegaLLM

1 participant