Skip to content

fix: apply idle timeout to domain fronting relay#416

Merged
9seconds merged 4 commits into9seconds:masterfrom
dolonet:fix/domain-fronting-idle-timeout
Mar 29, 2026
Merged

fix: apply idle timeout to domain fronting relay#416
9seconds merged 4 commits into9seconds:masterfrom
dolonet:fix/domain-fronting-idle-timeout

Conversation

@dolonet
Copy link
Copy Markdown
Contributor

@dolonet dolonet commented Mar 28, 2026

Summary

Fixes #378 — two bugs that together cause proxy failure under non-Telegram traffic spikes.

1. Domain fronting relay has no idle timeout

ProxyOpts.IdleTimeout was defined but never wired into the proxy. The doDomainFronting relay used raw io.CopyBuffer with no deadline, so each forwarded connection could block a worker goroutine indefinitely. Under a burst of probes/scanners/slowloris the pool fills up and legitimate Telegram connections get dropped.

Fix: wrap both sides of the domain fronting relay in a connIdleTimeout adapter that resets read/write deadlines on every I/O operation. The timeout comes from network.timeout.idle config (default 1m).

2. Connection leak on worker pool overflow

When the pool rejected a connection (ErrPoolOverload), the accepted net.Conn was never closed — leaking a file descriptor and TCP socket per rejected connection. Under sustained spikes this compounds the problem: leaked descriptors reduce capacity for new dials (including to the fronting domain).

Fix: conn.Close() before logging the concurrency limit event, consistent with the blocklist/allowlist paths.

Changes

  • mtglib/conns.go — add connIdleTimeout wrapper
  • mtglib/proxy.go — store idleTimeout in Proxy; wrap domain fronting relay connections; close conn on pool overflow
  • mtglib/proxy_opts.go — add getIdleTimeout() with default fallback
  • internal/cli/run_proxy.go — pass conf.Network.Timeout.Idle into ProxyOpts

Domain fronting relay (for non-Telegram traffic) had no idle timeout,
causing worker pool exhaustion under traffic spikes.

The ProxyOpts.IdleTimeout field existed but was never wired into the
proxy. Now domain fronting connections are wrapped with per-read/write
deadlines reset to the configured idle timeout (default 1m), so stale
or slowloris-style connections are reaped promptly.

Fixes 9seconds#378
dolonet added 3 commits March 28, 2026 22:52
When the worker pool rejected a connection (ErrPoolOverload), the
accepted net.Conn was never closed — leaking a file descriptor and
TCP socket per rejected connection. Under sustained traffic spikes this
compounds the problem: leaked descriptors reduce the capacity for new
dials (including to the fronting domain), accelerating the failure
cascade described in 9seconds#378.
- avoid deprecated DefaultIdleTimeout, use time.Minute directly
- simplify embedded field selectors (QF1008)
@9seconds 9seconds merged commit 735466b into 9seconds:master Mar 29, 2026
5 checks passed
dolonet added a commit to dolonet/mtg-multi that referenced this pull request Mar 29, 2026
Wrap both sides of the Telegram relay in connIdleTimeout,
same as already done for domain fronting in 9seconds#416.

Without this, if a client disappears (network drop, battery dies),
the TCP connection stays formally alive and the goroutine in the
worker pool blocks on io.CopyBuffer indefinitely. Under mass client
disconnects this accumulates zombie goroutines.

Fixes 9seconds#417
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cannot dial to the fronting domain

2 participants