Skip to content

fix(agent): honor proxy-aware OpenAI client routing (#5454)#12010

Open
kshitijk4poor wants to merge 2 commits into
NousResearch:mainfrom
kshitijk4poor:fix-openai-proxy-trust-env
Open

fix(agent): honor proxy-aware OpenAI client routing (#5454)#12010
kshitijk4poor wants to merge 2 commits into
NousResearch:mainfrom
kshitijk4poor:fix-openai-proxy-trust-env

Conversation

@kshitijk4poor

Copy link
Copy Markdown
Collaborator

Summary

  • skip injected keepalive httpx.Client creation when a proxy is configured so the OpenAI SDK/httpx default trust_env path owns routing
  • preserve the existing TCP keepalive transport on the direct-connect path
  • extend socket cleanup/dead-connection probing to walk proxy mount transports as well as the default transport
  • add regression coverage for proxy detection, direct-path keepalive, and mount-based socket cleanup

Problem

run_agent.py::_create_openai_client() currently injects httpx.HTTPTransport(socket_options=...) to mitigate the CLOSE-WAIT hang from #10324.

That transport path has an important side effect: once Hermes hands httpx an explicit transport, httpx no longer constructs its normal proxy mounts from trust_env=True. In practice this means LLM traffic can silently bypass HTTP_PROXY / HTTPS_PROXY / ALL_PROXY or system proxy settings and direct-connect instead.

That matches the behavior reported in #5454 and the later generalized report in #11609.

Why this fix

There were a few prior approaches in related PRs:

  • #11414, #11664, #11943: skip injection when proxy env vars are present
  • #11702: explicitly wire proxy= onto the custom HTTPTransport
  • #11733: detect proxies the same way httpx does and let the SDK/httpx default client own proxy routing, while broadening cleanup for mount-based transports

This PR follows the #11733 direction because it is the most correct salvage:

  • it uses urllib.request.getproxies() so detection matches httpx trust_env behavior more closely than env-only checks
  • it avoids partially reimplementing httpx proxy semantics around NO_PROXY and system proxy sources
  • it fixes the adjacent cleanup gap where proxied clients place live sockets under _mounts, which the old _force_close_tcp_sockets() / _cleanup_dead_connections() logic would miss

Changes

  • run_agent.py
    • add AIAgent._has_proxy_configured() using urllib.request.getproxies()
    • gate keepalive transport injection on not self._has_proxy_configured()
    • add AIAgent._iter_client_sockets() covering both _transport and _mounts
    • refactor _force_close_tcp_sockets() and _cleanup_dead_connections() to reuse that iterator
  • tests/run_agent/test_create_openai_client_reuse.py
    • add proxy/detection regression tests
    • lock the direct-path keepalive invariant
    • add coverage that mount-based sockets are included in cleanup

Verification

  • uv run --extra dev pytest tests/run_agent/test_create_openai_client_reuse.py tests/run_agent/test_create_openai_client_kwargs_isolation.py -q
  • uv run --extra dev pytest tests/agent/test_proxy_and_url_validation.py -q
  • python3 -m py_compile run_agent.py tests/run_agent/test_create_openai_client_reuse.py

Notes

Closes #5454
Related to #10324
Related to #11609

…palive behavior

The injected httpx keepalive transport fixed CLOSE-WAIT hangs on direct
provider connections, but it also made httpx skip its trust_env proxy
mount setup. In proxied environments that caused silent direct connects
for LLM traffic even when HTTP(S)_PROXY or system proxy settings were
present.

This change keeps the keepalive transport on the direct-connect path,
but lets the SDK/httpx default client own proxy resolution whenever a
proxy is configured. It also broadens socket iteration so cleanup and
dead-connection probing continue to work when proxied clients place live
connections under mount transports instead of only the default transport.

Constraint: Must preserve the existing direct-connect keepalive mitigation for NousResearch#10324
Rejected: Env-var-only proxy detection | misses macOS/Windows system proxy sources that httpx trust_env honors
Rejected: Explicit proxy wiring on custom HTTPTransport | reimplements httpx proxy semantics incompletely and leaves NO_PROXY/system-proxy edge cases to Hermes
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: Keep proxy detection aligned with httpx trust_env behavior; do not reintroduce custom transport injection on the proxy path without covering mount-based cleanup too
Tested: uv run --extra dev pytest tests/run_agent/test_create_openai_client_reuse.py tests/run_agent/test_create_openai_client_kwargs_isolation.py -q
Tested: uv run --extra dev pytest tests/agent/test_proxy_and_url_validation.py -q
Tested: python3 -m py_compile run_agent.py tests/run_agent/test_create_openai_client_reuse.py
Not-tested: Live proxy egress against a real HTTP/SOCKS proxy endpoint
Related: NousResearch#5454
Related: NousResearch#10324
Related: NousResearch#11609
The initial proxy-routing fix treated any configured proxy as a signal to
skip the custom keepalive transport. That was too coarse: a global proxy
plus NO_PROXY for localhost still routes local/custom providers directly,
so dropping keepalive there would re-open the CLOSE-WAIT hang class for
local LiteLLM/Ollama-style endpoints.

This change makes the proxy decision host-aware. Hermes now delegates to
httpx's default trust_env client only when the current base_url would
actually route through the proxy. Targets bypassed by NO_PROXY keep the
direct-path keepalive transport.

Constraint: Must not regress the proxied-routing fix for NousResearch#5454 while restoring the NousResearch#10324 mitigation on local endpoints
Rejected: Keep the coarse any-proxy gate and document NO_PROXY as unsupported | breaks common localhost custom-provider setups
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Any future proxy gating must consider per-host bypass rules, not just the existence of global proxy settings
Tested: uv run --extra dev pytest tests/run_agent/test_create_openai_client_reuse.py tests/run_agent/test_create_openai_client_kwargs_isolation.py tests/agent/test_proxy_and_url_validation.py -q
Tested: python3 -m py_compile run_agent.py tests/run_agent/test_create_openai_client_reuse.py
Not-tested: Live end-to-end traffic through a real proxy with NO_PROXY exclusions
Related: NousResearch#5454
Related: NousResearch#10324
Related: NousResearch#11609
@kshitijk4poor

Copy link
Copy Markdown
Collaborator Author

Addressed the regression called out in review.\n\nThe proxy gate is now host-aware instead of global: Hermes only skips the keepalive transport when the current would actually route through the configured proxy. Targets bypassed by (for example / custom providers) keep the direct-path keepalive transport, so the mitigation still applies there.\n\nAdded regression coverage for the + case and re-ran the focused proxy/client tests.

@kshitijk4poor

Copy link
Copy Markdown
Collaborator Author

Addressed the NO_PROXY regression called out in review.

The proxy gate is now host-aware instead of global: Hermes only skips the keepalive transport when the current base_url would actually route through the configured proxy. Targets bypassed by NO_PROXY (for example localhost / 127.0.0.1 custom providers) keep the direct-path keepalive transport, so the #10324 mitigation still applies there.

Added regression coverage for the HTTP_PROXY + NO_PROXY=localhost,127.0.0.1 case and re-ran the focused proxy/client tests.

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Proxy support for LLM API calls

2 participants