fix(agent): honor proxy-aware OpenAI client routing (#5454)#12010
fix(agent): honor proxy-aware OpenAI client routing (#5454)#12010kshitijk4poor wants to merge 2 commits into
Conversation
…palive behavior The injected httpx keepalive transport fixed CLOSE-WAIT hangs on direct provider connections, but it also made httpx skip its trust_env proxy mount setup. In proxied environments that caused silent direct connects for LLM traffic even when HTTP(S)_PROXY or system proxy settings were present. This change keeps the keepalive transport on the direct-connect path, but lets the SDK/httpx default client own proxy resolution whenever a proxy is configured. It also broadens socket iteration so cleanup and dead-connection probing continue to work when proxied clients place live connections under mount transports instead of only the default transport. Constraint: Must preserve the existing direct-connect keepalive mitigation for NousResearch#10324 Rejected: Env-var-only proxy detection | misses macOS/Windows system proxy sources that httpx trust_env honors Rejected: Explicit proxy wiring on custom HTTPTransport | reimplements httpx proxy semantics incompletely and leaves NO_PROXY/system-proxy edge cases to Hermes Confidence: high Scope-risk: moderate Reversibility: clean Directive: Keep proxy detection aligned with httpx trust_env behavior; do not reintroduce custom transport injection on the proxy path without covering mount-based cleanup too Tested: uv run --extra dev pytest tests/run_agent/test_create_openai_client_reuse.py tests/run_agent/test_create_openai_client_kwargs_isolation.py -q Tested: uv run --extra dev pytest tests/agent/test_proxy_and_url_validation.py -q Tested: python3 -m py_compile run_agent.py tests/run_agent/test_create_openai_client_reuse.py Not-tested: Live proxy egress against a real HTTP/SOCKS proxy endpoint Related: NousResearch#5454 Related: NousResearch#10324 Related: NousResearch#11609
The initial proxy-routing fix treated any configured proxy as a signal to skip the custom keepalive transport. That was too coarse: a global proxy plus NO_PROXY for localhost still routes local/custom providers directly, so dropping keepalive there would re-open the CLOSE-WAIT hang class for local LiteLLM/Ollama-style endpoints. This change makes the proxy decision host-aware. Hermes now delegates to httpx's default trust_env client only when the current base_url would actually route through the proxy. Targets bypassed by NO_PROXY keep the direct-path keepalive transport. Constraint: Must not regress the proxied-routing fix for NousResearch#5454 while restoring the NousResearch#10324 mitigation on local endpoints Rejected: Keep the coarse any-proxy gate and document NO_PROXY as unsupported | breaks common localhost custom-provider setups Confidence: high Scope-risk: narrow Reversibility: clean Directive: Any future proxy gating must consider per-host bypass rules, not just the existence of global proxy settings Tested: uv run --extra dev pytest tests/run_agent/test_create_openai_client_reuse.py tests/run_agent/test_create_openai_client_kwargs_isolation.py tests/agent/test_proxy_and_url_validation.py -q Tested: python3 -m py_compile run_agent.py tests/run_agent/test_create_openai_client_reuse.py Not-tested: Live end-to-end traffic through a real proxy with NO_PROXY exclusions Related: NousResearch#5454 Related: NousResearch#10324 Related: NousResearch#11609
|
Addressed the regression called out in review.\n\nThe proxy gate is now host-aware instead of global: Hermes only skips the keepalive transport when the current would actually route through the configured proxy. Targets bypassed by (for example / custom providers) keep the direct-path keepalive transport, so the mitigation still applies there.\n\nAdded regression coverage for the + case and re-ran the focused proxy/client tests. |
|
Addressed the The proxy gate is now host-aware instead of global: Hermes only skips the keepalive transport when the current Added regression coverage for the |
Summary
httpx.Clientcreation when a proxy is configured so the OpenAI SDK/httpx defaulttrust_envpath owns routingProblem
run_agent.py::_create_openai_client()currently injectshttpx.HTTPTransport(socket_options=...)to mitigate the CLOSE-WAIT hang from #10324.That transport path has an important side effect: once Hermes hands httpx an explicit transport, httpx no longer constructs its normal proxy mounts from
trust_env=True. In practice this means LLM traffic can silently bypassHTTP_PROXY/HTTPS_PROXY/ALL_PROXYor system proxy settings and direct-connect instead.That matches the behavior reported in #5454 and the later generalized report in #11609.
Why this fix
There were a few prior approaches in related PRs:
#11414,#11664,#11943: skip injection when proxy env vars are present#11702: explicitly wireproxy=onto the customHTTPTransport#11733: detect proxies the same way httpx does and let the SDK/httpx default client own proxy routing, while broadening cleanup for mount-based transportsThis PR follows the
#11733direction because it is the most correct salvage:urllib.request.getproxies()so detection matches httpxtrust_envbehavior more closely than env-only checksNO_PROXYand system proxy sources_mounts, which the old_force_close_tcp_sockets()/_cleanup_dead_connections()logic would missChanges
run_agent.pyAIAgent._has_proxy_configured()usingurllib.request.getproxies()not self._has_proxy_configured()AIAgent._iter_client_sockets()covering both_transportand_mounts_force_close_tcp_sockets()and_cleanup_dead_connections()to reuse that iteratortests/run_agent/test_create_openai_client_reuse.pyVerification
uv run --extra dev pytest tests/run_agent/test_create_openai_client_reuse.py tests/run_agent/test_create_openai_client_kwargs_isolation.py -quv run --extra dev pytest tests/agent/test_proxy_and_url_validation.py -qpython3 -m py_compile run_agent.py tests/run_agent/test_create_openai_client_reuse.pyNotes
Closes #5454
Related to #10324
Related to #11609