fix(codex): avoid custom keepalive transport on chatgpt backend#12953
fix(codex): avoid custom keepalive transport on chatgpt backend#12953liuhao1024 wants to merge 5 commits into
Conversation
|
Maintainer note: This is intentionally not a duplicate of the earlier transport regressions.
So the fix here is narrower: keep the current keepalive path for normal OpenAI-compatible endpoints, but bypass it for the Codex backend only. |
|
Supportive +1 from a real cron incident on macOS/launchd using This PR looks like the right shape to me: skip the custom keepalive transport entirely for the ChatGPT Codex backend rather than trying to make the socket-options path work for an endpoint that appears sensitive to it. In our incident, a Codex cron run appeared to stream final output successfully but then the cron runner never returned/saved/released scheduler state. Avoiding the custom transport for Codex is a cleaner mitigation than piling more special-case timeout handling onto that transport path. Small elegance suggestions, non-blocking:
Overall: I think this is the right fix for #12952 and likely addresses the provider-side cause of the cron wedge we saw. |
Address review suggestion to centralize the Codex backend detection logic
into _is_chatgpt_codex_backend() for reuse across client construction
paths. Improves maintainability and readability.
The helper checks both provider ('openai-codex') and base_url
(https://chatgpt.com/backend-api/codex) to identify the ChatGPT
Codex backend, which is incompatible with custom socket_options
in HTTPTransport.
Fixes: NousResearch#12952
Add regression test to ensure timeout is forwarded correctly when the custom keepalive http_client is injected for regular OpenAI endpoints. This addresses reviewer feedback that the broader class of bugs is "custom http_client subtly changes OpenAI SDK semantics" and pinning timeout preservation makes the transport boundary more robust. The test verifies that a configured timeout (300.0s) is passed through to the OpenAI client constructor even when http_client is injected, confirming that custom transport injection does not silently drop timeout configuration.
Problem: PR NousResearch#12953 conflicted with upstream main in the OpenAI client keepalive/proxy construction path and its regression tests. Resolution: keep main's base_url-aware NO_PROXY handling while skipping the custom keepalive http_client for the ChatGPT Codex backend; merge the regression coverage for regular OpenAI, Codex, NO_PROXY, and timeout behavior.
2f3bcae to
c7c0cf8
Compare
|
@ratacat thanks for the suggestions. I updated the PR to address both points:
I also synced with upstream Local targeted verification: Result: |
What does this PR do?
This draft fixes a Codex-specific transport regression in
_create_openai_client().Hermes currently injects a custom
httpx.HTTPTransport(socket_options=...)to enable TCP keepalives. That path is desirable for regular OpenAI-compatible endpoints, but it is not compatible with the ChatGPT Codex backend athttps://chatgpt.com/backend-api/codex: the TLS handshake gets reset before the request reaches normal API validation.This PR keeps the existing keepalive behavior for normal OpenAI-compatible endpoints, but skips custom transport injection for the Codex backend.
Closes #12952
Related Issue
N/A
Type of Change
Changes Made
How to Test
pytest tests/ -q— all tests should passChecklist
Code
fix(scope):,feat(scope):, etc.)pytest tests/ -qand all tests passDocumentation & Housekeeping
docs/, docstrings) — or N/Acli-config.yaml.exampleif I added/changed config keys — or N/ACONTRIBUTING.mdorAGENTS.mdif I changed architecture and workflows — or N/A