fix(run_agent): prevent _create_openai_client from mutating caller's kwargs by kshitijk4poor · Pull Request #11056 · NousResearch/hermes-agent

kshitijk4poor · 2026-04-16T13:55:18Z

Cherry-picked from #10978 by @taeuk178.

Shallow-copy client_kwargs at the top of _create_openai_client() to prevent in-place mutation from leaking back into self._client_kwargs. Defensive fix that locks the contract for future httpx/transport work.

Test Results

tests/run_agent/test_create_openai_client_kwargs_isolation.py: 1 passed

@taeuk178

…args Shallow-copy client_kwargs at the top of _create_openai_client() to prevent in-place mutation from leaking back into self._client_kwargs. Defensive fix that locks the contract for future httpx/transport work. Cherry-picked from #10978 by @taeuk178.

swnb · 2026-04-16T14:32:43Z

I independently hit the same root cause while debugging gateway instability on macOS, and I wanted to add a few datapoints that may help reviewers validate the fix.

What I observed locally:

Long-running hermes gateway run --replace processes using openai-codex would start producing repeated APIConnectionError / APITimeoutError, even on tiny prompts (2-7 messages, ~5k-10k tokens).
In the same environment, a fresh-shell hermes chat -q ... often still succeeded.
Replaying the exact request-dump body from a failed gateway turn as a fresh streaming POST to https://chatgpt.com/backend-api/codex/responses returned HTTP 200, which strongly pointed away from payload content and toward daemon/client state.
lsof on the gateway daemons showed stale TCP state (CLOSE_WAIT) in the long-lived process.

That matched the exact issue fixed here: _create_openai_client() was mutating caller-owned kwargs, so gateway-held self._client_kwargs could retain a concrete http_client, and later request clients were not actually fresh.

Additional note: on my side, preserving OpenAI SDK timeout / connection-limit defaults alongside the keepalive client was also helpful for keeping gateway behavior aligned with fresh-shell behavior.

For traceability, I documented the full reproduction and validation notes in #11070, and closed my duplicate PR #11072 in favor of this one.

kshitijk4poor · 2026-04-16T15:40:20Z

I independently hit the same root cause while debugging gateway instability on macOS, and I wanted to add a few datapoints that may help reviewers validate the fix.

What I observed locally:

Long-running hermes gateway run --replace processes using openai-codex would start producing repeated APIConnectionError / APITimeoutError, even on tiny prompts (2-7 messages, ~5k-10k tokens).

In the same environment, a fresh-shell hermes chat -q ... often still succeeded.

Replaying the exact request-dump body from a failed gateway turn as a fresh streaming POST to https://chatgpt.com/backend-api/codex/responses returned HTTP 200, which strongly pointed away from payload content and toward daemon/client state.

lsof on the gateway daemons showed stale TCP state (CLOSE_WAIT) in the long-lived process.

That matched the exact issue fixed here: _create_openai_client() was mutating caller-owned kwargs, so gateway-held self._client_kwargs could retain a concrete http_client, and later request clients were not actually fresh.

Additional note: on my side, preserving OpenAI SDK timeout / connection-limit defaults alongside the keepalive client was also helpful for keeping gateway behavior aligned with fresh-shell behavior.

For traceability, I documented the full reproduction and validation notes in #11070, and closed my duplicate PR #11072 in favor of this one.

Thanks for the findings. Could you please verify if this PR fixes your issue?

kshitijk4poor mentioned this pull request Apr 16, 2026

fix(run_agent): prevent _create_openai_client from mutating caller's kwargs #10978

Closed

2 tasks

This was referenced Apr 16, 2026

fix(agent): avoid reusing mutated http client kwargs #11072

Closed

Regression: gateway reuses mutated OpenAI http_client kwargs and accumulates stale connections #11070

Closed

kshitijk4poor merged commit 896e7b0 into main Apr 16, 2026
6 of 7 checks passed

kshitijk4poor deleted the fix/kwargs-mutation-guard branch April 16, 2026 14:45

alt-glitch mentioned this pull request Apr 25, 2026

fix(agent): copy client_kwargs before mutating to prevent shared httpx.Client #11369

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(run_agent): prevent _create_openai_client from mutating caller's kwargs#11056

fix(run_agent): prevent _create_openai_client from mutating caller's kwargs#11056
kshitijk4poor merged 1 commit into
mainfrom
fix/kwargs-mutation-guard

kshitijk4poor commented Apr 16, 2026

Uh oh!

swnb commented Apr 16, 2026

Uh oh!

Uh oh!

kshitijk4poor commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kshitijk4poor commented Apr 16, 2026

Test Results

Uh oh!

swnb commented Apr 16, 2026

Uh oh!

Uh oh!

kshitijk4poor commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants