Regression: gateway reuses mutated OpenAI http_client kwargs and accumulates stale connections

## Summary
A regression in the TCP keepalive change causes long-running gateway sessions to reuse a mutated `http_client` stored inside `self._client_kwargs`, so later request clients are not actually fresh. In practice this leaves stale sockets in the shared pool and leads to repeated `APIConnectionError` / `APITimeoutError` on `openai-codex`, even for tiny prompts.

## Related
- Follow-up/regression from #10324 (`fix: enable TCP keepalives to detect dead provider connections`)

## Symptoms
- `hermes chat -q ...` can succeed in a fresh shell
- long-running `hermes gateway run --replace` sessions start failing with `Connection error.`
- failures happen even on tiny contexts (for example 2-7 messages / ~5k-10k tokens)
- gateway processes show lingering `CLOSE_WAIT` sockets until restart

## Root Cause
`run_agent.py::_create_openai_client()` mutates the passed `client_kwargs` in-place by inserting a concrete `http_client`. In gateway mode, `self._client_kwargs` is retained across turns and reused by `_create_request_openai_client()`, so future "fresh" request clients silently reuse the same underlying `httpx.Client` / transport pool.

That defeats the intended per-request isolation and lets stale sockets survive across retries/turns.

## Reproduction
1. Run Hermes via gateway with `openai-codex`
2. Send a few messages over time
3. Observe repeated retries ending in `APIConnectionError` / `APITimeoutError`
4. Inspect the process with `lsof` and note lingering `CLOSE_WAIT` sockets
5. In a fresh shell, replay the same payload or run `hermes chat -q ...` and observe success

## Expected
Each request client should get a fresh `httpx.Client` / transport pool, while the shared client keeps its own lifecycle.

## Proposed Fix
- copy `client_kwargs` at the start of `_create_openai_client()` instead of mutating the caller-owned dict
- keep the keepalive `http_client`, but preserve OpenAI SDK default timeout / limits
- add regression tests proving:
  - `_client_kwargs` is not mutated
  - request clients do not reuse the shared client's `http_client`
  - keepalive client keeps OpenAI default timeout instead of `httpx`'s 5s default

## Validation
Targeted tests:
- `pytest tests/run_agent/test_openai_client_lifecycle.py -q`

Manual checks performed on macOS:
- direct `hermes chat -q` works for both default and named profile
- threaded / gateway-like probes work
- long-running gateways recover after restart and stop showing stale `CLOSE_WAIT` buildup


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression: gateway reuses mutated OpenAI http_client kwargs and accumulates stale connections #11070

Summary

Related

Symptoms

Root Cause

Reproduction

Expected

Proposed Fix

Validation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Regression: gateway reuses mutated OpenAI http_client kwargs and accumulates stale connections #11070

Description

Summary

Related

Symptoms

Root Cause

Reproduction

Expected

Proposed Fix

Validation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions