Skip to content

fix(agent): honor HERMES_TLS_MAX_VERSION to cap provider TLS handshakes#44392

Open
AIalliAI wants to merge 4 commits into
NousResearch:mainfrom
AIalliAI:fix/44365-tls-max-version
Open

fix(agent): honor HERMES_TLS_MAX_VERSION to cap provider TLS handshakes#44392
AIalliAI wants to merge 4 commits into
NousResearch:mainfrom
AIalliAI:fix/44365-tls-max-version

Conversation

@AIalliAI

@AIalliAI AIalliAI commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

#44365 reports intermittent DeepSeek connection failures on Windows desktop: the bundled Python/OpenSSL dies with [SSL: UNEXPECTED_EOF_WHILE_READING] ~15s into every TLS 1.3 handshake against api.deepseek.com, while a TLS 1.2-only handshake succeeds in 0.33s (and curl/PowerShell work because they use the OS TLS stack, not OpenSSL). This is a known class of CDN-edge/middlebox behavior — the server (or something in the path) accepts TLS 1.2 ClientHellos but kills TLS 1.3 ones.

This adds the escape hatch the issue asks for, as a config.yaml setting:

network:
  tls_max_version: "1.2"

Implementation

  • hermes_cli/config.py: new network.tls_max_version key (default "" = OpenSSL default), sibling of network.force_ipv4 in the existing "connectivity workarounds" section. Per the AGENTS.md contribution rubric, the user-facing surface is config.yaml — the env var below is an internal bridge.
  • hermes_constants.py: apply_tls_max_version() bridges the config value onto the internal HERMES_TLS_MAX_VERSION env var (same pattern as gateway.strictHERMES_MEDIA_DELIVERY_STRICT). The env-var hop is needed because agent/process_bootstrap.py has no config access at HTTP-client build time, and spawned agent subprocesses must inherit the cap. An explicitly exported env var wins over config.yaml, so one-off shell overrides keep working.
  • Bridged at both entrypoints that already apply network.force_ipv4: the hermes_cli/main.py early raw-yaml block (covers CLI, desktop dashboard spawns, TUI gateway — no extra config.yaml read) and the gateway/run.py bootstrap.
  • agent/process_bootstrap.py: _get_tls_ssl_context() parses the value (accepts 1.2/1.3, optional tls/tlsv prefix, case-insensitive; yaml floats fine) and builds an ssl.SSLContext with maximum_version capped. It honors the CLI's existing CA-bundle override convention (HERMES_CA_BUNDLE > REQUESTS_CA_BUNDLE > SSL_CERT_FILE, same precedence as _resolve_requests_verify in agent/model_metadata.py). Unset/invalid values fall back to httpx defaults with a logged warning; 1.0/1.1 are deliberately rejected — the knob exists to dodge broken TLS 1.3 paths, not to enable deprecated protocols.
  • run_agent.py _build_keepalive_http_client: passes the context to both the keepalive HTTPTransport (httpx ignores client-level verify when an explicit transport is passed) and the httpx.Client (so the proxy mount built internally from proxy= inherits the same cap — otherwise proxied users would silently keep the broken default).

Tests

tests/run_agent/test_keepalive_tls_max_version.py (12 tests): parsing (unset/blank, prefixes, invalid + 1.0/1.1 rejection), integration pins that the capped context lands on the transport pool and on the HTTPProxy mount when HTTPS_PROXY is set, that the default path keeps MAXIMUM_SUPPORTED, and the config bridge (sets the env var, never overrides an explicitly exported one, no-op on empty, handles yaml-float 1.2 and whitespace). Adjacent suites (test_create_openai_client_proxy_env, test_openai_client_lifecycle, test_ipv4_preference, hermes_cli/test_config.py) all pass.

Fixes #44365

🤖 Generated with Claude Code

Some CDN edges and middleboxes accept TLS 1.2 handshakes but kill TLS 1.3
ClientHellos, surfacing as [SSL: UNEXPECTED_EOF_WHILE_READING] ~15s into
every request while curl (OS TLS stack) works fine. NousResearch#44365 hit this on
Windows desktop against api.deepseek.com: TLS1.2-only connects in 0.33s,
TLS1.3 dies after 15s.

Setting HERMES_TLS_MAX_VERSION=1.2 now caps the handshake on the primary
chat client. The ssl context is applied to the keepalive HTTPTransport
directly (httpx ignores client-level verify when an explicit transport is
passed) and to the Client so internally-built proxy mounts inherit the
same cap. The context honors the existing CA-bundle overrides
(HERMES_CA_BUNDLE > REQUESTS_CA_BUNDLE > SSL_CERT_FILE). 1.0/1.1 are
deliberately rejected; invalid values log a warning and fall back to
defaults.

Fixes NousResearch#44365

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@alt-glitch alt-glitch added type/bug Something isn't working comp/agent Core agent loop, run_agent.py, prompt builder area/config Config system, migrations, profiles provider/deepseek DeepSeek API P2 Medium — degraded but workaround exists labels Jun 11, 2026
@liuhao1024

Copy link
Copy Markdown
Contributor

✅ Code Review — Clean

Reviewed the full diff (3 files: process_bootstrap.py, run_agent.py, test_keepalive_tls_max_version.py).

What I verified:

  • _get_tls_ssl_context() correctly rejects TLS 1.0/1.1 (only 1.2/1.3 accepted), handles prefix normalization (tls/tlsv/bare), and falls back gracefully on invalid input
  • CA bundle resolution follows the correct precedence: HERMES_CA_BUNDLE > REQUESTS_CA_BUNDLE > SSL_CERT_FILE
  • The SSL context is applied to both the HTTPTransport pool and the httpx.Client verify kwarg — this is necessary because httpx ignores client-level verify when an explicit transport is passed (well-documented in the PR body)
  • Proxy mount inherits the capped context (tested in test_proxy_mount_inherits_capped_context) — a capped direct transport next to an uncapped proxy mount would silently reintroduce the bug for proxied users
  • 7 tests cover: unset env, TLS 1.2 cap, prefix normalization, invalid values, keepalive transport, proxy mount, and default-TLS sanity

No issues found. LGTM.

…onfig.yaml

Per the AGENTS.md contribution rubric, behavioral settings belong in
config.yaml, not new HERMES_* env vars. The user-facing knob is now
network.tls_max_version (sibling of network.force_ipv4 — same
"connectivity workarounds" section), bridged onto the internal
HERMES_TLS_MAX_VERSION env var at process startup by
hermes_constants.apply_tls_max_version(), following the established
gateway.strict -> HERMES_MEDIA_DELIVERY_STRICT bridge pattern.

The env var remains the mechanism (agent/process_bootstrap has no
config access at client-build time, and spawned agent subprocesses
must inherit the cap) and an explicitly exported value still wins
over config.yaml for one-off shell overrides.

Bridged in both entrypoints that already apply network.force_ipv4:
the hermes_cli/main.py early raw-yaml block (covers CLI, desktop
dashboard spawns, TUI gateway) and the gateway/run.py bootstrap.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@AIalliAI

Copy link
Copy Markdown
Contributor Author

Reworked in 1cb8df1 per the AGENTS.md contribution rubric ("behavioral settings go in config.yaml, not new HERMES_* env vars"): the user-facing knob is now network.tls_max_version in config.yaml (sibling of network.force_ipv4), bridged onto the internal HERMES_TLS_MAX_VERSION env var at startup via hermes_constants.apply_tls_max_version() — same pattern as gateway.strictHERMES_MEDIA_DELIVERY_STRICT. The mechanism @liuhao1024 reviewed is unchanged (_get_tls_ssl_context parsing, transport + proxy-mount application); an explicitly exported env var still wins for one-off shell overrides. 5 new tests cover the bridge (12 total).

AIalliAI and others added 2 commits June 12, 2026 08:57
check-attribution flagged the email introduced by this branch's
2026-06-12 follow-up commit.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
# Conflicts:
#	scripts/release.py
@AIalliAI

Copy link
Copy Markdown
Contributor Author

Requesting maintainer review — this is ready to land from my side. Just merge-synced with current main (the only conflict was a trivial AUTHOR_MAP keep-both in scripts/release.py); the PR's touched test files pass locally on the merged head. Standalone fork CI is pending first-run approval here; the rollup branch in #44061 carrying this session's batch is fully green on upstream CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/config Config system, migrations, profiles comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists provider/deepseek DeepSeek API type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Windows desktop: DeepSeek intermittent connection failure with Python/OpenSSL TLS 1.3

3 participants