Skip to content

fix(agent): exclude ssl.SSLError from is_local_validation_error to prevent non-retryable abort#14445

Closed
Bartok9 wants to merge 1 commit into
NousResearch:mainfrom
Bartok9:fix/ssl-error-misclassified-as-local
Closed

fix(agent): exclude ssl.SSLError from is_local_validation_error to prevent non-retryable abort#14445
Bartok9 wants to merge 1 commit into
NousResearch:mainfrom
Bartok9:fix/ssl-error-misclassified-as-local

Conversation

@Bartok9

@Bartok9 Bartok9 commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

Problem

Fixes #14367.

ssl.SSLError (and its subclass ssl.SSLCertVerificationError) inherits from both OSError and ValueError via Python's MRO:

ssl.SSLCertVerificationError
  └─ ssl.SSLError
       ├─ OSError
       └─ (via adapter) ValueError

The is_local_validation_error check at run_agent.py ~line 10681:

is_local_validation_error = (
    isinstance(api_error, (ValueError, TypeError))
    and not isinstance(api_error, UnicodeEncodeError)
)

…catches ssl.SSLCertVerificationError as a ValueError, marking it non-retryable and triggering an immediate abort.

The error classifier already maps SSLCertVerificationError to FailoverReason.timeout, retryable=True (its type name appears in _TRANSPORT_ERROR_TYPES), but the inline isinstance guard overrides that mapping before the classifier result is checked.

Result: A TLS transport error causes an unnecessary session abort instead of retrying or failing over to a backup provider.

Fix

Add ssl.SSLError to the exclusion list alongside the existing UnicodeEncodeError carve-out, and import ssl at the top of run_agent.py:

import ssl
# …
is_local_validation_error = (
    isinstance(api_error, (ValueError, TypeError))
    and not isinstance(api_error, UnicodeEncodeError)
    and not isinstance(api_error, ssl.SSLError)   # ← new
)

Verification

import ssl
e = ssl.SSLCertVerificationError("certificate verify failed")

# Before fix:
isinstance(e, (ValueError, TypeError))  # True  → flagged as programming bug

# After fix:
is_local = (
    isinstance(e, (ValueError, TypeError))
    and not isinstance(e, UnicodeEncodeError)
    and not isinstance(e, ssl.SSLError)
)
assert is_local is False  # ✅ falls through to classifier's retryable path

…event non-retryable abort

ssl.SSLError (and its subclass ssl.SSLCertVerificationError) inherits from
OSError *and* ValueError via Python's MRO. The is_local_validation_error
check used isinstance(api_error, (ValueError, TypeError)) to detect
programming bugs that should abort immediately — but this inadvertently
caught ssl.SSLError, treating a TLS transport failure as a non-retryable
client error.

The error classifier already maps SSLCertVerificationError to
FailoverReason.timeout with retryable=True (its type name is in
_TRANSPORT_ERROR_TYPES), but the inline isinstance guard was overriding
that classification and triggering an unnecessary abort.

Fix: add ssl.SSLError to the exclusion list alongside the existing
UnicodeEncodeError carve-out so TLS errors fall through to the
classifier's retryable path.

Closes NousResearch#14367
@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 23, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Competing PRs for #14367: this, #14434, and #14374 all fix the same ssl.SSLError misclassification. #14374 extracts a helper function; this and #14434 are inline fixes.

@Bartok9

Bartok9 commented Apr 23, 2026

Copy link
Copy Markdown
Contributor Author

Thanks @alt-glitch. Agreed — three competing fixes for the same bug. Among the three, #14374 by @sgaofen is the most thorough (extracts a helper, which makes the guard easier to extend in future). Happy to close this in favor of #14374 if maintainers prefer. Deferring to their judgment.

@teknium1

Copy link
Copy Markdown
Contributor

Thanks for the fix, @Bartok9! This is an automated hermes-sweeper review.

The change proposed here is already on main — the exact commit from this PR (4e27e498f) is in main's ancestry (164 commits ahead of it now).

Evidence:

As noted by @alt-glitch and agreed by you, three PRs raced to fix #14367. The fix has landed. Closing as implemented on main.

@teknium1 teknium1 closed this Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P1 High — major feature broken, no workaround type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ssl.SSLCertVerificationError misclassified as local validation error

3 participants