fix(agent): exclude ssl.SSLError from is_local_validation_error to prevent non-retryable abort#14445
Closed
Bartok9 wants to merge 1 commit into
Closed
Conversation
…event non-retryable abort ssl.SSLError (and its subclass ssl.SSLCertVerificationError) inherits from OSError *and* ValueError via Python's MRO. The is_local_validation_error check used isinstance(api_error, (ValueError, TypeError)) to detect programming bugs that should abort immediately — but this inadvertently caught ssl.SSLError, treating a TLS transport failure as a non-retryable client error. The error classifier already maps SSLCertVerificationError to FailoverReason.timeout with retryable=True (its type name is in _TRANSPORT_ERROR_TYPES), but the inline isinstance guard was overriding that classification and triggering an unnecessary abort. Fix: add ssl.SSLError to the exclusion list alongside the existing UnicodeEncodeError carve-out so TLS errors fall through to the classifier's retryable path. Closes NousResearch#14367
Collaborator
19 tasks
Contributor
Author
|
Thanks @alt-glitch. Agreed — three competing fixes for the same bug. Among the three, #14374 by @sgaofen is the most thorough (extracts a helper, which makes the guard easier to extend in future). Happy to close this in favor of #14374 if maintainers prefer. Deferring to their judgment. |
Contributor
|
Thanks for the fix, @Bartok9! This is an automated hermes-sweeper review. The change proposed here is already on Evidence:
As noted by @alt-glitch and agreed by you, three PRs raced to fix #14367. The fix has landed. Closing as implemented on main. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Fixes #14367.
ssl.SSLError(and its subclassssl.SSLCertVerificationError) inherits from bothOSErrorandValueErrorvia Python's MRO:The
is_local_validation_errorcheck at run_agent.py ~line 10681:…catches
ssl.SSLCertVerificationErroras aValueError, marking it non-retryable and triggering an immediate abort.The error classifier already maps
SSLCertVerificationErrortoFailoverReason.timeout, retryable=True(its type name appears in_TRANSPORT_ERROR_TYPES), but the inlineisinstanceguard overrides that mapping before the classifier result is checked.Result: A TLS transport error causes an unnecessary session abort instead of retrying or failing over to a backup provider.
Fix
Add
ssl.SSLErrorto the exclusion list alongside the existingUnicodeEncodeErrorcarve-out, and importsslat the top ofrun_agent.py:Verification