Summary
classify_error() walks __cause__ / __context__ to extract a nested status code, but _extract_error_body() only inspects the top-level exception. When an SDK/API error is wrapped, Hermes can keep the nested 402 status code but lose the nested body message that distinguishes transient rate limits from billing failures.
Affected files
agent/error_classifier.py:263-266
agent/error_classifier.py:774-788
Why this is a bug
At classification time:
status_code = _extract_status_code(error) walks the cause chain
body = _extract_error_body(error) does not walk the cause chain
So a wrapped error like this:
- outer exception:
Exception("outer")
__cause__: provider/SDK exception with status_code=402 and body message "Usage limit reached, try again in 5 minutes"
gets classified using:
- status code:
402 (nested cause found)
- message/body:
outer / {} (nested body lost)
Minimal reproduction
Wrap a mock API error with:
status_code = 402
body = {"error": {"message": "Usage limit reached, try again in 5 minutes"}}
inside an outer exception and pass the outer exception to classify_error().
Expected
Classify as a transient/rate-limit condition, because the nested body explicitly says to retry later.
Actual
Classifies as billing, because the nested body is ignored and only the status code survives.
Suggested investigation
Make _extract_error_body() traverse __cause__ / __context__ the same way _extract_status_code() already does.
A regression test should cover wrapped 402/400 cases where only the nested body contains the decisive message.
Summary
classify_error()walks__cause__/__context__to extract a nested status code, but_extract_error_body()only inspects the top-level exception. When an SDK/API error is wrapped, Hermes can keep the nested402status code but lose the nested body message that distinguishes transient rate limits from billing failures.Affected files
agent/error_classifier.py:263-266agent/error_classifier.py:774-788Why this is a bug
At classification time:
status_code = _extract_status_code(error)walks the cause chainbody = _extract_error_body(error)does not walk the cause chainSo a wrapped error like this:
Exception("outer")__cause__: provider/SDK exception withstatus_code=402and body message"Usage limit reached, try again in 5 minutes"gets classified using:
402(nested cause found)outer/{}(nested body lost)Minimal reproduction
Wrap a mock API error with:
status_code = 402body = {"error": {"message": "Usage limit reached, try again in 5 minutes"}}inside an outer exception and pass the outer exception to
classify_error().Expected
Classify as a transient/rate-limit condition, because the nested body explicitly says to retry later.
Actual
Classifies as
billing, because the nested body is ignored and only the status code survives.Suggested investigation
Make
_extract_error_body()traverse__cause__/__context__the same way_extract_status_code()already does.A regression test should cover wrapped 402/400 cases where only the nested body contains the decisive message.