fix: distinguish network timeouts from context window errors#681
fix: distinguish network timeouts from context window errors#681yinwm merged 1 commit intosipeed:mainfrom
Conversation
c22853b to
5e3874e
Compare
HTTP timeouts (context deadline exceeded, Client.Timeout) were
incorrectly classified as context window errors, triggering useless
history compression. Replace broad substring checks ("context",
"token", "length") with specific patterns for real context limit
errors and explicitly exclude timeout errors from that path.
Additionally, timeout errors were not retried at all — the retry
loop only handled context window errors. Now timeouts are retried
up to 2 times with exponential backoff (5s, 10s).
5e3874e to
a4b6cea
Compare
PixelTux
left a comment
There was a problem hiding this comment.
Clear error messages for debugging o observability. LGTM
nikolasdehor
left a comment
There was a problem hiding this comment.
Good fix for a real problem. Network timeouts being misclassified as context window errors would cause unnecessary history compression and confusing user messages.
The approach is correct: check for timeout first, then check for context window errors with !isTimeoutError guard.
Minor observations:
-
strings.Contains(errMsg, "max_tokens")could false-positive on API responses that echo back themax_tokensparameter in error messages unrelated to context limits (e.g., "max_tokens must be positive"). This is an edge case but worth noting. -
The backoff
time.Duration(retry+1) * 5 * time.Secondgives 5s/10s, which is reasonable but blocks the goroutine. For high-concurrency scenarios, consider using a timer with context cancellation. Not a blocker for this PR. -
Note that PR #699 (decompose AgentLoop) extracts the same error detection into
IsContextWindowError()inerror_classifier.go. These two PRs will conflict. Whichever lands second should incorporate the timeout guard from this PR.
LGTM.
|
Thanks for the pr |
|
@dimensi Good catch on the timeout vs context window misclassification. Broad substring matching on "context" and "token" silently treating network timeouts as context exhaustion is a frustrating bug to track down. The retry with exponential backoff for actual timeouts is a nice addition too. We have a PicoClaw Dev Group on Discord where contributors share ideas and collaborate. If you're interested, send an email to |
|
Thanks, but I realized I don't know how to use openclaw, picoclaw, and similar bots in real life, so I won't for now) |
fix: distinguish network timeouts from context window errors
fix: distinguish network timeouts from context window errors
Fixes #683
Summary
context deadline exceeded,Client.Timeout exceeded) were incorrectly classified as context window errors due to overly broad substring matching ("context","token","length")context_length_exceeded,token limit,too many tokens,invalidparameter, etc.)Test plan
TestAgentLoop_ContextExhaustionRetrypasses — real context window errors still trigger compressiongo test ./...)