Skip to content

fix: distinguish network timeouts from context window errors#681

Merged
yinwm merged 1 commit intosipeed:mainfrom
dimensi:bugfix/falsy-context-deadline
Feb 28, 2026
Merged

fix: distinguish network timeouts from context window errors#681
yinwm merged 1 commit intosipeed:mainfrom
dimensi:bugfix/falsy-context-deadline

Conversation

@dimensi
Copy link
Contributor

@dimensi dimensi commented Feb 23, 2026

Fixes #683

Summary

  • HTTP timeouts (context deadline exceeded, Client.Timeout exceeded) were incorrectly classified as context window errors due to overly broad substring matching ("context", "token", "length")
  • This caused useless history compression and a misleading "Context window exceeded" message to users when the real problem was a network timeout (e.g. slow z.ai API response)
  • Replaced broad checks with explicit timeout detection first, then specific context window error patterns (context_length_exceeded, token limit, too many tokens, invalidparameter, etc.)
  • Timeout errors were not retried at all — now they retry up to 2 times with exponential backoff (5s, 10s)

Test plan

  • Existing TestAgentLoop_ContextExhaustionRetry passes — real context window errors still trigger compression
  • Full test suite passes (go test ./...)
  • Manual test: simulate z.ai timeout and verify no false "Context window exceeded" message
  • Manual test: verify timeout retries with backoff in logs

@dimensi dimensi force-pushed the bugfix/falsy-context-deadline branch 2 times, most recently from c22853b to 5e3874e Compare February 23, 2026 13:13
HTTP timeouts (context deadline exceeded, Client.Timeout) were
incorrectly classified as context window errors, triggering useless
history compression. Replace broad substring checks ("context",
"token", "length") with specific patterns for real context limit
errors and explicitly exclude timeout errors from that path.

Additionally, timeout errors were not retried at all — the retry
loop only handled context window errors. Now timeouts are retried
up to 2 times with exponential backoff (5s, 10s).
@dimensi dimensi force-pushed the bugfix/falsy-context-deadline branch from 5e3874e to a4b6cea Compare February 24, 2026 18:54
Copy link
Collaborator

@PixelTux PixelTux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clear error messages for debugging o observability. LGTM

Copy link

@nikolasdehor nikolasdehor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix for a real problem. Network timeouts being misclassified as context window errors would cause unnecessary history compression and confusing user messages.

The approach is correct: check for timeout first, then check for context window errors with !isTimeoutError guard.

Minor observations:

  1. strings.Contains(errMsg, "max_tokens") could false-positive on API responses that echo back the max_tokens parameter in error messages unrelated to context limits (e.g., "max_tokens must be positive"). This is an edge case but worth noting.

  2. The backoff time.Duration(retry+1) * 5 * time.Second gives 5s/10s, which is reasonable but blocks the goroutine. For high-concurrency scenarios, consider using a timer with context cancellation. Not a blocker for this PR.

  3. Note that PR #699 (decompose AgentLoop) extracts the same error detection into IsContextWindowError() in error_classifier.go. These two PRs will conflict. Whichever lands second should incorporate the timeout guard from this PR.

LGTM.

Copy link
Collaborator

@yinwm yinwm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yinwm yinwm merged commit 8529abb into sipeed:main Feb 28, 2026
2 checks passed
@yinwm
Copy link
Collaborator

yinwm commented Feb 28, 2026

Thanks for the pr

@Orgmar
Copy link
Contributor

Orgmar commented Feb 28, 2026

@dimensi Good catch on the timeout vs context window misclassification. Broad substring matching on "context" and "token" silently treating network timeouts as context exhaustion is a frustrating bug to track down. The retry with exponential backoff for actual timeouts is a nice addition too.

We have a PicoClaw Dev Group on Discord where contributors share ideas and collaborate. If you're interested, send an email to support@sipeed.com with the subject [Join PicoClaw Dev Group] dimensi and we'll send you the invite link.

@dimensi
Copy link
Contributor Author

dimensi commented Feb 28, 2026

Thanks, but I realized I don't know how to use openclaw, picoclaw, and similar bots in real life, so I won't for now)

hyperwd pushed a commit to hyperwd/picoclaw that referenced this pull request Mar 5, 2026
fix: distinguish network timeouts from context window errors
Pluckypan pushed a commit to Pluckypan/picoclaw that referenced this pull request Mar 6, 2026
fix: distinguish network timeouts from context window errors
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Network timeouts misclassified as context window errors

6 participants