Skip to content

fix(agents): suppress unrecognized errors from user surface#324

Open
BingqingLyu wants to merge 3 commits intomainfrom
fork-pr-41803-fix-suppress-unrecognized-error-fallback
Open

fix(agents): suppress unrecognized errors from user surface#324
BingqingLyu wants to merge 3 commits intomainfrom
fork-pr-41803-fix-suppress-unrecognized-error-fallback

Conversation

@BingqingLyu
Copy link
Copy Markdown
Owner

@BingqingLyu BingqingLyu commented Apr 27, 2026

Summary

  • Problem: formatAssistantErrorText falls through to a final branch that returns raw, unrecognized error strings (truncated at 600 chars) directly to the chat surface. This leaks API internals, confuses users, and spams channels.
  • Why it matters: Orphaned tool_result errors, rate limits, and other unexpected API responses become visible to end users in Telegram/Slack/etc.
  • What changed: The final fallback now logs the full error for debugging and returns a safe generic message: "Something went wrong. Please try again, or use /new to start a fresh session."
  • What did NOT change: All recognized error branches (overloaded, rate_limit, too_many_tokens, etc.) are untouched — only the unrecognized catch-all is affected.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

Unrecognized errors that previously leaked raw API messages to the chat surface now show a generic user-friendly message. The raw error is still logged at warn level for debugging.

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

Environment

  • OS: Linux (Amazon Linux 2023, aarch64)
  • Runtime/container: Node.js
  • Model/provider: claude-opus-4-5 via Anthropic
  • Integration/channel: Telegram group
  • Relevant config: Default

Steps

  1. Trigger a session with a corrupted transcript (orphaned tool_result without matching tool_use)
  2. Send any message to the agent

Expected

  • User sees a generic error message, not raw API internals.

Actual

  • Before: Raw error string (up to 600 chars) sent to chat surface.
  • After: Generic message shown; full error logged at warn level.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

New test in formatassistanterrortext.test.ts verifies the unrecognized-error branch returns the generic message. Lifecycle test expectations updated.

Human Verification (required)

  • Verified scenarios: Tested on a fork with Telegram and Slack. Orphaned tool_result errors now show generic message instead of raw API error.
  • Edge cases checked: All recognized error patterns (overloaded, rate_limit, too_many_tokens, etc.) still return their specific messages — only the catch-all is changed.
  • What you did not verify: Exhaustive list of all possible unrecognized error strings.

Review Conversations

N/A — fresh PR, no review conversations yet.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No

Failure Recovery (if this breaks)

  • How to disable/revert: Revert changes in errors.ts to restore the raw-truncation fallback.
  • Files/config to restore: src/agents/pi-embedded-helpers/errors.ts
  • Known bad symptoms: If a new recognized error pattern is added upstream that should have a specific message, it would be caught by this generic fallback instead. This is safe but suboptimal — the specific branch should be added.

Risks and Mitigations

  • Risk: Masking errors that operators need to see.
    • Mitigation: Full error text is logged at warn level. Operators can see it in logs.

✍️ Author: Claude Code with @carrotRakko (AI-written, human-approved)

The final fallback in formatAssistantErrorText returned raw error text
to the user when no known error pattern matched. This leaked internal
API error details (e.g. orphaned tool call errors, provider-specific
diagnostics) to messaging surfaces.

Replace the fallback with a generic user-safe message and log the
original error for debugging.

Related openclaw#11038, openclaw#16948

✍️ Author: Claude Code with @carrotRakko (AI-written, human-approved)
Address review feedback:
- Add auth and model-not-found checks before the generic catch-all
  so actionable guidance survives retry exhaustion (Codex P2)
- Redact sensitive text (API keys, tokens) in log preview using
  existing redactSensitiveText utility (Aisle High CWE-532)
- Strip CR/LF from log preview to prevent log forging (Aisle Low CWE-117)

✍️ Author: Claude Code with @carrotRakko (AI-written, human-approved)
- Check isCliSessionExpiredErrorMessage before auth (bare "expired"
  in auth patterns would misclassify "session expired" as auth failure)
- Check isImageDimensionErrorMessage before catch-all to preserve
  actionable resize guidance instead of generic "Something went wrong"

✍️ Author: Claude Code with @carrotRakko (AI-written, human-approved)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tool call result delivery fails after session compaction — error leaks to user chat Context corruption exposes raw API errors to chat surface

2 participants