fix: suppress raw provider errors in channel delivery#88610
Conversation
|
Codex review: needs maintainer review before merge. Reviewed May 31, 2026, 2:46 PM ET / 18:46 UTC. Summary PR surface: Source +41, Tests +59. Total +100 across 6 files. Reproducibility: yes. source-level. Current main can still let unclassified formatter pass-through become lifecycle error text and an isError reply payload, and the PR adds focused canary tests for that path. Review metrics: 1 noteworthy metric.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Mantis proof suggestion Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land the boundary-level suppression if maintainers accept generic fallback copy for unclassified raw errors, while preserving classified friendly errors, current-main formatter classifications, and internal raw diagnostics. Do we have a high-confidence way to reproduce the issue? Yes, source-level. Current main can still let unclassified formatter pass-through become lifecycle error text and an isError reply payload, and the PR adds focused canary tests for that path. Is this the best way to solve the issue? Yes, if maintainers accept the generic fallback policy. A delivery-boundary wrapper is narrower than changing formatAssistantErrorText globally because it keeps internal formatter behavior and raw observation fields available. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against ecbd97e9682d. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +41, Tests +59. Total +100 across 6 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
dfb619e to
3b293ab
Compare
3b293ab to
f4ec3b0
Compare
f4ec3b0 to
3733fe9
Compare
3733fe9 to
ceef03f
Compare
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. |
d68fd4d to
b46e197
Compare
|
Maintainer pass complete. I pushed follow-up commits on top of this PR to keep user-facing channel delivery from exposing raw or raw-derived provider error text, including structured, escaped, and aborted-turn cases, while preserving safe schema/rate-limit guidance. Proof on exact head
Known proof gap: I did not run a new live Telegram/Slack send; the PR now has direct lifecycle/payload regression coverage plus the existing real-behavior proof path. |
Fixes #69737.
Summary
This change blocks raw unclassified assistant
errorMessagetext at the user-facing delivery boundary.It adds a shared
formatUserFacingAssistantErrorText(...)wrapper around the existing assistant error formatter. Classified friendly errors still use the existing sanitizer/classification behavior, but formatter output that is only raw upstream pass-through now falls back to:Terminal lifecycle events and reply payload construction now use the safe wrapper for actual assistant error turns. Raw diagnostic details are not globally deleted, provider-side error objects are not changed, and internal observation fields such as
rawErrorPreviewremain available for maintainers.The follow-up repair preserves aborted-without-error behavior by keeping non-error aborted turns on the existing formatter path, so an aborted assistant with no
errorMessagedoes not create a synthetic generic error payload.Evidence
SECRET_CANARY_697375bb04e6997717541b9c0ede4f0abf7784f49f06fstopReason="error".Real behavior proof
Behavior addressed: Terminal assistant errors with unclassified raw upstream
errorMessagemust not send that raw text through user-facing lifecycle or reply payload delivery text. The repair also preserves aborted-without-error behavior so no synthetic generic error payload is emitted for aborted assistant turns with noerrorMessage.Real environment tested: Local OpenClaw checkout at HEAD
5bb04e6997717541b9c0ede4f0abf7784f49f06f, running the actualhandleAgentEndlifecycle function and actualbuildEmbeddedRunPayloadsreply payload builder throughnode --import tsx, plus a live Telegram DM through my existing OpenClaw Telegram bot.Exact steps or command run after this patch: Direct Node/tsx proof imported
src/agents/embedded-agent-subscribe.handlers.lifecycle.tsandsrc/agents/embedded-agent-runner/run/payloads.ts, passed a terminal assistant payload withstopReason: "error",content: [], anderrorMessage: "SECRET_CANARY_69737", then captured user-facing lifecycle event text and reply payload text. For Telegram, the PR checkout was started locally,openai/gpt-5.5was temporarily routed to a local OpenAI Responses-compatible endpoint at127.0.0.1:8787/v1, and a live Telegram DM triggered a provider 500 containingSECRET_CANARY_69737.Evidence after fix: Terminal output from the direct proof command and live Telegram gateway log excerpt:
{ "command": "node --import tsx direct handleAgentEnd plus buildEmbeddedRunPayloads canary proof", "head": "5bb04e6997717541b9c0ede4f0abf7784f49f06f", "containsCanaryInUserFacingLifecycleText": false, "containsGenericFallbackInUserFacingLifecycleText": true, "containsCanaryInUserFacingReplyPayloadText": false, "containsGenericFallbackInUserFacingReplyPayloadText": true, "replyPayloads": [ { "text": "LLM request failed.", "isError": true } ], "internalRawPreviewRetainedForMaintainers": true }Observed result after fix:
SECRET_CANARY_69737is absent from both user-facing lifecycle text and user-facing reply payload text. The live Telegram transcript displays a generic failure response and does not showSECRET_CANARY_69737. Internal diagnostic preview retention still reports true for maintainers. The aborted-without-error regression is covered by the added payload regression test and preserves the no-payload behavior.What was not tested: Slack delivery was not exercised. The direct lifecycle/payload behavior and live Telegram DM delivery path were exercised for this PR scope.
Testing
Result:
Result: