Skip to content

fix: suppress raw provider errors in channel delivery#88610

Merged
steipete merged 8 commits into
openclaw:mainfrom
jason-allen-oneal:fix/69737-suppress-raw-error-message
May 31, 2026
Merged

fix: suppress raw provider errors in channel delivery#88610
steipete merged 8 commits into
openclaw:mainfrom
jason-allen-oneal:fix/69737-suppress-raw-error-message

Conversation

@jason-allen-oneal

@jason-allen-oneal jason-allen-oneal commented May 31, 2026

Copy link
Copy Markdown
Contributor

Fixes #69737.

Summary

This change blocks raw unclassified assistant errorMessage text at the user-facing delivery boundary.

It adds a shared formatUserFacingAssistantErrorText(...) wrapper around the existing assistant error formatter. Classified friendly errors still use the existing sanitizer/classification behavior, but formatter output that is only raw upstream pass-through now falls back to:

LLM request failed.

Terminal lifecycle events and reply payload construction now use the safe wrapper for actual assistant error turns. Raw diagnostic details are not globally deleted, provider-side error objects are not changed, and internal observation fields such as rawErrorPreview remain available for maintainers.

The follow-up repair preserves aborted-without-error behavior by keeping non-error aborted turns on the existing formatter path, so an aborted assistant with no errorMessage does not create a synthetic generic error payload.

Evidence

  • Synthetic canary: SECRET_CANARY_69737
  • Current HEAD tested: 5bb04e6997717541b9c0ede4f0abf7784f49f06f
  • Raw provider-error suppression remains scoped to stopReason="error".
  • Aborted-without-error payload behavior is preserved.

Real behavior proof

  • Behavior addressed: Terminal assistant errors with unclassified raw upstream errorMessage must not send that raw text through user-facing lifecycle or reply payload delivery text. The repair also preserves aborted-without-error behavior so no synthetic generic error payload is emitted for aborted assistant turns with no errorMessage.

  • Real environment tested: Local OpenClaw checkout at HEAD 5bb04e6997717541b9c0ede4f0abf7784f49f06f, running the actual handleAgentEnd lifecycle function and actual buildEmbeddedRunPayloads reply payload builder through node --import tsx, plus a live Telegram DM through my existing OpenClaw Telegram bot.

  • Exact steps or command run after this patch: Direct Node/tsx proof imported src/agents/embedded-agent-subscribe.handlers.lifecycle.ts and src/agents/embedded-agent-runner/run/payloads.ts, passed a terminal assistant payload with stopReason: "error", content: [], and errorMessage: "SECRET_CANARY_69737", then captured user-facing lifecycle event text and reply payload text. For Telegram, the PR checkout was started locally, openai/gpt-5.5 was temporarily routed to a local OpenAI Responses-compatible endpoint at 127.0.0.1:8787/v1, and a live Telegram DM triggered a provider 500 containing SECRET_CANARY_69737.

  • Evidence after fix: Terminal output from the direct proof command and live Telegram gateway log excerpt:

{
  "command": "node --import tsx direct handleAgentEnd plus buildEmbeddedRunPayloads canary proof",
  "head": "5bb04e6997717541b9c0ede4f0abf7784f49f06f",
  "containsCanaryInUserFacingLifecycleText": false,
  "containsGenericFallbackInUserFacingLifecycleText": true,
  "containsCanaryInUserFacingReplyPayloadText": false,
  "containsGenericFallbackInUserFacingReplyPayloadText": true,
  "replyPayloads": [
    {
      "text": "LLM request failed.",
      "isError": true
    }
  ],
  "internalRawPreviewRetainedForMaintainers": true
}
[openai-transport] [responses] error provider=openai api=openai-responses model=gpt-5.5 status=500 code=telegram_live_proof_canary type=server_error message=500 SECRET_CANARY_69737
[telegram] embedded run agent end: isError=true model=gpt-5.5 provider=openai error=LLM request failed. rawError=500 SECRET_CANARY_69737

Live Telegram proof screenshot

  • Observed result after fix: SECRET_CANARY_69737 is absent from both user-facing lifecycle text and user-facing reply payload text. The live Telegram transcript displays a generic failure response and does not show SECRET_CANARY_69737. Internal diagnostic preview retention still reports true for maintainers. The aborted-without-error regression is covered by the added payload regression test and preserves the no-payload behavior.

  • What was not tested: Slack delivery was not exercised. The direct lifecycle/payload behavior and live Telegram DM delivery path were exercised for this PR scope.

Testing

OPENCLAW_VITEST_MAX_WORKERS=1 node scripts/run-vitest.mjs run src/agents/embedded-agent-helpers.formatassistanterrortext.test.ts src/agents/embedded-agent-runner/run/payloads.errors.test.ts src/agents/embedded-agent-subscribe.handlers.lifecycle.test.ts src/agents/embedded-agent-helpers/errors.test.ts

Result:

Test Files  4 passed (4)
Tests  156 passed (156)
git diff --check upstream/main...HEAD
node_modules/.bin/oxfmt --check --threads=1 --config .oxfmtrc.jsonc src/agents/embedded-agent-helpers/errors.ts src/agents/embedded-agent-helpers.ts src/agents/embedded-agent-subscribe.handlers.lifecycle.ts src/agents/embedded-agent-subscribe.handlers.lifecycle.test.ts src/agents/embedded-agent-runner/run/payloads.ts src/agents/embedded-agent-runner/run/payloads.errors.test.ts

Result:

All matched files use the correct format.

Copilot AI review requested due to automatic review settings May 31, 2026 12:16
@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 31, 2026
@clawsweeper

clawsweeper Bot commented May 31, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed May 31, 2026, 2:46 PM ET / 18:46 UTC.

Summary
The PR adds a shared user-facing assistant-error wrapper and routes terminal assistant lifecycle errors and reply payloads through it, with regression tests for raw canary suppression and aborted-run behavior.

PR surface: Source +41, Tests +59. Total +100 across 6 files.

Reproducibility: yes. source-level. Current main can still let unclassified formatter pass-through become lifecycle error text and an isError reply payload, and the PR adds focused canary tests for that path.

Review metrics: 1 noteworthy metric.

  • User-facing error surfaces: 2 changed. Lifecycle error events and reply payloads now share the safe wrapper, so maintainers should explicitly accept the cross-transport wording change.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P2] Maintainers should explicitly accept the generic fallback policy across supported channels before merge.

Mantis proof suggestion
A maintainer-controlled Telegram Desktop proof would independently show the changed visible chat error text, even though the contributor proof is already sufficient. A maintainer can ask Mantis to capture proof by posting a new PR comment that starts with the OpenClaw Mantis account mention, followed by:

telegram desktop proof: verify a forced raw provider error canary is not shown in Telegram and the user sees only generic failure copy.

Risk before merge

  • [P1] Existing users and operators lose previously visible unclassified provider diagnostic text in channel replies and lifecycle errors; raw details remain in internal observation fields, but the compatibility tradeoff needs maintainer acceptance.
  • [P1] The change applies at shared lifecycle and reply-payload boundaries, so it changes visible terminal provider-error text across transports; Telegram proof is strong, but Slack and other channels were not live exercised.

Maintainer options:

  1. Accept Generic Fallback (recommended)
    Land this if maintainers prefer never exposing unclassified upstream diagnostics to users and are comfortable relying on internal raw previews for debugging.
  2. Require Explicit Diagnostic Policy
    Ask for a narrower allowlist or channel policy if some unclassified provider messages should remain visible to end users.
  3. Defer To Broader Customization Work
    Pause this PR if maintainers want the remaining answer to be configurable per channel or audience instead of one generic fallback.

Next step before merge

  • No automated repair is needed; the remaining action is maintainer judgment on the diagnostics compatibility tradeoff and final merge review.

Security
Cleared: No supply-chain or new security-boundary concern was found; the patch reduces user-facing leakage of raw provider errors while retaining internal diagnostics.

Review details

Best possible solution:

Land the boundary-level suppression if maintainers accept generic fallback copy for unclassified raw errors, while preserving classified friendly errors, current-main formatter classifications, and internal raw diagnostics.

Do we have a high-confidence way to reproduce the issue?

Yes, source-level. Current main can still let unclassified formatter pass-through become lifecycle error text and an isError reply payload, and the PR adds focused canary tests for that path.

Is this the best way to solve the issue?

Yes, if maintainers accept the generic fallback policy. A delivery-boundary wrapper is narrower than changing formatAssistantErrorText globally because it keeps internal formatter behavior and raw observation fields available.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against ecbd97e9682d.

Label changes

Label changes:

  • add proof: sufficient: Contributor real behavior proof is sufficient. The PR body provides after-fix direct Node/tsx output plus live Telegram screenshot/log evidence showing the raw canary is absent from user-facing text and retained internally.

Label justifications:

  • P2: This is a bounded user-facing security/privacy hardening bug in shared agent channel delivery with normal maintainer priority.
  • merge-risk: 🚨 compatibility: The PR intentionally replaces previously visible unclassified provider diagnostics with generic copy, which can affect existing support workflows.
  • merge-risk: 🚨 message-delivery: The diff changes the actual terminal error text delivered through lifecycle events and channel reply payloads.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (logs): The PR body provides after-fix direct Node/tsx output plus live Telegram screenshot/log evidence showing the raw canary is absent from user-facing text and retained internally.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body provides after-fix direct Node/tsx output plus live Telegram screenshot/log evidence showing the raw canary is absent from user-facing text and retained internally.
  • mantis: telegram-visible-proof: Mantis should capture Telegram visible proof. The PR changes Telegram-visible chat error text, so a short Telegram Desktop proof can directly demonstrate that the raw canary is hidden.
Evidence reviewed

PR surface:

Source +41, Tests +59. Total +100 across 6 files.

View PR surface stats
Area Files Added Removed Net
Source 4 57 16 +41
Tests 2 61 2 +59
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 6 118 18 +100

What I checked:

Likely related people:

  • steipete: Current shallow/grafted blame for the central lifecycle, payload, and formatter lines points to the repository snapshot commit, so this is weak but useful routing context. (role: current checkout snapshot contributor; confidence: low; commits: fbc611ab4c65; files: src/agents/embedded-agent-subscribe.handlers.lifecycle.ts, src/agents/embedded-agent-runner/run/payloads.ts, src/agents/embedded-agent-helpers/errors.ts)
  • vignesh07: History shows the older pi lifecycle error logging path was introduced in commit 68b92e8 before the current embedded-agent path was carried forward. (role: introduced adjacent lifecycle error behavior; confidence: medium; commits: 68b92e80f72d; files: src/agents/pi-embedded-subscribe.handlers.lifecycle.ts)
  • vincentkoc: History ties Vincent Koc to user-facing assistant error formatting and the shared formatter seam that this PR wraps at the delivery boundary. (role: formatter seam contributor; confidence: medium; commits: 478af8170689, 397b0d85f571; files: src/agents/pi-embedded-helpers/errors.ts, src/shared/assistant-error-format.ts)
  • jeffrey701: Commit 01ef169 added current-main HTTP 401 provider-error sanitization in the same formatter and tests that this PR must preserve on merge. (role: recent adjacent formatter contributor; confidence: high; commits: 01ef169004bc; files: src/agents/embedded-agent-helpers/errors.ts, src/agents/embedded-agent-helpers.formatassistanterrortext.test.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 31, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. mantis: telegram-visible-proof Mantis should capture Telegram visible proof. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. labels May 31, 2026
@jason-allen-oneal jason-allen-oneal force-pushed the fix/69737-suppress-raw-error-message branch from dfb619e to 3b293ab Compare May 31, 2026 12:33
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 31, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 31, 2026
@jason-allen-oneal jason-allen-oneal force-pushed the fix/69737-suppress-raw-error-message branch from 3b293ab to f4ec3b0 Compare May 31, 2026 12:41
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 31, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 31, 2026
@jason-allen-oneal jason-allen-oneal force-pushed the fix/69737-suppress-raw-error-message branch from f4ec3b0 to 3733fe9 Compare May 31, 2026 12:56
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 31, 2026
@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 31, 2026
@jason-allen-oneal jason-allen-oneal force-pushed the fix/69737-suppress-raw-error-message branch from 3733fe9 to ceef03f Compare May 31, 2026 13:19
@openclaw-barnacle openclaw-barnacle Bot added triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. and removed proof: supplied External PR includes structured after-fix real behavior proof. triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 31, 2026
@openclaw-barnacle openclaw-barnacle Bot added triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. and removed proof: sufficient ClawSweeper judged the real behavior proof convincing. labels May 31, 2026
@jason-allen-oneal

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented May 31, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels May 31, 2026
@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. proof: sufficient ClawSweeper judged the real behavior proof convincing. labels May 31, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 31, 2026
@steipete steipete self-assigned this May 31, 2026
@steipete steipete force-pushed the fix/69737-suppress-raw-error-message branch from d68fd4d to b46e197 Compare May 31, 2026 22:06
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 31, 2026
@steipete

Copy link
Copy Markdown
Contributor

Maintainer pass complete. I pushed follow-up commits on top of this PR to keep user-facing channel delivery from exposing raw or raw-derived provider error text, including structured, escaped, and aborted-turn cases, while preserving safe schema/rate-limit guidance.

Proof on exact head b46e197f624cac968622163d9818187195c03a0d:

  • OPENCLAW_VITEST_MAX_WORKERS=1 node scripts/run-vitest.mjs run src/agents/embedded-agent-helpers.formatassistanterrortext.test.ts src/agents/embedded-agent-runner/run/payloads.errors.test.ts src/agents/embedded-agent-subscribe.handlers.lifecycle.test.ts src/agents/embedded-agent-helpers/errors.test.ts passed: 2 Vitest shards, 169 tests.
  • oxfmt --check passed on the 7 touched files.
  • git diff --check origin/main...HEAD passed.
  • autoreview --mode branch --base origin/main finished clean: no accepted/actionable findings.
  • GitHub exact-head checks are green: 134 success, 26 skipped, 1 neutral, 0 blockers.

Known proof gap: I did not run a new live Telegram/Slack send; the PR now has direct lifecycle/payload regression coverage plus the existing real-behavior proof path.

@steipete steipete merged commit 01f6ad6 into openclaw:main May 31, 2026
161 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling mantis: telegram-visible-proof Mantis should capture Telegram visible proof. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. P2 Normal backlog priority with limited blast radius. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: M status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Delivery layer: posts raw errorMessage verbatim when assistant message has stopReason=error

2 participants