fix(agents): surface exec failures after claimed success by jbetala7 · Pull Request #80003 · openclaw/openclaw

jbetala7 · 2026-05-09T22:50:26Z

Summary

Surface default-visible concise warnings for failed exec/bash tool calls instead of suppressing them before mutating-tool handling.
Keep raw exec stderr hidden unless verbose details are enabled, preserving the existing low-noise behavior.
Add regression coverage for default exec/bash failures, cron non-timeout exec failures, and the reported "assistant claimed success after exec failed" path.

Real behavior proof

Behavior or issue addressed: Failed exec/bash tool calls were hidden at default verbosity, so a channel could receive only the assistant's claimed-success text even when the last tool call failed.
Real environment tested: Local OpenClaw checkout on macOS, rebased on current origin/main (c58e01569b), Node v24.11.1, pnpm 10.33.2.
Exact steps or command run after this patch: Ran the OpenClaw payload policy path directly with node --import tsx, using an assistant success message plus lastToolError for exec with /bin/bash: line 1: python: command not found.

Evidence after fix: Terminal transcript from the local OpenClaw checkout.

$ node --import tsx -e "import('./src/agents/pi-embedded-runner/run/payloads.ts').then(({buildEmbeddedRunPayloads})=>console.log(JSON.stringify(buildEmbeddedRunPayloads({assistantTexts:['The script is ready to use and saved in your workspace.'],toolMetas:[],lastAssistant:{role:'assistant',stopReason:'end_turn',content:[]},lastToolError:{toolName:'exec',error:'/bin/bash: line 1: python: command not found'},sessionKey:'session:telegram',inlineToolResultsAllowed:false,verboseLevel:'off',reasoningLevel:'off',toolResultFormat:'plain'}),null,2)))"
[
  {
    "text": "The script is ready to use and saved in your workspace.",
    "replyToTag": false,
    "audioAsVoice": false
  },
  {
    "text": "⚠️ 🛠️ Exec failed",
    "isError": true,
    "audioAsVoice": false
  }
]

Observed result after fix: The user-facing payload preserves the assistant text and also appends a concise Exec failed warning, while omitting raw python: command not found stderr at default verbosity.
What was not tested: The original NemoClaw/NVIDIA live provider setup from the issue. The fix is in the deterministic OpenClaw payload policy path that decides whether failed exec tool state becomes user-visible.

Verification

pnpm test src/agents/pi-embedded-runner/run/payloads.test.ts src/agents/pi-embedded-runner/run/payloads.errors.test.ts src/agents/bash-tools.exec-runtime.test.ts src/agents/pi-embedded-subscribe.handlers.tools.test.ts src/agents/tool-mutation.test.ts
pnpm check:changed
git diff --check

AI assistance

This PR is AI-assisted. I used Codex to inspect the issue, make the targeted change, and run the verification above. I reviewed the final diff and understand the change: the exec/bash suppression fallback now runs after mutating-tool warnings have a chance to add a concise user-visible failure notice.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3bd61cb5f3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-09T22:53:38Z

@@ -160,6 +154,9 @@ function resolveToolErrorWarningPolicy(params: {
      includeDetails,


Avoid warning when assistant already reports exec failure

This branch now routes exec/bash failures through the mutating-tool policy, but suppression depends on hasUserFacingFailureAcknowledgement, which only matches write/edit-style failure phrasing. In practice, acknowledgements like “I couldn't run the command …” are not recognized, so the payload adds an extra ⚠️ … Exec failed warning even when the assistant already clearly reported the failure. That creates duplicate/conflicting user-visible error messaging in the exact success/failure reconciliation path this change targets.

Useful? React with 👍 / 👎.

Addressed in 2987980. The failure-acknowledgement detector now includes command/run/exec vocabulary, and there is a regression test covering "I couldn't run the command..." so the assistant acknowledgement is delivered without a duplicate "Exec failed" fallback.

Verified with the focused payload test suite, pnpm check:changed, git diff --check, and direct payload proofs for both the claimed-success warning case and the acknowledged-failure no-duplicate case.

clawsweeper · 2026-05-09T22:53:39Z

Codex review: needs maintainer review before merge.

Summary
The PR updates embedded-run payload warning policy so default-verbosity exec/bash failures emit concise user-visible warnings while raw stderr stays hidden, and adds payload regression tests.

Reproducibility: yes. Current main has a source-level reproduction through buildEmbeddedRunPayloads with assistant success text plus lastToolError for exec: the policy suppresses the warning before mutating-tool handling, and current tests assert that old suppression behavior.

Real behavior proof
Sufficient (terminal): The PR body includes copied terminal output from a local checkout after the patch showing the changed payload result for the reported exec failure path.

Next step before merge
No repair job is needed because the latest PR head addresses the prior narrow finding; remaining action is normal maintainer review and merge gating.

Security
Cleared: The diff only changes agent payload warning policy and colocated tests, with no dependency, CI, secret, release, or new code-execution surface expansion.

Review details

Best possible solution:

Land the targeted payload-policy fix after normal maintainer review and required checks, preserving concise default warnings, raw stderr suppression, and explicit assistant acknowledgement suppression.

Do we have a high-confidence way to reproduce the issue?

Yes. Current main has a source-level reproduction through buildEmbeddedRunPayloads with assistant success text plus lastToolError for exec: the policy suppresses the warning before mutating-tool handling, and current tests assert that old suppression behavior.

Is this the best way to solve the issue?

Yes. Moving exec/bash non-verbose failures through the existing mutating-warning policy, adding command/run acknowledgement vocabulary, and covering both claimed-success and acknowledged-failure cases is the narrowest maintainable fix for this symptom.

What I checked:

Current-main suppression path: Current main returns showWarning: false for exec-like tool errors when details are not included before mutating-tool warning handling, which matches the linked bug's silent default-verbosity exec failure path. (src/agents/pi-embedded-runner/run/payloads.ts:146, 390762962f6b)
Current tests encode the old behavior: Current main still has tests expecting default exec errors and non-timeout cron exec errors to produce no payload, showing the behavior this PR changes. (src/agents/pi-embedded-runner/run/payloads.test.ts:91, 390762962f6b)
PR diff implements the narrow policy change: The PR removes the early exec/bash suppression, routes exec/bash through the mutating-tool warning branch first, keeps raw details hidden when includeDetails is false, expands acknowledgement vocabulary, and adds regression tests for claimed success and acknowledged command failure. (src/agents/pi-embedded-runner/run/payloads.ts:45, 29879808eb14)
Prior review finding was addressed: The first bot review flagged duplicate warnings when the assistant already said it could not run a command; the latest diff now includes command/run/exec vocabulary and a regression test for "I couldn't run the command...". (src/agents/pi-embedded-runner/run/payloads.errors.test.ts:452, 29879808eb14)
Real behavior proof supplied: The PR body includes copied terminal output from a local checkout after the patch showing buildEmbeddedRunPayloads preserving the assistant success text and appending a concise Exec failed payload without raw stderr. (29879808eb14)
CI status summary: Public check-runs for the current head show 24 successes, 5 skips, and one neutral CodeQL result, with no failed check run in the fetched page. (29879808eb14)

Likely related people:

@steipete: Current blame for the payload warning policy and acknowledgement matcher in the main snapshot points to Peter Steinberger, making this the clearest local routing signal for the touched file. (role: recent maintainer; confidence: medium; commits: 7bc2deb0d658; files: src/agents/pi-embedded-runner/run/payloads.ts)
@obviyus: Merged PR fix: gate Telegram exec tool warnings behind verbose mode #20560 intentionally gated exec/bash tool-failure warnings behind verbose mode, which is the policy this PR partially revises for mutating exec failures. (role: adjacent policy owner; confidence: medium; commits: 6b05916c1445, 7ce94931f016; files: src/agents/pi-embedded-runner/run/payloads.ts)
@Swader: Merged PR fix: preserve bootstrap paths and expose failed mutations #16131 introduced the broader mutating-tool failure visibility behavior that this PR reuses for exec/bash failures. (role: adjacent behavior owner; confidence: medium; commits: dbdcbe03e711, 385dcbd8a9d3; files: src/agents/pi-embedded-runner/run/payloads.ts, src/agents/pi-embedded-subscribe.handlers.tools.ts, src/agents/tool-mutation.ts)

Remaining risk / open question:

The original NemoClaw/NVIDIA provider chain from the linked bug was not replayed; this review relies on deterministic payload-policy inspection, tests, CI metadata, and the contributor's terminal proof.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 390762962f6b.

TurboTheTurtle

Reviewed the current head (2987980) after the acknowledgement-pattern update. The earlier duplicate-warning concern appears addressed: command/run/exec vocabulary is now included in the failure-acknowledgement matcher, and there is a regression case for “I couldn't run the command…”.

I did not find additional blockers in this pass. The focused payload tests listed in the PR body are the right merge gate for this change.

steipete · 2026-05-10T01:57:53Z

Landed via rebase merge onto main.

Local proof: pnpm test src/agents/pi-embedded-runner/run/payloads.test.ts src/agents/pi-embedded-runner/run/payloads.errors.test.ts src/agents/bash-tools.exec-runtime.test.ts src/agents/pi-embedded-subscribe.handlers.tools.test.ts src/agents/tool-mutation.test.ts (140 tests passed across 2 shards); OPENCLAW_TESTBOX=0 pnpm check:changed; git diff --check origin/main...HEAD before push.
PR head before merge: 023a4fb
Landed commits: 658a30b, b1f3a71
Related issue: [Bug]: Agent fabricates successful output after exec tool failure instead of reporting error #60497 auto-closed by the merge.

Note: CI had an unrelated check:architecture failure from the existing src/config/types.channel-messaging-common.ts -> src/config/types.channels.ts -> ... cycle, outside this PR diff. The same pnpm check:architecture failure reproduced locally on the rebased branch; the PR diff only touched CHANGELOG.md and src/agents/pi-embedded-runner/run/payloads*.

Thanks @jbetala7!

openclaw-barnacle Bot added agents Agent runtime and tooling size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 9, 2026

jbetala7 mentioned this pull request May 9, 2026

[Bug]: Agent fabricates successful output after exec tool failure instead of reporting error #60497

Closed

chatgpt-codex-connector Bot reviewed May 9, 2026

View reviewed changes

openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 9, 2026

jbetala7 force-pushed the fix/60497-exec-error-warning branch from 3bd61cb to 1cd6270 Compare May 9, 2026 22:58

jbetala7 changed the title ~~[AI-assisted] fix(agents): surface exec failures after claimed success~~ fix(agents): surface exec failures after claimed success May 9, 2026

clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 9, 2026

jbetala7 force-pushed the fix/60497-exec-error-warning branch from 1cd6270 to 2987980 Compare May 9, 2026 23:11

openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 9, 2026

TurboTheTurtle reviewed May 9, 2026

View reviewed changes

clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 9, 2026

jbetala7 and others added 2 commits May 10, 2026 02:49

fix(agents): surface exec failures after claimed success

18e7b63

docs: update changelog for exec failure warning (openclaw#80003)

023a4fb

steipete force-pushed the fix/60497-exec-error-warning branch from 2987980 to 023a4fb Compare May 10, 2026 01:51

openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 10, 2026

steipete merged commit b1f3a71 into openclaw:main May 10, 2026
111 of 113 checks passed

github-actions Bot mentioned this pull request May 10, 2026

📡 Upstream Digest — 2026-05-10 02:08 UTC curtismercier/openclaw-mods#816

Open

steipete mentioned this pull request May 10, 2026

fix: canonicalize embedded reply payloads #80075

Merged

longstoryscott pushed a commit to longstoryscott/openclaw that referenced this pull request May 13, 2026

docs: update changelog for exec failure warning (openclaw#80003)

9d459b9

This was referenced May 14, 2026

fix(cron): make recovered tool errors order agnostic #81585

Draft

Feature request: config option to suppress transient tool error warnings #39406

Open

fix(agent): respect suppressToolErrors for all tools including mutating ones #51091

Closed

github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 24, 2026

docs: update changelog for exec failure warning (openclaw#80003)

c50a980

sjf mentioned this pull request Jun 2, 2026

[Bug]: Telegram shows internal "⚠️ 🛠️ ... failed" for normal grep exit(1) after upgrade to 2026.5.20 #87610

Open

clawsweeper Bot mentioned this pull request Jun 2, 2026

fix(channels): hide tool failure details in public channels #89285

Closed

jameslcowan pushed a commit to jameslcowan/openclaw that referenced this pull request Jun 2, 2026

docs: update changelog for exec failure warning (openclaw#80003)

f35134f

sablehead pushed a commit to sablehead/openclaw that referenced this pull request Jun 10, 2026

docs: update changelog for exec failure warning (openclaw#80003)

2a20209

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(agents): surface exec failures after claimed success#80003

fix(agents): surface exec failures after claimed success#80003
steipete merged 2 commits into
openclaw:mainfrom
jbetala7:fix/60497-exec-error-warning

jbetala7 commented May 9, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 9, 2026

Uh oh!

jbetala7 May 9, 2026

Uh oh!

clawsweeper Bot commented May 9, 2026 •

edited

Loading

Uh oh!

TurboTheTurtle left a comment

Uh oh!

Uh oh!

steipete commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -160,6 +154,9 @@ function resolveToolErrorWarningPolicy(params: {
		includeDetails,

Uh oh!

Conversation

jbetala7 commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Real behavior proof

Verification

AI assistance

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

jbetala7 May 9, 2026

Choose a reason for hiding this comment

Uh oh!

clawsweeper Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TurboTheTurtle left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

steipete commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jbetala7 commented May 9, 2026 •

edited

Loading

clawsweeper Bot commented May 9, 2026 •

edited

Loading