Skip to content

fix(agents): surface exec failures after claimed success#80003

Merged
steipete merged 2 commits into
openclaw:mainfrom
jbetala7:fix/60497-exec-error-warning
May 10, 2026
Merged

fix(agents): surface exec failures after claimed success#80003
steipete merged 2 commits into
openclaw:mainfrom
jbetala7:fix/60497-exec-error-warning

Conversation

@jbetala7

@jbetala7 jbetala7 commented May 9, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Surface default-visible concise warnings for failed exec/bash tool calls instead of suppressing them before mutating-tool handling.
  • Keep raw exec stderr hidden unless verbose details are enabled, preserving the existing low-noise behavior.
  • Add regression coverage for default exec/bash failures, cron non-timeout exec failures, and the reported "assistant claimed success after exec failed" path.

Fixes #60497.

Real behavior proof

  • Behavior or issue addressed: Failed exec/bash tool calls were hidden at default verbosity, so a channel could receive only the assistant's claimed-success text even when the last tool call failed.

  • Real environment tested: Local OpenClaw checkout on macOS, rebased on current origin/main (c58e01569b), Node v24.11.1, pnpm 10.33.2.

  • Exact steps or command run after this patch: Ran the OpenClaw payload policy path directly with node --import tsx, using an assistant success message plus lastToolError for exec with /bin/bash: line 1: python: command not found.

  • Evidence after fix: Terminal transcript from the local OpenClaw checkout.

    $ node --import tsx -e "import('./src/agents/pi-embedded-runner/run/payloads.ts').then(({buildEmbeddedRunPayloads})=>console.log(JSON.stringify(buildEmbeddedRunPayloads({assistantTexts:['The script is ready to use and saved in your workspace.'],toolMetas:[],lastAssistant:{role:'assistant',stopReason:'end_turn',content:[]},lastToolError:{toolName:'exec',error:'/bin/bash: line 1: python: command not found'},sessionKey:'session:telegram',inlineToolResultsAllowed:false,verboseLevel:'off',reasoningLevel:'off',toolResultFormat:'plain'}),null,2)))"
    [
      {
        "text": "The script is ready to use and saved in your workspace.",
        "replyToTag": false,
        "audioAsVoice": false
      },
      {
        "text": "⚠️ 🛠️ Exec failed",
        "isError": true,
        "audioAsVoice": false
      }
    ]
    
  • Observed result after fix: The user-facing payload preserves the assistant text and also appends a concise Exec failed warning, while omitting raw python: command not found stderr at default verbosity.

  • What was not tested: The original NemoClaw/NVIDIA live provider setup from the issue. The fix is in the deterministic OpenClaw payload policy path that decides whether failed exec tool state becomes user-visible.

Verification

pnpm test src/agents/pi-embedded-runner/run/payloads.test.ts src/agents/pi-embedded-runner/run/payloads.errors.test.ts src/agents/bash-tools.exec-runtime.test.ts src/agents/pi-embedded-subscribe.handlers.tools.test.ts src/agents/tool-mutation.test.ts
pnpm check:changed
git diff --check

AI assistance

This PR is AI-assisted. I used Codex to inspect the issue, make the targeted change, and run the verification above. I reviewed the final diff and understand the change: the exec/bash suppression fallback now runs after mutating-tool warnings have a chance to add a concise user-visible failure notice.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 9, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3bd61cb5f3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@@ -160,6 +154,9 @@ function resolveToolErrorWarningPolicy(params: {
includeDetails,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid warning when assistant already reports exec failure

This branch now routes exec/bash failures through the mutating-tool policy, but suppression depends on hasUserFacingFailureAcknowledgement, which only matches write/edit-style failure phrasing. In practice, acknowledgements like “I couldn't run the command …” are not recognized, so the payload adds an extra ⚠️ … Exec failed warning even when the assistant already clearly reported the failure. That creates duplicate/conflicting user-visible error messaging in the exact success/failure reconciliation path this change targets.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 2987980. The failure-acknowledgement detector now includes command/run/exec vocabulary, and there is a regression test covering "I couldn't run the command..." so the assistant acknowledgement is delivered without a duplicate "Exec failed" fallback.

Verified with the focused payload test suite, pnpm check:changed, git diff --check, and direct payload proofs for both the claimed-success warning case and the acknowledged-failure no-duplicate case.

@clawsweeper

clawsweeper Bot commented May 9, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge.

Summary
The PR updates embedded-run payload warning policy so default-verbosity exec/bash failures emit concise user-visible warnings while raw stderr stays hidden, and adds payload regression tests.

Reproducibility: yes. Current main has a source-level reproduction through buildEmbeddedRunPayloads with assistant success text plus lastToolError for exec: the policy suppresses the warning before mutating-tool handling, and current tests assert that old suppression behavior.

Real behavior proof
Sufficient (terminal): The PR body includes copied terminal output from a local checkout after the patch showing the changed payload result for the reported exec failure path.

Next step before merge
No repair job is needed because the latest PR head addresses the prior narrow finding; remaining action is normal maintainer review and merge gating.

Security
Cleared: The diff only changes agent payload warning policy and colocated tests, with no dependency, CI, secret, release, or new code-execution surface expansion.

Review details

Best possible solution:

Land the targeted payload-policy fix after normal maintainer review and required checks, preserving concise default warnings, raw stderr suppression, and explicit assistant acknowledgement suppression.

Do we have a high-confidence way to reproduce the issue?

Yes. Current main has a source-level reproduction through buildEmbeddedRunPayloads with assistant success text plus lastToolError for exec: the policy suppresses the warning before mutating-tool handling, and current tests assert that old suppression behavior.

Is this the best way to solve the issue?

Yes. Moving exec/bash non-verbose failures through the existing mutating-warning policy, adding command/run acknowledgement vocabulary, and covering both claimed-success and acknowledged-failure cases is the narrowest maintainable fix for this symptom.

What I checked:

  • Current-main suppression path: Current main returns showWarning: false for exec-like tool errors when details are not included before mutating-tool warning handling, which matches the linked bug's silent default-verbosity exec failure path. (src/agents/pi-embedded-runner/run/payloads.ts:146, 390762962f6b)
  • Current tests encode the old behavior: Current main still has tests expecting default exec errors and non-timeout cron exec errors to produce no payload, showing the behavior this PR changes. (src/agents/pi-embedded-runner/run/payloads.test.ts:91, 390762962f6b)
  • PR diff implements the narrow policy change: The PR removes the early exec/bash suppression, routes exec/bash through the mutating-tool warning branch first, keeps raw details hidden when includeDetails is false, expands acknowledgement vocabulary, and adds regression tests for claimed success and acknowledged command failure. (src/agents/pi-embedded-runner/run/payloads.ts:45, 29879808eb14)
  • Prior review finding was addressed: The first bot review flagged duplicate warnings when the assistant already said it could not run a command; the latest diff now includes command/run/exec vocabulary and a regression test for "I couldn't run the command...". (src/agents/pi-embedded-runner/run/payloads.errors.test.ts:452, 29879808eb14)
  • Real behavior proof supplied: The PR body includes copied terminal output from a local checkout after the patch showing buildEmbeddedRunPayloads preserving the assistant success text and appending a concise Exec failed payload without raw stderr. (29879808eb14)
  • CI status summary: Public check-runs for the current head show 24 successes, 5 skips, and one neutral CodeQL result, with no failed check run in the fetched page. (29879808eb14)

Likely related people:

  • @steipete: Current blame for the payload warning policy and acknowledgement matcher in the main snapshot points to Peter Steinberger, making this the clearest local routing signal for the touched file. (role: recent maintainer; confidence: medium; commits: 7bc2deb0d658; files: src/agents/pi-embedded-runner/run/payloads.ts)
  • @obviyus: Merged PR fix: gate Telegram exec tool warnings behind verbose mode #20560 intentionally gated exec/bash tool-failure warnings behind verbose mode, which is the policy this PR partially revises for mutating exec failures. (role: adjacent policy owner; confidence: medium; commits: 6b05916c1445, 7ce94931f016; files: src/agents/pi-embedded-runner/run/payloads.ts)
  • @Swader: Merged PR fix: preserve bootstrap paths and expose failed mutations #16131 introduced the broader mutating-tool failure visibility behavior that this PR reuses for exec/bash failures. (role: adjacent behavior owner; confidence: medium; commits: dbdcbe03e711, 385dcbd8a9d3; files: src/agents/pi-embedded-runner/run/payloads.ts, src/agents/pi-embedded-subscribe.handlers.tools.ts, src/agents/tool-mutation.ts)

Remaining risk / open question:

  • The original NemoClaw/NVIDIA provider chain from the linked bug was not replayed; this review relies on deterministic payload-policy inspection, tests, CI metadata, and the contributor's terminal proof.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 390762962f6b.

@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 9, 2026
@jbetala7 jbetala7 force-pushed the fix/60497-exec-error-warning branch from 3bd61cb to 1cd6270 Compare May 9, 2026 22:58
@jbetala7 jbetala7 changed the title [AI-assisted] fix(agents): surface exec failures after claimed success fix(agents): surface exec failures after claimed success May 9, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 9, 2026
@jbetala7 jbetala7 force-pushed the fix/60497-exec-error-warning branch from 1cd6270 to 2987980 Compare May 9, 2026 23:11
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 9, 2026

@TurboTheTurtle TurboTheTurtle left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the current head (2987980) after the acknowledgement-pattern update. The earlier duplicate-warning concern appears addressed: command/run/exec vocabulary is now included in the failure-acknowledgement matcher, and there is a regression case for “I couldn't run the command…”.

I did not find additional blockers in this pass. The focused payload tests listed in the PR body are the right merge gate for this change.

@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 9, 2026
@steipete steipete force-pushed the fix/60497-exec-error-warning branch from 2987980 to 023a4fb Compare May 10, 2026 01:51
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 10, 2026
@steipete steipete merged commit b1f3a71 into openclaw:main May 10, 2026
111 of 113 checks passed
@steipete

Copy link
Copy Markdown
Contributor

Landed via rebase merge onto main.

Note: CI had an unrelated check:architecture failure from the existing src/config/types.channel-messaging-common.ts -> src/config/types.channels.ts -> ... cycle, outside this PR diff. The same pnpm check:architecture failure reproduced locally on the rebased branch; the PR diff only touched CHANGELOG.md and src/agents/pi-embedded-runner/run/payloads*.

Thanks @jbetala7!

jameslcowan pushed a commit to jameslcowan/openclaw that referenced this pull request Jun 2, 2026
sablehead pushed a commit to sablehead/openclaw that referenced this pull request Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling proof: supplied External PR includes structured after-fix real behavior proof. size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Agent fabricates successful output after exec tool failure instead of reporting error

3 participants