fix(agents): surface exec failures after claimed success#80003
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3bd61cb5f3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| @@ -160,6 +154,9 @@ function resolveToolErrorWarningPolicy(params: { | |||
| includeDetails, | |||
There was a problem hiding this comment.
Avoid warning when assistant already reports exec failure
This branch now routes exec/bash failures through the mutating-tool policy, but suppression depends on hasUserFacingFailureAcknowledgement, which only matches write/edit-style failure phrasing. In practice, acknowledgements like “I couldn't run the command …” are not recognized, so the payload adds an extra ⚠️ … Exec failed warning even when the assistant already clearly reported the failure. That creates duplicate/conflicting user-visible error messaging in the exact success/failure reconciliation path this change targets.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Addressed in 2987980. The failure-acknowledgement detector now includes command/run/exec vocabulary, and there is a regression test covering "I couldn't run the command..." so the assistant acknowledgement is delivered without a duplicate "Exec failed" fallback.
Verified with the focused payload test suite, pnpm check:changed, git diff --check, and direct payload proofs for both the claimed-success warning case and the acknowledged-failure no-duplicate case.
|
Codex review: needs maintainer review before merge. Summary Reproducibility: yes. Current main has a source-level reproduction through buildEmbeddedRunPayloads with assistant success text plus lastToolError for exec: the policy suppresses the warning before mutating-tool handling, and current tests assert that old suppression behavior. Real behavior proof Next step before merge Security Review detailsBest possible solution: Land the targeted payload-policy fix after normal maintainer review and required checks, preserving concise default warnings, raw stderr suppression, and explicit assistant acknowledgement suppression. Do we have a high-confidence way to reproduce the issue? Yes. Current main has a source-level reproduction through buildEmbeddedRunPayloads with assistant success text plus lastToolError for exec: the policy suppresses the warning before mutating-tool handling, and current tests assert that old suppression behavior. Is this the best way to solve the issue? Yes. Moving exec/bash non-verbose failures through the existing mutating-warning policy, adding command/run acknowledgement vocabulary, and covering both claimed-success and acknowledged-failure cases is the narrowest maintainable fix for this symptom. What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 390762962f6b. |
3bd61cb to
1cd6270
Compare
1cd6270 to
2987980
Compare
TurboTheTurtle
left a comment
There was a problem hiding this comment.
Reviewed the current head (2987980) after the acknowledgement-pattern update. The earlier duplicate-warning concern appears addressed: command/run/exec vocabulary is now included in the failure-acknowledgement matcher, and there is a regression case for “I couldn't run the command…”.
I did not find additional blockers in this pass. The focused payload tests listed in the PR body are the right merge gate for this change.
2987980 to
023a4fb
Compare
|
Landed via rebase merge onto
Note: CI had an unrelated Thanks @jbetala7! |
Summary
exec/bashtool calls instead of suppressing them before mutating-tool handling.Fixes #60497.
Real behavior proof
Behavior or issue addressed: Failed
exec/bashtool calls were hidden at default verbosity, so a channel could receive only the assistant's claimed-success text even when the last tool call failed.Real environment tested: Local OpenClaw checkout on macOS, rebased on current
origin/main(c58e01569b), Nodev24.11.1, pnpm10.33.2.Exact steps or command run after this patch: Ran the OpenClaw payload policy path directly with
node --import tsx, using an assistant success message pluslastToolErrorforexecwith/bin/bash: line 1: python: command not found.Evidence after fix: Terminal transcript from the local OpenClaw checkout.
Observed result after fix: The user-facing payload preserves the assistant text and also appends a concise
Exec failedwarning, while omitting rawpython: command not foundstderr at default verbosity.What was not tested: The original NemoClaw/NVIDIA live provider setup from the issue. The fix is in the deterministic OpenClaw payload policy path that decides whether failed exec tool state becomes user-visible.
Verification
pnpm test src/agents/pi-embedded-runner/run/payloads.test.ts src/agents/pi-embedded-runner/run/payloads.errors.test.ts src/agents/bash-tools.exec-runtime.test.ts src/agents/pi-embedded-subscribe.handlers.tools.test.ts src/agents/tool-mutation.test.ts pnpm check:changed git diff --checkAI assistance
This PR is AI-assisted. I used Codex to inspect the issue, make the targeted change, and run the verification above. I reviewed the final diff and understand the change: the exec/bash suppression fallback now runs after mutating-tool warnings have a chance to add a concise user-visible failure notice.