Skip to content

fix #90452: Regression: Heartbeat exec completion still shows generic fallback text instead of actual output#90897

Merged
clawsweeper[bot] merged 2 commits into
openclaw:mainfrom
mushuiyu886:feat/issue-90452
Jun 8, 2026
Merged

fix #90452: Regression: Heartbeat exec completion still shows generic fallback text instead of actual output#90897
clawsweeper[bot] merged 2 commits into
openclaw:mainfrom
mushuiyu886:feat/issue-90452

Conversation

@mushuiyu886

Copy link
Copy Markdown
Contributor

Summary

  • Problem: heartbeat-triggered exec task failures could still be summarized as a generic 🛠️ ... failed fallback instead of showing the command's captured error/output.
  • Solution: mark embedded runner payload construction when the trigger is heartbeat and allow exec-like heartbeat tool failures to include the collected error details.
  • What changed: buildEmbeddedRunPayloads now receives isHeartbeatTrigger, threads it into the tool-error warning policy, and has a regression test for the HEARTBEAT.md tail failure shape from the issue.
  • What did NOT change: cron timeout handling, non-heartbeat compact tool-error behavior, channel delivery, and heartbeat scheduling are unchanged.

Fixes #90452

Real behavior proof

  • Behavior or issue addressed: Heartbeat exec task failures now surface the actual exec error details in the generated payload instead of only emitting the generic fallback text.
  • Real environment tested: Local Linux OpenClaw worktree at /media/vdc/code/ai/aispace/openclaw-worktrees/issue-90452, calling the production embedded-runner payload builder with node --import tsx.
  • Exact steps or command run after this patch:
    env -C "/media/vdc/code/ai/aispace/openclaw-worktrees/issue-90452" node --import tsx -e 'import { buildEmbeddedRunPayloads } from "./src/agents/embedded-agent-runner/run/payloads.ts"; const payloads = buildEmbeddedRunPayloads({ assistantTexts: [], toolMetas: [], lastAssistant: undefined, currentAssistant: undefined, lastToolError: { toolName: "exec", meta: "show last 20 lines of ~/.openclaw/workspace/memory/2026-06-04.md", error: "tail: cannot open '\''/home/user/.openclaw/workspace/memory/2026-06-04.md'\'' for reading: No such file or directory" }, isCronTrigger: false, isHeartbeatTrigger: true, sessionKey: "session:heartbeat", inlineToolResultsAllowed: false, verboseLevel: "off", reasoningLevel: "off", toolResultFormat: "plain" }); console.log(JSON.stringify(payloads, null, 2)); if (!payloads[0]?.text?.includes("No such file or directory")) process.exit(1);'
  • Evidence after fix:
    [
      {
        "text": "⚠️ 🛠️ show last 20 lines of ~/.openclaw/workspace/memory/2026-06-04.md failed: tail: cannot open '/home/user/.openclaw/workspace/memory/2026-06-04.md' for reading: No such file or directory",
        "isError": true
      }
    ]
  • Observed result after fix: The heartbeat exec failure payload includes the real tail error and No such file or directory detail that the previous generic fallback omitted.
  • What was not tested: I did not wait for a wall-clock HEARTBEAT.md interval or deliver through an external messaging channel; the proof exercises the local embedded-runner heartbeat payload path that formats the exec failure before channel delivery.

Regression Test Plan

  • Coverage level: focused regression test plus related heartbeat/agent shards.
  • Target test file: src/agents/embedded-agent-runner/run/payloads.test.ts.
  • Scenario locked in: heartbeat-triggered exec failure for show last 20 lines of ~/.openclaw/workspace/memory/2026-06-04.md includes No such file or directory in the payload.
  • Why this is the smallest reliable guardrail: the bug was in the embedded-runner payload warning policy, so this directly covers the formatting decision that previously dropped the exec error details.

Commands run:

env -C "/media/vdc/code/ai/aispace/openclaw-worktrees/issue-90452" node scripts/run-vitest.mjs src/agents/embedded-agent-runner/run/payloads.test.ts src/infra/heartbeat-events-filter.test.ts src/infra/heartbeat-runner.ghost-reminder.test.ts src/agents/bash-tools.test.ts
env -C "/media/vdc/code/ai/aispace/openclaw-worktrees/issue-90452" ./node_modules/.bin/oxfmt --check --threads=1 src/agents/embedded-agent-runner/run.ts src/agents/embedded-agent-runner/run/payloads.ts src/agents/embedded-agent-runner/run/payloads.test.ts
env -C "/media/vdc/code/ai/aispace/openclaw-worktrees/issue-90452" node scripts/run-tsgo.mjs -p tsconfig.core.json --incremental --tsBuildInfoFile .artifacts/tsgo-cache/core.tsbuildinfo
git -C "/media/vdc/code/ai/aispace/openclaw-worktrees/issue-90452" diff --check

Observed test output:

[test] passed 3 Vitest shards in 34.97s

Checking formatting...

All matched files use the correct format.

tsgo:core and git diff --check completed with no output and exit code 0.

Merge risk

  • Why it matters / User impact: HEARTBEAT.md exec tasks that fail should tell the user what the command reported, especially for filesystem or shell errors where the command output is the actionable part.
  • Risk labels considered: merge-risk:runtime-behavior, merge-risk:low.
  • Risk explanation: The change only affects embedded-runner tool-error payload formatting when the active trigger is heartbeat; cron timeout handling and ordinary compact tool-error behavior keep their existing conditions.
  • Why acceptable: The output was already collected in lastToolError.error; this only makes heartbeat exec-like failures eligible to surface that existing detail instead of dropping it.
  • Fix classification: Root-cause regression fix with a focused regression test.
  • Maintainer-ready confidence: High for the local payload-formatting path: production function proof, focused regression test, related heartbeat tests, formatting, typecheck, and diff whitespace checks all pass.
  • Patch quality notes: Small 3-file patch, no public API, config, schema, dependency, channel delivery, or scheduler changes.
  • Why this is root-cause fix: The missing trigger classification in the payload warning policy caused the real exec error to be omitted; threading isHeartbeatTrigger through that policy fixes the decision point directly.

Root Cause

  • Root cause: embedded runner payloads only included exec tool error details for verbose mode or cron timeout cases; heartbeat-triggered exec failures were not marked as a detail-eligible trigger, so lastToolError.error was dropped.
  • Missing detection / guardrail: there was no regression test covering heartbeat-triggered exec-like tool failures with captured stderr/stdout details.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: XS proof: supplied External PR includes structured after-fix real behavior proof. labels Jun 6, 2026
@clawsweeper

clawsweeper Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Codex review: passed. Reviewed June 8, 2026, 1:47 AM ET / 05:47 UTC.

Summary
The PR threads heartbeat trigger state into embedded-runner payload formatting so heartbeat exec-like failures include captured error details, with a focused regression test.

PR surface: Source +12, Tests +18. Total +30 across 3 files.

Reproducibility: yes. Source inspection shows current main and v2026.6.1 only include raw exec details for verbose mode or timed-out cron cases, so a non-timeout heartbeat exec failure follows the generic warning path; I did not run the wall-clock heartbeat scenario in this read-only review.

Review metrics: none identified.

Merge readiness
Overall: 🦞 diamond lobster
Proof: 🦞 diamond lobster
Patch quality: 🦞 diamond lobster
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Next step before merge

  • [P2] No repair lane is needed because the automerge-opted PR has no actionable review findings and sufficient proof.

Security
Cleared: The diff only threads a trigger flag through payload formatting and adds a test; it does not change dependencies, secrets, permissions, or command execution.

Review details

Best possible solution:

Land the narrow payload-builder fix after the usual automerge gates, keeping scheduler and channel delivery behavior unchanged.

Do we have a high-confidence way to reproduce the issue?

Yes. Source inspection shows current main and v2026.6.1 only include raw exec details for verbose mode or timed-out cron cases, so a non-timeout heartbeat exec failure follows the generic warning path; I did not run the wall-clock heartbeat scenario in this read-only review.

Is this the best way to solve the issue?

Yes. The payload builder is the narrow decision point for whether collected tool-error details are surfaced, and threading the existing trigger into that boundary is cleaner than changing scheduler or channel delivery code.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against a4f0e508dfcb.

Label changes

Label justifications:

  • P1: The linked regression hides actionable heartbeat exec failure output from users in an agent automation workflow.
  • rating: 🦞 diamond lobster: Overall readiness is 🦞 diamond lobster; proof is 🦞 diamond lobster and patch quality is 🦞 diamond lobster.
  • status: 🚀 automerge armed: This PR is in ClawSweeper's automerge lane. Sufficient (live_output): The PR body supplies after-fix live output from the production payload builder showing the heartbeat exec failure includes the captured error detail.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body supplies after-fix live output from the production payload builder showing the heartbeat exec failure includes the captured error detail.
Evidence reviewed

PR surface:

Source +12, Tests +18. Total +30 across 3 files.

View PR surface stats
Area Files Added Removed Net
Source 2 13 1 +12
Tests 1 18 0 +18
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 3 31 1 +30

What I checked:

Likely related people:

  • vincentkoc: Current main blame and file history for the payload builder, tool-error summary, mutation classifier, and runEmbeddedAgent caller point to recent commits by Vincent Koc. (role: recent area contributor; confidence: high; commits: 5d7e0b73a7b0, 2e08f0f4221f; files: src/agents/embedded-agent-runner/run/payloads.ts, src/agents/embedded-agent-runner/run.ts, src/agents/tool-error-summary.ts)
  • Michael Appel: Commit 19a2e9d changed heartbeat exec completion detection and tests, which is adjacent to the reported heartbeat exec output path. (role: adjacent exec-completion contributor; confidence: medium; commits: 19a2e9ddb5a8; files: src/infra/heartbeat-events-filter.ts, src/infra/heartbeat-events-filter.test.ts)
  • Chinar Amrutkar: Commit 103bebd added HEARTBEAT.md task batching, the user-facing automation surface involved in the linked regression report. (role: heartbeat task feature contributor; confidence: medium; commits: 103bebd651a5; files: src/auto-reply/heartbeat.ts, src/infra/heartbeat-runner.ts)
  • steipete: Recent heartbeat event-filter and ghost-reminder history includes repeated commits by Peter Steinberger around heartbeat routing and prompt filtering. (role: adjacent heartbeat history contributor; confidence: medium; commits: c1b3a49320, e2362d352d, 77876bd05c; files: src/infra/heartbeat-events-filter.ts, src/infra/heartbeat-runner.ghost-reminder.test.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P1 High-priority user-facing bug, regression, or broken workflow. labels Jun 6, 2026
@Takhoffman

Copy link
Copy Markdown
Contributor

@clawsweeper automerge

@clawsweeper

clawsweeper Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

🦞✅
ClawSweeper merged this PR after the passing review.

Source: clawsweeper[bot]
Feedback: structured ClawSweeper verdict: pass (sha=85c5d6fb9fe7f9ba0c224ea8d65ac02f33a08840)
Merge status: merged by ClawSweeper automerge
Merged at: 2026-06-08T05:47:54Z
Merge commit: b2c1de77acc7

What merged:

  • The PR threads heartbeat trigger state into embedded-runner payload formatting so heartbeat exec-like failures include captured error details, with a focused regression test.
  • PR surface: Source +12, Tests +18. Total +30 across 3 files.
  • Reproducibility: yes. Source inspection shows current main and v2026.6.1 only include raw exec details for v ... follows the generic warning path; I did not run the wall-clock heartbeat scenario in this read-only review.

Automerge notes:

The automerge loop is complete.

Automerge progress:

  • 2026-06-08 01:55:16 UTC review queued 0e7a0e4cc94a (queued)
  • 2026-06-08 02:03:08 UTC review passed 0e7a0e4cc94a (structured ClawSweeper verdict: pass (sha=0e7a0e4cc94ad1bfc04b873901960fca98509...)
  • 2026-06-08 02:13:15 UTC review queued 85c5d6fb9fe7 (after repair)
  • 2026-06-08 05:47:41 UTC review passed 85c5d6fb9fe7 (structured ClawSweeper verdict: pass (sha=85c5d6fb9fe7f9ba0c224ea8d65ac02f33a08...)
  • 2026-06-08 05:38:19 UTC review queued 85c5d6fb9fe7 (queued)
  • 2026-06-08 05:47:56 UTC merged 85c5d6fb9fe7 (merged by ClawSweeper automerge)

@clawsweeper clawsweeper Bot added clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge status: 🚀 automerge armed This PR is in ClawSweeper's automerge lane. and removed status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels Jun 8, 2026
@clawsweeper clawsweeper Bot force-pushed the feat/issue-90452 branch from 0e7a0e4 to 85c5d6f Compare June 8, 2026 02:13
@clawsweeper clawsweeper Bot added rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. labels Jun 8, 2026
@clawsweeper clawsweeper Bot merged commit b2c1de7 into openclaw:main Jun 8, 2026
167 of 169 checks passed
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request Jun 9, 2026
… generic fallback text instead of actual output (openclaw#90897)

Summary:
- The PR threads heartbeat trigger state into embedded-runner payload formatting so heartbeat exec-like failures include captured error details, with a focused regression test.
- PR surface: Source +12, Tests +18. Total +30 across 3 files.
- Reproducibility: yes. Source inspection shows current main and v2026.6.1 only include raw exec details for v ... follows the generic warning path; I did not run the wall-clock heartbeat scenario in this read-only review.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix openclaw#90452: Regression: Heartbeat exec completion still shows generic…

Validation:
- ClawSweeper review passed for head 85c5d6f.
- Required merge gates passed before the squash merge.

Prepared head SHA: 85c5d6f
Review: openclaw#90897 (comment)

Co-authored-by: 杨浩宇0668001029 <yang.haoyu@xydigit.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge P1 High-priority user-facing bug, regression, or broken workflow. proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. size: XS status: 🚀 automerge armed This PR is in ClawSweeper's automerge lane.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regression: Heartbeat exec completion still shows generic fallback text instead of actual output

2 participants