Skip to content

fix(heartbeat): align response tool prompts#76458

Merged
pashpashpash merged 3 commits intomainfrom
fix/heartbeat-tool-main-regressions
May 3, 2026
Merged

fix(heartbeat): align response tool prompts#76458
pashpashpash merged 3 commits intomainfrom
fix/heartbeat-tool-main-regressions

Conversation

@vincentkoc
Copy link
Copy Markdown
Member

@vincentkoc vincentkoc commented May 3, 2026

Summary

  • Problem: Stop heartbeat tool turns from asking for HEARTBEAT_OK #76338 can put heartbeat runs into heartbeat_respond prompt mode even when the runtime has not enabled the heartbeat_respond tool.
  • Why it matters: Codex/provider-Codex heartbeats and commitment-only check-ins can be told to call a tool that is unavailable, so quiet notify=false outcomes stop being reliable.
  • What changed: response-tool heartbeat prompts now explicitly enable/force the heartbeat tool through reply options, and commitment-only heartbeats stay on the legacy HEARTBEAT_OK ack path because that path intentionally disables tools.
  • What did NOT change: due heartbeat tasks remain tool-capable, and the retired agentRuntime.fallback config key remains retired.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

Root Cause (if applicable)

  • Root cause: Stop heartbeat tool turns from asking for HEARTBEAT_OK #76338 changed the heartbeat prompt contract without carrying the matching heartbeat-tool availability through GetReplyOptions into embedded PI tool construction, and without accounting for commitment-only runs that deliberately set disableTools: true.
  • Missing detection / guardrail: tests checked prompt text but not the reply option/tool availability contract, and live Codex fixtures still wrote the removed agentRuntime.fallback key.
  • Contributing context: Codex app-server heartbeat runs already force the tool internally, but provider-Codex/PI paths need the explicit option wiring.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/infra/heartbeat-runner.tool-response.test.ts, src/infra/heartbeat-runner.commitments.test.ts, extensions/qa-lab/src/scenario-catalog.test.ts
  • Scenario the test should lock in: structured heartbeat prompts carry enableHeartbeatTool/forceHeartbeatTool; tool-disabled commitment prompts do not mention heartbeat_respond; QA/live Codex fixtures no longer write removed fallback config.
  • Why this is the smallest reliable guardrail: it covers the prompt/options seam at the heartbeat runner boundary and keeps live scenario config parseable without running a full live Codex harness.
  • Existing test that already covers this (if any): N/A
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

Heartbeat quiet/no-notify paths are more reliable for Codex/provider-Codex runs. Live Codex QA fixtures use the current agentRuntime: { id: "codex" } shape.

Diagram (if applicable)

Before:
heartbeat prompt -> asks for heartbeat_respond -> tool may be unavailable
commitment prompt -> asks for heartbeat_respond -> tools disabled

After:
structured heartbeat prompt -> enables heartbeat_respond -> structured notify result
commitment-only prompt -> HEARTBEAT_OK -> tools disabled remains safe

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? Yes
  • Data access scope changed? No
  • If any Yes, explain risk + mitigation: heartbeat runs that already use the structured heartbeat prompt now explicitly receive only the matching heartbeat response tool availability. Commitment-only runs keep tools disabled and use the text ack path.

Repro + Verification

Environment

  • OS: macOS local; Linux Blacksmith Testbox
  • Runtime/container: Node 22 / pnpm
  • Model/provider: N/A for unit coverage
  • Integration/channel (if any): heartbeat runner, Telegram test harness, QA scenario catalog
  • Relevant config (redacted): messages.visibleReplies: "message_tool", agentRuntime.id: "codex"

Steps

  1. Run focused heartbeat tests.
  2. Run QA scenario catalog test.
  3. Run changed-file gate in Blacksmith Testbox.

Expected

  • Structured heartbeat prompts receive the heartbeat tool option plumbing.
  • Commitment-only prompts do not request unavailable tools.
  • QA Codex fixtures do not write removed fallback config.

Actual

  • Fixed by this PR.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

  • Verified scenarios:
    • pnpm test:serial src/infra/heartbeat-runner.tool-response.test.ts src/infra/heartbeat-runner.commitments.test.ts -- --reporter=verbose
    • pnpm test:serial extensions/qa-lab/src/scenario-catalog.test.ts -- --reporter=verbose
    • OPENCLAW_TESTBOX=1 pnpm check:changed via Testbox tbx_01kqp3gh7gxvw8dv8qp2z2yqr6
  • Edge cases checked: message-tool mode, Codex harness session, auto-selected Codex model, env-forced Codex runtime, due heartbeat tasks, commitment-only heartbeat with tools disabled, QA scenario config parsing.
  • What you did not verify: live Codex app-server run against a real account.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

Risks and Mitigations

  • Risk: allowing the heartbeat tool in structured heartbeat runs could widen tools for commitment content.
    • Mitigation: commitment-only runs keep disableTools: true and use HEARTBEAT_OK; only structured heartbeat prompts enable the heartbeat response tool.

@openclaw-barnacle openclaw-barnacle Bot added scripts Repository scripts agents Agent runtime and tooling extensions: qa-lab labels May 3, 2026
@vincentkoc vincentkoc requested a review from pashpashpash May 3, 2026 05:06
@vincentkoc vincentkoc self-assigned this May 3, 2026
@openclaw-barnacle openclaw-barnacle Bot added size: S maintainer Maintainer-authored PR labels May 3, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 3, 2026

Codex review: needs maintainer review before merge.

Summary
The PR threads heartbeat response-tool enable/force options through embedded PI runs, keeps due-commitment heartbeats on the legacy text ack path, removes retired agentRuntime.fallback from QA/live fixtures, and adds regression coverage plus a changelog entry.

Reproducibility: yes. Source inspection on current main shows heartbeat runs can be prompted to call heartbeat_respond without the embedded PI tool path receiving matching availability options, and due-commitment runs can get that prompt while tools are disabled.

Next step before merge
A protected maintainer-labeled MEMBER PR with no automated repair finding needs maintainer review and exact-head validation, not a replacement fix lane.

Security
Cleared: The diff exposes the existing heartbeat result-recording tool only to heartbeat runs whose prompt requests it, keeps commitment-only runs tool-disabled, and does not touch dependencies, workflows, secrets, or package resolution.

Review details

Best possible solution:

Land this PR or an equivalent narrow fix after maintainer review and exact-head validation, preserving structured heartbeat prompts only when heartbeat_respond is available and keeping tool-disabled commitment check-ins on HEARTBEAT_OK.

Do we have a high-confidence way to reproduce the issue?

Yes. Source inspection on current main shows heartbeat runs can be prompted to call heartbeat_respond without the embedded PI tool path receiving matching availability options, and due-commitment runs can get that prompt while tools are disabled.

Is this the best way to solve the issue?

Yes. The proposed fix is the narrowest maintainable path I found: carry the prompt/tool decision through reply options into tool construction, and force commitment-only runs back to the legacy text ack contract.

Acceptance criteria:

  • pnpm test:serial src/infra/heartbeat-runner.tool-response.test.ts src/infra/heartbeat-runner.commitments.test.ts -- --reporter=verbose
  • pnpm test:serial extensions/qa-lab/src/scenario-catalog.test.ts -- --reporter=verbose
  • OPENCLAW_TESTBOX=1 pnpm check:changed

What I checked:

  • Protected PR state: Provided GitHub context shows authorAssociation MEMBER, label maintainer, draft false, mergeable true, and head SHA 774dea171b564bdf32e0c4d3f9968bb6dcf4eb46, so conservative cleanup should not close it. (774dea171b56)
  • Current-main prompt/tool mismatch: Current main selects the structured heartbeat_respond prompt for messages.visibleReplies: "message_tool" and Codex-style heartbeat runs, but the heartbeat reply options at the run boundary do not include enableHeartbeatTool or forceHeartbeatTool. (src/infra/heartbeat-runner.ts:406, f696be950bdd)
  • Commitment-only conflict on main: Current main passes useHeartbeatResponseTool into commitment prompt construction while later setting disableTools: true for due commitments, which can ask for heartbeat_respond in a tool-disabled turn. (src/infra/heartbeat-runner.ts:1048, f696be950bdd)
  • Tool construction contract: The PI tool builder only keeps heartbeat_respond available when explicitly enabled or when a heartbeat run has visibleReplies: "message_tool"; current embedded runner plumbing does not forward the explicit heartbeat options. (src/agents/pi-tools.ts:426, f696be950bdd)
  • Patch addresses the seam: The PR diff adds heartbeat-tool fields to GetReplyOptions and RunEmbeddedPiAgentParams, forwards them through runAgentTurnWithFallback, runEmbeddedPiAgent, and runEmbeddedAttempt, and adds tests that assert structured heartbeat prompts receive matching options while commitment-only prompts do not mention heartbeat_respond. (src/infra/heartbeat-runner.ts:1584, 774dea171b56)
  • Fallback fixture cleanup is consistent: Current main's public agent runtime policy schema is strict agentRuntime: { id }, while QA/live fixture files still write or assert agentRuntime.fallback; the PR removes those stale fixture fields. (src/config/zod-schema.agent-runtime.ts:820, f696be950bdd)

Likely related people:

Remaining risk / open question:

  • No tests were independently run in this read-only review; the PR body reports targeted heartbeat/QA tests and a Testbox changed-file gate.

Codex review notes: model gpt-5.5, reasoning high; reviewed against f696be950bdd.

@vincentkoc vincentkoc marked this pull request as ready for review May 3, 2026 05:14
Copy link
Copy Markdown
Contributor

@pashpashpash pashpashpash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vincentkoc. This is the right follow-up to #76338: the prompt contract and tool-availability contract need to move together, and keeping tool-disabled commitment check-ins on the legacy ack path is the correct boundary.

@pashpashpash pashpashpash merged commit 877eb1c into main May 3, 2026
103 checks passed
@pashpashpash pashpashpash deleted the fix/heartbeat-tool-main-regressions branch May 3, 2026 14:19
lxe pushed a commit to lxe/openclaw that referenced this pull request May 6, 2026
* fix(heartbeat): align response tool prompts

* docs(changelog): credit heartbeat prompt fix
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
* fix(heartbeat): align response tool prompts

* docs(changelog): credit heartbeat prompt fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling extensions: qa-lab maintainer Maintainer-authored PR scripts Repository scripts size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants