Skip to content

fix: honor exec approval security and clean up stale tasks#60310

Merged
steipete merged 6 commits into
openclaw:mainfrom
lml2468:fix/issue-60268
Apr 4, 2026
Merged

fix: honor exec approval security and clean up stale tasks#60310
steipete merged 6 commits into
openclaw:mainfrom
lml2468:fix/issue-60268

Conversation

@lml2468

@lml2468 lml2468 commented Apr 3, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR now contains two verified fixes:

  1. Gateway exec now honors exec-approvals.json agent security directly, matching the node-host path.
  2. Task maintenance now correctly marks stale cron runs and CLI tasks backed only by long-lived chat sessions as lost.

The earlier cron wake fallback experiment was removed from this PR.

Changes

Exec approvals (#60268)

src/agents/bash-tools.exec-host-shared.ts

  • change gateway hostSecurity to use approvals.agent.security directly instead of intersecting it with the tool default via minSecurity(...)
  • keep existing ask precedence logic unchanged
  • add a focused regression test for the broader-security case

Why this is correct:

  • exec-approvals.json is the explicit user security policy
  • node-host already treated the resolved approval security as authoritative
  • when no agent/default override exists, approvals.agent.security falls back to params.security, so behavior stays backward-compatible

Task maintenance (#60299)

src/tasks/task-registry.maintenance.ts

  • treat runtime="cron" tasks as having no persistent backing session after the run exits
  • treat runtime="cli" tasks with channel/group/direct child session keys as stale even if that long-lived chat session still exists in the session store
  • keep subagent and acp behavior intact

src/tasks/task-registry.maintenance.issue-60299.test.ts

  • add regression coverage for stale cron tasks, stale CLI tasks backed by persistent chat sessions, and the healthy subagent case

Verification

  • pnpm check
  • pnpm test src/tasks/task-registry.maintenance.issue-60299.test.ts

I also added a targeted exec regression test in-tree. A standalone local runner for that file was sticky in this environment, so the landed proof for the exec path is the code change plus that focused unit coverage.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: M labels Apr 3, 2026
@greptile-apps

greptile-apps Bot commented Apr 3, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR bundles three bug fixes across different subsystems: (1) the main fix aligns gateway exec security with node-host by making exec-approvals.json authoritative for hostSecurity instead of intersecting it with the tool-default via minSecurity; (2) a cron timer fix ensures requestHeartbeatNow is called when runHeartbeatOnce is skipped for a non-busy reason, preventing enqueued system events from silently going unconsumed; (3) hasBackingSession in task maintenance is fixed to correctly handle cron tasks (always stale) and cli tasks whose childSessionKey points to a persistent channel/group/direct session.

The security fix is backward-compatible: when no explicit entry exists in exec-approvals.json, approvals.agent.security resolves through resolveExecApprovalsFromFile back to params.security (the tool/runtime default), matching pre-fix behavior. The maintenance fixes are well-covered by the new regression test file.

Confidence Score: 5/5

Safe to merge — all three fixes are backward-compatible, well-commented, and the maintenance path has direct regression test coverage.

No P0 or P1 findings. The exec security fix is provably backward-compatible (verified via resolveExecApprovalsFromFile fallback chain). The timer change follows the same requestHeartbeatNow pattern already used elsewhere in the same function. The hasBackingSession refactor correctly handles all four TaskRuntime values. Tests cover the three newly fixed stale-task cases plus a healthy subagent non-regression case.

No files require special attention.

Reviews (1): Last reviewed commit: "fix: subagents now inherit exec approval..." | Re-trigger Greptile

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 73e9a2b1c9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +83 to +85
if (task.runtime === "cron") {
return false;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Stop treating every active cron task as orphaned

Returning false for all runtime === "cron" tasks makes shouldMarkLost() reconcile any running cron task to lost once the 5-minute grace expires, even when the job is still executing normally. This is a real runtime path because cron agent turns are allowed to run far longer (default safety timeout is up to 60 minutes), so long jobs will be mislabeled as lost mid-run and can emit incorrect terminal state/notifications before they actually finish.

Useful? React with 👍 / 👎.

@steipete steipete changed the title fix: subagents inherit exec approval security context from parent (gateway path parity with node-host) fix: honor exec approval security and clean up stale tasks Apr 4, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c4b1bcdff7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +100 to +103
const chatType = deriveSessionChatType(childSessionKey);
if (chatType === "channel" || chatType === "group" || chatType === "direct") {
return false;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve liveness checks for active CLI chat runs

Returning false for CLI tasks whose childSessionKey parses as channel/group/direct makes shouldMarkLost() flip them to lost after the 5-minute grace even when the run is still healthy. This path is reachable for gateway-ingress tasks (runtime: "cli") that use chat-scoped session keys, and agent runs are allowed to exceed 5 minutes (default timeout is far longer), so long-running turns will be mislabeled as terminal mid-execution and can be swept/inspected as dead work before they actually finish.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 258e5e7d33

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +352 to +356
function resolveBaselineFloor(
provider: ProviderKey,
lane: BaselineLane,
): LiveCacheFloor | undefined {
return (LIVE_CACHE_REGRESSION_BASELINE[provider] as Record<string, LiveCacheFloor>)[lane];

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove duplicate resolveBaselineFloor implementation

This introduces a second resolveBaselineFloor implementation while the original one earlier in this file is still present. TypeScript treats this as duplicate function implementations, which can break pnpm tsgo and prevent importing this module (for example through scripts/check-live-cache.ts and the live cache regression test) before any runtime logic executes. Keep a single helper and change its parameter type in place instead of redefining it.

Useful? React with 👍 / 👎.

@openclaw-barnacle openclaw-barnacle Bot added the commands Command implementations label Apr 4, 2026
@steipete steipete merged commit 19036ef into openclaw:main Apr 4, 2026
9 checks passed
@steipete

steipete commented Apr 4, 2026

Copy link
Copy Markdown
Contributor

Landed via temp rebase onto main.

  • Gate: pnpm check, pnpm build, pnpm test src/tasks/task-registry.maintenance.issue-60299.test.ts, pnpm test src/channels/plugins/config-schema.test.ts, plus CI
  • Land commit: 4e424d9
  • Merge commit: 19036ef

Thanks @lml2468!

lovewanwan pushed a commit to lovewanwan/openclaw that referenced this pull request Apr 28, 2026
lovewanwan pushed a commit to lovewanwan/openclaw that referenced this pull request Apr 28, 2026
lovewanwan pushed a commit to lovewanwan/openclaw that referenced this pull request Apr 28, 2026
lovewanwan pushed a commit to lovewanwan/openclaw that referenced this pull request Apr 28, 2026
ogt-redknie pushed a commit to ogt-redknie/OPENX that referenced this pull request May 2, 2026
ogt-redknie pushed a commit to ogt-redknie/OPENX that referenced this pull request May 2, 2026
ogt-redknie pushed a commit to ogt-redknie/OPENX that referenced this pull request May 2, 2026
ogt-redknie pushed a commit to ogt-redknie/OPENX that referenced this pull request May 2, 2026
Feelw00 added a commit to Feelw00/openclaw that referenced this pull request May 6, 2026
…lost marker

Upstream 7d1575b (openclaw#60310, 2026-04-04) introduced activeJobIds plus
markCronJobActive/clearCronJobActive so task-registry maintenance has a
backing-session signal for runtime='cron' tasks
(task-registry.maintenance.ts hasBackingSession). That patch wired the
pair into runDueJob and executeJob but left the manual-run path
(prepareManualRun + finishPreparedManualRun in src/cron/service/ops.ts)
without mark/clear.

Symptom: when a user triggers `openclaw cron run <id> --force` (or any
manual-run RPC) and the run exceeds TASK_RECONCILE_GRACE_MS (5 min) —
common for force-mode `agentTurn` runs which can reach
AGENT_TURN_SAFETY_TIMEOUT_MS (60 min) — task-registry sweeper marks the
active task `lost` and emits a `Background task lost` system message to
the session, even though the run is still progressing normally.

The merged commit 1fae716 (resolveDurableCronTaskRecovery, 2026-04-26)
reconciles terminal status retroactively from cron run-log +
store.lastRunStatus, but only after the run finishes. This patch
suppresses the transient `lost` marker by adding the producer-side
mark/clear pair, restoring symmetry with runDueJob/executeJob:

  * prepareManualRun: markCronJobActive(job.id) after tryCreateManualTaskRun.
  * finishPreparedManualRun: wrap body in try/finally with
    clearCronJobActive(jobId).

Scope intentionally narrower than openclaw#71040 (closed):
  * No change to runStartupCatchupCandidate — the `deferAgentTurnJobs:true`
    policy added in 7877182 reroutes long-running startup catchups to
    runDueJob/executeJob (already wired). Non-agentTurn startup catchups
    are theoretical hot-path per pre-PR cross-review.
  * No change to active-jobs.ts API.

Regression test (src/cron/active-jobs-manual-run.test.ts): two cases
exercising production hot-path via cron.run("<id>", "force") — success
and inner-throw — assert isCronJobActive transitions true→false around
the manual run.

Fixes openclaw#78233.

[AI-assisted, fully tested]
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
Feelw00 added a commit to Feelw00/openclaw that referenced this pull request May 11, 2026
…lost marker

Upstream 7d1575b (openclaw#60310, 2026-04-04) introduced activeJobIds plus
markCronJobActive/clearCronJobActive so task-registry maintenance has a
backing-session signal for runtime='cron' tasks
(task-registry.maintenance.ts hasBackingSession). That patch wired the
pair into runDueJob and executeJob but left the manual-run path
(prepareManualRun + finishPreparedManualRun in src/cron/service/ops.ts)
without mark/clear.

Symptom: when a user triggers `openclaw cron run <id> --force` (or any
manual-run RPC) and the run exceeds TASK_RECONCILE_GRACE_MS (5 min) —
common for force-mode `agentTurn` runs which can reach
AGENT_TURN_SAFETY_TIMEOUT_MS (60 min) — task-registry sweeper marks the
active task `lost` and emits a `Background task lost` system message to
the session, even though the run is still progressing normally.

The merged commit 1fae716 (resolveDurableCronTaskRecovery, 2026-04-26)
reconciles terminal status retroactively from cron run-log +
store.lastRunStatus, but only after the run finishes. This patch
suppresses the transient `lost` marker by adding the producer-side
mark/clear pair, restoring symmetry with runDueJob/executeJob:

  * prepareManualRun: markCronJobActive(job.id) after tryCreateManualTaskRun.
  * finishPreparedManualRun: wrap body in try/finally with
    clearCronJobActive(jobId).

Scope intentionally narrower than openclaw#71040 (closed):
  * No change to runStartupCatchupCandidate — the `deferAgentTurnJobs:true`
    policy added in 7877182 reroutes long-running startup catchups to
    runDueJob/executeJob (already wired). Non-agentTurn startup catchups
    are theoretical hot-path per pre-PR cross-review.
  * No change to active-jobs.ts API.

Regression test (src/cron/active-jobs-manual-run.test.ts): two cases
exercising production hot-path via cron.run("<id>", "force") — success
and inner-throw — assert isCronJobActive transitions true→false around
the manual run.

Fixes openclaw#78233.

[AI-assisted, fully tested]
steipete pushed a commit to Feelw00/openclaw that referenced this pull request May 11, 2026
…lost marker

Upstream 7d1575b (openclaw#60310, 2026-04-04) introduced activeJobIds plus
markCronJobActive/clearCronJobActive so task-registry maintenance has a
backing-session signal for runtime='cron' tasks
(task-registry.maintenance.ts hasBackingSession). That patch wired the
pair into runDueJob and executeJob but left the manual-run path
(prepareManualRun + finishPreparedManualRun in src/cron/service/ops.ts)
without mark/clear.

Symptom: when a user triggers `openclaw cron run <id> --force` (or any
manual-run RPC) and the run exceeds TASK_RECONCILE_GRACE_MS (5 min) —
common for force-mode `agentTurn` runs which can reach
AGENT_TURN_SAFETY_TIMEOUT_MS (60 min) — task-registry sweeper marks the
active task `lost` and emits a `Background task lost` system message to
the session, even though the run is still progressing normally.

The merged commit 1fae716 (resolveDurableCronTaskRecovery, 2026-04-26)
reconciles terminal status retroactively from cron run-log +
store.lastRunStatus, but only after the run finishes. This patch
suppresses the transient `lost` marker by adding the producer-side
mark/clear pair, restoring symmetry with runDueJob/executeJob:

  * prepareManualRun: markCronJobActive(job.id) after tryCreateManualTaskRun.
  * finishPreparedManualRun: wrap body in try/finally with
    clearCronJobActive(jobId).

Scope intentionally narrower than openclaw#71040 (closed):
  * No change to runStartupCatchupCandidate — the `deferAgentTurnJobs:true`
    policy added in 7877182 reroutes long-running startup catchups to
    runDueJob/executeJob (already wired). Non-agentTurn startup catchups
    are theoretical hot-path per pre-PR cross-review.
  * No change to active-jobs.ts API.

Regression test (src/cron/active-jobs-manual-run.test.ts): two cases
exercising production hot-path via cron.run("<id>", "force") — success
and inner-throw — assert isCronJobActive transitions true→false around
the manual run.

Fixes openclaw#78233.

[AI-assisted, fully tested]
steipete pushed a commit to Feelw00/openclaw that referenced this pull request May 11, 2026
…lost marker

Upstream 7d1575b (openclaw#60310, 2026-04-04) introduced activeJobIds plus
markCronJobActive/clearCronJobActive so task-registry maintenance has a
backing-session signal for runtime='cron' tasks
(task-registry.maintenance.ts hasBackingSession). That patch wired the
pair into runDueJob and executeJob but left the manual-run path
(prepareManualRun + finishPreparedManualRun in src/cron/service/ops.ts)
without mark/clear.

Symptom: when a user triggers `openclaw cron run <id> --force` (or any
manual-run RPC) and the run exceeds TASK_RECONCILE_GRACE_MS (5 min) —
common for force-mode `agentTurn` runs which can reach
AGENT_TURN_SAFETY_TIMEOUT_MS (60 min) — task-registry sweeper marks the
active task `lost` and emits a `Background task lost` system message to
the session, even though the run is still progressing normally.

The merged commit 1fae716 (resolveDurableCronTaskRecovery, 2026-04-26)
reconciles terminal status retroactively from cron run-log +
store.lastRunStatus, but only after the run finishes. This patch
suppresses the transient `lost` marker by adding the producer-side
mark/clear pair, restoring symmetry with runDueJob/executeJob:

  * prepareManualRun: markCronJobActive(job.id) after tryCreateManualTaskRun.
  * finishPreparedManualRun: wrap body in try/finally with
    clearCronJobActive(jobId).

Scope intentionally narrower than openclaw#71040 (closed):
  * No change to runStartupCatchupCandidate — the `deferAgentTurnJobs:true`
    policy added in 7877182 reroutes long-running startup catchups to
    runDueJob/executeJob (already wired). Non-agentTurn startup catchups
    are theoretical hot-path per pre-PR cross-review.
  * No change to active-jobs.ts API.

Regression test (src/cron/active-jobs-manual-run.test.ts): two cases
exercising production hot-path via cron.run("<id>", "force") — success
and inner-throw — assert isCronJobActive transitions true→false around
the manual run.

Fixes openclaw#78233.

[AI-assisted, fully tested]
steipete pushed a commit to Feelw00/openclaw that referenced this pull request May 11, 2026
…lost marker

Upstream 7d1575b (openclaw#60310, 2026-04-04) introduced activeJobIds plus
markCronJobActive/clearCronJobActive so task-registry maintenance has a
backing-session signal for runtime='cron' tasks
(task-registry.maintenance.ts hasBackingSession). That patch wired the
pair into runDueJob and executeJob but left the manual-run path
(prepareManualRun + finishPreparedManualRun in src/cron/service/ops.ts)
without mark/clear.

Symptom: when a user triggers `openclaw cron run <id> --force` (or any
manual-run RPC) and the run exceeds TASK_RECONCILE_GRACE_MS (5 min) —
common for force-mode `agentTurn` runs which can reach
AGENT_TURN_SAFETY_TIMEOUT_MS (60 min) — task-registry sweeper marks the
active task `lost` and emits a `Background task lost` system message to
the session, even though the run is still progressing normally.

The merged commit 1fae716 (resolveDurableCronTaskRecovery, 2026-04-26)
reconciles terminal status retroactively from cron run-log +
store.lastRunStatus, but only after the run finishes. This patch
suppresses the transient `lost` marker by adding the producer-side
mark/clear pair, restoring symmetry with runDueJob/executeJob:

  * prepareManualRun: markCronJobActive(job.id) after tryCreateManualTaskRun.
  * finishPreparedManualRun: wrap body in try/finally with
    clearCronJobActive(jobId).

Scope intentionally narrower than openclaw#71040 (closed):
  * No change to runStartupCatchupCandidate — the `deferAgentTurnJobs:true`
    policy added in 7877182 reroutes long-running startup catchups to
    runDueJob/executeJob (already wired). Non-agentTurn startup catchups
    are theoretical hot-path per pre-PR cross-review.
  * No change to active-jobs.ts API.

Regression test (src/cron/active-jobs-manual-run.test.ts): two cases
exercising production hot-path via cron.run("<id>", "force") — success
and inner-throw — assert isCronJobActive transitions true→false around
the manual run.

Fixes openclaw#78233.

[AI-assisted, fully tested]
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 24, 2026
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 24, 2026
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 24, 2026
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling commands Command implementations size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants