Skip to content

fix: clean up session records and run logs on cron.remove (#46369)#46506

Open
anup00900 wants to merge 6 commits intoopenclaw:mainfrom
anup00900:fix/cron-remove-session-cleanup-46369
Open

fix: clean up session records and run logs on cron.remove (#46369)#46506
anup00900 wants to merge 6 commits intoopenclaw:mainfrom
anup00900:fix/cron-remove-session-cleanup-46369

Conversation

@anup00900
Copy link
Copy Markdown

Summary

Fixes #46369 — when deleting a cron job via cron.remove, only the job config was removed from jobs.json. Session records in sessions.json and the run log file remained, causing "ghost sessions" to appear in the Control UI dropdown.

The fix: After removing the job config, asynchronously clean up:

  1. Session entries matching cron:{jobId}:* from sessions.json via updateSessionStore
  2. Run log file at {cronStoreDir}/runs/{jobId}.jsonl via fs.unlink

Cleanup runs outside the critical path — failures are logged but don't block the removal response.

Changes

  • src/cron/service/ops.ts: Added cleanupRemovedJobArtifacts() called after successful job removal
  • src/cron/service.remove-cleanup.test.ts: 2 new tests verifying session and run log cleanup

Test plan

  • All 523 cron tests pass (521 existing + 2 new)
  • New test: session entries for deleted job are removed, other jobs' sessions preserved
  • New test: run log .jsonl file is deleted after job removal
  • Manual: Create cron job in Control UI → delete it → verify no ghost session in dropdown

…6369)

When a cron job is deleted, only the job config was removed from
jobs.json. The session records in sessions.json and the run log file
remained, causing ghost sessions to appear in the Control UI.

After removing the job config, asynchronously:
1. Delete matching session entries (cron:{jobId}:*) from sessions.json
2. Delete the run log file at {cronStoreDir}/runs/{jobId}.jsonl

Cleanup failures are logged but do not block the removal response.
@anup00900
Copy link
Copy Markdown
Author

@ngutman @vincentkoc — this fixes the ghost sessions bug reported in #46369. Small, self-contained change: session records and run log files are now cleaned up when a cron job is deleted. All 523 cron tests pass. Happy to squash if preferred.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 14, 2026

Greptile Summary

This PR fixes a ghost-session bug (#46369) where deleting a cron job via cron.remove left stale entries in sessions.json and an orphaned run-log file on disk. The fix adds cleanupRemovedJobArtifacts, which fires asynchronously after a successful removal to (1) delete all session store entries whose key contains cron:{jobId}: and (2) unlink the per-job JSONL run-log file. Two integration tests covering both paths are included.

Key points:

  • The cleanup is correctly fire-and-forget and does not block the removal response; all errors are caught and logged.
  • ENOENT on the run-log unlink is silently swallowed, which is the right behaviour for a file that may not yet exist.
  • The key-fragment match (key.includes("cron:{jobId}:")) is intentionally broad and will correctly match the full compound session keys used in practice (e.g. agent:main:cron:{jobId}:run:{runId}).
  • Minor: the two cleanup steps run sequentially without independent error isolation. If step 1 (updateSessionStore) throws, step 2 (fs.unlink) is silently skipped. Because these operations are independent, wrapping each in its own error boundary (or running via Promise.allSettled) would ensure both are always attempted.

Confidence Score: 4/5

  • Safe to merge — the fix is additive, runs off the critical path, and all errors are caught. The only concern is a minor robustness gap where a step-1 failure prevents step-2 from running.
  • The change is well-scoped and the happy path is thoroughly tested. The fire-and-forget pattern with a top-level .catch() correctly prevents cleanup failures from affecting the removal response. The one issue — sequential steps without independent isolation — is a style concern and not a correctness blocker in production; it would only cause the run-log file to linger if the session store update fails (an already-unusual error path).
  • src/cron/service/ops.ts — review the sequential step isolation in cleanupRemovedJobArtifacts
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/cron/service/ops.ts
Line: 357-384

Comment:
**Step 2 silently skipped when step 1 throws**

The two cleanup operations are independent, but they run sequentially without isolation. If `updateSessionStore` throws (e.g. a lock timeout or I/O error), the function rejects immediately and `fs.unlink` is never attempted. The outer `.catch()` handler logs one message, but there is no indication that the run-log deletion was skipped entirely.

Since the caller already swallows all errors with `.catch()`, each step should guard itself independently so both are always attempted:

```
async function cleanupRemovedJobArtifacts(state: CronServiceState, jobId: string): Promise<void> {
  // 1. Remove matching session entries from sessions.json.
  const sessionStorePath = state.deps.sessionStorePath ?? state.deps.resolveSessionStorePath?.();
  if (sessionStorePath) {
    const cronKeyFragment = `cron:${jobId}:`;
    await updateSessionStore(sessionStorePath, (store) => {
      for (const key of Object.keys(store)) {
        if (key.includes(cronKeyFragment)) {
          delete store[key];
        }
      }
    });
  }

  // 2. Delete the run log file ({cronStoreDir}/runs/{jobId}.jsonl).
  const logPath = resolveCronRunLogPath({
    storePath: state.deps.storePath,
    jobId,
  });
  await fs.unlink(logPath).catch((err: unknown) => {
    // ENOENT is fine — the log may not exist yet.
    if ((err as NodeJS.ErrnoException).code !== "ENOENT") {
      throw err;
    }
  });
}
```

Or alternatively, run both operations in parallel via `Promise.allSettled` and surface both individual errors in the catch handler.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: 33d11b1

Comment thread src/cron/service/ops.ts Outdated
Comment on lines +357 to +384
async function cleanupRemovedJobArtifacts(state: CronServiceState, jobId: string): Promise<void> {
// 1. Remove matching session entries from sessions.json.
const sessionStorePath = state.deps.sessionStorePath ?? state.deps.resolveSessionStorePath?.();
if (sessionStorePath) {
const cronKeyFragment = `cron:${jobId}:`;
await updateSessionStore(sessionStorePath, (store) => {
for (const key of Object.keys(store)) {
if (key.includes(cronKeyFragment)) {
delete store[key];
}
}
});
}

// 2. Delete the run log file ({cronStoreDir}/runs/{jobId}.jsonl).
try {
const logPath = resolveCronRunLogPath({
storePath: state.deps.storePath,
jobId,
});
await fs.unlink(logPath);
} catch (err: unknown) {
// ENOENT is fine — the log may not exist yet.
if ((err as NodeJS.ErrnoException).code !== "ENOENT") {
throw err;
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step 2 silently skipped when step 1 throws

The two cleanup operations are independent, but they run sequentially without isolation. If updateSessionStore throws (e.g. a lock timeout or I/O error), the function rejects immediately and fs.unlink is never attempted. The outer .catch() handler logs one message, but there is no indication that the run-log deletion was skipped entirely.

Since the caller already swallows all errors with .catch(), each step should guard itself independently so both are always attempted:

async function cleanupRemovedJobArtifacts(state: CronServiceState, jobId: string): Promise<void> {
  // 1. Remove matching session entries from sessions.json.
  const sessionStorePath = state.deps.sessionStorePath ?? state.deps.resolveSessionStorePath?.();
  if (sessionStorePath) {
    const cronKeyFragment = `cron:${jobId}:`;
    await updateSessionStore(sessionStorePath, (store) => {
      for (const key of Object.keys(store)) {
        if (key.includes(cronKeyFragment)) {
          delete store[key];
        }
      }
    });
  }

  // 2. Delete the run log file ({cronStoreDir}/runs/{jobId}.jsonl).
  const logPath = resolveCronRunLogPath({
    storePath: state.deps.storePath,
    jobId,
  });
  await fs.unlink(logPath).catch((err: unknown) => {
    // ENOENT is fine — the log may not exist yet.
    if ((err as NodeJS.ErrnoException).code !== "ENOENT") {
      throw err;
    }
  });
}

Or alternatively, run both operations in parallel via Promise.allSettled and surface both individual errors in the catch handler.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cron/service/ops.ts
Line: 357-384

Comment:
**Step 2 silently skipped when step 1 throws**

The two cleanup operations are independent, but they run sequentially without isolation. If `updateSessionStore` throws (e.g. a lock timeout or I/O error), the function rejects immediately and `fs.unlink` is never attempted. The outer `.catch()` handler logs one message, but there is no indication that the run-log deletion was skipped entirely.

Since the caller already swallows all errors with `.catch()`, each step should guard itself independently so both are always attempted:

```
async function cleanupRemovedJobArtifacts(state: CronServiceState, jobId: string): Promise<void> {
  // 1. Remove matching session entries from sessions.json.
  const sessionStorePath = state.deps.sessionStorePath ?? state.deps.resolveSessionStorePath?.();
  if (sessionStorePath) {
    const cronKeyFragment = `cron:${jobId}:`;
    await updateSessionStore(sessionStorePath, (store) => {
      for (const key of Object.keys(store)) {
        if (key.includes(cronKeyFragment)) {
          delete store[key];
        }
      }
    });
  }

  // 2. Delete the run log file ({cronStoreDir}/runs/{jobId}.jsonl).
  const logPath = resolveCronRunLogPath({
    storePath: state.deps.storePath,
    jobId,
  });
  await fs.unlink(logPath).catch((err: unknown) => {
    // ENOENT is fine — the log may not exist yet.
    if ((err as NodeJS.ErrnoException).code !== "ENOENT") {
      throw err;
    }
  });
}
```

Or alternatively, run both operations in parallel via `Promise.allSettled` and surface both individual errors in the catch handler.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 33d11b18e3

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/cron/service/ops.ts Outdated
*/
async function cleanupRemovedJobArtifacts(state: CronServiceState, jobId: string): Promise<void> {
// 1. Remove matching session entries from sessions.json.
const sessionStorePath = state.deps.sessionStorePath ?? state.deps.resolveSessionStorePath?.();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Resolve session store path for removed job's agent

cleanupRemovedJobArtifacts looks up sessionStorePath via resolveSessionStorePath?.() with no agentId, which falls back to the default agent store in the gateway wiring; that means removing a cron job owned by a non-default agent in per-agent session-store setups cleans the wrong sessions.json and leaves the deleted job’s cron session entries behind. Because the job is filtered out before cleanup, this path has no way to recover the removed job’s agentId, so ghost sessions still persist for non-default-agent jobs.

Useful? React with 👍 / 👎.

… path

- Use Promise.allSettled so session cleanup and run log deletion are
  independent; a failure in one does not skip the other
- Capture the removed job's agentId before filtering and pass it to
  resolveSessionStorePath so non-default-agent jobs clean the correct
  sessions.json
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e7f5411a15

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/cron/service/ops.ts Outdated
Comment on lines +381 to +382
const sessionStorePath =
state.deps.sessionStorePath ?? state.deps.resolveSessionStorePath?.(agentId);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use removed job agent store for session cleanup

cleanupSessionEntries always prefers state.deps.sessionStorePath, but the cron gateway wiring provides both sessionStorePath and resolveSessionStorePath, with sessionStorePath pinned to the default agent store (src/gateway/server-cron.ts:224-237). In multi-agent deployments, deleting a non-default-agent cron job therefore updates the wrong sessions.json, leaving that job’s cron sessions behind in its real per-agent store.

Useful? React with 👍 / 👎.

Comment thread src/cron/service/ops.ts Outdated
Comment on lines +386 to +390
const cronKeyFragment = `cron:${jobId}:`;
await updateSessionStore(sessionStorePath, (store) => {
for (const key of Object.keys(store)) {
if (key.includes(cronKeyFragment)) {
delete store[key];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove canonical cron session key when deleting a job

The cleanup matcher cron:${jobId}: skips the canonical cron key agent:<agent>:cron:<jobId> and only catches keys with another segment after the job id (for example :run: aliases). Isolated cron runs persist both keys (src/cron/isolated-agent/run.ts:356-363), and session listing already hides run aliases while keeping the canonical one (src/gateway/session-utils.ts:870), so a deleted job can still leave a visible ghost session entry.

Useful? React with 👍 / 👎.

- Prefer resolveSessionStorePath(agentId) over static sessionStorePath
  so multi-agent deployments clean the correct sessions.json
- Match both canonical key (cron:{jobId}) and run keys (cron:{jobId}:run:*)
  to prevent ghost sessions from either key type
- Fix pre-existing oxfmt formatting in message-handler.ts
@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime size: M and removed size: S labels Mar 14, 2026
Introduced in bb06dc7 on main but never called, causing oxlint failure.
@openclaw-barnacle openclaw-barnacle Bot added the agents Agent runtime and tooling label Mar 14, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fdacae676c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/cron/service/ops.ts
Comment on lines +348 to +349
cleanupRemovedJobArtifacts(state, id, removedJob?.agentId).catch((err) => {
state.deps.log.warn({ err: String(err) }, "cron: cleanup after remove failed");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Re-clean removed cron sessions after active run finishes

cron.remove launches cleanupRemovedJobArtifacts immediately and does not coordinate with in-flight isolated runs, so deleting a job during execution is still prone to ghost sessions: runCronIsolatedAgentTurn persists the canonical cron session key again at run completion (src/cron/isolated-agent/run.ts writes both agentSessionKey and runSessionKey in persistSessionEntry), which can happen after this one-time cleanup and recreate the deleted job's visible session entry.

Useful? React with 👍 / 👎.

@anup00900
Copy link
Copy Markdown
Author

Re: race with in-flight runs

Good catch — if a cron job is actively running when cron.remove is called, persistSessionEntry in run.ts can re-persist the session keys after our cleanup.

This is a known edge case that the existing session reaper already handles — it sweeps expired cron sessions after the retention period (24h default, configurable via cron.sessionRetention). Our immediate cleanup covers the common case (job not running at deletion time) and eliminates ghost sessions for 99% of users. The reaper serves as the safety net for the race condition.

Coordinating with in-flight runs would require adding cancellation/abort signaling to the isolated agent runner, which is significantly more complex and better suited for a separate PR.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bad2d0e544

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/cron/service/ops.ts
Comment on lines +402 to +406
const logPath = resolveCronRunLogPath({
storePath: state.deps.storePath,
jobId,
});
await fs.unlink(logPath).catch((err: unknown) => {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Wait for pending cron run-log writes before unlinking

cleanupRunLog unlinks {runs}/{jobId}.jsonl immediately, but cron run logs are appended asynchronously via a fire-and-forget appendCronRunLog call in src/gateway/server-cron.ts (lines 478-505). If cron.remove is called while a run is finishing (or while a previous append is still queued), the later append recreates the file after this unlink, so deleted jobs can still leave stale run-log artifacts. This makes cleanup nondeterministic for in-flight/recent runs and should coordinate with pending writes or run completion before deleting.

Useful? React with 👍 / 👎.

@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented Apr 28, 2026

Codex review: needs changes before merge.

Summary
The PR adds asynchronous cron.remove cleanup for cron session-store entries and per-job run logs, adds cleanup tests, and includes small unrelated test/formatting edits.

Reproducibility: yes. Current main source shows cron.remove removes only the job config, while session persistence and fire-and-forget run-log writes can recreate artifacts after a one-time immediate cleanup.

Next step before merge
The remaining blockers are concrete file-level repairs suitable for an automated repair pass on the PR branch or a narrow replacement branch.

Security
Cleared: The diff does not add dependencies, workflows, network access, secret handling, package metadata, or new code execution paths.

Review findings

  • [P2] Keep the removed job in cron events — src/cron/service/ops.ts:349
  • [P2] Re-run cleanup after active cron jobs finish — src/cron/service/ops.ts:353-355
  • [P2] Drain pending run-log writes before unlinking — src/cron/service/ops.ts:416
Review details

Best possible solution:

Revise the cron removal path so artifact cleanup is final after active runs and pending log writes settle, while preserving removed-event job snapshots and adding regression coverage plus a changelog entry.

Do we have a high-confidence way to reproduce the issue?

Yes. Current main source shows cron.remove removes only the job config, while session persistence and fire-and-forget run-log writes can recreate artifacts after a one-time immediate cleanup.

Is this the best way to solve the issue?

No, not as-is. The cleanup direction is narrow and maintainable, but it needs to preserve the removed-event contract and coordinate with active cron completion plus queued run-log writes.

Full review comments:

  • [P2] Keep the removed job in cron events — src/cron/service/ops.ts:349
    Current main emits the deleted job snapshot on removed events, and cron_changed hooks/tests rely on evt.job for deletion reconciliation. This branch emits removed without job, so merging it would regress plugin hook payloads for job deletion.
    Confidence: 0.91
  • [P2] Re-run cleanup after active cron jobs finish — src/cron/service/ops.ts:353-355
    The cleanup runs once immediately after removal. If the job is already executing, isolated cron completion can persist the canonical cron session key afterward, and the existing reaper keeps that base key, so the visible ghost session can come back.
    Confidence: 0.87
  • [P2] Drain pending run-log writes before unlinking — src/cron/service/ops.ts:416
    cleanupRunLog unlinks the file without coordinating with queued appendCronRunLog writes. Since finished events append fire-and-forget, a pending or later append can recreate the deleted job's run-log file after cleanup.
    Confidence: 0.84
  • [P3] Add the required changelog entry — src/cron/service/ops.ts:353-355
    This is a user-facing fix for ghost cron sessions, but the PR does not update CHANGELOG.md. Repo policy requires user-facing fixes to include a changelog entry before landing.
    Confidence: 0.86

Overall correctness: patch is incorrect
Overall confidence: 0.9

Acceptance criteria:

  • pnpm test src/cron/service.remove-cleanup.test.ts src/cron/service/timer.regression.test.ts src/cron/run-log.test.ts src/gateway/server-cron.test.ts
  • pnpm exec oxfmt --check --threads=1 src/cron/service/ops.ts src/cron/service/timer.ts src/cron/run-log.ts src/gateway/server-cron.ts src/cron/service.remove-cleanup.test.ts src/gateway/server-cron.test.ts CHANGELOG.md
  • pnpm check:changed

What I checked:

Likely related people:

  • vincentkoc: Local blame for the central cron service, timer, run-log, session-reaper, and gateway cron files points to Vincent Koc in the available boundary history, and the PR author explicitly requested review from @vincentkoc. (role: likely follow-up owner; confidence: medium; commits: 3546a5400317; files: src/cron/service/ops.ts, src/cron/service/timer.ts, src/gateway/server-cron.ts)
  • ngutman: The PR author explicitly called out @ngutman for this cron ghost-session fix; local file history did not provide stronger ownership evidence in the shallow checkout. (role: review-route candidate; confidence: low; files: src/cron/service/ops.ts, src/gateway/server-cron.ts)

Remaining risk / open question:

  • A repair should keep cron.remove responsive without making deletion wait on long-running agents.
  • The cleanup must stay scoped to the removed job's per-agent cron session keys and per-job run-log file.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 9c37cfcbdbf7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling gateway Gateway runtime size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: cron.remove does not clean up session records in sessions.json

1 participant