fix: clean up session records and run logs on cron.remove (#46369)#46506
fix: clean up session records and run logs on cron.remove (#46369)#46506anup00900 wants to merge 6 commits intoopenclaw:mainfrom
Conversation
…6369) When a cron job is deleted, only the job config was removed from jobs.json. The session records in sessions.json and the run log file remained, causing ghost sessions to appear in the Control UI. After removing the job config, asynchronously: 1. Delete matching session entries (cron:{jobId}:*) from sessions.json 2. Delete the run log file at {cronStoreDir}/runs/{jobId}.jsonl Cleanup failures are logged but do not block the removal response.
|
@ngutman @vincentkoc — this fixes the ghost sessions bug reported in #46369. Small, self-contained change: session records and run log files are now cleaned up when a cron job is deleted. All 523 cron tests pass. Happy to squash if preferred. |
Greptile SummaryThis PR fixes a ghost-session bug (#46369) where deleting a cron job via Key points:
Confidence Score: 4/5
Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/cron/service/ops.ts
Line: 357-384
Comment:
**Step 2 silently skipped when step 1 throws**
The two cleanup operations are independent, but they run sequentially without isolation. If `updateSessionStore` throws (e.g. a lock timeout or I/O error), the function rejects immediately and `fs.unlink` is never attempted. The outer `.catch()` handler logs one message, but there is no indication that the run-log deletion was skipped entirely.
Since the caller already swallows all errors with `.catch()`, each step should guard itself independently so both are always attempted:
```
async function cleanupRemovedJobArtifacts(state: CronServiceState, jobId: string): Promise<void> {
// 1. Remove matching session entries from sessions.json.
const sessionStorePath = state.deps.sessionStorePath ?? state.deps.resolveSessionStorePath?.();
if (sessionStorePath) {
const cronKeyFragment = `cron:${jobId}:`;
await updateSessionStore(sessionStorePath, (store) => {
for (const key of Object.keys(store)) {
if (key.includes(cronKeyFragment)) {
delete store[key];
}
}
});
}
// 2. Delete the run log file ({cronStoreDir}/runs/{jobId}.jsonl).
const logPath = resolveCronRunLogPath({
storePath: state.deps.storePath,
jobId,
});
await fs.unlink(logPath).catch((err: unknown) => {
// ENOENT is fine — the log may not exist yet.
if ((err as NodeJS.ErrnoException).code !== "ENOENT") {
throw err;
}
});
}
```
Or alternatively, run both operations in parallel via `Promise.allSettled` and surface both individual errors in the catch handler.
How can I resolve this? If you propose a fix, please make it concise.Last reviewed commit: 33d11b1 |
| async function cleanupRemovedJobArtifacts(state: CronServiceState, jobId: string): Promise<void> { | ||
| // 1. Remove matching session entries from sessions.json. | ||
| const sessionStorePath = state.deps.sessionStorePath ?? state.deps.resolveSessionStorePath?.(); | ||
| if (sessionStorePath) { | ||
| const cronKeyFragment = `cron:${jobId}:`; | ||
| await updateSessionStore(sessionStorePath, (store) => { | ||
| for (const key of Object.keys(store)) { | ||
| if (key.includes(cronKeyFragment)) { | ||
| delete store[key]; | ||
| } | ||
| } | ||
| }); | ||
| } | ||
|
|
||
| // 2. Delete the run log file ({cronStoreDir}/runs/{jobId}.jsonl). | ||
| try { | ||
| const logPath = resolveCronRunLogPath({ | ||
| storePath: state.deps.storePath, | ||
| jobId, | ||
| }); | ||
| await fs.unlink(logPath); | ||
| } catch (err: unknown) { | ||
| // ENOENT is fine — the log may not exist yet. | ||
| if ((err as NodeJS.ErrnoException).code !== "ENOENT") { | ||
| throw err; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Step 2 silently skipped when step 1 throws
The two cleanup operations are independent, but they run sequentially without isolation. If updateSessionStore throws (e.g. a lock timeout or I/O error), the function rejects immediately and fs.unlink is never attempted. The outer .catch() handler logs one message, but there is no indication that the run-log deletion was skipped entirely.
Since the caller already swallows all errors with .catch(), each step should guard itself independently so both are always attempted:
async function cleanupRemovedJobArtifacts(state: CronServiceState, jobId: string): Promise<void> {
// 1. Remove matching session entries from sessions.json.
const sessionStorePath = state.deps.sessionStorePath ?? state.deps.resolveSessionStorePath?.();
if (sessionStorePath) {
const cronKeyFragment = `cron:${jobId}:`;
await updateSessionStore(sessionStorePath, (store) => {
for (const key of Object.keys(store)) {
if (key.includes(cronKeyFragment)) {
delete store[key];
}
}
});
}
// 2. Delete the run log file ({cronStoreDir}/runs/{jobId}.jsonl).
const logPath = resolveCronRunLogPath({
storePath: state.deps.storePath,
jobId,
});
await fs.unlink(logPath).catch((err: unknown) => {
// ENOENT is fine — the log may not exist yet.
if ((err as NodeJS.ErrnoException).code !== "ENOENT") {
throw err;
}
});
}
Or alternatively, run both operations in parallel via Promise.allSettled and surface both individual errors in the catch handler.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cron/service/ops.ts
Line: 357-384
Comment:
**Step 2 silently skipped when step 1 throws**
The two cleanup operations are independent, but they run sequentially without isolation. If `updateSessionStore` throws (e.g. a lock timeout or I/O error), the function rejects immediately and `fs.unlink` is never attempted. The outer `.catch()` handler logs one message, but there is no indication that the run-log deletion was skipped entirely.
Since the caller already swallows all errors with `.catch()`, each step should guard itself independently so both are always attempted:
```
async function cleanupRemovedJobArtifacts(state: CronServiceState, jobId: string): Promise<void> {
// 1. Remove matching session entries from sessions.json.
const sessionStorePath = state.deps.sessionStorePath ?? state.deps.resolveSessionStorePath?.();
if (sessionStorePath) {
const cronKeyFragment = `cron:${jobId}:`;
await updateSessionStore(sessionStorePath, (store) => {
for (const key of Object.keys(store)) {
if (key.includes(cronKeyFragment)) {
delete store[key];
}
}
});
}
// 2. Delete the run log file ({cronStoreDir}/runs/{jobId}.jsonl).
const logPath = resolveCronRunLogPath({
storePath: state.deps.storePath,
jobId,
});
await fs.unlink(logPath).catch((err: unknown) => {
// ENOENT is fine — the log may not exist yet.
if ((err as NodeJS.ErrnoException).code !== "ENOENT") {
throw err;
}
});
}
```
Or alternatively, run both operations in parallel via `Promise.allSettled` and surface both individual errors in the catch handler.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 33d11b18e3
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| */ | ||
| async function cleanupRemovedJobArtifacts(state: CronServiceState, jobId: string): Promise<void> { | ||
| // 1. Remove matching session entries from sessions.json. | ||
| const sessionStorePath = state.deps.sessionStorePath ?? state.deps.resolveSessionStorePath?.(); |
There was a problem hiding this comment.
Resolve session store path for removed job's agent
cleanupRemovedJobArtifacts looks up sessionStorePath via resolveSessionStorePath?.() with no agentId, which falls back to the default agent store in the gateway wiring; that means removing a cron job owned by a non-default agent in per-agent session-store setups cleans the wrong sessions.json and leaves the deleted job’s cron session entries behind. Because the job is filtered out before cleanup, this path has no way to recover the removed job’s agentId, so ghost sessions still persist for non-default-agent jobs.
Useful? React with 👍 / 👎.
… path - Use Promise.allSettled so session cleanup and run log deletion are independent; a failure in one does not skip the other - Capture the removed job's agentId before filtering and pass it to resolveSessionStorePath so non-default-agent jobs clean the correct sessions.json
…sion-cleanup-46369
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e7f5411a15
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const sessionStorePath = | ||
| state.deps.sessionStorePath ?? state.deps.resolveSessionStorePath?.(agentId); |
There was a problem hiding this comment.
Use removed job agent store for session cleanup
cleanupSessionEntries always prefers state.deps.sessionStorePath, but the cron gateway wiring provides both sessionStorePath and resolveSessionStorePath, with sessionStorePath pinned to the default agent store (src/gateway/server-cron.ts:224-237). In multi-agent deployments, deleting a non-default-agent cron job therefore updates the wrong sessions.json, leaving that job’s cron sessions behind in its real per-agent store.
Useful? React with 👍 / 👎.
| const cronKeyFragment = `cron:${jobId}:`; | ||
| await updateSessionStore(sessionStorePath, (store) => { | ||
| for (const key of Object.keys(store)) { | ||
| if (key.includes(cronKeyFragment)) { | ||
| delete store[key]; |
There was a problem hiding this comment.
Remove canonical cron session key when deleting a job
The cleanup matcher cron:${jobId}: skips the canonical cron key agent:<agent>:cron:<jobId> and only catches keys with another segment after the job id (for example :run: aliases). Isolated cron runs persist both keys (src/cron/isolated-agent/run.ts:356-363), and session listing already hides run aliases while keeping the canonical one (src/gateway/session-utils.ts:870), so a deleted job can still leave a visible ghost session entry.
Useful? React with 👍 / 👎.
- Prefer resolveSessionStorePath(agentId) over static sessionStorePath
so multi-agent deployments clean the correct sessions.json
- Match both canonical key (cron:{jobId}) and run keys (cron:{jobId}:run:*)
to prevent ghost sessions from either key type
- Fix pre-existing oxfmt formatting in message-handler.ts
Introduced in bb06dc7 on main but never called, causing oxlint failure.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fdacae676c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| cleanupRemovedJobArtifacts(state, id, removedJob?.agentId).catch((err) => { | ||
| state.deps.log.warn({ err: String(err) }, "cron: cleanup after remove failed"); |
There was a problem hiding this comment.
Re-clean removed cron sessions after active run finishes
cron.remove launches cleanupRemovedJobArtifacts immediately and does not coordinate with in-flight isolated runs, so deleting a job during execution is still prone to ghost sessions: runCronIsolatedAgentTurn persists the canonical cron session key again at run completion (src/cron/isolated-agent/run.ts writes both agentSessionKey and runSessionKey in persistSessionEntry), which can happen after this one-time cleanup and recreate the deleted job's visible session entry.
Useful? React with 👍 / 👎.
|
Re: race with in-flight runs Good catch — if a cron job is actively running when This is a known edge case that the existing session reaper already handles — it sweeps expired cron sessions after the retention period (24h default, configurable via Coordinating with in-flight runs would require adding cancellation/abort signaling to the isolated agent runner, which is significantly more complex and better suited for a separate PR. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bad2d0e544
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const logPath = resolveCronRunLogPath({ | ||
| storePath: state.deps.storePath, | ||
| jobId, | ||
| }); | ||
| await fs.unlink(logPath).catch((err: unknown) => { |
There was a problem hiding this comment.
Wait for pending cron run-log writes before unlinking
cleanupRunLog unlinks {runs}/{jobId}.jsonl immediately, but cron run logs are appended asynchronously via a fire-and-forget appendCronRunLog call in src/gateway/server-cron.ts (lines 478-505). If cron.remove is called while a run is finishing (or while a previous append is still queued), the later append recreates the file after this unlink, so deleted jobs can still leave stale run-log artifacts. This makes cleanup nondeterministic for in-flight/recent runs and should coordinate with pending writes or run completion before deleting.
Useful? React with 👍 / 👎.
|
Codex review: needs changes before merge. Summary Reproducibility: yes. Current main source shows Next step before merge Security Review findings
Review detailsBest possible solution: Revise the cron removal path so artifact cleanup is final after active runs and pending log writes settle, while preserving removed-event job snapshots and adding regression coverage plus a changelog entry. Do we have a high-confidence way to reproduce the issue? Yes. Current main source shows Is this the best way to solve the issue? No, not as-is. The cleanup direction is narrow and maintainable, but it needs to preserve the removed-event contract and coordinate with active cron completion plus queued run-log writes. Full review comments:
Overall correctness: patch is incorrect Acceptance criteria:
What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 9c37cfcbdbf7. |
Summary
Fixes #46369 — when deleting a cron job via
cron.remove, only the job config was removed fromjobs.json. Session records insessions.jsonand the run log file remained, causing "ghost sessions" to appear in the Control UI dropdown.The fix: After removing the job config, asynchronously clean up:
cron:{jobId}:*fromsessions.jsonviaupdateSessionStore{cronStoreDir}/runs/{jobId}.jsonlviafs.unlinkCleanup runs outside the critical path — failures are logged but don't block the removal response.
Changes
src/cron/service/ops.ts: AddedcleanupRemovedJobArtifacts()called after successful job removalsrc/cron/service.remove-cleanup.test.ts: 2 new tests verifying session and run log cleanupTest plan
.jsonlfile is deleted after job removal