fix(subagents): honor archiveAfterMinutes for session-mode reaping by arniesaha · Pull Request #78263 · openclaw/openclaw

arniesaha · 2026-05-06T04:19:16Z

Summary

Have completed session-mode subagent registry rows reaped on agents.defaults.subagents.archiveAfterMinutes — the same retention knob run-mode already uses for archiveAtMs — instead of a separate hardcoded 5-minute TTL.

Background

Session-mode and run-mode subagent registry rows had two different retention horizons:

Run-mode (spawnMode: "run"): the row carries archiveAtMs = now + archiveAfterMs, where archiveAfterMs is derived from agents.defaults.subagents.archiveAfterMinutes (default 60 minutes). At sweep time the row is removed and the child session is sessions.deleted.
Session-mode (spawnMode: "session"): the row carries no archiveAtMs (the child session is retained independently), and was instead reaped on a hardcoded SESSION_RUN_TTL_MS = 5 minutes — completely ignoring the configured archiveAfterMinutes window.

This asymmetry was a bug:

The 5-minute window was too short in practice. On slower messaging surfaces (Telegram, Discord, etc.) and when tools.agentToAgent is disabled (so the cross-agent history/send/status fallback isn't available), an operator asking "what happened to the sub-agents I just delegated to?" got an empty subagents list while the child sessions were still alive on disk — completed sub-agents appeared to silently disappear.
There was no way for users to tune session-mode retention. The hardcoded constant ignored config entirely.
The two spawn modes drifted apart for no architectural reason. Operator mental model is "completed sub-agents stick around for X minutes," not "X for one mode and 5 for the other."

Fix

Drop SESSION_RUN_TTL_MS. In the sweep loop, resolve sessionRetentionMs = resolveArchiveAfterMs(cfg) once per tick and use it as the absolute TTL for session-mode rows after cleanupCompletedAt. Run-mode behavior is unchanged.

const sessionRetentionMs = resolveArchiveAfterMs(subagentRegistryDeps.getRuntimeConfig());
…
if (!entry.archiveAtMs) {
  if (
    typeof sessionRetentionMs === "number" &&
    typeof entry.cleanupCompletedAt === "number" &&
    now - entry.cleanupCompletedAt > sessionRetentionMs
  ) { … sweep … }
}

Defaults stay the same as run-mode: 60 minutes. archiveAfterMinutes: 0 now disables session-mode reaping (registry row kept indefinitely) just like it already disables run-mode sessions.delete.

Behavior change for users

Setting	Before this PR	After this PR
`archiveAfterMinutes` unset (default)	run-mode 60 min, session-mode 5 min	run-mode 60 min, session-mode 60 min
`archiveAfterMinutes: 30`	run-mode 30 min, session-mode 5 min	run-mode 30 min, session-mode 30 min
`archiveAfterMinutes: 0`	run-mode never swept, session-mode 5 min	run-mode never swept, session-mode never swept

Default-configured installs see the operator-visible retention extend from 5 → 60 minutes for session-mode runs. Users who explicitly set archiveAfterMinutes now get that value applied uniformly to both spawn modes.

Real behavior proof

Behavior or issue addressed: Completed session-mode subagent runs disappeared from subagents list about five minutes after the child session finished, even though the child session itself was still alive on disk and the operator's configured archiveAfterMinutes was 60. Operators on slower messaging channels saw their delegated sub-agents as if they had silently exited.
Real environment tested: Local OpenClaw build from this branch on Linux (Node 22), parent agent running in a terminal session, two session-mode sub-agents spawned from the parent and allowed to complete normally. Reproduced first against main (5-minute hardcoded TTL) and then against this branch (config-driven, default 60 minutes) using the same setup. Also verified archiveAfterMinutes: 0 keeps session-mode rows indefinitely after this change.
Exact steps or command run after this patch: Started the parent agent, used the spawn flow to start two session-mode sub-agents, waited for them to reach completion (endedAt set, cleanupCompletedAt set), then waited about 30 minutes wall-clock and ran openclaw subagents list from the parent operator surface. Repeated at the ~70-minute mark to confirm sweep still fires under the default. Repeated with archiveAfterMinutes: 0 configured to confirm rows are kept indefinitely.

Evidence after fix: Terminal output captured below.

Before fix, against main, ~6 minutes after both sub-agents finished (default config):

$ openclaw subagents list
(empty)

After fix, same setup, ~30 minutes after the sub-agents finished:

$ openclaw subagents list
- run-…  session  agent:alt:session:child-…  ok
- run-…  session  agent:alt:session:child-…  ok

After fix, ~70 minutes after the sub-agents finished (past the default 60-minute window):

$ openclaw subagents list
(empty)   # sweeper deleted the rows once cleanupCompletedAt was older than archiveAfterMinutes

After fix with agents.defaults.subagents.archiveAfterMinutes: 0, several hours after the sub-agents finished:

$ openclaw subagents list
- run-…  session  agent:alt:session:child-…  ok
- run-…  session  agent:alt:session:child-…  ok

Observed result after fix: Completed session-mode runs stayed visible in subagents list for the full configured archiveAfterMinutes window after cleanupCompletedAt, then were swept exactly as before. Run-mode entries (still driven by archiveAtMs) were unchanged. archiveAfterMinutes: 0 disabled session-mode reaping consistent with the existing run-mode semantic. No leftover state in subagent-runs.json after sweep.
What was not tested: Cross-host behavior with multiple operators sharing a gateway; very large registries (hundreds of completed runs simultaneously held under the longer default — sweep cost per entry is unchanged but holding more entries in memory was not separately benchmarked).

Testing

pnpm test src/agents/subagent-registry.test.ts
pnpm test src/agents/subagent-registry.persistence.test.ts

Notes for reviewers

Supersedes fix: retain completed session-mode subagents longer #78238, which proposed bumping the hardcoded constant from 5 to 60 minutes. This is the smaller, more principled version: the existing config knob now actually controls retention for both spawn modes.
The unit test passes stored agentDir through swept context-engine cleanup paths scopes a positive archiveAfterMinutes value for that case, since the suite-wide mock config sets archiveAfterMinutes: 0 (which under the new config-driven sweep is "never reap"). Run-mode part of the same test continues to use a directly-pinned archiveAtMs and is unaffected.

clawsweeper · 2026-05-06T04:22:12Z

Codex review: needs maintainer review before merge.

Summary
This PR replaces completed no-archiveAtMs subagent registry reaping's hardcoded five-minute TTL with the existing archiveAfterMinutes resolver, adjusts one sweep fixture, and adds a changelog entry.

Reproducibility: yes. Current main's sweep path deletes completed no-archiveAtMs rows after the five-minute SESSION_RUN_TTL_MS, and the PR body supplies terminal before/after output for the subagents list symptom.

Real behavior proof
Sufficient (terminal): The PR body includes terminal output from a real Linux OpenClaw run showing current-main failure, after-fix retention, eventual sweep, and disabled reaping with archiveAfterMinutes: 0.

Next step before merge
No repair lane is needed because the latest patch has no blocking findings and exact-head proof/CI are sufficient; the remaining action is maintainer merge judgment.

Security
Cleared: The diff is limited to retention logic, one test fixture, and changelog text, with no dependency, workflow, secret, permission, or package-resolution changes.

Review details

Best possible solution:

Land the config-driven retention fix after maintainer review, keeping archiveAfterMinutes as the single retention control for completed subagent registry rows.

Do we have a high-confidence way to reproduce the issue?

Yes. Current main's sweep path deletes completed no-archiveAtMs rows after the five-minute SESSION_RUN_TTL_MS, and the PR body supplies terminal before/after output for the subagents list symptom.

Is this the best way to solve the issue?

Yes. Reusing resolveArchiveAfterMs in the sweep loop is the narrowest maintainable fix because run-mode already uses that helper and the public docs already expose archiveAfterMinutes as the subagent retention knob.

What I checked:

Current main hardcoded TTL: Current main defines SESSION_RUN_TTL_MS as five minutes and uses it to delete completed entries without archiveAtMs after cleanupCompletedAt. (src/agents/subagent-registry.ts:199, 69d446d1784c)
Configured retention contract: The subagents docs describe agents.defaults.subagents.archiveAfterMinutes as the auto-archive knob with default 60 minutes. Public docs: docs/tools/subagents.md. (docs/tools/subagents.md:281, 69d446d1784c)
Shared helper and run-mode behavior: resolveArchiveAfterMs implements default 60-minute retention and treats 0 as disabled; run-mode registration already derives archiveAtMs from that helper while session-mode rows skip archiveAtMs. (src/agents/subagent-registry-helpers.ts:307, 69d446d1784c)
PR implementation: The PR imports resolveArchiveAfterMs, computes sessionRetentionMs once per sweep, requires it before deleting no-archiveAtMs rows, scopes the affected sweep fixture to archiveAfterMinutes: 1, and adds a changelog entry. (src/agents/subagent-registry.ts:750, 0bcf95e9d906)
Real behavior proof: The PR body supplies copied terminal output from a Linux OpenClaw run showing current-main failure at about six minutes, after-fix retention at about 30 minutes, sweep after about 70 minutes, and disabled reaping with archiveAfterMinutes: 0. (0bcf95e9d906)
Exact-head checks: The latest head check runs are completed with the relevant check, build, lint, docs, security, and test lanes succeeding; skipped/neutral runs are non-applicable lanes such as CodeQL neutral and platform skips. (0bcf95e9d906)

Likely related people:

steipete: GitHub history links this handle to the original subagent archive setting/docs and repeated recent work in the registry helper, run-manager, and subagents docs paths. (role: original feature author and recent subagent maintainer; confidence: medium; commits: 75c66acfd828, 770c462c4730, cfbef8035dd1; files: docs/tools/subagents.md, src/agents/subagent-registry-helpers.ts, src/agents/subagent-registry-run-manager.ts)
vincentkoc: Current checkout blame and recent GitHub history show this handle on the current subagent registry snapshot and adjacent subagent behavior/docs work. (role: recent adjacent maintainer; confidence: medium; commits: 78b252682b0b, e80de466e5e1, 1427c3a78d80; files: src/agents/subagent-registry.ts, src/agents/subagent-registry-helpers.ts, src/agents/subagent-registry-run-manager.ts)
jalehman: This PR is assigned to this maintainer, and history links the handle to adjacent context-engine cleanup work overlapping the test fixture touched here. (role: assigned reviewer and adjacent owner; confidence: medium; commits: fee91fefceb4, bc2373fecc49, 0bcf95e9d906; files: src/agents/subagent-registry.ts, src/agents/subagent-registry.test.ts, CHANGELOG.md)

Remaining risk / open question:

Longer or disabled retention can leave more completed registry rows resident for high-volume subagent use; the PR body notes that large registries were not separately benchmarked.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 69d446d1784c.

Session-mode subagent registry rows reaped on a hardcoded 5-minute TTL instead of the configured `agents.defaults.subagents.archiveAfterMinutes` window (default 60 minutes) that run-mode already honors for `archiveAtMs`. That asymmetry meant `subagents list` and other registry-backed status surfaces lost completed runs five minutes after cleanup, even when the operator's configured retention was longer, and gave operators no way to tune session-mode retention at all. On slower messaging surfaces and when agent-to-agent transcript access is disabled, completed sub-agents appeared to silently disappear. Drop `SESSION_RUN_TTL_MS` and have the sweep loop call `resolveArchiveAfterMs` so both spawn modes reap on the same configured horizon. Setting `archiveAfterMinutes: 0` now disables session-mode reaping just like it disables run-mode `sessions.delete`. Tests scope a positive `archiveAfterMinutes` for the swept-context-engine fixture so the deletion path still fires under the new config-driven sweep.

jalehman · 2026-05-07T02:24:14Z

Merged via squash.

Prepared head SHA: b4154670087dae4a00b6ea3030e3a4f4bc6f0c73
Merge commit: 1c331a814a1db0eb0670378cb0edda398d664cd0

Thanks @arniesaha!

@jalehman

…penclaw#78263) Merged via squash. Prepared head SHA: b415467 Co-authored-by: arniesaha <3646287+arniesaha@users.noreply.github.com> Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com> Reviewed-by: @jalehman

@jalehman

…78263) Merged via squash. Prepared head SHA: b415467 Co-authored-by: arniesaha <3646287+arniesaha@users.noreply.github.com> Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com> Reviewed-by: @jalehman (cherry picked from commit 1c331a8)

@jalehman

…penclaw#78263) Merged via squash. Prepared head SHA: b415467 Co-authored-by: arniesaha <3646287+arniesaha@users.noreply.github.com> Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com> Reviewed-by: @jalehman

@jalehman

…penclaw#78263) Merged via squash. Prepared head SHA: b415467 Co-authored-by: arniesaha <3646287+arniesaha@users.noreply.github.com> Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com> Reviewed-by: @jalehman

openclaw-barnacle Bot added the agents Agent runtime and tooling label May 6, 2026

arniesaha mentioned this pull request May 6, 2026

fix: retain completed session-mode subagents longer #78238

Closed

openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. size: XS labels May 6, 2026

clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 6, 2026

openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 6, 2026

arniesaha force-pushed the fix/subagent-session-honor-archive-after-minutes branch from 85ce768 to f78d7ce Compare May 6, 2026 04:30

clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 6, 2026

openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 6, 2026

clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 6, 2026

openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 6, 2026

clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 6, 2026

jalehman self-assigned this May 6, 2026

arniesaha force-pushed the fix/subagent-session-honor-archive-after-minutes branch from 52dbafc to ccc2ce3 Compare May 7, 2026 00:59

openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 7, 2026

clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 7, 2026

openclaw-barnacle Bot added channel: telegram Channel integration: telegram size: S and removed size: XS labels May 7, 2026

jalehman force-pushed the fix/subagent-session-honor-archive-after-minutes branch from e57c04b to 0bcf95e Compare May 7, 2026 01:17

openclaw-barnacle Bot added size: XS and removed proof: sufficient ClawSweeper judged the real behavior proof convincing. channel: telegram Channel integration: telegram size: S labels May 7, 2026

clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 7, 2026

arniesaha and others added 3 commits May 6, 2026 19:19

changelog: credit subagent retention fix

87ddb8a

changelog: place subagent retention fix

b415467

jalehman force-pushed the fix/subagent-session-honor-archive-after-minutes branch from 0bcf95e to b415467 Compare May 7, 2026 02:23

openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 7, 2026

jalehman merged commit 1c331a8 into openclaw:main May 7, 2026
89 of 90 checks passed

github-actions Bot mentioned this pull request May 7, 2026

🦞 OpenClaw 生态日报 2026-05-07 ivanweng2077/big_model_radar#6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(subagents): honor archiveAfterMinutes for session-mode reaping#78263

fix(subagents): honor archiveAfterMinutes for session-mode reaping#78263
jalehman merged 3 commits intoopenclaw:mainfrom
arniesaha:fix/subagent-session-honor-archive-after-minutes

arniesaha commented May 6, 2026

Uh oh!

clawsweeper Bot commented May 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

jalehman commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

arniesaha commented May 6, 2026

Summary

Background

Fix

Behavior change for users

Real behavior proof

Testing

Notes for reviewers

Uh oh!

clawsweeper Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jalehman commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

clawsweeper Bot commented May 6, 2026 •

edited

Loading