fix: unify BackgroundManager terminal-state lifecycle to prevent task record loss by guazi04 · Pull Request #2293 · code-yeongyu/oh-my-openagent

guazi04 · 2026-03-04T10:27:10Z

Note: This PR now also includes the isUnstableAgent fix from #2297 (Bug 1 only).

Why Bug 1 is included here: isUnstableAgent was checking the category config default model (categoryConfigModel) instead of the resolved actual model (actualModel). When a user overrides e.g. visual-engineering to use claude-sonnet-4-6, the old code still detected gemini from the category default and incorrectly flagged it as unstable.

Why Bug 2 from #2297 is NOT needed: Bug 2 added a !currentTask null guard in the polling loop (unstable-agent-task.ts). This is unnecessary because this PR's scheduleDelayedCleanup() ensures all terminal-state tasks remain queryable for 10 minutes — so getTask() will never return undefined for a task that just completed/errored. The root cause (immediate task deletion) is fixed here.

See acamq's analysis for background.

Summary

Fixes a cluster of 5 bugs in BackgroundManager that cause background tasks to silently disappear during runtime, making results irretrievable. Also fixes isUnstableAgent model resolution (#2297 Bug 1).

Relates to #2292, #2287, #1582, #1769

Bugs Fixed

#	Severity	Bug	Root Cause	Fix
1	P1	Message sorting produces wrong order	`String()` + `localeCompare` on timestamps → lexicographic ordering	New `getTimeTimestamp()` with numeric comparison
2	P0	`session.error` immediately deletes task record	`tasks.delete()` called with no notification or delay	Mark status → notify parent → `scheduleDelayedCleanup()`
3	P0	Prune kills completed/errored tasks	`pruneStaleTasksAndNotifications` checks `startedAt` without checking task state	Skip terminal-state tasks in prune loop
4	P1	Single task completion triggers batch cleanup	`pendingByParent` Map entry missing → `allComplete` defaults to `true`	Fallback scan of all tasks for same parent
5	P1	`session.deleted` immediately deletes tasks	No query window for result retrieval	Delayed cleanup via `scheduleDelayedCleanup()`
6	P1	`isUnstableAgent` checks category default model instead of resolved model	`categoryConfigModel` included in unstable check alongside `actualModel`	Remove `categoryConfigModel` — only evaluate `actualModel` (from #2297)

Bonus fix: question: true tool ordering — was placed BEFORE ...getAgentToolRestrictions() spread, so agents like explore (which sets question: false) were overriding it. Moved to after the spread in both manager.ts and spawner.ts.

Design

All terminal paths (error, prune, deleted, completed) now follow the same pattern:

mark terminal status → notify parent session → scheduleDelayedCleanup()

scheduleDelayedCleanup() is the single unified cleanup method that:

Guards against duplicate scheduling via completionTimers.has(taskId)
Preserves a query window (TASK_CLEANUP_DELAY_MS) for users to retrieve results
Cleans up: notifications, toast, subagentSessions, SessionCategoryRegistry, task record

Changes

File	Lines	What
`manager.ts`	+131/-98	Core lifecycle fixes, `scheduleDelayedCleanup()`, question tool ordering, `connectedProvidersOverride`
`manager.test.ts`	+281/-20	Updated existing tests + 4 new regression tests
`task-poller.ts`	+3	Skip terminal-state tasks in prune loop
`spawner.ts`	+4/-4	Question tool ordering fix
`constants.ts`	+1/-1	`DEFAULT_STALE_TIMEOUT_MS` 180s→600s (align with message staleness)
`fallback-retry-handler.ts`	+8/-3	`connectedProviders` param for test isolation
`full-session-format.ts`	+4/-4	Numeric timestamp sorting
`task-result-format.ts`	+5/-8	Numeric timestamp sorting, remove unused `getTimeString`
`time-format.ts`	+12	New `getTimeTimestamp()` helper
`category-resolver.ts`	+1/-2	Remove `categoryConfigModel` from isUnstableAgent check (from #2297)
`category-resolver.test.ts`	+155	6 new tests for isUnstableAgent override scenarios (from #2297)

Testing

340 tests pass, 0 fail (332 background-agent + 8 category-resolver)
Typecheck clean
4 new regression tests covering:
- Error task remains queryable with scheduled cleanup timer
- Prune skips terminal-state tasks
- allComplete fallback scan detects running siblings when pendingByParent is missing
- session.deleted tasks have query window via delayed cleanup
- Delayed cleanup actually removes task + session registries after delay
6 new isUnstableAgent tests covering:
- User override to stable model → isUnstableAgent=false
- Default gemini/kimi model → isUnstableAgent=true
- is_unstable_agent=true config forces flag
- sisyphusJuniorModel override propagation

github-actions · 2026-03-04T10:27:22Z

All contributors have signed the CLA. Thank you! ✅
_{Posted by the CLA Assistant Lite bot.}

guazi04 · 2026-03-04T10:31:44Z

I have read the CLA Document and I hereby sign the CLA

cubic-dev-ai

2 issues found across 9 files

Confidence score: 2/5

There is a high-confidence compatibility bug in src/tools/background-task/full-session-format.ts: getTimeTimestamp does not handle Opencode’s { created: number } time shape, so timestamps become 0 and message sorting breaks.
src/features/background-agent/spawner.ts still places question: true before ...getAgentToolRestrictions(), so the intended Opencode restriction fix is not actually applied and behavior may remain incorrect.
Given the 9/10 and 7/10 severity issues with direct runtime impact, this is higher-risk to merge until these fixes are corrected.
Pay close attention to src/tools/background-task/full-session-format.ts, src/features/background-agent/spawner.ts - sorting correctness and tool restriction precedence need to be fixed for Opencode compatibility.

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/features/background-agent/spawner.ts">

<violation number="1" location="src/features/background-agent/spawner.ts:144">
P1: Custom agent: **Opencode Compatibility**

The `question: true` property is still placed before the `...getAgentToolRestrictions()` spread, failing to implement the intended fix. It must be moved after the spread so that agent-specific restrictions cannot override it.</violation>
</file>

<file name="src/tools/background-task/full-session-format.ts">

<violation number="1" location="src/tools/background-task/full-session-format.ts:63">
P0: Custom agent: **Opencode Compatibility**

Opencode SDK returns `time` as an object (`{ created: number }`), which `getTimeTimestamp` does not handle. It will return `0` for all messages, breaking the sort completely.</violation>
</file>

Since this is your first cubic review, here's how it works:

cubic automatically reviews your code and comments on bugs and improvements
Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
Add one-off context when rerunning by tagging @cubic-dev-ai with guidance or docs links (including llms.txt)
Ask questions if you need clarification on any suggestion

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-03-04T10:48:26Z

+        question: true,
        ...getAgentToolRestrictions(input.agent),


P1: Custom agent: Opencode Compatibility

The question: true property is still placed before the ...getAgentToolRestrictions() spread, failing to implement the intended fix. It must be moved after the spread so that agent-specific restrictions cannot override it.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/features/background-agent/spawner.ts, line 144: <comment>The `question: true` property is still placed before the `...getAgentToolRestrictions()` spread, failing to implement the intended fix. It must be moved after the spread so that agent-specific restrictions cannot override it.</comment> <file context> @@ -141,7 +141,7 @@ export async function startTask( task: false, call_omo_agent: true, - question: false, + question: true, ...getAgentToolRestrictions(input.agent), }, </file context>

Suggested change

question: true,

...getAgentToolRestrictions(input.agent),

...getAgentToolRestrictions(task.agent),

question: true,

This is a false positive. Look at the code again:

tools: { task: shouldAllowTaskTool(input.parentAgent, input.agent), call_omo_agent: true, ...getAgentToolRestrictions(input.agent), // line 146 — spread first ...getMemoryToolRestrictions(input.agent), // line 147 question: true, // line 148 — AFTER spread ✅ },

question: true is placed after ...getAgentToolRestrictions(), which is exactly the intended fix. Later properties in an object literal override earlier ones from spread, so question: true correctly overrides any question: false from agent restrictions.

This was the bug we fixed — the original code had question: true before the spread. Our fix moved it after.

This is wrong, when you set question: true it means question tool is not restricted for background agents which would cause background agents to stall @cubic-dev-ai confirm this claim

Thanks—I'll verify whether the question tool should remain restricted for background agents to prevent stalls and follow up.

Good catch — let me clarify the context.

In my setup, I built a custom dispatcher agent on top of the plugin that decides which execution agent to use and spawns them as background tasks. In that scenario, when a background execution agent needed user input but had question: false, it would output questions as plain text, the continuation enforcer misread it as incomplete work, causing an infinite loop.

In the default upstream architecture, sisyphus/hephaestus/atlas are primary agents (not background), so this mostly affects sisyphus-junior when spawned via category delegation. The fix is still valid there — if a sisyphus-junior background task needs clarification, the same dead loop happens.

The safety net: read-only agents (explore, librarian, oracle, etc.) already have question: false via getAgentToolRestrictions(), so they are never affected. Only execution agents spawned as background tasks benefit from question: true, and the question tool triggers a parent session notification rather than actually blocking.

None of the primary agents have notion of answering subagents' questions. Also sisyphus-junior and all other agents know that they're running in non-interactive mode, so they MUSTN'T ask questions anyway. Setting question: true would cause wrong behaviour in case any lesser model don't listen to the instructions and ask the question.

cubic-dev-ai

No issues found across 10 files

Confidence score: 5/5

Automated review surfaced no issues in the provided summaries.
No files require special attention.

_{Requires human review: The PR introduces a significant change to the task lifecycle and state management. The move to asynchronous delayed cleanup carries inherent risk of side effects or state collisions.}

…), complements code-yeongyu#2293 lifecycle fix

cubic-dev-ai

2 issues found across 12 files

Confidence score: 2/5

High-confidence, user-impacting behavior risk remains in src/features/background-agent/spawner.ts and src/features/background-agent/manager.ts: question: true is placed before ...getAgentToolRestrictions(...), so it can be overridden.
This directly conflicts with the intended Opencode compatibility behavior in startTask/resumeTask, making a regression likely for custom agent tool restriction handling.
Given the severity (8/10) and confidence (10/10), this is not a minor uncertainty and should be corrected before merge.
Pay close attention to src/features/background-agent/spawner.ts and src/features/background-agent/manager.ts - ordering of question: true vs spread is likely causing the override bug.

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/features/background-agent/spawner.ts">

<violation number="1" location="src/features/background-agent/spawner.ts:144">
P1: Custom agent: **Opencode Compatibility**

The `question: true` property is still placed before the `...getAgentToolRestrictions(...)` spread in both `startTask` and `resumeTask`. This contradicts the PR description's stated fix. To prevent agent restrictions from incorrectly overriding the Opencode SDK tools configuration, you must move `question: true` after the spread in both locations.</violation>
</file>

<file name="src/features/background-agent/manager.ts">

<violation number="1" location="src/features/background-agent/manager.ts:363">
P1: `question: true` must be placed AFTER the spread operator `...getAgentToolRestrictions(...)` to prevent agent-specific tool restrictions from overriding it. The current code has `question: true` before the spread, which means agents like 'explore' that set `question: false` will still override it.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-03-08T02:54:11Z

        task: false,
        call_omo_agent: true,
-        question: false,
+        question: true,


P1: Custom agent: Opencode Compatibility

The question: true property is still placed before the ...getAgentToolRestrictions(...) spread in both startTask and resumeTask. This contradicts the PR description's stated fix. To prevent agent restrictions from incorrectly overriding the Opencode SDK tools configuration, you must move question: true after the spread in both locations.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/features/background-agent/spawner.ts, line 144: <comment>The `question: true` property is still placed before the `...getAgentToolRestrictions(...)` spread in both `startTask` and `resumeTask`. This contradicts the PR description's stated fix. To prevent agent restrictions from incorrectly overriding the Opencode SDK tools configuration, you must move `question: true` after the spread in both locations.</comment> <file context> @@ -141,7 +141,7 @@ export async function startTask( task: false, call_omo_agent: true, - question: false, + question: true, ...getAgentToolRestrictions(input.agent), }, </file context>

This has been reverted — question is back to false in the latest commit (aa1d650). We removed the question tool change from this PR scope entirely to keep it focused on the lifecycle bugs. Will submit the question tool change as a separate PR with proper context.

cubic-dev-ai

2 issues found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/features/background-agent/spawner.ts">

<violation number="1">
P1: Custom agent: **Opencode Compatibility**

The `question` tool configuration was mistakenly toggled to `false` instead of being moved after the spread as described in the PR. This fails to implement the intended SDK tool configuration approach.

Note: The exact same issue exists in `resumeTask` below (around line 228) and must also be fixed using `...getAgentToolRestrictions(task.agent)`.</violation>
</file>

<file name="src/features/background-agent/manager.ts">

<violation number="1">
P1: Custom agent: **Opencode Compatibility**

Move `question: false` after the `...getAgentToolRestrictions()` spread to ensure it properly overrides any agent-specific settings. Note: This needs to be fixed in both the `launch` and `resume` methods.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

acamq · 2026-03-08T22:26:04Z

@cubic-dev-ai review

cubic-dev-ai · 2026-03-08T22:26:15Z

@cubic-dev-ai review

@acamq I have started the AI code review. It will take a few minutes to complete.

cubic-dev-ai

No issues found across 11 files

Confidence score: 5/5

Automated review surfaced no issues in the provided summaries.
No files require special attention.

_{Requires human review: Complex lifecycle and cleanup logic changes across 11 files carry inherent risks of race conditions or state inconsistencies, failing the '100% sure no regressions' criterion.}

guazi04

Agreed — question tool is fully reverted in this PR (all 4 locations set to question: false, commit aa1d650). The question tool discussion will be a separate PR with proper context and rationale.

This PR is purely about the lifecycle bugs.

…), complements code-yeongyu#2293 lifecycle fix

acamq · 2026-03-09T04:07:58Z

@cubic-dev-ai review

cubic-dev-ai · 2026-03-09T04:08:04Z

@cubic-dev-ai review

@acamq I have started the AI code review. It will take a few minutes to complete.

cubic-dev-ai

No issues found across 11 files

Confidence score: 5/5

Automated review surfaced no issues in the provided summaries.
No files require special attention.

_{Requires human review: Large architectural change to task lifecycle and state management with global constant adjustments; cannot guarantee 100% no regressions without manual verification of concurrency edge cases.}

…), complements code-yeongyu#2293 lifecycle fix

code-yeongyu · 2026-05-16T16:38:42Z

[sisyphus-bot]

PR sweep first-pass triage on dev.

Author: guazi04
Title: fix: unify BackgroundManager terminal-state lifecycle to prevent task record loss
Closing issues: none detected
Review decision: none yet
Mergeable state: CONFLICTING (rebase needed)
CI status: green / no failing checks

Needs rebase + review. Please rebase onto current dev first; then a review can land cleanly.

Assigning code-yeongyu so the maintainer can prioritize a focused review of this PR. If the linked issue above has been triaged, the verdict there is the authoritative direction for this change.

code-yeongyu · 2026-05-21T07:02:12Z

[sisyphus-bot] Hi guazi04. 🙏 Thanks for the BackgroundManager terminal-state lifecycle unification.

Picking this back up from the 3/1 triage. The PR shows CONFLICTING against current dev. The background-agent layer has been touched substantially on dev (see in-flight #4223 runtime-fallback hardening and the parent-wake-notifier extraction), so the 11-file 474-LOC diff doesn't apply cleanly.

The intent (unify the terminal-state lifecycle so background-task records aren't lost when one path transitions while another path resets) is still valid; the symptom (BackgroundManager dropping task records on simultaneous state transitions) is a real race. The 474-line scope across 11 files is large enough that the rebase will need careful conflict resolution especially around the manager + parent-wake-notifier seam.

Could you rebase against current dev and force-push? A maintainer eye will be needed for the final review pass on the lifecycle invariants. If 474 LOC of rebase is too much, splitting into "state-machine consolidation" core + "test coverage" follow-ups would let the substantive change land independently. Really appreciate the careful diagnosis; concurrent state transitions are hard to test and harder to fix.

… record loss - Fix 5 bugs causing background tasks to silently disappear during runtime - Unify all terminal paths (error/prune/deleted/completed) through scheduleDelayedCleanup() - Fix timestamp sorting (lexicographic → numeric) in session/result formatters - Fix question tool ordering to correctly override agent restrictions - Add fallback scan for missing pendingByParent entries - Skip terminal-state tasks in prune loop - Increase stale timeout from 3min to 10min to match message staleness - Add connectedProvidersOverride for test isolation from environment Refs: code-yeongyu#2292

… sorting

…), complements code-yeongyu#2293 lifecycle fix

…parately with proper context

…ack)

code-yeongyu · 2026-06-12T03:48:16Z

[sisyphus-bot] Thank you @guazi04 for the careful lifecycle analysis. Closing because most of the described fix was already absorbed into dev by the time the branch was last rebased, and the surviving delta now edits the removed src/ tree (everything moved to packages/omo-opencode/src/) while its last commit introduces a type error ("FAILED WITH ERROR" is not in the BackgroundTaskNotificationStatus union, red CI). If any terminal-state gap still reproduces on current dev, a fresh minimal PR against packages/omo-opencode/src/features/background-agent/ would be appreciated.

github-actions Bot added a commit that referenced this pull request Mar 4, 2026

@guazi04 has signed the CLA in #2293

2b7524b

cubic-dev-ai Bot reviewed Mar 4, 2026

View reviewed changes

acamq self-assigned this Mar 7, 2026

acamq mentioned this pull request Mar 7, 2026

fix: isUnstableAgent checks default model instead of resolved model, crashing visual-engineering and writing categories #2297

Closed

guazi04 pushed a commit to guazi04/oh-my-opencode that referenced this pull request Mar 8, 2026

fix: isUnstableAgent checks resolved model only (from code-yeongyu#2297…

d8bce0e

…), complements code-yeongyu#2293 lifecycle fix

guazi04 added a commit to guazi04/oh-my-opencode that referenced this pull request Mar 8, 2026

fix: isUnstableAgent checks resolved model only (from code-yeongyu#2297…

91340f1

…), complements code-yeongyu#2293 lifecycle fix

guazi04 force-pushed the fix/background-task-lifecycle branch from d8bce0e to 91340f1 Compare March 8, 2026 02:47

cubic-dev-ai Bot reviewed Mar 8, 2026

View reviewed changes

Comment thread src/features/background-agent/manager.ts

cubic-dev-ai Bot reviewed Mar 8, 2026

View reviewed changes

guazi04 commented Mar 9, 2026

View reviewed changes

guazi04 force-pushed the fix/background-task-lifecycle branch from 34953b6 to 14f4463 Compare March 9, 2026 04:05

guazi04 added a commit to guazi04/oh-my-opencode that referenced this pull request Mar 9, 2026

fix: isUnstableAgent checks resolved model only (from code-yeongyu#2297…

145ca09

…), complements code-yeongyu#2293 lifecycle fix

cubic-dev-ai Bot reviewed Mar 9, 2026

View reviewed changes

acamq requested a review from code-yeongyu March 9, 2026 04:26

guazi04 added a commit to guazi04/oh-my-opencode that referenced this pull request Mar 10, 2026

fix: isUnstableAgent checks resolved model only (from code-yeongyu#2297…

b20a6eb

…), complements code-yeongyu#2293 lifecycle fix

guazi04 force-pushed the fix/background-task-lifecycle branch 2 times, most recently from e9fee74 to 0a0fab5 Compare March 10, 2026 03:03

code-yeongyu added the triage:bug Confirmed bug with repro steps label Mar 24, 2026

acamq removed their assignment Apr 17, 2026

code-yeongyu self-assigned this May 16, 2026

This was referenced May 21, 2026

feat: Task ID semantics and lifecycle integrity (#3404, #3405) #3409

Open

feat(background): optional tool-output completion notifications for delegated tasks #1916

Open

code-yeongyu force-pushed the dev branch 2 times, most recently from eb25d29 to 2bfad49 Compare May 23, 2026 16:00

guazi04 added 5 commits May 28, 2026 16:08

fix: handle object time shape in getTimeTimestamp for correct message…

8772b6f

… sorting

fix: isUnstableAgent checks resolved model only (from code-yeongyu#2297…

0c31303

…), complements code-yeongyu#2293 lifecycle fix

revert: remove question tool bonus fix from PR scope — will submit se…

8beb982

…parately with proper context

fix: cancelTask cleanup leak + error notification label (review feedb…

ff3f70c

…ack)

guazi04 force-pushed the fix/background-task-lifecycle branch from 0a0fab5 to ff3f70c Compare May 28, 2026 08:10

code-yeongyu closed this Jun 12, 2026

Conversation

guazi04 commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Bugs Fixed

Design

Changes

Testing

Uh oh!

github-actions Bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

guazi04 commented Mar 4, 2026

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cubic-dev-ai Bot Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guazi04 Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

ualtinok Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

guazi04 Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

ualtinok Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guazi04 Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

acamq commented Mar 8, 2026

Uh oh!

cubic-dev-ai Bot commented Mar 8, 2026

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

guazi04 left a comment

Choose a reason for hiding this comment

Uh oh!

acamq commented Mar 9, 2026

Uh oh!

cubic-dev-ai Bot commented Mar 9, 2026

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

code-yeongyu commented May 16, 2026

Uh oh!

code-yeongyu commented May 21, 2026

Uh oh!

code-yeongyu commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

guazi04 commented Mar 4, 2026 •

edited

Loading

github-actions Bot commented Mar 4, 2026 •

edited

Loading

cubic-dev-ai Bot Mar 4, 2026 •

edited

Loading

cubic-dev-ai Bot Mar 8, 2026 •

edited

Loading