Skip to content

fix: unify BackgroundManager terminal-state lifecycle to prevent task record loss#2293

Closed
guazi04 wants to merge 5 commits into
code-yeongyu:devfrom
guazi04:fix/background-task-lifecycle
Closed

fix: unify BackgroundManager terminal-state lifecycle to prevent task record loss#2293
guazi04 wants to merge 5 commits into
code-yeongyu:devfrom
guazi04:fix/background-task-lifecycle

Conversation

@guazi04

@guazi04 guazi04 commented Mar 4, 2026

Copy link
Copy Markdown
Contributor

Note: This PR now also includes the isUnstableAgent fix from #2297 (Bug 1 only).

Why Bug 1 is included here: isUnstableAgent was checking the category config default model (categoryConfigModel) instead of the resolved actual model (actualModel). When a user overrides e.g. visual-engineering to use claude-sonnet-4-6, the old code still detected gemini from the category default and incorrectly flagged it as unstable.

Why Bug 2 from #2297 is NOT needed: Bug 2 added a !currentTask null guard in the polling loop (unstable-agent-task.ts). This is unnecessary because this PR's scheduleDelayedCleanup() ensures all terminal-state tasks remain queryable for 10 minutes — so getTask() will never return undefined for a task that just completed/errored. The root cause (immediate task deletion) is fixed here.

See acamq's analysis for background.


Summary

Fixes a cluster of 5 bugs in BackgroundManager that cause background tasks to silently disappear during runtime, making results irretrievable. Also fixes isUnstableAgent model resolution (#2297 Bug 1).

Relates to #2292, #2287, #1582, #1769

Bugs Fixed

# Severity Bug Root Cause Fix
1 P1 Message sorting produces wrong order String() + localeCompare on timestamps → lexicographic ordering New getTimeTimestamp() with numeric comparison
2 P0 session.error immediately deletes task record tasks.delete() called with no notification or delay Mark status → notify parent → scheduleDelayedCleanup()
3 P0 Prune kills completed/errored tasks pruneStaleTasksAndNotifications checks startedAt without checking task state Skip terminal-state tasks in prune loop
4 P1 Single task completion triggers batch cleanup pendingByParent Map entry missing → allComplete defaults to true Fallback scan of all tasks for same parent
5 P1 session.deleted immediately deletes tasks No query window for result retrieval Delayed cleanup via scheduleDelayedCleanup()
6 P1 isUnstableAgent checks category default model instead of resolved model categoryConfigModel included in unstable check alongside actualModel Remove categoryConfigModel — only evaluate actualModel (from #2297)

Bonus fix: question: true tool ordering — was placed BEFORE ...getAgentToolRestrictions() spread, so agents like explore (which sets question: false) were overriding it. Moved to after the spread in both manager.ts and spawner.ts.

Design

All terminal paths (error, prune, deleted, completed) now follow the same pattern:

mark terminal status → notify parent session → scheduleDelayedCleanup()

scheduleDelayedCleanup() is the single unified cleanup method that:

  • Guards against duplicate scheduling via completionTimers.has(taskId)
  • Preserves a query window (TASK_CLEANUP_DELAY_MS) for users to retrieve results
  • Cleans up: notifications, toast, subagentSessions, SessionCategoryRegistry, task record

Changes

File Lines What
manager.ts +131/-98 Core lifecycle fixes, scheduleDelayedCleanup(), question tool ordering, connectedProvidersOverride
manager.test.ts +281/-20 Updated existing tests + 4 new regression tests
task-poller.ts +3 Skip terminal-state tasks in prune loop
spawner.ts +4/-4 Question tool ordering fix
constants.ts +1/-1 DEFAULT_STALE_TIMEOUT_MS 180s→600s (align with message staleness)
fallback-retry-handler.ts +8/-3 connectedProviders param for test isolation
full-session-format.ts +4/-4 Numeric timestamp sorting
task-result-format.ts +5/-8 Numeric timestamp sorting, remove unused getTimeString
time-format.ts +12 New getTimeTimestamp() helper
category-resolver.ts +1/-2 Remove categoryConfigModel from isUnstableAgent check (from #2297)
category-resolver.test.ts +155 6 new tests for isUnstableAgent override scenarios (from #2297)

Testing

  • 340 tests pass, 0 fail (332 background-agent + 8 category-resolver)
  • Typecheck clean
  • 4 new regression tests covering:
    • Error task remains queryable with scheduled cleanup timer
    • Prune skips terminal-state tasks
    • allComplete fallback scan detects running siblings when pendingByParent is missing
    • session.deleted tasks have query window via delayed cleanup
    • Delayed cleanup actually removes task + session registries after delay
  • 6 new isUnstableAgent tests covering:
    • User override to stable model → isUnstableAgent=false
    • Default gemini/kimi model → isUnstableAgent=true
    • is_unstable_agent=true config forces flag
    • sisyphusJuniorModel override propagation

@github-actions

github-actions Bot commented Mar 4, 2026

Copy link
Copy Markdown
Contributor

All contributors have signed the CLA. Thank you! ✅
Posted by the CLA Assistant Lite bot.

@guazi04

guazi04 commented Mar 4, 2026

Copy link
Copy Markdown
Contributor Author

I have read the CLA Document and I hereby sign the CLA

github-actions Bot added a commit that referenced this pull request Mar 4, 2026

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 9 files

Confidence score: 2/5

  • There is a high-confidence compatibility bug in src/tools/background-task/full-session-format.ts: getTimeTimestamp does not handle Opencode’s { created: number } time shape, so timestamps become 0 and message sorting breaks.
  • src/features/background-agent/spawner.ts still places question: true before ...getAgentToolRestrictions(), so the intended Opencode restriction fix is not actually applied and behavior may remain incorrect.
  • Given the 9/10 and 7/10 severity issues with direct runtime impact, this is higher-risk to merge until these fixes are corrected.
  • Pay close attention to src/tools/background-task/full-session-format.ts, src/features/background-agent/spawner.ts - sorting correctness and tool restriction precedence need to be fixed for Opencode compatibility.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/features/background-agent/spawner.ts">

<violation number="1" location="src/features/background-agent/spawner.ts:144">
P1: Custom agent: **Opencode Compatibility**

The `question: true` property is still placed before the `...getAgentToolRestrictions()` spread, failing to implement the intended fix. It must be moved after the spread so that agent-specific restrictions cannot override it.</violation>
</file>

<file name="src/tools/background-task/full-session-format.ts">

<violation number="1" location="src/tools/background-task/full-session-format.ts:63">
P0: Custom agent: **Opencode Compatibility**

Opencode SDK returns `time` as an object (`{ created: number }`), which `getTimeTimestamp` does not handle. It will return `0` for all messages, breaking the sort completely.</violation>
</file>

Since this is your first cubic review, here's how it works:

  • cubic automatically reviews your code and comments on bugs and improvements
  • Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
  • Add one-off context when rerunning by tagging @cubic-dev-ai with guidance or docs links (including llms.txt)
  • Ask questions if you need clarification on any suggestion

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread src/tools/background-task/full-session-format.ts
Comment on lines 144 to 145
question: true,
...getAgentToolRestrictions(input.agent),

@cubic-dev-ai cubic-dev-ai Bot Mar 4, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Custom agent: Opencode Compatibility

The question: true property is still placed before the ...getAgentToolRestrictions() spread, failing to implement the intended fix. It must be moved after the spread so that agent-specific restrictions cannot override it.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/features/background-agent/spawner.ts, line 144:

<comment>The `question: true` property is still placed before the `...getAgentToolRestrictions()` spread, failing to implement the intended fix. It must be moved after the spread so that agent-specific restrictions cannot override it.</comment>

<file context>
@@ -141,7 +141,7 @@ export async function startTask(
         task: false,
         call_omo_agent: true,
-        question: false,
+        question: true,
         ...getAgentToolRestrictions(input.agent),
       },
</file context>
Suggested change
question: true,
...getAgentToolRestrictions(input.agent),
...getAgentToolRestrictions(task.agent),
question: true,
Fix with Cubic

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a false positive. Look at the code again:

tools: {
  task: shouldAllowTaskTool(input.parentAgent, input.agent),
  call_omo_agent: true,
  ...getAgentToolRestrictions(input.agent),   // line 146 — spread first
  ...getMemoryToolRestrictions(input.agent),   // line 147
  question: true,                              // line 148 — AFTER spread ✅
},

question: true is placed after ...getAgentToolRestrictions(), which is exactly the intended fix. Later properties in an object literal override earlier ones from spread, so question: true correctly overrides any question: false from agent restrictions.

This was the bug we fixed — the original code had question: true before the spread. Our fix moved it after.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong, when you set question: true it means question tool is not restricted for background agents which would cause background agents to stall @cubic-dev-ai confirm this claim

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks—I'll verify whether the question tool should remain restricted for background agents to prevent stalls and follow up.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — let me clarify the context.

In my setup, I built a custom dispatcher agent on top of the plugin that decides which execution agent to use and spawns them as background tasks. In that scenario, when a background execution agent needed user input but had question: false, it would output questions as plain text, the continuation enforcer misread it as incomplete work, causing an infinite loop.

In the default upstream architecture, sisyphus/hephaestus/atlas are primary agents (not background), so this mostly affects sisyphus-junior when spawned via category delegation. The fix is still valid there — if a sisyphus-junior background task needs clarification, the same dead loop happens.

The safety net: read-only agents (explore, librarian, oracle, etc.) already have question: false via getAgentToolRestrictions(), so they are never affected. Only execution agents spawned as background tasks benefit from question: true, and the question tool triggers a parent session notification rather than actually blocking.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the primary agents have notion of answering subagents' questions. Also sisyphus-junior and all other agents know that they're running in non-interactive mode, so they MUSTN'T ask questions anyway. Setting question: true would cause wrong behaviour in case any lesser model don't listen to the instructions and ask the question.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 10 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.

Requires human review: The PR introduces a significant change to the task lifecycle and state management. The move to asynchronous delayed cleanup carries inherent risk of side effects or state collisions.

@acamq acamq self-assigned this Mar 7, 2026
guazi04 pushed a commit to guazi04/oh-my-opencode that referenced this pull request Mar 8, 2026
guazi04 added a commit to guazi04/oh-my-opencode that referenced this pull request Mar 8, 2026
@guazi04 guazi04 force-pushed the fix/background-task-lifecycle branch from d8bce0e to 91340f1 Compare March 8, 2026 02:47

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 12 files

Confidence score: 2/5

  • High-confidence, user-impacting behavior risk remains in src/features/background-agent/spawner.ts and src/features/background-agent/manager.ts: question: true is placed before ...getAgentToolRestrictions(...), so it can be overridden.
  • This directly conflicts with the intended Opencode compatibility behavior in startTask/resumeTask, making a regression likely for custom agent tool restriction handling.
  • Given the severity (8/10) and confidence (10/10), this is not a minor uncertainty and should be corrected before merge.
  • Pay close attention to src/features/background-agent/spawner.ts and src/features/background-agent/manager.ts - ordering of question: true vs spread is likely causing the override bug.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/features/background-agent/spawner.ts">

<violation number="1" location="src/features/background-agent/spawner.ts:144">
P1: Custom agent: **Opencode Compatibility**

The `question: true` property is still placed before the `...getAgentToolRestrictions(...)` spread in both `startTask` and `resumeTask`. This contradicts the PR description's stated fix. To prevent agent restrictions from incorrectly overriding the Opencode SDK tools configuration, you must move `question: true` after the spread in both locations.</violation>
</file>

<file name="src/features/background-agent/manager.ts">

<violation number="1" location="src/features/background-agent/manager.ts:363">
P1: `question: true` must be placed AFTER the spread operator `...getAgentToolRestrictions(...)` to prevent agent-specific tool restrictions from overriding it. The current code has `question: true` before the spread, which means agents like 'explore' that set `question: false` will still override it.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

task: false,
call_omo_agent: true,
question: false,
question: true,

@cubic-dev-ai cubic-dev-ai Bot Mar 8, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Custom agent: Opencode Compatibility

The question: true property is still placed before the ...getAgentToolRestrictions(...) spread in both startTask and resumeTask. This contradicts the PR description's stated fix. To prevent agent restrictions from incorrectly overriding the Opencode SDK tools configuration, you must move question: true after the spread in both locations.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/features/background-agent/spawner.ts, line 144:

<comment>The `question: true` property is still placed before the `...getAgentToolRestrictions(...)` spread in both `startTask` and `resumeTask`. This contradicts the PR description's stated fix. To prevent agent restrictions from incorrectly overriding the Opencode SDK tools configuration, you must move `question: true` after the spread in both locations.</comment>

<file context>
@@ -141,7 +141,7 @@ export async function startTask(
         task: false,
         call_omo_agent: true,
-        question: false,
+        question: true,
         ...getAgentToolRestrictions(input.agent),
       },
</file context>
Fix with Cubic

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been reverted — question is back to false in the latest commit (aa1d650). We removed the question tool change from this PR scope entirely to keep it focused on the lifecycle bugs. Will submit the question tool change as a separate PR with proper context.

Comment thread src/features/background-agent/manager.ts Outdated

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/features/background-agent/spawner.ts">

<violation number="1">
P1: Custom agent: **Opencode Compatibility**

The `question` tool configuration was mistakenly toggled to `false` instead of being moved after the spread as described in the PR. This fails to implement the intended SDK tool configuration approach.

Note: The exact same issue exists in `resumeTask` below (around line 228) and must also be fixed using `...getAgentToolRestrictions(task.agent)`.</violation>
</file>

<file name="src/features/background-agent/manager.ts">

<violation number="1">
P1: Custom agent: **Opencode Compatibility**

Move `question: false` after the `...getAgentToolRestrictions()` spread to ensure it properly overrides any agent-specific settings. Note: This needs to be fixed in both the `launch` and `resume` methods.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread src/features/background-agent/manager.ts
@acamq

acamq commented Mar 8, 2026

Copy link
Copy Markdown
Collaborator

@cubic-dev-ai review

@cubic-dev-ai

cubic-dev-ai Bot commented Mar 8, 2026

Copy link
Copy Markdown

@cubic-dev-ai review

@acamq I have started the AI code review. It will take a few minutes to complete.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 11 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.

Requires human review: Complex lifecycle and cleanup logic changes across 11 files carry inherent risks of race conditions or state inconsistencies, failing the '100% sure no regressions' criterion.

@guazi04 guazi04 left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — question tool is fully reverted in this PR (all 4 locations set to question: false, commit aa1d650). The question tool discussion will be a separate PR with proper context and rationale.

This PR is purely about the lifecycle bugs.

@guazi04 guazi04 force-pushed the fix/background-task-lifecycle branch from 34953b6 to 14f4463 Compare March 9, 2026 04:05
guazi04 added a commit to guazi04/oh-my-opencode that referenced this pull request Mar 9, 2026
@acamq

acamq commented Mar 9, 2026

Copy link
Copy Markdown
Collaborator

@cubic-dev-ai review

@cubic-dev-ai

cubic-dev-ai Bot commented Mar 9, 2026

Copy link
Copy Markdown

@cubic-dev-ai review

@acamq I have started the AI code review. It will take a few minutes to complete.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 11 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.

Requires human review: Large architectural change to task lifecycle and state management with global constant adjustments; cannot guarantee 100% no regressions without manual verification of concurrency edge cases.

@acamq acamq requested a review from code-yeongyu March 9, 2026 04:26
guazi04 added a commit to guazi04/oh-my-opencode that referenced this pull request Mar 10, 2026
@guazi04 guazi04 force-pushed the fix/background-task-lifecycle branch 2 times, most recently from e9fee74 to 0a0fab5 Compare March 10, 2026 03:03
@code-yeongyu code-yeongyu added the triage:bug Confirmed bug with repro steps label Mar 24, 2026
@acamq acamq removed their assignment Apr 17, 2026
@code-yeongyu

Copy link
Copy Markdown
Owner

[sisyphus-bot]

PR sweep first-pass triage on dev.

  • Author: guazi04
  • Title: fix: unify BackgroundManager terminal-state lifecycle to prevent task record loss
  • Closing issues: none detected
  • Review decision: none yet
  • Mergeable state: CONFLICTING (rebase needed)
  • CI status: green / no failing checks

Needs rebase + review. Please rebase onto current dev first; then a review can land cleanly.

Assigning code-yeongyu so the maintainer can prioritize a focused review of this PR. If the linked issue above has been triaged, the verdict there is the authoritative direction for this change.

@code-yeongyu code-yeongyu self-assigned this May 16, 2026
@code-yeongyu

Copy link
Copy Markdown
Owner

[sisyphus-bot] Hi guazi04. 🙏 Thanks for the BackgroundManager terminal-state lifecycle unification.

Picking this back up from the 3/1 triage. The PR shows CONFLICTING against current dev. The background-agent layer has been touched substantially on dev (see in-flight #4223 runtime-fallback hardening and the parent-wake-notifier extraction), so the 11-file 474-LOC diff doesn't apply cleanly.

The intent (unify the terminal-state lifecycle so background-task records aren't lost when one path transitions while another path resets) is still valid; the symptom (BackgroundManager dropping task records on simultaneous state transitions) is a real race. The 474-line scope across 11 files is large enough that the rebase will need careful conflict resolution especially around the manager + parent-wake-notifier seam.

Could you rebase against current dev and force-push? A maintainer eye will be needed for the final review pass on the lifecycle invariants. If 474 LOC of rebase is too much, splitting into "state-machine consolidation" core + "test coverage" follow-ups would let the substantive change land independently. Really appreciate the careful diagnosis; concurrent state transitions are hard to test and harder to fix.

guazi04 added 5 commits May 28, 2026 16:08
… record loss

- Fix 5 bugs causing background tasks to silently disappear during runtime
- Unify all terminal paths (error/prune/deleted/completed) through scheduleDelayedCleanup()
- Fix timestamp sorting (lexicographic → numeric) in session/result formatters
- Fix question tool ordering to correctly override agent restrictions
- Add fallback scan for missing pendingByParent entries
- Skip terminal-state tasks in prune loop
- Increase stale timeout from 3min to 10min to match message staleness
- Add connectedProvidersOverride for test isolation from environment

Refs: code-yeongyu#2292
@guazi04 guazi04 force-pushed the fix/background-task-lifecycle branch from 0a0fab5 to ff3f70c Compare May 28, 2026 08:10
@code-yeongyu

Copy link
Copy Markdown
Owner

[sisyphus-bot] Thank you @guazi04 for the careful lifecycle analysis. Closing because most of the described fix was already absorbed into dev by the time the branch was last rebased, and the surviving delta now edits the removed src/ tree (everything moved to packages/omo-opencode/src/) while its last commit introduces a type error ("FAILED WITH ERROR" is not in the BackgroundTaskNotificationStatus union, red CI). If any terminal-state gap still reproduces on current dev, a fresh minimal PR against packages/omo-opencode/src/features/background-agent/ would be appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

triage:bug Confirmed bug with repro steps

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants