fix: unify BackgroundManager terminal-state lifecycle to prevent task record loss#2293
fix: unify BackgroundManager terminal-state lifecycle to prevent task record loss#2293guazi04 wants to merge 5 commits into
Conversation
|
All contributors have signed the CLA. Thank you! ✅ |
|
I have read the CLA Document and I hereby sign the CLA |
There was a problem hiding this comment.
2 issues found across 9 files
Confidence score: 2/5
- There is a high-confidence compatibility bug in
src/tools/background-task/full-session-format.ts:getTimeTimestampdoes not handle Opencode’s{ created: number }time shape, so timestamps become0and message sorting breaks. src/features/background-agent/spawner.tsstill placesquestion: truebefore...getAgentToolRestrictions(), so the intended Opencode restriction fix is not actually applied and behavior may remain incorrect.- Given the 9/10 and 7/10 severity issues with direct runtime impact, this is higher-risk to merge until these fixes are corrected.
- Pay close attention to
src/tools/background-task/full-session-format.ts,src/features/background-agent/spawner.ts- sorting correctness and tool restriction precedence need to be fixed for Opencode compatibility.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/features/background-agent/spawner.ts">
<violation number="1" location="src/features/background-agent/spawner.ts:144">
P1: Custom agent: **Opencode Compatibility**
The `question: true` property is still placed before the `...getAgentToolRestrictions()` spread, failing to implement the intended fix. It must be moved after the spread so that agent-specific restrictions cannot override it.</violation>
</file>
<file name="src/tools/background-task/full-session-format.ts">
<violation number="1" location="src/tools/background-task/full-session-format.ts:63">
P0: Custom agent: **Opencode Compatibility**
Opencode SDK returns `time` as an object (`{ created: number }`), which `getTimeTimestamp` does not handle. It will return `0` for all messages, breaking the sort completely.</violation>
</file>
Since this is your first cubic review, here's how it works:
- cubic automatically reviews your code and comments on bugs and improvements
- Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
- Add one-off context when rerunning by tagging
@cubic-dev-aiwith guidance or docs links (includingllms.txt) - Ask questions if you need clarification on any suggestion
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| question: true, | ||
| ...getAgentToolRestrictions(input.agent), |
There was a problem hiding this comment.
P1: Custom agent: Opencode Compatibility
The question: true property is still placed before the ...getAgentToolRestrictions() spread, failing to implement the intended fix. It must be moved after the spread so that agent-specific restrictions cannot override it.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/features/background-agent/spawner.ts, line 144:
<comment>The `question: true` property is still placed before the `...getAgentToolRestrictions()` spread, failing to implement the intended fix. It must be moved after the spread so that agent-specific restrictions cannot override it.</comment>
<file context>
@@ -141,7 +141,7 @@ export async function startTask(
task: false,
call_omo_agent: true,
- question: false,
+ question: true,
...getAgentToolRestrictions(input.agent),
},
</file context>
| question: true, | |
| ...getAgentToolRestrictions(input.agent), | |
| ...getAgentToolRestrictions(task.agent), | |
| question: true, |
There was a problem hiding this comment.
This is a false positive. Look at the code again:
tools: {
task: shouldAllowTaskTool(input.parentAgent, input.agent),
call_omo_agent: true,
...getAgentToolRestrictions(input.agent), // line 146 — spread first
...getMemoryToolRestrictions(input.agent), // line 147
question: true, // line 148 — AFTER spread ✅
},question: true is placed after ...getAgentToolRestrictions(), which is exactly the intended fix. Later properties in an object literal override earlier ones from spread, so question: true correctly overrides any question: false from agent restrictions.
This was the bug we fixed — the original code had question: true before the spread. Our fix moved it after.
There was a problem hiding this comment.
This is wrong, when you set question: true it means question tool is not restricted for background agents which would cause background agents to stall @cubic-dev-ai confirm this claim
There was a problem hiding this comment.
Thanks—I'll verify whether the question tool should remain restricted for background agents to prevent stalls and follow up.
There was a problem hiding this comment.
Good catch — let me clarify the context.
In my setup, I built a custom dispatcher agent on top of the plugin that decides which execution agent to use and spawns them as background tasks. In that scenario, when a background execution agent needed user input but had question: false, it would output questions as plain text, the continuation enforcer misread it as incomplete work, causing an infinite loop.
In the default upstream architecture, sisyphus/hephaestus/atlas are primary agents (not background), so this mostly affects sisyphus-junior when spawned via category delegation. The fix is still valid there — if a sisyphus-junior background task needs clarification, the same dead loop happens.
The safety net: read-only agents (explore, librarian, oracle, etc.) already have question: false via getAgentToolRestrictions(), so they are never affected. Only execution agents spawned as background tasks benefit from question: true, and the question tool triggers a parent session notification rather than actually blocking.
There was a problem hiding this comment.
None of the primary agents have notion of answering subagents' questions. Also sisyphus-junior and all other agents know that they're running in non-interactive mode, so they MUSTN'T ask questions anyway. Setting question: true would cause wrong behaviour in case any lesser model don't listen to the instructions and ask the question.
There was a problem hiding this comment.
No issues found across 10 files
Confidence score: 5/5
- Automated review surfaced no issues in the provided summaries.
- No files require special attention.
Requires human review: The PR introduces a significant change to the task lifecycle and state management. The move to asynchronous delayed cleanup carries inherent risk of side effects or state collisions.
…), complements code-yeongyu#2293 lifecycle fix
…), complements code-yeongyu#2293 lifecycle fix
d8bce0e to
91340f1
Compare
There was a problem hiding this comment.
2 issues found across 12 files
Confidence score: 2/5
- High-confidence, user-impacting behavior risk remains in
src/features/background-agent/spawner.tsandsrc/features/background-agent/manager.ts:question: trueis placed before...getAgentToolRestrictions(...), so it can be overridden. - This directly conflicts with the intended Opencode compatibility behavior in
startTask/resumeTask, making a regression likely for custom agent tool restriction handling. - Given the severity (8/10) and confidence (10/10), this is not a minor uncertainty and should be corrected before merge.
- Pay close attention to
src/features/background-agent/spawner.tsandsrc/features/background-agent/manager.ts- ordering ofquestion: truevs spread is likely causing the override bug.
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/features/background-agent/spawner.ts">
<violation number="1" location="src/features/background-agent/spawner.ts:144">
P1: Custom agent: **Opencode Compatibility**
The `question: true` property is still placed before the `...getAgentToolRestrictions(...)` spread in both `startTask` and `resumeTask`. This contradicts the PR description's stated fix. To prevent agent restrictions from incorrectly overriding the Opencode SDK tools configuration, you must move `question: true` after the spread in both locations.</violation>
</file>
<file name="src/features/background-agent/manager.ts">
<violation number="1" location="src/features/background-agent/manager.ts:363">
P1: `question: true` must be placed AFTER the spread operator `...getAgentToolRestrictions(...)` to prevent agent-specific tool restrictions from overriding it. The current code has `question: true` before the spread, which means agents like 'explore' that set `question: false` will still override it.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| task: false, | ||
| call_omo_agent: true, | ||
| question: false, | ||
| question: true, |
There was a problem hiding this comment.
P1: Custom agent: Opencode Compatibility
The question: true property is still placed before the ...getAgentToolRestrictions(...) spread in both startTask and resumeTask. This contradicts the PR description's stated fix. To prevent agent restrictions from incorrectly overriding the Opencode SDK tools configuration, you must move question: true after the spread in both locations.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/features/background-agent/spawner.ts, line 144:
<comment>The `question: true` property is still placed before the `...getAgentToolRestrictions(...)` spread in both `startTask` and `resumeTask`. This contradicts the PR description's stated fix. To prevent agent restrictions from incorrectly overriding the Opencode SDK tools configuration, you must move `question: true` after the spread in both locations.</comment>
<file context>
@@ -141,7 +141,7 @@ export async function startTask(
task: false,
call_omo_agent: true,
- question: false,
+ question: true,
...getAgentToolRestrictions(input.agent),
},
</file context>
There was a problem hiding this comment.
This has been reverted — question is back to false in the latest commit (aa1d650). We removed the question tool change from this PR scope entirely to keep it focused on the lifecycle bugs. Will submit the question tool change as a separate PR with proper context.
There was a problem hiding this comment.
2 issues found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/features/background-agent/spawner.ts">
<violation number="1">
P1: Custom agent: **Opencode Compatibility**
The `question` tool configuration was mistakenly toggled to `false` instead of being moved after the spread as described in the PR. This fails to implement the intended SDK tool configuration approach.
Note: The exact same issue exists in `resumeTask` below (around line 228) and must also be fixed using `...getAgentToolRestrictions(task.agent)`.</violation>
</file>
<file name="src/features/background-agent/manager.ts">
<violation number="1">
P1: Custom agent: **Opencode Compatibility**
Move `question: false` after the `...getAgentToolRestrictions()` spread to ensure it properly overrides any agent-specific settings. Note: This needs to be fixed in both the `launch` and `resume` methods.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
|
@cubic-dev-ai review |
@acamq I have started the AI code review. It will take a few minutes to complete. |
There was a problem hiding this comment.
No issues found across 11 files
Confidence score: 5/5
- Automated review surfaced no issues in the provided summaries.
- No files require special attention.
Requires human review: Complex lifecycle and cleanup logic changes across 11 files carry inherent risks of race conditions or state inconsistencies, failing the '100% sure no regressions' criterion.
guazi04
left a comment
There was a problem hiding this comment.
Agreed — question tool is fully reverted in this PR (all 4 locations set to question: false, commit aa1d650). The question tool discussion will be a separate PR with proper context and rationale.
This PR is purely about the lifecycle bugs.
34953b6 to
14f4463
Compare
…), complements code-yeongyu#2293 lifecycle fix
|
@cubic-dev-ai review |
@acamq I have started the AI code review. It will take a few minutes to complete. |
There was a problem hiding this comment.
No issues found across 11 files
Confidence score: 5/5
- Automated review surfaced no issues in the provided summaries.
- No files require special attention.
Requires human review: Large architectural change to task lifecycle and state management with global constant adjustments; cannot guarantee 100% no regressions without manual verification of concurrency edge cases.
…), complements code-yeongyu#2293 lifecycle fix
e9fee74 to
0a0fab5
Compare
|
[sisyphus-bot] PR sweep first-pass triage on
Needs rebase + review. Please rebase onto current Assigning |
|
[sisyphus-bot] Hi guazi04. 🙏 Thanks for the BackgroundManager terminal-state lifecycle unification. Picking this back up from the 3/1 triage. The PR shows CONFLICTING against current The intent (unify the terminal-state lifecycle so background-task records aren't lost when one path transitions while another path resets) is still valid; the symptom (BackgroundManager dropping task records on simultaneous state transitions) is a real race. The 474-line scope across 11 files is large enough that the rebase will need careful conflict resolution especially around the manager + parent-wake-notifier seam. Could you rebase against current |
eb25d29 to
2bfad49
Compare
… record loss - Fix 5 bugs causing background tasks to silently disappear during runtime - Unify all terminal paths (error/prune/deleted/completed) through scheduleDelayedCleanup() - Fix timestamp sorting (lexicographic → numeric) in session/result formatters - Fix question tool ordering to correctly override agent restrictions - Add fallback scan for missing pendingByParent entries - Skip terminal-state tasks in prune loop - Increase stale timeout from 3min to 10min to match message staleness - Add connectedProvidersOverride for test isolation from environment Refs: code-yeongyu#2292
…), complements code-yeongyu#2293 lifecycle fix
…parately with proper context
0a0fab5 to
ff3f70c
Compare
|
[sisyphus-bot] Thank you @guazi04 for the careful lifecycle analysis. Closing because most of the described fix was already absorbed into dev by the time the branch was last rebased, and the surviving delta now edits the removed |
Summary
Fixes a cluster of 5 bugs in BackgroundManager that cause background tasks to silently disappear during runtime, making results irretrievable. Also fixes
isUnstableAgentmodel resolution (#2297 Bug 1).Relates to #2292, #2287, #1582, #1769
Bugs Fixed
String()+localeCompareon timestamps → lexicographic orderinggetTimeTimestamp()with numeric comparisonsession.errorimmediately deletes task recordtasks.delete()called with no notification or delayscheduleDelayedCleanup()pruneStaleTasksAndNotificationschecksstartedAtwithout checking task statependingByParentMap entry missing →allCompletedefaults totruesession.deletedimmediately deletes tasksscheduleDelayedCleanup()isUnstableAgentchecks category default model instead of resolved modelcategoryConfigModelincluded in unstable check alongsideactualModelcategoryConfigModel— only evaluateactualModel(from #2297)Bonus fix:
question: truetool ordering — was placed BEFORE...getAgentToolRestrictions()spread, so agents likeexplore(which setsquestion: false) were overriding it. Moved to after the spread in bothmanager.tsandspawner.ts.Design
All terminal paths (error, prune, deleted, completed) now follow the same pattern:
scheduleDelayedCleanup()is the single unified cleanup method that:completionTimers.has(taskId)TASK_CLEANUP_DELAY_MS) for users to retrieve resultsChanges
manager.tsscheduleDelayedCleanup(), question tool ordering,connectedProvidersOverridemanager.test.tstask-poller.tsspawner.tsconstants.tsDEFAULT_STALE_TIMEOUT_MS180s→600s (align with message staleness)fallback-retry-handler.tsconnectedProvidersparam for test isolationfull-session-format.tstask-result-format.tsgetTimeStringtime-format.tsgetTimeTimestamp()helpercategory-resolver.tscategoryConfigModelfrom isUnstableAgent check (from #2297)category-resolver.test.tsTesting
allCompletefallback scan detects running siblings whenpendingByParentis missingsession.deletedtasks have query window via delayed cleanupisUnstableAgenttests covering:isUnstableAgent=falseisUnstableAgent=trueis_unstable_agent=trueconfig forces flagsisyphusJuniorModeloverride propagation