feat(core): limit background agent concurrency#4324
Conversation
wenshao
left a comment
There was a problem hiding this comment.
Critical regression in resume path: The new cap check in register() also fires on the resume path in background-agent-resume.ts (unmodified by this PR). That file calls register() twice for the same agentId — first at line 490 (promoting paused → running, outside any try/catch) and again at line 658. The second call's cap check counts the just-promoted entry toward the cap, so the last concurrency slot can never be filled by a resume operation. The first call's throw (when the cap is already full) becomes an unhandled rejection. Fix: skip the cap check in register() when replacing an existing entry with the same agentId that is already running + isBackgrounded.
|
|
||
| register(registration: AgentTaskRegistration): AgentTask { | ||
| if (registration.isBackgrounded && registration.status === 'running') { | ||
| this.assertCanStartBackgroundAgent(); |
There was a problem hiding this comment.
[Critical] The cap check added here applies to ALL callers of register(), including the resume path in background-agent-resume.ts. That file calls register() twice for the same agentId (line 490 outside try/catch, line 658 inside try/catch). The first call promotes paused → running, incrementing the cap count. The second call then sees count >= cap and throws — even though it's replacing the same slot, not adding a new agent. Net effect: the last concurrency slot can never be filled by a resume operation, and the first call's throw becomes an unhandled rejection.
| this.assertCanStartBackgroundAgent(); | |
| register(registration: AgentTaskRegistration): AgentTask { | |
| if (registration.isBackgrounded && registration.status === 'running') { | |
| const existing = this.agents.get(registration.agentId); | |
| const isReplacingRunning = existing?.isBackgrounded && existing?.status === 'running'; | |
| if (!isReplacingRunning) { | |
| this.assertCanStartBackgroundAgent(); | |
| } | |
| } |
— qwen-latest-series-invite-beta-v34 via Qwen Code /review
| outputFile: jsonlPath, | ||
| metaPath, | ||
| }); | ||
| } catch (error) { |
There was a problem hiding this comment.
[Critical] The SubagentStart hook fires at line 1622 before this register() call. If register() throws (cap reached in the TOCTOU window), this catch block does NOT fire a matching SubagentStop hook. Hook consumers (metrics, resource allocators, billing) see orphaned "started" events with no corresponding "stopped."
Add a SubagentStop fire in this catch block:
// After bgAbortController.abort():
const hookSystem = this.config.getHookSystem();
try {
await this.runSubagentStopHookLoop(
hookOpts.agentId, hookOpts.agentType,
jsonlPath, undefined, false, resolvedMode, signal, updateOutput,
);
} catch (hookError) {
debugLogger.warn(`[Agent] SubagentStop hook after register failure: ${hookError}`);
}— qwen-latest-series-invite-beta-v34 via Qwen Code /review
| export const BACKGROUND_AGENT_CONCURRENCY_ENV = | ||
| 'QWEN_CODE_MAX_BACKGROUND_AGENTS'; | ||
|
|
||
| export function resolveMaxConcurrentBackgroundAgents( |
There was a problem hiding this comment.
[Suggestion] Invalid env var values (NaN, negative, float, non-numeric strings) silently fall back to the default (10) with no log output. An operator who sets QWEN_CODE_MAX_BACKGROUND_AGENTS=3.5 or =foo gets no indication that their value was ignored.
| export function resolveMaxConcurrentBackgroundAgents( | |
| export function resolveMaxConcurrentBackgroundAgents( | |
| env: Record<string, string | undefined> = process.env, | |
| ): number { | |
| const raw = env[BACKGROUND_AGENT_CONCURRENCY_ENV]; | |
| if (raw === undefined || raw.trim() === '') { | |
| return DEFAULT_MAX_CONCURRENT_BACKGROUND_AGENTS; | |
| } | |
| const parsed = Number(raw); | |
| if (!Number.isInteger(parsed) || parsed < 1) { | |
| debugLogger.warn( | |
| `[BackgroundTasks] Invalid ${BACKGROUND_AGENT_CONCURRENCY_ENV}=${JSON.stringify(raw)}, using default (${DEFAULT_MAX_CONCURRENT_BACKGROUND_AGENTS})`, | |
| ); | |
| return DEFAULT_MAX_CONCURRENT_BACKGROUND_AGENTS; | |
| } | |
| return parsed; | |
| } |
— qwen-latest-series-invite-beta-v34 via Qwen Code /review
| : MAX_CONCURRENT_BACKGROUND_AGENTS; | ||
| } | ||
|
|
||
| assertCanStartBackgroundAgent(): void { |
There was a problem hiding this comment.
[Suggestion] No debugLogger call when the cap is hit. Operators investigating why background agents aren't starting have no log evidence — the only signal is in the model's conversation context.
| assertCanStartBackgroundAgent(): void { | |
| assertCanStartBackgroundAgent(): void { | |
| const running = this.getRunningBackgroundCount(); | |
| if (running >= this.maxConcurrentBackgroundAgents) { | |
| debugLogger.warn( | |
| `[BackgroundTasks] Concurrency cap reached: ${running}/${this.maxConcurrentBackgroundAgents}. Refusing new background agent.`, | |
| ); | |
| throw new Error( |
— qwen-latest-series-invite-beta-v34 via Qwen Code /review
| this.params.run_in_background === true || | ||
| subagentConfig.background === true; | ||
|
|
||
| if (shouldRunInBackground) { |
There was a problem hiding this comment.
[Suggestion] The preflight check here and the authoritative check inside register() serve different purposes but neither documents the relationship. A future maintainer may remove one as "redundant." Add a comment:
| if (shouldRunInBackground) { | |
| // Preflight: fast-fail before expensive worktree/subagent setup. | |
| // NOT redundant with the check inside registry.register() — that | |
| // one is the authoritative guard, but by then we've already | |
| // provisioned a worktree and created the subagent. | |
| if (shouldRunInBackground) { |
— qwen-latest-series-invite-beta-v34 via Qwen Code /review
| }, | ||
| updateOutput, | ||
| ); | ||
| void agentConfig |
There was a problem hiding this comment.
[Suggestion] agentConfig.getToolRegistry().stop() uses fire-and-forget .catch(() => {}). If it rejects, tool processes (spawned shells, MCP connections) may outlive the failed agent with no log evidence. Replace with:
| void agentConfig | |
| void agentConfig | |
| .getToolRegistry() | |
| .stop() | |
| .catch((e) => { | |
| debugLogger.warn(`[Agent] ToolRegistry stop after register failure: ${e}`); | |
| }); |
— qwen-latest-series-invite-beta-v34 via Qwen Code /review
|
Thanks for the careful review. I pushed Changes made:
Validation rerun: cd packages/core && npx vitest run src/agents/background-tasks.test.ts src/agents/background-agent-resume.test.ts src/tools/agent/agent.test.ts
cd packages/core && npm run typecheck
cd packages/core && npm run lint
cd packages/core && npm run build
git diff --checkAll passed locally. |
wenshao
left a comment
There was a problem hiding this comment.
Round 2 review: all round 1 findings addressed. The isReplacingRunning escape hatch correctly fixes the resume-path cap self-collision, and the subagentStartHookCompleted flag properly guards hook symmetry. New tests cover the key paths (resume at cap, resume when full, register error handling, stop hook after register failure). 164 tests pass, tsc + eslint clean. LGTM! ✅ — qwen-latest-series-invite-beta-v34 via Qwen Code /review
pomelo-nwu
left a comment
There was a problem hiding this comment.
LGTM. Add configurable concurrency cap (default 10) for background agents to prevent resource exhaustion, with clear rejection message when limit is reached.
Maintainer verification report — PR 4324Local merge-prep verification by @wenshao on a worktree of this PR merged with Scope reminderTwo commits, six files, all in
Adds a configurable cap on running, backgrounded agents
Environment
Results
Compiled-output smoke testA small Node script imports the compiled registry from Real TUI test in tmuxLaunched the bundled CLI under Captured pane (excerpt): Matches the PR description verbatim:
When Probe A completed (after 2m 11s · 20k tokens), the counter dropped to Things to be aware of as the maintainer
Reproductiongit fetch origin pull/4324/head:pr-4324
git worktree add /tmp/pr-4324 pr-4324
cd /tmp/pr-4324
git merge main --no-edit # clean
npm ci
cd packages/core
npx vitest run \
src/agents/background-tasks.test.ts \
src/tools/agent/agent.test.ts \
src/agents/background-agent-resume.test.ts
npm run typecheck && npm run lint
cd /tmp/pr-4324
npm run build && npm run bundle
QWEN_CODE_MAX_BACKGROUND_AGENTS=1 \
node /tmp/pr-4324/dist/cli.js --yolo
# Then prompt the model to call the `agent` tool twice with run_in_background: true.VerdictAll behavioral claims in the PR description reproduced on a tree merged with |
…ask module Folds PR #4324's per-instance BackgroundTaskRegistry cap into the collapsed tasks/agent-task.ts module as kind-local module state. The new architecture's per-kind module is the natural home for an agent-only behavior — the generic TaskRegistry no longer needs to carry it. What moves into tasks/agent-task.ts: - MAX_CONCURRENT_BACKGROUND_AGENTS and the env-var resolver - agentAssertCanStartBackground(registry) free function - The race-guard inside agentRegister() (skips re-registers of already-running entries so resume can recover) - setAgentBackgroundCapForTest() for test override (mirrors the existing setAgentNotificationCallback pattern) What moves out of agent.ts: - The early preflight calls agentAssertCanStartBackground instead of registry.assertCanStartBackgroundAgent(). - The post-register error path is unchanged structurally — it still aborts the bgAbortController, fires SubagentStop, cleans up the worktree, and returns the cap message to the model. Resume path (background-agent-resume.ts) now wraps agentRegister in the same try/catch #4324 added, but routes to the new free function. Also addresses wenshao's review on #3982: 1. config.ts — moves the four registerTaskKind(...) calls below all imports, instead of dangling between two import groups. 2. dispatcher.ts — comment updated to describe actual init order (module-load side effect in config.ts), removes the fictional registerAllTaskKinds() / Config.initialize() references. 3. monitor.ts — fixes a stale comment that still referred to monitorRegister(registry, ) (dangling comma). 4. monitor-task.ts — monitorAbortAll now clears the module-level owner-routed callbacks so a daemon process recycling sessions doesn't leak handlers to dead owner agents. 5. useBackgroundTaskView.ts — moves the registry shape probe to getAll() BEFORE buildMerged() so activity bursts and monitor event bumps don't pay the listDreamTasks cost. Tests added: - tasks/agent-task.test.ts — covers env resolution, cap enforcement (running-only, foreground exclusion, paused exclusion, resume race exception), and module-level cap reset. Tests fixed (rebase fallout): - nonInteractiveCli.test.ts — six structured-output assertions migrated from mockBackgroundTaskRegistry.abortAll to the module-level mockAgentAbortAll mock. - shell.test.ts — 13 assertions migrated from registry.{cancel,complete,fail} to shellTaskModule.shell{Cancel,Complete,Fail} (called with (registry, ...)). - useBackgroundTaskView.test.ts — sort assertions updated to the new descending-startTime active-bucket order. - agent.test.ts — adds the getStopHookBlockingCap mock that was lost in the auto-merge with PR #4208. - clearCommand.test.ts — drops the SessionStart assertion that was lost in the auto-merge with PR #4115. - background-agent-resume.test.ts — monitorRegistry is now a set of vi.spyOn against the new module-level free functions. - InlineParallelAgentsDisplay.test.tsx — makeRegistryConfig exposes getTaskRegistry (matching the renamed Config method).
…ask module Folds PR #4324's per-instance BackgroundTaskRegistry cap into the collapsed tasks/agent-task.ts module as kind-local module state. The new architecture's per-kind module is the natural home for an agent-only behavior — the generic TaskRegistry no longer needs to carry it. What moves into tasks/agent-task.ts: - MAX_CONCURRENT_BACKGROUND_AGENTS and the env-var resolver - agentAssertCanStartBackground(registry) free function - The race-guard inside agentRegister() (skips re-registers of already-running entries so resume can recover) - setAgentBackgroundCapForTest() for test override (mirrors the existing setAgentNotificationCallback pattern) What moves out of agent.ts: - The early preflight calls agentAssertCanStartBackground instead of registry.assertCanStartBackgroundAgent(). - The post-register error path is unchanged structurally — it still aborts the bgAbortController, fires SubagentStop, cleans up the worktree, and returns the cap message to the model. Resume path (background-agent-resume.ts) now wraps agentRegister in the same try/catch #4324 added, but routes to the new free function. Also addresses wenshao's review on #3982: 1. config.ts — moves the four registerTaskKind(...) calls below all imports, instead of dangling between two import groups. 2. dispatcher.ts — comment updated to describe actual init order (module-load side effect in config.ts), removes the fictional registerAllTaskKinds() / Config.initialize() references. 3. monitor.ts — fixes a stale comment that still referred to monitorRegister(registry, ) (dangling comma). 4. monitor-task.ts — monitorAbortAll now clears the module-level owner-routed callbacks so a daemon process recycling sessions doesn't leak handlers to dead owner agents. 5. useBackgroundTaskView.ts — moves the registry shape probe to getAll() BEFORE buildMerged() so activity bursts and monitor event bumps don't pay the listDreamTasks cost. Tests added: - tasks/agent-task.test.ts — covers env resolution, cap enforcement (running-only, foreground exclusion, paused exclusion, resume race exception), and module-level cap reset. Tests fixed (rebase fallout): - nonInteractiveCli.test.ts — six structured-output assertions migrated from mockBackgroundTaskRegistry.abortAll to the module-level mockAgentAbortAll mock. - shell.test.ts — 13 assertions migrated from registry.{cancel,complete,fail} to shellTaskModule.shell{Cancel,Complete,Fail} (called with (registry, ...)). - useBackgroundTaskView.test.ts — sort assertions updated to the new descending-startTime active-bucket order. - agent.test.ts — adds the getStopHookBlockingCap mock that was lost in the auto-merge with PR #4208. - clearCommand.test.ts — drops the SessionStart assertion that was lost in the auto-merge with PR #4115. - background-agent-resume.test.ts — monitorRegistry is now a set of vi.spyOn against the new module-level free functions. - InlineParallelAgentsDisplay.test.tsx — makeRegistryConfig exposes getTaskRegistry (matching the renamed Config method).
…ask module Folds PR #4324's per-instance BackgroundTaskRegistry cap into the collapsed tasks/agent-task.ts module as kind-local module state. The new architecture's per-kind module is the natural home for an agent-only behavior — the generic TaskRegistry no longer needs to carry it. What moves into tasks/agent-task.ts: - MAX_CONCURRENT_BACKGROUND_AGENTS and the env-var resolver - agentAssertCanStartBackground(registry) free function - The race-guard inside agentRegister() (skips re-registers of already-running entries so resume can recover) - setAgentBackgroundCapForTest() for test override (mirrors the existing setAgentNotificationCallback pattern) What moves out of agent.ts: - The early preflight calls agentAssertCanStartBackground instead of registry.assertCanStartBackgroundAgent(). - The post-register error path is unchanged structurally — it still aborts the bgAbortController, fires SubagentStop, cleans up the worktree, and returns the cap message to the model. Resume path (background-agent-resume.ts) now wraps agentRegister in the same try/catch #4324 added, but routes to the new free function. Also addresses wenshao's review on #3982: 1. config.ts — moves the four registerTaskKind(...) calls below all imports, instead of dangling between two import groups. 2. dispatcher.ts — comment updated to describe actual init order (module-load side effect in config.ts), removes the fictional registerAllTaskKinds() / Config.initialize() references. 3. monitor.ts — fixes a stale comment that still referred to monitorRegister(registry, ) (dangling comma). 4. monitor-task.ts — monitorAbortAll now clears the module-level owner-routed callbacks so a daemon process recycling sessions doesn't leak handlers to dead owner agents. 5. useBackgroundTaskView.ts — moves the registry shape probe to getAll() BEFORE buildMerged() so activity bursts and monitor event bumps don't pay the listDreamTasks cost. Tests added: - tasks/agent-task.test.ts — covers env resolution, cap enforcement (running-only, foreground exclusion, paused exclusion, resume race exception), and module-level cap reset. Tests fixed (rebase fallout): - nonInteractiveCli.test.ts — six structured-output assertions migrated from mockBackgroundTaskRegistry.abortAll to the module-level mockAgentAbortAll mock. - shell.test.ts — 13 assertions migrated from registry.{cancel,complete,fail} to shellTaskModule.shell{Cancel,Complete,Fail} (called with (registry, ...)). - useBackgroundTaskView.test.ts — sort assertions updated to the new descending-startTime active-bucket order. - agent.test.ts — adds the getStopHookBlockingCap mock that was lost in the auto-merge with PR #4208. - clearCommand.test.ts — drops the SessionStart assertion that was lost in the auto-merge with PR #4115. - background-agent-resume.test.ts — monitorRegistry is now a set of vi.spyOn against the new module-level free functions. - InlineParallelAgentsDisplay.test.tsx — makeRegistryConfig exposes getTaskRegistry (matching the renamed Config method).

Summary
run_in_backgroundlaunches can consume API quota and local resources without backpressure.Validation
QWEN_CODE_MAX_BACKGROUND_AGENTS=1 node dist/cli.js.packages/core/src/tools/agent/agent.ts.packages/core/src/agents/background-tasks.tsbefore the first completed.Cannot start background agent: maximum concurrent background agents (1) reached. Stop an existing agent first.QWEN_CODE_MAX_BACKGROUND_AGENTS=1 node dist/cli.js.agenttool calls withrun_in_background: truebefore the first completes.Scope / Risk
QWEN_CODE_MAX_BACKGROUND_AGENTS.npm run buildis currently blocked in this checkout by existingpackages/cli/src/serve/*ACP bridge/status type errors unrelated to this change.QWEN_CODE_MAX_BACKGROUND_AGENTS=<n>.Testing Matrix
Testing matrix notes:
npm run build,npm run typecheck, andnpm run lintpassed inpackages/core.npx vitest run src/agents/background-tasks.test.ts src/tools/agent/agent.test.tspassed inpackages/core.Linked Issues / Bugs