fix(core): serialize team task claims per agent and add mailbox lock parity#4981
Conversation
…parity The auto-claim busy-check had a TOCTOU: claimTask read isAgentBusy before taking the per-task lock, and the racing claims (scanIdleAgentsForTasks vs a message flush, for the same idle agent) held different task locks — so moving the check inside the per-task lock couldn't close it. Both passed the stale read and the agent ended up owning two in_progress tasks, breaking the one-task-per-agent invariant the UI and auto-claim rely on. Add a per-agent claim mutex (keyed by agentId) around the busy-check + claim so the second claim observes the first's committed write and bails. Distinct agents never block each other; the loser refuses on its next iteration. Separately, bring tasks.ts to parity with mailbox.ts's locking: an in-process per-file mutex (withTaskFileLock) wraps every task-file lock site so same-process writers queue in memory instead of stampeding the OS lockfile (the Windows ELOCKED cause — most acute when up to MAX_TEAMMATES claimants race the same first-pending task), plus randomize:true jitter on the retry backoff. Acquisition order is always agent-mutex → file-mutex → OS lock; only claimTask nests the two, and dependency cycles (which the reverse order would need) are rejected, so no deadlock. Also give the reciprocal edge-mirror writes a RECIPROCAL_CALLER sentinel instead of an empty callerName, so the intentional ownership-guard bypass is greppable; it can't collide with a sanitized [a-z0-9-] agent name. Regression tests cover the per-agent serialization (mutation-verified: fails with double-ownership when reverted), cross-agent same-task contention, and the sentinel bypass.
Test ReportUnit / integration (vitest)
Mutation check on the serialization fix: reverting the per-agent lock makes the regression test fail with the exact defect — the agent ends up owning 2 in-progress tasks ( Live smoke (no-regression)Bundled build, real model (glm-5.1), team flag on, yolo mode. Leader created a team with 3 teammates and 6 unowned tasks, left them to auto-claim, and reported the result:
Even 2 / 2 / 2 distribution, every task completed by exactly one teammate, all six files written with the expected content, team deleted cleanly. The concurrent auto-claim path (multiple idle teammates drawing from one pending pool) behaves correctly under the new locking. Scope note: the race itself is timing-dependent and not deterministically reproducible end-to-end, so this smoke run is a no-regression confirmation — the fix is proven by the deterministic unit + mutation tests above. |
|
Thanks for the PR! Template looks good ✓ — all required sections present, bilingual, with a clear test plan and evidence table. On direction: this is a straightforward correctness fix for a real TOCTOU race in the experimental Agent Team feature. The busy-check → claim gap letting the same agent end up owning two in-progress tasks is a genuine bug, not a theoretical one — On approach: the scope feels right. The three pieces each have a clear job:
The regression test driving the race directly with Moving on to code review. 🔍 中文说明感谢贡献! 模板完整 ✓ —— 所有必需章节齐全,双语,测试方案和证据表清晰。 方向:这是对实验性 Agent Team 功能中一个真实 TOCTOU 竞争的正确性修复。busy-check → claim 之间的时间窗口让同一 agent 可能同时拥有两个 in-progress 任务—— 方案:范围合理。三部分各有明确职责:
用 进入代码审查 🔍 — Qwen Code · qwen3.7-max |
Background — deferred task-concurrency items from the #4844 reviewThis PR is a fast-follow on #4844 (the Agent Team feature, squash-merged as
Explicitly out of scope (still deferred): the This is one of a series of follow-up PRs cut from the #4844 review backlog (companion: #4979, which fixed the teammate-identity loss on the approval-resume path). Each is independently reviewable and behind the experimental flag. 中文说明本 PR 是 #4844(Agent Team 特性,已 squash 合并为
明确不在本 PR 范围(仍延后): 本 PR 是从 #4844 评审待办中切出的系列后续之一(同系列:#4979,修复审批恢复路径上的队友身份丢失)。每个 PR 均可独立评审,且处于实验性 flag 之后。 |
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
Code ReviewCode review is clean — no critical blockers or AGENTS.md violations. The PR does exactly what it says in three well-separated pieces:
The Test ResultsUnit tests (worktree, PR branch)Build + typecheck clean. Live smoke test (tmux, PR code via
|
ReflectionThis is a clean, well-motivated concurrency fix. The TOCTOU between The fix matches my independent proposal exactly: serialize the full busy-check → claim sequence per agent with an in-process mutex, and bring the task store's file locking to parity with the already-proven What I appreciate about this PR:
The live smoke test confirms the concurrent auto-claim path works correctly with the new locking — 6 tasks, 3 teammates, all files written, team cleaned up, no orphans. No concerns. Approving. 中文说明总结这是一个干净、动机充分的并发修复。 修复与我的独立方案完全一致:用进程内 mutex 为每个 agent 串行化完整的 busy-check → claim 序列,并将任务存储的文件锁对齐已验证的 回归测试直接用 实时冒烟测试确认并发自动认领路径在新锁下工作正常。无顾虑,批准。 — Qwen Code · qwen3.7-max |
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
LGTM, looks ready to ship. ✅
| await unblockDependents(teamName, taskId, task.blocks); | ||
| // When a task completes, unblock any tasks that depend on it. | ||
| if (updates.status === 'completed' && task.blocks.length > 0) { | ||
| await unblockDependents(teamName, taskId, task.blocks); |
There was a problem hiding this comment.
[Suggestion] Defense-in-depth against self-edge deadlock: unblockDependents is called here while holding this task's in-process mutex. If task.blocks ever contained taskId itself, async-mutex's non-reentrant runExclusive would hang permanently. The tool layer (task-update.ts findDependencyCycle) rejects self-edges today, but updateTask is a public function with no self-edge guard — future callers could bypass the tool layer.
A one-line filter provides cheap insurance:
| await unblockDependents(teamName, taskId, task.blocks); | |
| await unblockDependents(teamName, taskId, task.blocks.filter((id) => id !== taskId)); |
— qwen3.7-max via Qwen Code /review
DragonnZhang
left a comment
There was a problem hiding this comment.
No review findings. Downgraded from Approve to Comment: CI still running. — claude-sonnet-4-20250514 via Qwen Code /review
| // rather than an empty callerName so the intentional bypass is | ||
| // greppable in logs; it can never collide with a real teammate | ||
| // identity (agent names are sanitized to [a-z0-9-]). | ||
| const reciprocalUpdates: Array<Promise<unknown>> = []; |
There was a problem hiding this comment.
[Suggestion] The RECIPROCAL_CALLER sentinel was explicitly designed for the cross-owner edge-mirror case (a teammate editing its own task's edges touches a neighbor it may not own), but no test verifies this end-to-end through the tool layer.
The existing tasks.test.ts sentinel test exercises updateTask directly with addBlockedBy, and the task-update.test.ts mirror test uses unowned tasks — neither path triggers the ownership guard. The critical scenario (alice adds addBlocks:[bob's task], the reciprocal must write addBlockedBy into bob's task via the sentinel) is untested.
Consider adding a test in task-update.test.ts that creates two owned tasks and verifies the reciprocal mirror succeeds across ownership boundaries — this would catch any future refactor that breaks the sentinel passthrough.
— qwen3.7-max via Qwen Code /review
Local real-run verification report (maintainer, macOS)This is a concurrency-hardening PR with no deterministic UI symptom, so I verified it the way a race fix should be verified: prove the bug is real, prove the test catches it, then prove the invariant holds under heavy load — in-process, cross-process, and end-to-end with a real model. Summary: the per-agent claim fix is solid and well-proven — the bug reproduces on revert (mutation), the shipped test catches it, and the one-task-per-agent invariant held across 225+ stress scenarios in-process, a 6-process × 8-round cross-process race, and a real-model 3-teammate / 6-task team run that distributed cleanly 2/2/2 with zero double-ownership. No issues found. Default behavior is untouched (flag-gated). Environment
Static & unit checks
1. Mutation test — the fix is load-bearing and the test catches the bugReverted just the per-agent serialization in Restoring the fix → green again. So the bug is real (the busy-check TOCTOU genuinely produces double-ownership), and the shipped test is a true regression guard, not a tautology. 2. Concurrency stress (in-process) — invariant holds far beyond the 2-claim unit testWrote a throwaway stress test driving the real
I also re-applied the mutation under the stress test: the per-agent scenario failed immediately ( 3. Cross-process race — the
|
| Live smoke (3 teammates, 6 unowned tasks, auto-claim) | Result |
|---|---|
| Distribution across teammates | 2 / 2 / 2 (worker-1: 1,4 · worker-2: 3,6 · worker-3: 2,5) |
| Tasks owned by more than one teammate | none (every task file has exactly one owner) |
| Tasks left unclaimed / orphaned | none (all 6 completed) |
| Output files written & contents match owners | yes (out_1=worker-1, … all 6 consistent) |
Leftover .lock directories after run |
0 (clean lock release — important; a stale lock would wedge future ops) |
This reproduces the PR's reported smoke result exactly, and confirms the new in-process-mutex + jitter layer doesn't deadlock or break the normal team flow.
On the lock-ordering / deadlock claim
The PR states the fixed acquisition order (per-agent → per-file → OS lock) cannot deadlock because the reverse would need a dependency cycle the task system rejects. I didn't find a counter-example: in all stress/cross-process runs nothing hung or timed out, and lock files were always released (0 leftover). The nesting I can see in the code is consistent with the stated order (getAgentClaimLock(...).runExclusive(claimTaskLocked) → withTaskFileLock → lockfile.lock), and withTaskFileLock is never entered while already holding the same file's mutex.
Conclusion
For a race fix, this is about as well-evidenced as it gets: the defect reproduces on revert, the shipped test catches it, and the one-task-per-agent and one-owner-per-task invariants held under in-process stress (225+ scenarios), a real multi-process race, and a real-model team run — with clean lock release throughout. Types are clean, the change is flag-gated, and I found no functional issue. LGTM. (The CI yaml-parser failures are, as the PR notes, pre-existing on main and unrelated to this task-layer change.)
中文版(Chinese version)
本地真实运行验证报告(维护者,macOS)
这是一个并发加固 PR、没有确定性的 UI 表现,所以我按"竞争修复该有的方式"来验证:先证明 bug 真实、再证明测试能抓到、最后在重压下证明不变量成立——in-process、跨进程、以及真实模型端到端。结论:per-agent 认领修复扎实且证据充分——回退即复现(变异),随 PR 的测试能抓到,且"一个 agent 一个任务"不变量在 in-process 225+ 个压测场景、6 进程 × 8 轮跨进程竞争、以及真实模型 3 队友 / 6 任务的团队运行(干净 2/2/2、零双重拥有)中均成立。 未发现问题。默认行为不变(flag 门控)。
环境
- macOS 26.5 (arm64), Node v22.22.2, tmux 3.6a
- PR head
1f31666f(已 merge main),npm run bundle→node dist/cli.js - 真实模型
qwen3.7-max;agent team 经QWEN_CODE_ENABLE_AGENT_TEAM=1;隔离HOME
静态与单元检查
| 检查项 | 结果 |
|---|---|
tasks.test.ts(含两个新竞争测试) |
✅ 53/53 |
core 的 tsc --noEmit |
✅ 干净 |
1. 变异测试 —— 修复是承重的,且测试能抓到 bug
只回退 claimTask 里的 per-agent 序列化(把 getAgentClaimLock(agentId).runExclusive(...) 换成直接调用),重跑随 PR 的竞争测试:
✗ serializes concurrent busy-checked claims for the same agent (no double-ownership)
expected 2 to be 1 ← 两个认领都成功;agent 拥有了两个 in_progress 任务
恢复修复后即恢复绿。所以 bug 真实(busy-check 的 TOCTOU 确实造成双重拥有),且随 PR 的测试是真正的回归护栏,不是空过。
2. 并发压测(in-process)—— 不变量远超 2-认领单测仍成立
写了一个一次性压测,直接驱动真实 tasks.ts 打真实文件系统,验证后已删除。所有场景通过:
| 压测场景 | 规模 | 结果 |
|---|---|---|
| 同一 agent、N 个并发 busy-check 认领、N 个任务 | 宽度 2/3/5/8/12 × 25 轮(125 场景,最多 12 并发) | ✅ 该 agent 始终恰好 1 个 in_progress |
| N 个 agent 抢同一任务 | 宽度 2/4/8/16 × 25 轮(100 场景,最多 16 并发) | ✅ 始终恰好 1 个 owner,且磁盘 owner == 唯一返回的赢家 |
| 5 agent × 4 任务,100 个认领一齐发出 | 20 轮 | ✅ 每个 agent ≤ 1 任务;无孤儿 in_progress |
我还在压测下重新注入变异:per-agent 场景立即失败(第一个宽度就 expected 2 to be 1),而 per-task 场景仍通过——证明变异只破坏 per-agent 不变量,正是修复的靶点(特异性好)。
3. 跨进程竞争 —— proper-lockfile 层在真实 OS 进程间成立
in-process mutex 无法覆盖独立进程,所以我直接驱动文件锁层:6 个独立 node 进程,各为不同 agent,争抢同一任务文件(经 QWEN_HOME 共享目录),带小幅随机错峰,重复 8 轮:
round 1: winners=1 | ondisk=[in_progress agent5]
round 2: winners=1 | ondisk=[in_progress agent2]
…(赢家在 agent5/2/3/1 间轮换——真随机竞争)…
RESULT: PASS (每轮:恰好 1 赢家、1 磁盘 owner)
4. 真实冒烟(真实模型,tmux)—— 无回归,干净 2/2/2
跑了 PR 自己的冒烟:QWEN_CODE_ENABLE_AGENT_TEAM=1 … --approval-mode yolo,提示 = 建 3 队友团队、6 个无主任务(各写 out_N.txt)、自动认领、汇报分工。Ground-truth 取自磁盘任务存储与输出文件:
| 真实冒烟(3 队友、6 无主任务、自动认领) | 结果 |
|---|---|
| 队友间分配 | 2 / 2 / 2(worker-1: 1,4 · worker-2: 3,6 · worker-3: 2,5) |
| 被多名队友拥有的任务 | 无(每个任务文件恰好一个 owner) |
| 未认领 / 孤儿任务 | 无(全部 6 个 completed) |
| 输出文件写入且内容与 owner 一致 | 是(out_1=worker-1 … 全部 6 个一致) |
运行后残留 .lock 目录 |
0(锁干净释放——很重要;残留锁会卡住后续操作) |
完全复现了 PR 报告的冒烟结果,并确认新的 in-process-mutex + 抖动层不会死锁或破坏正常团队流程。
关于锁顺序 / 死锁的说法
PR 称固定获取顺序(按 agent → 按文件 → OS 锁)不会死锁,因为反向需要任务系统会拒绝的依赖环。我没找到反例:所有压测/跨进程运行中都没有挂起或超时,锁文件始终被释放(0 残留)。代码里可见的嵌套与所述顺序一致(getAgentClaimLock(...).runExclusive(claimTaskLocked) → withTaskFileLock → lockfile.lock),且 withTaskFileLock 不会在已持有同一文件 mutex 时再次进入。
结论
对一个竞争修复而言,这已是相当充分的证据:回退即复现、随 PR 的测试能抓到,且"一个 agent 一个任务""一个任务一个 owner"两个不变量在 in-process 压测(225+ 场景)、真实多进程竞争、真实模型团队运行中都成立——全程锁干净释放。类型干净、改动 flag 门控、未发现功能问题。LGTM。(CI 的 yaml-parser 失败正如 PR 所述是 main 上既有的、与本任务层改动无关。)
What this PR does
Hardens how an experimental team hands pending tasks to teammates so that, under concurrent auto-claim, the same teammate can never be assigned two tasks at the same moment. Also brings the task store's file locking in line with the team mailbox's — in-process serialization plus jittered retries — so many teammates racing for the same task don't stampede the file lock.
Why it's needed
When several tasks are waiting and a teammate goes idle, two internal triggers can try to give that teammate work at the same instant. A timing gap between the "is this agent already busy?" check and the actual claim let both succeed, leaving one teammate owning two in-progress tasks at once. That breaks the one-task-per-agent model the task view and auto-claim assume, and transiently misreports the agent's load. Separately, when many teammates race to grab the same first pending task, they could all hammer the task file lock in lockstep — the same contention pattern that caused Windows lock flakiness in the mailbox before it got an in-process queue and backoff jitter.
This is behind the
experimental.agentTeamflag; default behavior is unchanged.Reviewer Test Plan
How to verify
The defect is an internal, timing-dependent race with no deterministic UI symptom, so the primary proof is a unit test that drives the race directly, backed by a live smoke run for no-regression.
QWEN_CODE_ENABLE_AGENT_TEAM=1 node dist/cli.js --approval-mode yoloEvidence (Before & After)
This is internal concurrency hardening — no visual change. Smoke run below confirms normal distribution is intact; the race itself is covered by the unit + mutation tests.
Tested on
Environment (optional)
Local bundled build (
node dist/cli.js), real model (glm-5.1 via OpenAI-compatible auth). Unit + integration suites run under vitest.Risk & Scope
Linked Issues
Fast-follow on #4844 (Agent Team). Lands deferred task-concurrency items from that review: the
claimTaskbusy-check TOCTOU (r3375723230 → Critical r3385688175), mailbox lock parity (round-5 review), and the reciprocal-edge sentinel (r3385145436). Companion follow-up: #4979. Full background in the comment below.中文说明
这个 PR 做了什么
加固实验性团队把待办任务分配给队友的方式:在并发自动认领下,同一名队友绝不会在同一时刻被分到两个任务。同时让任务存储的文件锁与团队收件箱保持一致——进程内串行 + 带抖动的重试——这样多名队友争抢同一个任务时不会一起冲击文件锁。
为什么需要
当有多个任务待办且某队友空闲时,两个内部触发点可能在同一瞬间尝试给该队友派活。"该 agent 是否已忙"的检查与真正认领之间存在时间窗口,使两者都成功,导致一名队友同时拥有两个 in-progress 任务。这破坏了任务视图与自动认领所假定的"一个 agent 一个任务"模型,并会暂时误报该 agent 的负载。另外,当多名队友争抢同一个首个待办任务时,可能齐步冲击文件锁——正是收件箱在引入进程内队列与退避抖动之前导致 Windows 锁抖动的那种竞争模式。
本改动位于
experimental.agentTeamflag 之后;默认行为不变。复核测试方案
该缺陷是内部的、依赖时序的竞争,没有确定性的 UI 表现,因此主要证据是直接触发竞争的单元测试,辅以真实冒烟运行做无回归确认。
QWEN_CODE_ENABLE_AGENT_TEAM=1 node dist/cli.js --approval-mode yolo证据(冒烟结果)
属内部并发加固,无视觉变化。下方冒烟运行确认正常分配未受影响;竞争本身由单元 + 变异测试覆盖。
风险与范围