fix: preserve subagent delivery after lock stalls#86540
Conversation
|
Codex review: needs real behavior proof before merge. Reviewed May 29, 2026, 1:20 AM ET / 05:20 UTC. Summary PR surface: Source +46, Tests +15. Total +61 across 5 files. Reproducibility: yes. from source and supplied before-proof: current main still clears completion fallback on suspended delivery discard and reports only pid-level lock timeout ownership. I did not run the repro locally in this read-only review. Review metrics: 1 noteworthy metric.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Proof guidance:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land a rebased branch after maintainers accept or improve the live proof, coordinate the overlapping session-lock work, and get required CI green or explicitly waived. Do we have a high-confidence way to reproduce the issue? Yes, from source and supplied before-proof: current main still clears completion fallback on suspended delivery discard and reports only pid-level lock timeout ownership. I did not run the repro locally in this read-only review. Is this the best way to solve the issue? Mostly yes: the code path is narrow and reuses existing completion fallback and lock-inspection state rather than adding a new scheduler or config surface. The unresolved part is merge readiness, especially live proof, CI, and lock-path coordination. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 13cb9f82777e. Label changesLabel justifications:
Evidence reviewedPR surface: Source +46, Tests +15. Total +61 across 5 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
ClawSweeper PR egg 🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat. Where did the egg go?
|
|
Thanks for the PR. Before this can move forward, please add live proof from the affected surface, not just unit tests, mocked tests, or source inspection. A useful proof update should include:
Please redact secrets, tokens, phone numbers, and private message content from logs or screenshots. |
martingarramon
left a comment
There was a problem hiding this comment.
Verified at f9c6fc7d.
Two distinct changes in this PR:
Lock timeout diagnostics (session-write-lock.ts): formatSessionLockTimeoutOwner assembles pid + liveness + starttime + age into the timeout error. The pinCurrentProcessStartTimeForTest addition in the test is needed to make starttime assertions deterministic. ✓
Durable fallback preservation (subagent-registry.ts): before expiring a suspended delivery, checks frozenResultText ?? fallbackFrozenResultText and persists it into completion.fallbackResultText if present. discardReason distinguishes "expired-after-durable-fallback" from "expired". The type in subagent-registry.types.ts is updated to match. ✓
Three CI failures need resolution before merge:
check-prod-typesandcheck-additional-extension-package-boundary— likely the newfallbackResultText/fallbackCapturedAtfields on thecompletiontype are not exported through the package boundary; verifysrc/agents/subagent-registry.types.tsre-exports reach the extension bundle entry.check-test-types— if a new field is referenced in a test that the type declaration doesn't yet include.
LGTM pending CI.
|
Maintainer prep update for Evidence collected
Review result Current blocker I would not call this fully land-ready while required CI is red. The PR-specific code/test evidence is clean; remaining decision is whether to fix/override the unrelated qa-lab CI blocker and whether the targeted Azure before/after proof is enough versus requiring a live gateway/session replay. |
Summary
Linked context
Which issue does this close?
Closes #86537
Closes #86538
Which issues, PRs, or discussions are related?
Related #86537
Related #86538
Was this requested by a maintainer or owner?
Local bug evidence under
/mnt/c/OpenClaw/bugs/BUG-035-subagent-terminal-reconciliation-times-out-then-discards-deliveryand/mnt/c/OpenClaw/bugs/BUG-036-session-write-lock-timeouts-block-main-and-subagent-lanes.Real behavior proof (required for external PRs)
silver-prawn/cbx_e06ef86ae274; after proof used Azure Crabboxviolet-lobster/cbx_3eda7a42d715.node scripts/run-vitest.mjs src/agents/subagent-registry.test.ts src/agents/session-write-lock.test.ts;git diff --check;.agents/skills/autoreview/scripts/autoreview --mode local; Azure Crabbox before/after focused regression proof.RUN v4.1.7 /home/galini/GitHub/worktrees/bug-035-036-subagent-delivery-locks Test Files 4 passed (4) Tests 144 passed (144) Start at 15:01:23 Duration 16.11s autoreview clean: no accepted/actionable findings reported overall: patch is correct (0.84)Azure Crabbox after-fix evidence log:
discardReason: "expired-after-durable-fallback"pluscompletion.fallbackResultText; the session lock regression observes timeout owner text withpid=,alive=true,starttime=, andcurrentStarttime=. Azure Crabbox after-fix proof passed the focused regression set at headf9c6fc7de291905233e7d5ca65201f6d299e00ab.Redacted original symptom evidence:
Azure Crabbox before-fix evidence log:
Tests and validation
Which commands did you run?
node scripts/run-vitest.mjs src/agents/subagent-registry.test.ts src/agents/session-write-lock.test.tsgit diff --check.agents/skills/autoreview/scripts/autoreview --mode localsilver-prawn/cbx_e06ef86ae274, basec4bce00727ec5f6a5b9c78e0ad34524f5dc1db54, observedBEFORE_FIX_TEST_EXIT=1andBEFORE_FIX_EXPECTED_REGRESSION=1.violet-lobster/cbx_3eda7a42d715, headf9c6fc7de291905233e7d5ca65201f6d299e00ab, observedAFTER_FIX_TEST_EXIT=0andAFTER_FIX_REGRESSION_PASSED=1.What regression coverage was added or updated?
src/agents/subagent-registry.test.tsnow verifies durable fallback preservation on suspended final-delivery expiry.src/agents/session-write-lock.test.tsnow verifies timeout owner diagnostics include liveness/starttime details.What failed before this fix, if known?
c4bce00727ec5f6a5b9c78e0ad34524f5dc1db54and observed the expected regression failure.If no test was added, why not?
N/A
Risk checklist
Did user-visible behavior change? (
Yes/No)Yes
Did config, environment, or migration behavior change? (
Yes/No)No
Did security, auth, secrets, network, or tool execution behavior change? (
Yes/No)No
What is the highest-risk area?
Subagent final-delivery cleanup state.
How is that risk mitigated?
The delivery payload is still compacted on expiry, but final text is copied into the existing completion fallback field before the payload is removed; pressure-pruned deliveries remain unchanged.
Current review state
What is the next action?
Maintainer review and CI.
What is still waiting on author, maintainer, CI, or external proof?
Live replay of the original private sessions was not run. Required CI is currently blocked by an unrelated type error in
extensions/qa-lab/src/live-transports/whatsapp/whatsapp-live.runtime.ts:872.Which bot or reviewer comments were addressed?
Autoreview reported no accepted/actionable findings. Azure Crabbox before/after focused evidence was added to the PR description.