fix(agents): log warnings instead of swallowing subagent errors#82943
Conversation
|
Codex review: needs maintainer review before merge. Reviewed May 24, 2026, 7:42 PM ET / 23:42 UTC. Summary PR surface: Source +8. Total +8 across 2 files. Reproducibility: yes. Source inspection on current main shows both catch blocks swallow errors silently, and the PR body provides fault-injected gateway output showing the new warnings fire after the patch. Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Risk before merge
Maintainer options:
Next step before merge Security Review detailsBest possible solution: Land the narrow diagnostic change after normal maintainer review/checks, keeping the current fallback behavior while using the subsystem logger for both error paths. Do we have a high-confidence way to reproduce the issue? Yes. Source inspection on current main shows both catch blocks swallow errors silently, and the PR body provides fault-injected gateway output showing the new warnings fire after the patch. Is this the best way to solve the issue? Yes. Using the existing subsystem logger is the narrowest maintainable fix here because it preserves the existing fallback control flow and adds only the missing diagnostic trail. Codex review notes: model gpt-5.5, reasoning high; reviewed against f68ed721b16c. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +8. Total +8 across 2 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
Signed-off-by: Sebastien Tardif <sebtardif@ncf.ca>
|
ClawSweeper PR egg ✨ Hatched: 🥚 common Brave Branchling Hatch commandComment Hatchability rules:
Rarity: 🥚 common. What is this egg doing here?
|
d08718c to
a0fb535
Compare
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
@clawsweeper re-review Added real gateway startup proof: built OpenClaw from patched source, started the gateway showing clean 1.3s startup with 8 plugins. Verified both log.warn calls are present in the compiled production bundle. Red-green tests supplemental. |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
Real behavior proofSubagent registry tests pass (66 tests across 2 suites): The change is minimal: two @clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
6e68aaf to
2e2f73b
Compare
Wire createSubsystemLogger into the two silent catch blocks that
discard errors during subagent lifecycle:
1. emitSubagentEndedHookOnce (subagent-registry-completion.ts):
catch { return false } -> catch (err) { log.warn(...); return false }
2. restoreSubagentRunsOnce (subagent-registry.ts):
catch { /* ignore */ } -> catch (err) { log.warn(...) }
Both paths now log the error message before continuing, providing
a diagnostic trail when hook emission or disk restore fails silently.
Signed-off-by: Sebastien Tardif <sebtardif@ncf.ca>
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
🦞🧹 Reason: re-review requires an open issue or PR. |
Problem
Two catch blocks in the subagent registry silently discard errors, leaving no diagnostic trail when hook emission or disk restore fails:
emitSubagentEndedHookOnce(subagent-registry-completion.ts:116):catch { return false; }discards hook-runner errors when emitting thesubagent_endedlifecycle hook.restoreSubagentRunsOnce(subagent-registry.ts:645):catch { // ignore restore failures }discards errors when restoring subagent runs from disk on cold start.When either path fails (hook runner unavailable, corrupt disk state, reconciliation error), the system continues silently. Operators have no indication that subagent lifecycle hooks were skipped or that restored runs were lost.
Fix
Wire
log.warn()into both catch blocks so errors are logged before the existing fallback behavior continues:catch (err) { log.warn("failed to emit subagent_ended hook for run ${runId}: ..."); return false; }catch (err) { log.warn("failed to restore subagent runs from disk: ..."); }Both paths still return/continue as before (no behavior change beyond logging). The completion file gets its own
createSubsystemLogger("agents/subagent-registry-completion")to match the existingcreateSubsystemLogger("agents/subagent-registry")in the registry file.Production diff is +8 lines changed across 2 files.
Real behavior proof
Behavior addressed: Two silent
catchblocks in the subagent registry discard errors during hook emission and disk restore. The fix addslog.warn()to both paths so failures leave a diagnostic trail instead of being silently swallowed.Real environment tested: Linux (Ubuntu 24.04, WSL2), Node.js v22.16.0, OpenClaw built from source (commit 1d5b5db) at
/tmp/fix-82943.Exact steps or command run after this patch:
Evidence after fix: terminal output copied below.
Compiled production code verification
Both
log.warncalls are present in the compiled subagent-registry bundle:Line 94 is
emitSubagentEndedHookOnce(the hook emission path). Line 1911 isrestoreSubagentRunsOnce(the disk restore path). Before this fix, both werecatch { }(empty) andcatch { // ignore restore failures }respectively.The
log$2prefix on line 94 is the compiled form of the newcreateSubsystemLogger("agents/subagent-registry-completion"), whilelogon line 1911 is the existingcreateSubsystemLogger("agents/subagent-registry").Live gateway startup proof
The patched gateway starts, loads the subagent registry with both warning-enabled catch blocks, and runs cleanly:
No
[agents/subagent-registry-completion]or[agents/subagent-registry]warnings appear in the startup log, confirming the catch blocks are not triggered during normal operation (they are defensive logging for error paths only).Vitest test suites (supplemental)
Fault injection proof: restore catch path fires in real gateway
The
restoreSubagentRunsOncecatch handler was triggered in a live running gateway by injectingthrow new Error("INJECTED: corrupt registry state")at the top of the try block in the compileddist/subagent-registry-DLVHJynj.js. This forces the catch path to fire during gateway startup.Steps:
throw new Error(...)inside the try block ofrestoreSubagentRunsOnce(afterrestoreAttempted = true, beforerestoreSubagentRunsFromDisk).log.warnmessage in the gateway output.Terminal output (fault injection):
Line-by-line analysis:
[gateway] starting...: Gateway startup begins.[agents/subagent-registry] failed to restore subagent runs from disk: INJECTED: corrupt registry state: Thelog.warnin the catch handler fired. The error message includes the full error text. Before this fix, this catch block wascatch { // ignore restore failures }with no logging at all.[gateway] ready: The gateway continued to start normally despite the restore failure. The catch handler swallowed the error and let startup proceed.Without this fix, the same error would be silently ignored (empty catch block with only a comment), leaving no diagnostic trail.
Fault injection proof: hook emission catch path fires in real gateway
The
emitSubagentEndedHookOncecatch handler was also triggered in a live running gateway. A conditionalthrowwas injected inside the try block ofemitSubagentEndedHookOnceindist/subagent-registry-DLVHJynj.js(line 73), and a timed trigger called the function after the gateway was ready.Terminal output (hook emission catch):
Analysis:
[agents/subagent-registry-completion]subsystem prefix confirms the newcreateSubsystemLogger("agents/subagent-registry-completion")is active (this logger was added by this PR).failed to emit subagent_ended hook for run INJECTED-PROOF-RUN: INJECTED: hook runner unavailableis the exactlog.warnmessage from the catch block at compiled line 94.false, not rethrow).catch { return false; }with no logging at all.Both catch blocks are now proven in live gateway runs:
restoreSubagentRunsOnce):[agents/subagent-registry] failed to restore subagent runs from disk: INJECTED: corrupt registry stateemitSubagentEndedHookOnce):[agents/subagent-registry-completion] failed to emit subagent_ended hook for run INJECTED-PROOF-RUN: INJECTED: hook runner unavailableObserved result after fix: Both new
log.warncalls fire correctly in production compiled code. The restore catch logs under[agents/subagent-registry]and the hook emission catch logs under[agents/subagent-registry-completion]. Both allow the gateway to continue running. All 74 tests pass across both test files.What was not tested: The hook emission was triggered via injected timer call rather than a real subagent completion event. The injection proves the catch handler mechanism (log.warn + return false + gateway continues) works identically to the restore path proof.