fix(cli): drain runExitCleanup before process.exit in error handlers#3602
Conversation
handleError / handleCancellationError / handleMaxTurnsExceededError all called process.exit synchronously, bypassing the caller's runExitCleanup -> Config.shutdown -> chat-recording flush() chain on SIGINT, max-turn, and fatal-error paths. Same family as the EPIPE bypass fixed in bf24fff, just on a different code path. Makes the three handlers async and routes the exit through a shared exitAfterCleanup() helper that awaits runExitCleanup() before the actual process.exit. The helper carries an exit-once latch so a SIGINT racing a stream rejection (handleCancellationError + handleError fired concurrently) doesn't end up running cleanup twice or interleaving exit calls — only the first caller drains and exits, the second parks on a never-resolving promise that's killed when process.exit fires. Text-mode handleError still throws to the caller (unchanged behavior), but now drains the queue first so the unhandled-rejection path doesn't lose chat-recording records. Five call sites in nonInteractiveCli.ts updated to await. Existing 11 errors.test.ts cases adapted to async + rejects.toThrow; added 5 new regression guards covering cleanup-before-exit ordering for each handler plus the concurrent-handler race.
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
zhangxy-zju
left a comment
There was a problem hiding this comment.
LGTM. The fix closes the last process.exit-bypasses-runExitCleanup path that wasn't covered by #3581 — same root cause as the EPIPE fix in bf24fff1f, just on the SIGINT / max-turn / fatal-error legs. Verified the call surface: grep confirms the 5 sites in nonInteractiveCli.ts are the only callers of these three handlers in the CLI package, all converted to await.
A few things I checked while reviewing:
- Concurrency latch. Module-level
exiting+ park-forever promise is a clean way to handle the SIGINT-races-stream-rejection case. The "second caller suspends in an unresolved promise" pattern is fine because the first caller'sprocess.exitwill tear down the process. Test isolation via_resetExitLatchForTestinbeforeEachis necessary — without it the latch leaks across cases. - Text-mode behavior change.
handleErrornowawait runExitCleanup()before re-throwing. SincerunExitCleanupclears the cleanup array on completion, the top-levelrunExitCleanupingemini.tsxbecomes a no-op on the re-throw path. Cleanup runs once, just earlier than before — no functional regression. return process.exit(code)inexitAfterCleanup. The TS comment is correct:awaitdoesn't propagatenever, so the explicitreturnis needed for the function to type-check asPromise<never>.- Naming convention.
_resetExitLatchForTestmatches the_reset*ForTestpattern established ind6485964c(_resetCleanupFunctionsForTest,_resetValidatePathCacheForTest, etc.). - CI. 13/13 green across macOS / Ubuntu / Windows × Node 20/22/24, plus Lint and CodeQL.
After this lands, the invariant "every terminating path drains runExitCleanup first" holds across all 4 known exit shapes (normal return, EPIPE, SIGINT, fatal error / max turns). Worth a follow-up note in CONTRIBUTING.md or a comment on process.exit itself so future contributors know to route through exitAfterCleanup instead of calling process.exit directly.
…wenLM#3602) handleError / handleCancellationError / handleMaxTurnsExceededError all called process.exit synchronously, bypassing the caller's runExitCleanup -> Config.shutdown -> chat-recording flush() chain on SIGINT, max-turn, and fatal-error paths. Same family as the EPIPE bypass fixed in 3dc263fcd, just on a different code path. Makes the three handlers async and routes the exit through a shared exitAfterCleanup() helper that awaits runExitCleanup() before the actual process.exit. The helper carries an exit-once latch so a SIGINT racing a stream rejection (handleCancellationError + handleError fired concurrently) doesn't end up running cleanup twice or interleaving exit calls — only the first caller drains and exits, the second parks on a never-resolving promise that's killed when process.exit fires. Text-mode handleError still throws to the caller (unchanged behavior), but now drains the queue first so the unhandled-rejection path doesn't lose chat-recording records. Five call sites in nonInteractiveCli.ts updated to await. Existing 11 errors.test.ts cases adapted to async + rejects.toThrow; added 5 new regression guards covering cleanup-before-exit ordering for each handler plus the concurrent-handler race. Co-authored-by: wenshao <wenshao@U-K7F6PQY3-2157.local>
Summary
Closes the last
process.exit-bypasses-runExitCleanuppath that didn't make it into #3581 (the merged PR addressed the EPIPE + cleanup-failsafe pieces of the same family).handleError/handleCancellationError/handleMaxTurnsExceededErrorare called from the SIGINT, max-turn, and fatal-error paths innonInteractiveCli.ts. They were synchronous and calledprocess.exitdirectly — which bypassesrunExitCleanup→Config.shutdown→chatRecording.flush(), silently dropping queued JSONL writes from the most recent turn.After #3581 the EPIPE path was fixed by destroying stdout instead of exiting, but the SIGINT / fatal-error paths still terminated through the same shape. This PR closes that.
What changed
handleError/handleCancellationError/handleMaxTurnsExceededErrorare nowasync ...: Promise<never>and route their exit through a sharedexitAfterCleanup(code)helper thatawait runExitCleanup()first.handleErrorstill re-throws to the caller (unchanged behaviour) but drains the queue first so the unhandled-rejection path doesn't lose records.handleCancellationError+handleErrorfiring in parallel) don't run cleanup twice or double-exit. Only the first caller drains and exits; the second parks in a never-resolving promise and dies whenprocess.exitfires._resetExitLatchForTestfollows the underscore-prefix convention from d648596 (_resetCleanupFunctionsForTest,_resetValidatePathCacheForTest, etc.).nonInteractiveCli.tsupdated toawait. Existing 11errors.test.tscases adapted toasync + rejects.toThrow. Added 5 new regression guards covering cleanup-before-exit ordering for each handler plus the concurrent-handler race.Test plan
npx vitest run packages/cli/src/utils/errors.test.ts packages/cli/src/utils/cleanup.test.ts packages/cli/src/nonInteractiveCli.test.ts— 75 passing locallynpm run typecheckclean (thereturn process.exit(code)inexitAfterCleanupis needed so thenevernarrows past theawait)npx eslintcleanContext
This work was originally folded into #3581 as commit 718ebb3 but landed ~1h after that PR was squash-merged, so it didn't make it into main. Same theme, same fix shape — putting it in its own PR for clean review. Re-pinging @zhangxy-zju who reviewed #3581 and would be the natural reviewer here.