Skip to content

fix: fatal hard-crash on loop detection via unhandled AbortError#20108

Merged
scidomino merged 3 commits intogoogle-gemini:mainfrom
hsm207:fix/loop-abort-crash
Apr 23, 2026
Merged

fix: fatal hard-crash on loop detection via unhandled AbortError#20108
scidomino merged 3 commits intogoogle-gemini:mainfrom
hsm207:fix/loop-abort-crash

Conversation

@hsm207
Copy link
Copy Markdown
Contributor

@hsm207 hsm207 commented Feb 24, 2026

Summary

This PR fixes a fatal hard-crash in the GeminiClient that occurs when the loop detection service identifies repetitive content and attempts to stop the stream. It resolves a synchronous AbortError race condition that can crash the Node.js process by bypassing high-level error handling.

Details

The crash is caused by a race condition within the @google/genai SDK's async iterator when an abort is triggered during yield suspensions. To resolve this, the GeminiClient now immediately breaks the streaming loop upon detection of a loop or reaching a turn limit. This ensures the system exits the unmonitored SDK state safely before any abort-related cleanup occurs.

Key Changes:

  • Loop Termination: Refactored processTurn to immediately break the for await loop when a loop is detected.
  • Cleanup: Removed unused AbortController and linkedSignal declarations to simplify the logic.
  • Graceful Exit: Ensured the CLI exits with Code 0 in headless mode when a loop is detected.
  • Regression Tests: Verified the fix using the updated regression tests in packages/core/src/services/loopDetectionService.test.ts.

Related Issues

Fixes #20106.
Related to:

  • #10952: "Reason: AbortError: The operation was aborted." (Same underlying race condition).
  • #18028: "Prevent hard UI crashes on render errors" (Addressing the hard-crash aspect of this bug).

How to Validate

  1. Install dependencies: npm install
  2. Perform an initial build: npm run build
  3. Execute the reproduction command:
npm start -- -p "Repeat the following sequence exactly 50 times: la li lu le lo"

Expected Result: The loop is detected, the generation stops gracefully, and the application exits with Code 0 instead of crashing with a stack trace.

Pre-Merge Checklist

  • Updated relevant documentation and README (Finalized code comments)
  • Added/updated tests (Verified in loopDetectionService.test.ts)
  • Noted breaking changes (if any)
  • Validated on required platforms/methods:
    • Linux
      • npm run
      • npx

@hsm207 hsm207 requested a review from a team as a code owner February 24, 2026 01:16
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @hsm207, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical hard-crash issue within the GeminiClient that occurred when the loop detection service attempted to stop a streaming response. The crash stemmed from a race condition involving AbortError and the @google/genai SDK's async iterator, leading to an unhandled exception. The changes implement a more robust abort signal handling strategy and refine the loop detection termination process, ensuring that the application gracefully exits without crashing when repetitive content is identified.

Highlights

  • Fatal Crash Fix: Resolved a hard-crash in the GeminiClient caused by a synchronous AbortError during loop detection, which previously bypassed high-level error handling in the @google/genai SDK's async iterator.
  • Abort Signal Handling Refinement: Replaced AbortSignal.any with manual signal linking and a guarded try-catch block for controller.abort() to prevent uncaught exceptions when a parent signal aborts.
  • Loop Detection Logic Update: Refactored the loop detection mechanism to immediately break the streaming loop upon detection, ensuring a safe exit before any abort-related cleanup occurs.
  • Regression Test Addition: Introduced a new comprehensive test case (loopCrashReproduction.test.ts) to reliably reproduce the synchronous emission crash and validate the fix.
Changelog
  • packages/core/src/core/client.ts
    • Replaced AbortSignal.any with a manual addEventListener for the parent signal to trigger the internal controller.abort(), wrapped in a try-catch block for safety.
    • Modified the for await loop to break immediately when a loop is detected, setting a detectedLoop flag.
    • Introduced a try-catch block around the for await loop to gracefully handle AbortError as a valid termination signal.
    • Ensured the abort event listener is removed in a finally block.
    • Updated the turn.run call to use controller.signal directly.
  • packages/core/src/core/loopCrashReproduction.test.ts
    • Added a new test file to specifically reproduce the AbortError crash scenario during loop detection and verify the fix.
Activity
  • Updated code comments for relevant documentation.
  • Added a new test file packages/core/src/core/loopCrashReproduction.test.ts to cover the fix.
  • Validated the changes on Linux using npm run and npx.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a critical crash caused by a race condition in the Gemini SDK's signal handling by transitioning to manual signal linking and a more controlled exit from the streaming loop. However, the manual signal linking in GeminiClient.processTurn has several critical flaws. It fails to properly handle already-aborted signals, leading to resource and memory leaks, and the fix for the fatal crash is incomplete, as parent signal aborts can still trigger the crash during stream suspensions. Additionally, there's a risk of leaking event listeners if errors occur during turn initialization. The review comments provide detailed feedback on ensuring the internal controller is aborted for already-aborted signals, improving listener cleanup robustness, and fully mitigating the crash.

Comment thread packages/core/src/core/client.ts Outdated
Comment on lines +638 to +649
const controller = new AbortController();
const linkedSignal = AbortSignal.any([signal, controller.signal]);
const onAbort = () => {
try {
controller.abort();
} catch (e) {
debugLogger.debug(
'Error aborting internal controller from parent signal',
e,
);
}
};
signal.addEventListener('abort', onAbort, { once: true });
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The manual linking of AbortSignal fails to check if the parent signal is already aborted, causing the internal AbortController to never abort. This leads to unnecessary LLM requests (turn.run), resource consumption, and potential session hangs. Crucially, the fix for the fatal crash is incomplete; if the parent signal is aborted during a yield suspension, controller.abort() can still trigger the uncatchable fatal crash this PR aims to resolve. Additionally, there is a significant risk of leaking the onAbort listener on the parent signal object if an error occurs during the initialization phase (e.g., in this.loopDetector.turnStarted, router.route, or this.setTools) before the try block at line 706 is reached.

    const controller = new AbortController();
    const onAbort = () => {
      try {
        controller.abort();
      } catch (e) {
        debugLogger.debug(
          'Error aborting internal controller from parent signal',
          e,
        );
      }
    };

    if (signal.aborted) {
      onAbort();
    } else {
      signal.addEventListener('abort', onAbort, { once: true });
    }
References
  1. Asynchronous operations that can be cancelled by the user should accept and propagate an AbortSignal to ensure cancellability and prevent dangling processes or network requests. The current implementation's failure to properly link and handle AbortSignal leads to resource consumption and potential session hangs, directly contradicting the goal of proper cancellability.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! I've refined the implementation to ensure signal linking happens within the safety of the try-catch block and added explicit pre-abort checks to prevent any potential leaks or unhandled exceptions from the underlying SDK's custom iterator.

@hsm207 hsm207 force-pushed the fix/loop-abort-crash branch 3 times, most recently from 47454ed to c76e6b7 Compare February 24, 2026 04:38
@hsm207 hsm207 force-pushed the fix/loop-abort-crash branch 7 times, most recently from 48bf197 to 4d4e86a Compare February 24, 2026 13:32
@gemini-cli gemini-cli Bot added area/agent Issues related to Core Agent, Tools, Memory, Sub-Agents, Hooks, Agent Quality help wanted We will accept PRs from all issues marked as "help wanted". Thanks for your support! labels Feb 24, 2026
@hsm207
Copy link
Copy Markdown
Contributor Author

hsm207 commented Mar 1, 2026

@bdmorgan can i get some human feedback about this pr, please?

@hsm207 hsm207 force-pushed the fix/loop-abort-crash branch 3 times, most recently from 23e8820 to 300d1a1 Compare March 13, 2026 17:01
@hsm207 hsm207 force-pushed the fix/loop-abort-crash branch 4 times, most recently from bac0406 to fa68c03 Compare March 20, 2026 05:21
private readonly connection: acp.AgentSideConnection,
private readonly sessionId: string,
private readonly capabilities: acp.FileSystemCapabilities,
private readonly capabilities: acp.FileSystemCapability,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended? how is this related to the fix?

@@ -0,0 +1,43 @@
/**
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of adding a single new test file. Can this not fit into an existing test file somewhere?

Comment thread packages/core/src/core/client.ts Outdated
isInvalidStreamRetry: boolean,
displayContent?: PartListUnion,
controllerToAbort?: AbortController,
_controllerToAbort?: AbortController,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the controller is no longer needed, it should not be an argument at all.

@hsm207
Copy link
Copy Markdown
Contributor Author

hsm207 commented Apr 21, 2026

@scidomino Thanks for the review. The upstream has evolved significantly since the original PR was opened, rendering many of the previous implementation details (and related comments) obsolete.

I have redone the fix against the latest main branch. The new implementation is much simpler.

Current Behavior:

  • Headless: The CLI now halts gracefully (Exit Code 0) instead of crashing when a persistent loop is detected.
  • Interactive: The circuit-breaker prompt now functions correctly. Selecting "Keep" safely halts the turn, while "Disable" allows the model to continue (until hit by the Gemini API's recitation policy).

I've cleaned up the code and removed the unused parameters as suggested.

@hsm207 hsm207 force-pushed the fix/loop-abort-crash branch from c44a95a to 8dde7e7 Compare April 21, 2026 20:24
@hsm207 hsm207 force-pushed the fix/loop-abort-crash branch from 8dde7e7 to 35c6578 Compare April 21, 2026 20:28
@scidomino
Copy link
Copy Markdown
Collaborator

ok. you only have two issues:

  1. update the PR description since much of it is no longer accurate.
  2. Fix the leftover unused variables (controller and linkedSignal). If we don't need them, don't declare them.

Make sure npm run preflight passes

@hsm207 hsm207 force-pushed the fix/loop-abort-crash branch from 35c6578 to c24dad4 Compare April 22, 2026 17:19
@hsm207 hsm207 requested a review from a team as a code owner April 22, 2026 20:54
@hsm207
Copy link
Copy Markdown
Contributor Author

hsm207 commented Apr 22, 2026

Done! addressed both points:

  1. PR Description: Updated to accurately reflect the refactored implementation and the new technical rationale.
  2. Unused Variables: Removed the redundant controller and linkedSignal variables in GeminiClient.

Verified that npm run preflight passes clean. Ready for another look! @scidomino

@scidomino scidomino force-pushed the fix/loop-abort-crash branch from 7ae5207 to 38fc039 Compare April 23, 2026 16:05
@scidomino
Copy link
Copy Markdown
Collaborator

You had changes to package-lock that I reverted. I will merge now. Please don't update the code as that will require rerunning the tests.

@scidomino scidomino enabled auto-merge April 23, 2026 16:06
@scidomino scidomino added this pull request to the merge queue Apr 23, 2026
Merged via the queue into google-gemini:main with commit ff28d55 Apr 23, 2026
52 of 67 checks passed
kimjune01 pushed a commit to kimjune01/gemini-cli-claude that referenced this pull request May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/agent Issues related to Core Agent, Tools, Memory, Sub-Agents, Hooks, Agent Quality help wanted We will accept PRs from all issues marked as "help wanted". Thanks for your support!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Fatal CLI Crash on Loop Detection via Unhandled AbortError

2 participants