Skip to content

[Telemetry] Add an atomic flag to ensure telemetry event is not recurrently called#5705

Merged
kadupoornima merged 3 commits into
GoogleCloudPlatform:developfrom
kadupoornima:atomic-flag
May 27, 2026
Merged

[Telemetry] Add an atomic flag to ensure telemetry event is not recurrently called#5705
kadupoornima merged 3 commits into
GoogleCloudPlatform:developfrom
kadupoornima:atomic-flag

Conversation

@kadupoornima

@kadupoornima kadupoornima commented May 25, 2026

Copy link
Copy Markdown
Contributor

Problem
Currently, the logging.FatalHook unconditionally triggers telemetry collection whenever the CLI encounters a fatal error. This introduces two significant risks:

  1. Re-entrancy/Deadlocks: If the telemetry collector itself encounters an error while gathering metrics (for instance, failing to rename or read a packer module), it can trigger another logging.Fatal() call. This recursive loop can lead to deadlocks or redundant executions.
  2. Concurrent Duplicates: If multiple goroutines experience a fatal error simultaneously, they will both trigger the FatalHook at the same time, leading to duplicate payload generation and concurrent uploads to Clearcut.

Solution
This PR introduces a thread-safe atomic flag to guarantee that telemetry is flushed exactly once per CLI execution, regardless of concurrency or re-entrant fatal calls.

Changes

  • cmd/root.go: Added telemetryFlushed atomic.Bool to serve as a 2-state flag for whether the telemetry event is flushed or not.
  • cmd/root.go: Wrapped the telemetry execution inside the logging.FatalHook with if telemetryFlushed.CompareAndSwap(false, true).
  • cmd/root_test.go: Added the TestFatalHook_ConcurrencyAndReentrancy unit test. This test spawns multiple concurrent goroutines to simulate race conditions and verifies that the telemetry hook behaves safely under load.

Rationale
Using an atomic.Bool with CompareAndSwap is superior to using standard locking mechanisms (like sync.Mutex or sync.Once) for this specific issue. Atomic operations are non-blocking; if a recursive call or a concurrent goroutine reaches the hook, it immediately evaluates to false and skips the block without waiting, safely proceeding to termination.

Advantages

  • Prevents Duplicate Events: Ensures Clearcut analytics only receive one payload per execution.
  • Safe CLI Termination: Eliminates the risk of deadlocks from infinite recursive logging.Fatal loops during failure handling.
  • Ergonomic Types: Utilizing atomic.Bool expresses the intent clearly and avoids the error-prone usage of generic integer-based atomics.

@kadupoornima kadupoornima requested a review from a team as a code owner May 25, 2026 13:13
@kadupoornima kadupoornima added release-improvements Added to release notes under the "Improvements" heading. release-bugfix Added to release notes under the "Bug fixes" heading. labels May 25, 2026
@kadupoornima kadupoornima enabled auto-merge (squash) May 25, 2026 13:13
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request improves the reliability of the telemetry system by preventing redundant event reporting during application crashes. By implementing an atomic guard, the system now guarantees that telemetry flushing logic is executed only once, even if multiple fatal events occur simultaneously or recursively.

Highlights

  • Telemetry Guard: Introduced an atomic boolean flag, telemetryFlushed, to ensure that telemetry events are processed exactly once during fatal errors.
  • Concurrency Safety: Updated the logging.FatalHook to use CompareAndSwap on the atomic flag, preventing re-entrancy and duplicate telemetry submissions in concurrent or recursive scenarios.
  • Test Coverage: Added a new test case, TestFatalHook_ConcurrencyAndReentrancy, to verify that multiple concurrent calls to the fatal hook do not trigger redundant telemetry execution.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an atomic guard using sync/atomic to ensure telemetry is executed exactly once during fatal errors, preventing re-entrancy and duplicate events. A new test case, TestFatalHook_ConcurrencyAndReentrancy, was added to verify the thread safety of this implementation. Feedback suggests extending this atomic protection to the main Execute() function to handle normal program exits consistently and implementing a 10-second timeout for telemetry calls to prevent blocking the CLI.

Comment thread cmd/root.go
@kadupoornima kadupoornima merged commit 74710aa into GoogleCloudPlatform:develop May 27, 2026
12 of 74 checks passed
@kadupoornima kadupoornima deleted the atomic-flag branch May 27, 2026 06:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-bugfix Added to release notes under the "Bug fixes" heading. release-improvements Added to release notes under the "Improvements" heading.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants