[Telemetry] Add an atomic flag to ensure telemetry event is not recurrently called#5705
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request improves the reliability of the telemetry system by preventing redundant event reporting during application crashes. By implementing an atomic guard, the system now guarantees that telemetry flushing logic is executed only once, even if multiple fatal events occur simultaneously or recursively. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces an atomic guard using sync/atomic to ensure telemetry is executed exactly once during fatal errors, preventing re-entrancy and duplicate events. A new test case, TestFatalHook_ConcurrencyAndReentrancy, was added to verify the thread safety of this implementation. Feedback suggests extending this atomic protection to the main Execute() function to handle normal program exits consistently and implementing a 10-second timeout for telemetry calls to prevent blocking the CLI.
74710aa
into
GoogleCloudPlatform:develop
Problem
Currently, the
logging.FatalHookunconditionally triggers telemetry collection whenever the CLI encounters a fatal error. This introduces two significant risks:logging.Fatal()call. This recursive loop can lead to deadlocks or redundant executions.FatalHookat the same time, leading to duplicate payload generation and concurrent uploads to Clearcut.Solution
This PR introduces a thread-safe atomic flag to guarantee that telemetry is flushed exactly once per CLI execution, regardless of concurrency or re-entrant fatal calls.
Changes
cmd/root.go: AddedtelemetryFlushed atomic.Boolto serve as a 2-state flag for whether the telemetry event is flushed or not.cmd/root.go: Wrapped the telemetry execution inside thelogging.FatalHookwithif telemetryFlushed.CompareAndSwap(false, true).cmd/root_test.go: Added theTestFatalHook_ConcurrencyAndReentrancyunit test. This test spawns multiple concurrent goroutines to simulate race conditions and verifies that the telemetry hook behaves safely under load.Rationale
Using an
atomic.BoolwithCompareAndSwapis superior to using standard locking mechanisms (likesync.Mutexorsync.Once) for this specific issue. Atomic operations are non-blocking; if a recursive call or a concurrent goroutine reaches the hook, it immediately evaluates tofalseand skips the block without waiting, safely proceeding to termination.Advantages
logging.Fatalloops during failure handling.atomic.Boolexpresses the intent clearly and avoids the error-prone usage of generic integer-based atomics.