[Telemetry] Set up a base skeleton framework#5455
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request establishes a foundational telemetry framework for the Cluster Toolkit. The goal is to enable the collection of non-sensitive usage metrics and deployment outcomes to inform future product decisions. The implementation includes a modular structure for collecting metrics, building event payloads, and uploading them via HTTP POST requests. Note that while the infrastructure is now in place, data collection is currently disabled in production. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a telemetry collection system for the HPC Toolkit, adding the pkg/telemetry package to capture and upload command execution metrics such as exit codes and latency. It also centralizes version management within pkg/config and includes comprehensive unit tests for the new components. Review feedback identifies a potential runtime panic in argument parsing, suggests using non-intrusive logging for telemetry errors to avoid user confusion, recommends making the upload process non-blocking to improve CLI performance, and points out redundant constants.
SwarnaBharathiMantena
left a comment
There was a problem hiding this comment.
I assume support for the --telemetry flag will be added in the followup PRs?
Yes. Once support for user config for new users is added. |
SwarnaBharathiMantena
left a comment
There was a problem hiding this comment.
LGTM for a base skeleton.
97e0767
into
GoogleCloudPlatform:develop
This reverts commit 97e0767.
Cluster Toolkit Telemetry
The objective of this effort is to implement a robust telemetry system for Cluster Toolkit that captures usage data. This is a proposed enhancement to help the team understand how modules and blueprints are being used across different environments. When enabled, the system will automatically collect non-sensitive metrics and deployment outcomes, enabling better product decisions and roadmap planning based on real usage insights. Clients would be able to choose to opt-out.
This PR - Setting up a base skeleton framework
We will be collecting data from each CLI run and the data is then sent in the form of an HTTP POST request to an internal service for analytics.
Introduced a new
pkg/telemetry:collector.go: The required metrics are collected here. Support forcommand name,IS_TEST_DATA,latency, andEXIT_CODEhas been added in this PR.telemetry.go: Contains methods to construct the payload and handle the complete telemetry flow.uploader.go: Includes aFlush()method to send the event payload to the internal server for future analysis.Added unit tests for the new code introduced and performed local testing. The coverage > 80%.
This PR has no effect on production, DATA COLLECTION HAS NOT STARTED YET.