Skip to content

[Telemetry] Use GitHub API and local caching for metadata retrieval#5589

Merged
kadupoornima merged 9 commits into
GoogleCloudPlatform:developfrom
kadupoornima:telemetry-18
May 5, 2026
Merged

[Telemetry] Use GitHub API and local caching for metadata retrieval#5589
kadupoornima merged 9 commits into
GoogleCloudPlatform:developfrom
kadupoornima:telemetry-18

Conversation

@kadupoornima

@kadupoornima kadupoornima commented May 1, 2026

Copy link
Copy Markdown
Contributor

Overview

This pull request refactors the metadata retrieval process used for telemetry, centralizing the logic within the pkg/config package. It removes the reliance on the external Firestore-based caching tool and instead fetches data directly using the GitHub API, backed by a robust local file-caching mechanism.

Process of Getting Standard Data (In Detail)

This process dynamically retrieves this data with the following steps:

  1. Local Caching (getOrFetchCachedList):
    Before making any network requests, the system attempts to read the required data from a local JSON cache file located in the user's cache directory (e.g., standard_modules_<version>.json). If the cache hits and is successfully parsed, the data is returned immediately.
  2. GitHub Git Trees API:
    If there is a cache miss, the system falls back to fetching the full file tree for the current toolkit version directly from the GitHub Git Trees API (https://api.github.com/repos/GoogleCloudPlatform/cluster-toolkit/git/trees/...).
  3. Filtering for Modules and Examples:
    The retrieved repository tree is then parsed to identify standard paths:
    • Modules: It looks for .tf and .pkr.hcl files within the modules/ and community/modules/ directories to build the list of standard modules.
    • Examples: It identifies .yaml and .yml files within the examples/ and community/examples/ directories.
  4. Fetching Blueprint Names via Worker Pool:
    To extract blueprint names efficiently, the system sets up a bounded concurrent worker pool (capped at 10 workers). These workers concurrently fetch the raw content of the identified example YAML files (https://raw.githubusercontent.com/...) and unmarshal only the blueprint_name field.
  5. Cache Population:
    Once the standard modules, examples, or blueprint names are successfully fetched, the results are marshaled into JSON format and saved to the local user cache directory to speed up subsequent CLI executions.

Summary of Changes

  • Added Local Caching and API Fetching: Implemented config.GetPredefinedModules(), config.GetPredefinedExampleFiles(), and config.GetStandardBlueprintNames() to handle local file-based caching and GitHub API fallbacks.
  • Concurrency Improvements: Added a concurrent worker pool to fetch and parse raw YAML files for blueprint extraction without hitting GitHub rate limits sequentially.
  • Telemetry Collector Updates: Updated pkg/telemetry/collector.go to use the new native config.GetPredefinedModules() functions instead of Viper.
  • Cleanup: Deleted tools/cache_metadata/main.go and .github/workflows/cache-metadata.yml as this functionality is now handled natively within the toolkit.
  • Testing: Added comprehensive unit tests in pkg/config/config_test.go using a mock HTTP transport (mockTransport) to simulate cache hits, network failures, and GitHub API responses.

@kadupoornima kadupoornima added the release-chore To not include into release notes label May 1, 2026
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors how standard metadata is retrieved for the toolkit. By moving the logic that fetches module and example information from GitHub directly into the configuration package, the system no longer relies on an external caching tool. This change simplifies the build and deployment process while ensuring that telemetry data remains accurate and up-to-date.

Highlights

  • Refactoring Metadata Retrieval: Moved standard module and example file retrieval logic from a standalone tool into the core configuration package to enable direct GitHub API usage.
  • Cleanup: Removed the redundant tools/cache_metadata/main.go utility as its functionality is now integrated into the main codebase.
  • Improved Telemetry: Updated the telemetry collector to use the new centralized configuration functions, improving maintainability and reliability.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request integrates metadata fetching logic directly into the pkg/config package to support telemetry, including functions to retrieve git trees, modules, and example files from GitHub. Review feedback highlights several critical issues: the use of log.Fatalf in library code which could crash the CLI on network errors, the performance impact of performing synchronous network requests during package initialization, and the inefficiency of individual HTTP requests for every example file which risks hitting GitHub API rate limits. It is recommended to refactor these into non-fatal, asynchronous, or hook-based operations.

Comment thread pkg/config/config.go Outdated
Comment thread pkg/config/config.go Outdated
Comment thread pkg/telemetry/collector.go Outdated
Comment thread pkg/config/config.go Outdated
@kadupoornima

Copy link
Copy Markdown
Contributor Author

/gemini summary

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

This pull request refactors the metadata retrieval process for the toolkit by centralizing the logic within the configuration package. By transitioning from an external Firestore-based caching tool to direct GitHub API calls, the system simplifies its dependency chain and build process. The changes also include robust local caching and improved error handling to ensure telemetry collection remains non-intrusive and resilient to network failures.

Highlights

  • Refactoring Metadata Retrieval: Moved standard module and example file retrieval logic from a standalone tool into the core configuration package to enable direct GitHub API usage.
  • Cleanup: Removed the redundant tools/cache_metadata/main.go utility as its functionality is now integrated into the main codebase.
  • Improved Telemetry: Updated the telemetry collector to use the new centralized configuration functions, improving maintainability and reliability.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/cache-metadata.yml
Activity
  • Gemini Code Assist provided initial feedback on error handling and initialization patterns.
  • The author acknowledged the feedback regarding network efficiency and confirmed the chosen approach.
  • The author addressed the initialization concerns by refactoring the code.

@kadupoornima

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the metadata fetching logic to retrieve standard modules, example files, and blueprint names directly from GitHub instead of Firestore. It introduces a local file-based caching mechanism and a concurrent worker pool for efficient data retrieval. Feedback was provided regarding critical error handling in the GitHub API fetch function, specifically addressing a potential nil pointer dereference and a resource leak when the response status is not OK.

Comment thread pkg/config/config.go Outdated
@kadupoornima kadupoornima changed the title [Telemetry] Refactor getting standard metadata to use Github API [Telemetry] Refactor metadata retrieval to use GitHub API and local caching May 1, 2026
@kadupoornima kadupoornima marked this pull request as ready for review May 1, 2026 06:56
@kadupoornima kadupoornima requested a review from a team as a code owner May 1, 2026 06:56
@kadupoornima kadupoornima enabled auto-merge (squash) May 1, 2026 06:56
@kadupoornima kadupoornima disabled auto-merge May 1, 2026 06:56
@kadupoornima kadupoornima enabled auto-merge (squash) May 1, 2026 06:57
@kadupoornima kadupoornima changed the title [Telemetry] Refactor metadata retrieval to use GitHub API and local caching [Telemetry] Use GitHub API and local caching for metadata retrieval May 1, 2026
@kadupoornima kadupoornima added release-improvements Added to release notes under the "Improvements" heading. and removed release-chore To not include into release notes labels May 1, 2026
Comment thread pkg/config/config.go
kadupoornima and others added 8 commits May 5, 2026 09:44
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

@SwarnaBharathiMantena SwarnaBharathiMantena left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kadupoornima kadupoornima merged commit daea497 into GoogleCloudPlatform:develop May 5, 2026
14 of 80 checks passed
@kadupoornima kadupoornima deleted the telemetry-18 branch May 5, 2026 10:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-improvements Added to release notes under the "Improvements" heading.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants