Skip to content

fix(slurm): allow hyphens in slurm_cluster_name for power managed nodes#5437

Merged
kvenkatachala333 merged 3 commits into
GoogleCloudPlatform:developfrom
kvenkatachala333:handling_hypens
Apr 23, 2026
Merged

fix(slurm): allow hyphens in slurm_cluster_name for power managed nodes#5437
kvenkatachala333 merged 3 commits into
GoogleCloudPlatform:developfrom
kvenkatachala333:handling_hypens

Conversation

@kvenkatachala333

Copy link
Copy Markdown
Member

This PR fixes a regression introduced in PR #4316 where the validation for slurm_cluster_name was relaxed to allow hyphens, but the runtime Python scripts (util.py) were not updated to handle them.

The previous implementation used a regular expression (node_desc_regex) that broke when parsing node names if the cluster name contained hyphens (e.g., a3mega2-new). This caused nodes to be incorrectly ignored, leading to cluster creation failures where worker nodes were stuck in a CONFIGURING state.

Replaced the fragile node_desc_regex with a robust token-based parser in _node_desc. Instead of relying on a regex split (which becomes ambiguous if both the cluster name and the nodeset contain hyphens), the new parser:

  1. Splits from the right to isolate the index (or range) from the prefix.
  2. Searches for known nodeset_name keys in the prefix to find the exact boundary between the cluster name and the nodeset name.
  3. Extracts the cluster name from the left part.

This approach is extremely robust and eliminates edge-case ambiguities.

Verification Results:

  1. slurm_cluster_name: a3-megafix >> (Pass) - with fix, (Fail) - without fix
  2. slurm_cluster_name: a3-megafix-v1234567 >> (Pass) - with fix, (Fail) - without fix
  3. slurm_cluster_name: a3-megafix-v123456789abc (Fail) - with fix (Failed as part of validator as it should of 20 chars)

@kvenkatachala333 kvenkatachala333 added the release-chore To not include into release notes label Mar 31, 2026
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a regression where Slurm cluster names containing hyphens caused node parsing failures. By moving away from a regex-based approach to a more deterministic, token-based parsing strategy, the system can now correctly identify cluster and nodeset components even when hyphens are present in the cluster name, preventing nodes from entering an incorrect state.

Highlights

  • Robust Node Description Parsing: Replaced the fragile regular expression in _node_desc with a token-based parser that correctly handles cluster names containing hyphens.
  • Improved Cluster Name Resolution: Implemented a logic that splits node names from the right and validates against known nodeset keys to accurately isolate the cluster name from the nodeset name.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the _node_desc method in util.py to replace regex-based parsing with manual string manipulation for extracting node metadata. A significant issue was identified in the new logic for identifying nodesets, which is susceptible to incorrect matches when nodeset names share suffixes. It is recommended to use the existing cluster name configuration to reliably determine the boundary between the cluster and nodeset names, as provided in the suggested code block.

LAVEEN
LAVEEN previously approved these changes Apr 20, 2026

@LAVEEN LAVEEN left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM , please make sure all slurm test pass successfully before merging.

@kvenkatachala333 kvenkatachala333 marked this pull request as ready for review April 23, 2026 05:24
@kvenkatachala333 kvenkatachala333 requested a review from a team as a code owner April 23, 2026 05:24
@kvenkatachala333 kvenkatachala333 merged commit 12286d8 into GoogleCloudPlatform:develop Apr 23, 2026
23 of 82 checks passed
@kvenkatachala333 kvenkatachala333 deleted the handling_hypens branch April 23, 2026 05:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-chore To not include into release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants