Skip to content

fix: correct PCSG topology constraint handling for scaled PodGangs#357

Merged
unmarshall merged 9 commits into
ai-dynamo:mainfrom
Ronkahn21:fix/revert-pcsg-topology-constraints
Jan 20, 2026
Merged

fix: correct PCSG topology constraint handling for scaled PodGangs#357
unmarshall merged 9 commits into
ai-dynamo:mainfrom
Ronkahn21:fix/revert-pcsg-topology-constraints

Conversation

@Ronkahn21

@Ronkahn21 Ronkahn21 commented Jan 20, 2026

Copy link
Copy Markdown
Contributor

What type of PR is this?

/kind bug

What this PR does / why we need it:

Fixes incorrect PCSG topology constraint handling for scaled PodGangs:

  1. Reverts commit a08e5af (fix: add PCS topology constraints to scaled PodGangs #347): Removes incorrect implementation that applied PCSG config-level topology constraints to scaled PodGangs
  2. Implements correct behavior: Scaled PodGangs now properly use two levels:
    • PCSG-level constraint as top-level ( if PCSG constraint does not exists use PCS as fall back) topology constraint
    • Per-PCLQ constraints for individual pod groups

Which issue(s) this PR fixes:

Fixes #339

Special notes for your reviewer:

  • Base PodGangs: Use PCS top-level + PCSG constraint groups (when PCSG TopologyConstraint is defined) + PCLQ levels
  • Scaled PodGangs: Use PCSG top-level (if PCSG does not have constraints fall back to PCS constraints + per-PCLQ constraints (TopologyConstraintGroupConfig NOT applied)
  • Test expectations updated to match correct behavior

Does this PR introduce a API change?

NONE

Additional documentation e.g., enhancement proposals, usage docs, etc.:

NONE

…amo#347)"

This reverts commit a08e5af.

Signed-off-by: Ron Kahn <rkahn@nvidia.com>
Signed-off-by: Ron Kahn <rkahn@nvidia.com>
Add nil check before creating PCSG topology constraint groups in
base PodGang building. This prevents creating empty topology
constraints when PCSG has no TopologyConstraint specified,
ensuring scaled PodGangs properly fall back to PCS-level
constraints.

Signed-off-by: Ron Kahn <rkahn@nvidia.com>
… pcs

Signed-off-by: Ron Kahn <rkahn@nvidia.com>
Signed-off-by: Ron Kahn <rkahn@nvidia.com>
Signed-off-by: Ron Kahn <rkahn@nvidia.com>
…_test.go

Signed-off-by: Ron Kahn <rkahn@nvidia.com>
…t.go

Signed-off-by: Ron Kahn <rkahn@nvidia.com>
Signed-off-by: Ron Kahn <rkahn@nvidia.com>
@unmarshall unmarshall merged commit a5b8e0d into ai-dynamo:main Jan 20, 2026
6 checks passed
danbar2 pushed a commit to danbar2/grove that referenced this pull request Jan 21, 2026
…i-dynamo#357)

* Revert "fix: add PCSG topology constraints to scaled PodGangs (ai-dynamo#347)"

This reverts commit a08e5af.

Signed-off-by: Ron Kahn <rkahn@nvidia.com>

* fix: update topology constraint handling for scaled PodGangs

Signed-off-by: Ron Kahn <rkahn@nvidia.com>

* fix: only create PCSG topology constraints when defined

Add nil check before creating PCSG topology constraint groups in
base PodGang building. This prevents creating empty topology
constraints when PCSG has no TopologyConstraint specified,
ensuring scaled PodGangs properly fall back to PCS-level
constraints.

Signed-off-by: Ron Kahn <rkahn@nvidia.com>

* test: add case for PCS with nil PCSG topology constraints fallback to pcs

Signed-off-by: Ron Kahn <rkahn@nvidia.com>

* test: clean up formatting in syncflow_test.go

Signed-off-by: Ron Kahn <rkahn@nvidia.com>

* test: clean up formatting in syncflow_test.go

Signed-off-by: Ron Kahn <rkahn@nvidia.com>

* test: remove redundant comments regarding pcsgConstraints in syncflow_test.go

Signed-off-by: Ron Kahn <rkahn@nvidia.com>

* test: improve formatting and remove redundant comment in syncflow_test.go

Signed-off-by: Ron Kahn <rkahn@nvidia.com>

* test: clean up formatting in syncflow_test.go

Signed-off-by: Ron Kahn <rkahn@nvidia.com>

---------

Signed-off-by: Ron Kahn <rkahn@nvidia.com>
Ronkahn21 added a commit to Ronkahn21/grove that referenced this pull request Jan 21, 2026
- Rename test functions to simplified format (TAS1-TAS7)
- Remove scenario labels from test comments
- Replace hardcoded strings with setup package constants
- Update log messages to use new test names
- Fix PR ai-dynamo#357 semantic change: PCSG inherits PCS constraint when nil
- Remove duplicate topology functions from k8s_clusters.go

Signed-off-by: Ron Kahn <rkahn@nvidia.com>
Ronkahn21 added a commit to Ronkahn21/grove that referenced this pull request Jan 21, 2026
When PCSG has nil TopologyConstraint, PCSG parent groups are not
created in KAI PodGroup. PCLQ children are placed directly under
PCS-level constraint instead.

- Test_TAS3: Changed from 5 to 3 SubGroups (removed PCSG parents)
- Test_TAS7: Changed from 4 to 2 SubGroups (removed PCSG parents)
- Updated PCLQ children to have Parent: nil instead of PCSG parents

Signed-off-by: Ron Kahn <rkahn@nvidia.com>
Ronkahn21 added a commit to Ronkahn21/grove that referenced this pull request Jan 24, 2026
- Rename test functions to simplified format (TAS1-TAS7)
- Remove scenario labels from test comments
- Replace hardcoded strings with setup package constants
- Update log messages to use new test names
- Fix PR ai-dynamo#357 semantic change: PCSG inherits PCS constraint when nil
- Remove duplicate topology functions from k8s_clusters.go

Signed-off-by: Ron Kahn <rkahn@nvidia.com>
Ronkahn21 added a commit to Ronkahn21/grove that referenced this pull request Jan 24, 2026
When PCSG has nil TopologyConstraint, PCSG parent groups are not
created in KAI PodGroup. PCLQ children are placed directly under
PCS-level constraint instead.

- Test_TAS3: Changed from 5 to 3 SubGroups (removed PCSG parents)
- Test_TAS7: Changed from 4 to 2 SubGroups (removed PCSG parents)
- Updated PCLQ children to have Parent: nil instead of PCSG parents

Signed-off-by: Ron Kahn <rkahn@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TopologyConstraint for a scaled PodGang should have the PCSG topology constraint and not the PCS topology constraint

3 participants