test: add TAS e2e test infrastructure and basic tests#348
Merged
Conversation
Ronkahn21
added a commit
to Ronkahn21/grove
that referenced
this pull request
Jan 18, 2026
Add 5 tests for simple topology constraint scenarios: - SL1: PCS-only constraint (inherited by children) - SL2: PCSG-only constraint - SL3: No topology constraints (baseline) - PC1: Host-level constraint (strictest packing) - ZL1: Zone-level constraint These tests verify constraint behavior at different resource levels (PCS, PCSG, PCLQ) and topology domains (zone, rack, host, none). Builds on PR ai-dynamo#348 (infrastructure). Signed-off-by: Ron Kahn <rkahn@nvidia.com>
This was referenced Jan 18, 2026
shayasoolin
reviewed
Jan 19, 2026
Ronkahn21
added a commit
to Ronkahn21/grove
that referenced
this pull request
Jan 19, 2026
- Add topology node configuration constants - Restore cleanup failure marking - Refactor label verification to use loop and label selector - Remove redundant conversion wrapper - Rename BP1 to TAS1 following convention - Increase node count to 28 to strengthen test Signed-off-by: Ron Kahn <rkahn@nvidia.com>
gflarity
reviewed
Jan 19, 2026
gflarity
reviewed
Jan 19, 2026
gflarity
reviewed
Jan 19, 2026
gflarity
reviewed
Jan 19, 2026
gflarity
reviewed
Jan 19, 2026
Contributor
|
Just a few comments, the duplicate probably the most important to fix. Just leaving comments to avoid blocking. |
Ronkahn21
added a commit
to Ronkahn21/grove
that referenced
this pull request
Jan 20, 2026
- Remove duplicate WaitForPodsReady function from topology.go - Update topology_test.go to use canonical WaitForPods - Add debug logging to filterEnv in skaffold.go - Extract GetWorkerNodeLabelSelector helper function - Remove unused time import from topology.go Signed-off-by: Ron Kahn <rkahn@nvidia.com>
shayasoolin
reviewed
Jan 20, 2026
gflarity
previously approved these changes
Jan 20, 2026
shayasoolin
previously approved these changes
Jan 20, 2026
- Add 4-level topology hierarchy setup (zone/block/rack/host) - Add KAI Topology verification utilities - Add topology constraint verification helpers - Include 2 foundational tests: * TI1: Topology infrastructure verification * BP1: Multiple cliques with different constraints - Update dependencies to KAI Scheduler v0.13.0-rc1 - Add Makefile target for selective test execution - Add topology-test skaffold profile Signed-off-by: Ron Kahn <rkahn@nvidia.com>
- Add topology node configuration constants - Restore cleanup failure marking - Refactor label verification to use loop and label selector - Remove redundant conversion wrapper - Rename BP1 to TAS1 following convention - Increase node count to 28 to strengthen test Signed-off-by: Ron Kahn <rkahn@nvidia.com>
- Move topology constants and functions to dedicated topology.go - Add GetZoneForNodeIndex() to complete helper function set - Replace hard-coded topology label strings with constants - Use label selector constants for worker node filtering Signed-off-by: Ron Kahn <rkahn@nvidia.com>
- Remove duplicate WaitForPodsReady function from topology.go - Update topology_test.go to use canonical WaitForPods - Add debug logging to filterEnv in skaffold.go - Extract GetWorkerNodeLabelSelector helper function - Remove unused time import from topology.go Signed-off-by: Ron Kahn <rkahn@nvidia.com>
- Change zone/block/rack indices from 1-based to 0-based - Remove unused scenario names (TI-1, TAS-2) from test comments - Update log messages to use correct test names (BP-1 → TAS2) - Update documentation to reflect 0-based indexing This ensures zone-0, block-0, rack-0 labels with no 1-based indexing. Signed-off-by: Ron Kahn <rkahn@nvidia.com>
c81bd1e to
1b43b26
Compare
gflarity
approved these changes
Jan 20, 2026
danbar2
pushed a commit
to danbar2/grove
that referenced
this pull request
Jan 21, 2026
* test: add TAS e2e test infrastructure and basic tests - Add 4-level topology hierarchy setup (zone/block/rack/host) - Add KAI Topology verification utilities - Add topology constraint verification helpers - Include 2 foundational tests: * Topology infrastructure verification * Multiple cliques with different constraints - Update dependencies to KAI Scheduler v0.13.0-rc1 - Add Makefile target for selective test execution - Add topology-test skaffold profile Signed-off-by: Ron Kahn <rkahn@nvidia.com>
Ronkahn21
added a commit
to Ronkahn21/grove
that referenced
this pull request
Jan 21, 2026
Add 5 tests for simple topology constraint scenarios: - SL1: PCS-only constraint (inherited by children) - SL2: PCSG-only constraint - SL3: No topology constraints (baseline) - PC1: Host-level constraint (strictest packing) - ZL1: Zone-level constraint These tests verify constraint behavior at different resource levels (PCS, PCSG, PCLQ) and topology domains (zone, rack, host, none). Builds on PR ai-dynamo#348 (infrastructure). Signed-off-by: Ron Kahn <rkahn@nvidia.com>
Ronkahn21
added a commit
to Ronkahn21/grove
that referenced
this pull request
Jan 24, 2026
Add 5 tests for simple topology constraint scenarios: - SL1: PCS-only constraint (inherited by children) - SL2: PCSG-only constraint - SL3: No topology constraints (baseline) - PC1: Host-level constraint (strictest packing) - ZL1: Zone-level constraint These tests verify constraint behavior at different resource levels (PCS, PCSG, PCLQ) and topology domains (zone, rack, host, none). Builds on PR ai-dynamo#348 (infrastructure). Signed-off-by: Ron Kahn <rkahn@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR establishes the foundational infrastructure for Topology Aware Scheduling (TAS) e2e tests and includes two basic test scenarios.
Infrastructure:
topology-testskaffold profile with TAS configurationTEST_PATTERNsupport)Tests:
This PR is part 1 of 4 in the TAS e2e test suite. Additional test scenarios will be added in follow-up PRs.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Dependencies:
Test Verification:
-tags e2eWhat's Next:
File Summary:
Does this PR introduce a API change?
Additional documentation e.g., enhancement proposals, usage docs, etc.: