Skip to content

Refactor/e2e reuse k8s clients#502

Merged
oleg-kushniriov merged 18 commits into
ai-dynamo:mainfrom
oleg-kushniriov:refactor/e2e-reuse-k8s-clients
Apr 12, 2026
Merged

Refactor/e2e reuse k8s clients#502
oleg-kushniriov merged 18 commits into
ai-dynamo:mainfrom
oleg-kushniriov:refactor/e2e-reuse-k8s-clients

Conversation

@oleg-kushniriov

@oleg-kushniriov oleg-kushniriov commented Mar 30, 2026

Copy link
Copy Markdown
Contributor

What type of PR is this?

/kind cleanup
/kind e2e

What this PR does / why we need it:

Refactors the e2e test framework from flat utility functions and a monolithic
TestContext struct into domain-specific manager structs with bound methods.

New packages:

  • e2e/k8s/Clients, PodManager, NodeManager, ResourceManager, polling, conversions
  • e2e/grove/WorkloadManager, TopologyVerifier, PodGroupVerifier, OperatorConfig
  • e2e/diagnostics/DiagCollector
  • e2e/tests/suite.goTestSuite composing all managers with functional options

Key fixes:

  • REST mapper is now created once in SharedClusterManager instead of being
    recreated per-test via CreateKubernetesClients
  • Pre-existing build errors in rolling update tests resolved (bare function
    calls converted to TestSuite method calls)

Design for future parallel suites:

  • TestSuite.Namespace is per-suite (can be unique for parallel runs)
  • k8s.Clients is goroutine-safe (shared across suites)
  • Managers are scoped to their suite instance, no global state

Which issue(s) this PR fixes:

Fixes #513
Fixes #513

Special notes for your reviewer:

  • TestContext and its methods in setup.go/debug_utils.go are marked
    deprecated but kept — they are still used by the diagnostics path and will
    be removed in a follow-up once debug_utils.go consumers are fully migrated.
  • The auto-mnnvl/ tests use their own lowercase testContext struct and were
    intentionally not migrated in this PR to keep scope manageable.
  • The utils/ package files remain as deprecated wrappers. Only logger.go
    and measurement/ are long-term residents; the rest will be deleted in a
    follow-up PR.
  • All e2e test files (startup ordering, scale, gang scheduling, topology,
    cert management, rolling updates) have been migrated to use TestSuite.

Does this PR introduce a API change?

NONE

Additional documentation e.g., enhancement proposals, usage docs, etc.:

 NONE                                              

@copy-pr-bot

copy-pr-bot Bot commented Mar 30, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@oleg-kushniriov oleg-kushniriov marked this pull request as draft March 31, 2026 07:28
@oleg-kushniriov oleg-kushniriov force-pushed the refactor/e2e-reuse-k8s-clients branch 2 times, most recently from b4a1638 to 35f5f3b Compare April 5, 2026 09:41
@oleg-kushniriov oleg-kushniriov self-assigned this Apr 5, 2026
@oleg-kushniriov oleg-kushniriov marked this pull request as ready for review April 5, 2026 10:28
Comment thread operator/e2e/k8s/resources/resources.go
Comment thread operator/e2e/grove/config/config.go
Comment thread operator/e2e/grove/workload/workload.go Outdated
Comment thread operator/e2e/grove/workload/workload.go Outdated
Comment thread operator/e2e/k8s/pods/pods.go Outdated
Comment thread operator/e2e/k8s/polling.go
Comment thread operator/e2e/grove/podgroup/podgroup.go Outdated
Comment thread operator/e2e/grove/topology/topology.go
Comment thread operator/e2e/tests/context.go Outdated
Comment thread operator/e2e/diagnostics/collector.go Outdated
@oleg-kushniriov oleg-kushniriov requested a review from gflarity April 7, 2026 10:19
Comment thread operator/e2e/architecture.md
danbar2
danbar2 previously approved these changes Apr 9, 2026
@oleg-kushniriov oleg-kushniriov requested a review from danbar2 April 12, 2026 04:05
danbar2
danbar2 previously approved these changes Apr 12, 2026
Comment thread operator/e2e/tests/debug_utils.go Outdated
Comment thread operator/e2e/testctx/context.go
Comment thread operator/e2e/setup/constants.go
@oleg-kushniriov oleg-kushniriov merged commit 94b2e7d into ai-dynamo:main Apr 12, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor e2e test infrastructure — introduce domain packages, shared clients, and new TestContext

4 participants