Skip to content

Refactor/e2e unify k8s clients#529

Merged
oleg-kushniriov merged 10 commits into
ai-dynamo:mainfrom
oleg-kushniriov:refactor/e2e-unify-k8s-clients
Apr 20, 2026
Merged

Refactor/e2e unify k8s clients#529
oleg-kushniriov merged 10 commits into
ai-dynamo:mainfrom
oleg-kushniriov:refactor/e2e-unify-k8s-clients

Conversation

@oleg-kushniriov

@oleg-kushniriov oleg-kushniriov commented Apr 14, 2026

Copy link
Copy Markdown
Contributor

What type of PR is this?

/kind cleanup
/kind feature

What this PR does / why we need it:

Replaces the 6-field Clients struct (Clientset, DynamicClient, GroveClient, CRClient, RestMapper, RestConfig) with a unified K8s struct that embeds controller-runtime client.Client. This eliminates the "which client do I use?" question from e2e test code — every Kubernetes operation goes through a single client with all types registered in one scheme.

Key changes:

  • New e2e/k8s/k8s.go — K8s struct embedding client.Client, with RestConfig for raw HTTP and private clientset for GetLogs/WatchPods (capabilities client.Client lacks). Single scheme registers core K8s, Grove CRDs, KAI scheduling, and KAI topology types.
  • All managers migrated — PodManager, NodeManager, ResourceManager, WorkloadManager, PodGroupVerifier, DiagCollector, OperatorConfig, TopologyVerifier now accept client.Client or *K8s instead of *Clients.
  • SharedClusterManager simplified — 4 redundant client fields removed, single k8s *K8s field, one GetK8s() accessor replaces GetClients()/GetAllClients()/GetCRClient().
  • ResourceManager YAML apply simplified — eliminated RestMapper usage, GVR-based scope checks, and namespace branching. client.Client handles GVK-to-GVR mapping and scoping internally.
  • Typed access for KAI resources — PodGroupVerifier and TopologyVerifier now use typed client.Client operations instead of DynamicClient + ConvertUnstructuredToTyped.
  • WaitAndGetReadyNode standalone function removed — callers use NodeManager.WaitAndGetReady instead.
  • e2e/k8s/clients/ package deleted entirely.

Also includes:

  • k8s.Getter[T] helper bridging client.Client.Get to waiter.GetFunc[T]
  • computeDomainGVR → computeDomainGVK (removes hardcoded Kind strings)
  • suite_test.go fix for nil testctx.Logger in auto-mnnvl tests

Which issue(s) this PR fixes:

Fixes #519

Special notes for reviewer:

  • client.Client does not support Watch() or GetLogs() — these require the private kubernetes.Clientset kept inside the K8s struct. Only 3 call sites use these (diagnostics log streaming, update tracker pod watching, unsupported_but_enabled log checking).
  • The ResourceManager.ScaleCRD signature changed from GroupVersionResource to GroupVersionKind since client.Client works with GVK, not GVR.
  • WorkloadManager.WaitForPodClique no longer takes a groveClient parameter — it uses the embedded client from *K8s.
  • ConvertUnstructuredToTyped is still used in setup/grove.go (ConvertTypedToUnstructured) so e2e/k8s/conversions.go is kept.

Does this PR introduce a API change?

NONE

Additional documentation e.g., enhancement proposals, usage docs, etc.:

The e2e test framework now uses a single K8s struct for all Kubernetes operations:

  // Create a K8s client
  k8sClient, err := k8s.New(restConfig)

  // Typed operations (Get, List, Create, Update, Delete, Patch)
  k8sClient.Get(ctx, types.NamespacedName{Name: "my-pod", Namespace: "default"}, &pod)
  k8sClient.List(ctx, &podList, client.InNamespace("default"))
  k8sClient.Create(ctx, &pcs)

  // Log streaming and pod watching
  k8sClient.GetLogs(namespace, podName, opts)
  k8sClient.WatchPods(ctx, namespace, opts)

  // Bridge to waiter package
  waiter.FetchByName(name, k8s.Getter[*v1.Node](k8sClient, ""))

@copy-pr-bot

copy-pr-bot Bot commented Apr 14, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@oleg-kushniriov oleg-kushniriov force-pushed the refactor/e2e-unify-k8s-clients branch 2 times, most recently from 72dabb7 to f75c8d1 Compare April 19, 2026 12:22
@oleg-kushniriov oleg-kushniriov marked this pull request as ready for review April 19, 2026 13:23
Comment thread operator/e2e/k8s/k8s.go Outdated
danbar2
danbar2 previously approved these changes Apr 20, 2026
Comment thread operator/e2e/setup/grove.go Outdated
@oleg-kushniriov oleg-kushniriov force-pushed the refactor/e2e-unify-k8s-clients branch from 0bf7ac4 to 0f8f36a Compare April 20, 2026 15:49
@oleg-kushniriov oleg-kushniriov merged commit 46b51c3 into ai-dynamo:main Apr 20, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unify e2e Kubernetes clients into single K8s struct backed by controller-runtime client.Client

3 participants