improved documentations and increased test coverage for init container by gflarity · Pull Request #193 · ai-dynamo/grove

gflarity · 2025-09-17T19:43:49Z

What type of PR is this?

/kind documentation
/kind tests

What this PR does / why we need it:

As discussed during our sync, I've used Cursor/Claude to:

improve documentation
improve test coverage

I'll include the prompts below for those who are curious. Please feel free to just suggest changes to anything that doesn't look right (ideally just make your changes using the suggest changes github UI).

I realize the documentation is a bit verbose but I think it's ultimately a win as long as it's accurate. It helps new users get up to speed with the codebase, improving documentation also helps the models generate better tests/code when it's well documented.

If you'd like me to take another stab at a particular test or test case, just let me know. I've read through them but I'm just getting acquainted with the code base so it's best to get more eyes on it.

Special notes for your reviewer:

Prompts used:

Documentation:
Improve the inline documentation for for all golang files under this directory/package. Use concise idiomatic go documentation principles. Ensure correctness. Please add some minimal inline documentation above code blocks when to explain the intent of the code. Don't change the code or variable names. All functions to should have concise accurate documentation. Blocks of code should have a comment describing them if practical.

Tests:
Please create an idiomatic golang unit tests for this specific directory/package. Please be sure to document the fields in the anonymous test structs and how they'll be used below. Each Test* function should also have some concise documentation as well. Each test case should have a brief explanation and expectation explained. Rather than having a description field, the description should just be in comments above the name in the struct declaration. Don't forget to run the tests and confirm they're working, but focus on just testing this package as others might have issues. Please avoid creating skipped tests. Feel free to refactor some code to make it more testable. Specifically hardcoded constants etc are good candidates to be refactor out of testable code. As are interface that can be mocked in a straight forward way. Please ensure backwards compatibility and new functions rather than changing the signature of public functions. Keep these refactors to a minimum though, so only do it where there's a good return on investment.

Test Quality And Coverage Check:

Please review all the tests for this package/directory. Do they make sense? Is coverage good? Are all tests and test cases well documented?

Does this PR introduce a API change?

NONE

Additional documentation e.g., enhancement proposals, usage docs, etc.:

Signed-off-by: Geoff Flarity <gflarity@nvidia.com>

gflarity · 2025-09-17T20:20:56Z

Tagged @renormalize as looks like you coded the init container dir.

unmarshall · 2025-09-19T03:58:32Z

 	"github.com/NVIDIA/grove/operator/internal/version"
 )

+// log is the global logger instance configured for the grove init container.


This comment is not required.

unmarshall · 2025-09-19T04:00:46Z

+// TestSetupSignalHandler tests the signal handling setup and context cancellation behavior.
+// It validates proper signal registration, context cancellation, and graceful shutdown behavior.
+func TestSetupSignalHandler(t *testing.T) {
+	tests := []struct {


lets avoid creating table driven tests when the test function just has a single test.

unmarshall · 2025-09-19T04:03:24Z

+		{},
+	}
+
+	for i := range tests {


this test is not really useful as it does not really test the signal handler.

Perhaps the signature of setupSignalHandler could be changed to accept the channel as a parameter, which would enable us in unit testing this function correctly. Right now, the test just establishes things like an error hasn't occured when a context is created without its channel being closed, which will always be the case.

However, I think we can delegate testing graceful termination of the init-container (a consquence of the signal handler) to an end-to-end test.

Not exactly sure how to adjust the prompt here. Please let me know if you have any suggestions.

unmarshall · 2025-09-19T04:03:30Z

+
+// TestSetupSignalHandlerContextProperties tests the properties of the returned context.
+// It validates that the context behaves correctly before any signals are sent.
+func TestSetupSignalHandlerContextProperties(t *testing.T) {


this test is not really useful as it does not really test the signal handler.

unmarshall · 2025-09-19T04:04:38Z

 )

 // CLIOptions defines the configuration that is passed to the init container.
+// It contains the PodClique dependencies with their minimum available replica requirements.


this line is not required as it is also repeated at the field level

unmarshall · 2025-09-19T05:01:22Z

+
+// WaitForReadyWithClient waits for all parent PodCliques to reach their minimum ready replica count
+// using the provided Kubernetes client. This enables testing by allowing client injection.
+func (c *ParentPodCliqueDependencies) WaitForReadyWithClient(ctx context.Context, client kubernetes.Interface, log logr.Logger) error {


instead of creating this function, create the client when constructing ParentPodCliqueDependencies and make it part of the struct itself.

unmarshall · 2025-09-19T05:02:12Z

 }

+// createClient creates and returns a Kubernetes clientset using the in-cluster configuration.
+// This function is designed to be called from within a pod running in a Kubernetes cluster.


Redundant comment line as in-cluster configuration automatically means its going to be running withing a Pod. Also no code runs outside a Pod.

unmarshall · 2025-09-19T05:03:20Z

+// createClient creates and returns a Kubernetes clientset using the in-cluster configuration.
+// This function is designed to be called from within a pod running in a Kubernetes cluster.
 func createClient() (*kubernetes.Clientset, error) {
+	// Get the in-cluster REST configuration


no need for this comment. When code is simple and self-documentation additional documentation on top is just noise.

unmarshall · 2025-09-19T05:03:38Z

 		)
 	}
+
+	// Create the Kubernetes clientset from the REST configuration


same remove such comments as they serve no purpose.

unmarshall · 2025-09-19T05:04:01Z

+// registerEventHandler registers pod lifecycle event handlers with the shared informer factory.
+// It handles pod addition, updates, and deletion events to track readiness state changes.
 func (c *ParentPodCliqueDependencies) registerEventHandler(factory informers.SharedInformerFactory, log logr.Logger) error {
+	// Get the pod informer from the factory


remove comment.

renormalize

Thank you for raising a much needed PR, @gflarity!

Reading through the test cases made me find a couple hidden parsing bugs, since the test cases were written such that these bugs were considered as correct behavior.

I've added comments which correct the test cases, and suggest the fixes in the code.

I've also found a few tests to be redundant, which I've suggested the removal of. We can keep them if you wish to.

renormalize · 2025-09-19T06:29:32Z

 	log = logger.MustNewLogger(false, configv1alpha1.InfoLevel, configv1alpha1.LogFormatJSON).WithName("grove-initc")
 )

+// main is the entry point for the grove init container.


Suggested change

// main is the entry point for the grove init container.

renormalize · 2025-09-19T06:30:05Z

 )

+// main is the entry point for the grove init container.
+// It parses CLI options, sets up signal handling, and waits for parent PodCliques to be ready.


Suggested change

// It parses CLI options, sets up signal handling, and waits for parent PodCliques to be ready.

// Parse CLI options, set up signal handling, and wait for parent PodCliques to be ready.

renormalize · 2025-09-19T06:51:25Z

+		{},
+	}
+
+	for i := range tests {


Perhaps the signature of setupSignalHandler could be changed to accept the channel as a parameter, which would enable us in unit testing this function correctly. Right now, the test just establishes things like an error hasn't occured when a context is created without its channel being closed, which will always be the case.

However, I think we can delegate testing graceful termination of the init-container (a consquence of the signal handler) to an end-to-end test.

renormalize · 2025-09-19T08:19:37Z

+		},
+		// Valid with whitespace around values
+		{
+			input:       []string{"podclique-whitespace:2"},


I don't think the white space has been added to this test input?

Suggested change

input: []string{"podclique-whitespace:2"},

input: []string{"podclique-whitespace : 2"},

Correcting this test case has caught one potential bug:

replicas, err := strconv.Atoi(nameAndMinAvailable[1])

does not trim the whitespace for the subtring containing the replicas.

replicas, err := strconv.Atoi(strings.TrimSpace(nameAndMinAvailable[1]))

renormalize · 2025-09-19T08:22:26Z

 		}

+		// Parse the replica count as an integer
 		replicas, err := strconv.Atoi(nameAndMinAvailable[1])


Context: #193 (comment)

Suggested change

replicas, err := strconv.Atoi(nameAndMinAvailable[1])

replicas, err := strconv.Atoi(strings.TrimSpace(nameAndMinAvailable[1]))

This hasn't been a problem till now (and potentially won't be either), since the arguments for the initc are generated in the operator. Still, it is always better to make the code more robust.

renormalize · 2025-09-19T13:51:19Z

+			if tt.expectCreateClientErr {
+				// Should error during client creation since we're not in a K8s environment
+				assert.Error(t, err)
+				assert.Contains(t, err.Error(), "unable to load in-cluster configuration")


Can we use the ErrNotInCluster constant in the rest package instead? This would avoid test breakage if the string is changed in the future.

renormalize · 2025-09-19T13:56:37Z

+		{
+			name:                  "wrapper with empty dependencies",
+			podCliqueDependencies: map[string]int{},
+			namespaceContent:      "empty-namespace",
+			podGangContent:        "empty-podgang",
+			expectError:           false,
+		},


Can be removed since empty dependencies will not be a case as discussed.

renormalize · 2025-09-19T14:01:18Z

+
+// TestNewPodCliqueState tests the wrapper function with real file operations.
+// It validates that the wrapper correctly calls the testable version with default paths.
+func TestNewPodCliqueState(t *testing.T) {


Not so big on tests like this, which just check the creation of a struct instance from the passed fields to the constructing function. We can keep this if you prefer checking this in.

renormalize · 2025-09-19T14:03:20Z

+		// Zero requirements should always be ready
+		{
+			dependencies:  map[string]int{"zero-clique": 0},
+			readyCounts:   map[string]int{"zero-clique": 0},
+			expectedReady: true,
+		},
+		// Empty dependencies should be ready
+		{
+			dependencies:  map[string]int{},
+			readyCounts:   map[string]int{},
+			expectedReady: true,
+		},


Testcases not needed since, as discussed in my previous comments.

renormalize · 2025-09-19T14:05:52Z

+			for cliqueName, readyCount := range tt.readyCounts {
+				podSet := sets.New[string]()
+				// Add dummy pod names to reach the desired count
+				for i := 0; i < readyCount; i++ {


Range loops are simpler and are the convention for [0, n) iterations.

Suggested change

for i := 0; i < readyCount; i++ {

for i := range readyCount {

gflarity · 2025-09-23T19:42:58Z

I'm going to close this PR and open separate PR for documentation and testing using updated prompts.

gflarity requested review from sanjaychatterjee and unmarshall as code owners September 17, 2025 19:43

improved documentations and increased test coverage for init container

ce9847c

Signed-off-by: Geoff Flarity <gflarity@nvidia.com>

gflarity force-pushed the gflarity/docs_n_tests_initc branch from 0c69575 to ce9847c Compare September 17, 2025 20:09

lint fixes

375085e

Signed-off-by: Geoff Flarity <gflarity@nvidia.com>

gflarity requested a review from renormalize September 17, 2025 20:20

unmarshall requested changes Sep 19, 2025

View reviewed changes

renormalize suggested changes Sep 19, 2025

View reviewed changes

gflarity closed this Sep 23, 2025

gflarity mentioned this pull request Sep 23, 2025

Add unit-tests for initc and improve in-line doc strings #204

Merged

	// It parses CLI options, sets up signal handling, and waits for parent PodCliques to be ready.
	// Parse CLI options, set up signal handling, and wait for parent PodCliques to be ready.

	input: []string{"podclique-whitespace:2"},
	input: []string{"podclique-whitespace : 2"},

	replicas, err := strconv.Atoi(nameAndMinAvailable[1])
	replicas, err := strconv.Atoi(strings.TrimSpace(nameAndMinAvailable[1]))

	for i := 0; i < readyCount; i++ {
	for i := range readyCount {

Conversation

gflarity commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Special notes for your reviewer:

Does this PR introduce a API change?

Additional documentation e.g., enhancement proposals, usage docs, etc.:

Uh oh!

gflarity commented Sep 17, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

renormalize left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gflarity commented Sep 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gflarity commented Sep 17, 2025 •

edited

Loading