Skip to content

improved documentations and increased test coverage for init container#193

Closed
gflarity wants to merge 2 commits into
ai-dynamo:mainfrom
gflarity:gflarity/docs_n_tests_initc
Closed

improved documentations and increased test coverage for init container#193
gflarity wants to merge 2 commits into
ai-dynamo:mainfrom
gflarity:gflarity/docs_n_tests_initc

Conversation

@gflarity

@gflarity gflarity commented Sep 17, 2025

Copy link
Copy Markdown
Contributor

What type of PR is this?

/kind documentation
/kind tests

What this PR does / why we need it:

As discussed during our sync, I've used Cursor/Claude to:

  • improve documentation
  • improve test coverage

I'll include the prompts below for those who are curious. Please feel free to just suggest changes to anything that doesn't look right (ideally just make your changes using the suggest changes github UI).

I realize the documentation is a bit verbose but I think it's ultimately a win as long as it's accurate. It helps new users get up to speed with the codebase, improving documentation also helps the models generate better tests/code when it's well documented.

If you'd like me to take another stab at a particular test or test case, just let me know. I've read through them but I'm just getting acquainted with the code base so it's best to get more eyes on it.

Special notes for your reviewer:

Prompts used:

Documentation:
Improve the inline documentation for for all golang files under this directory/package. Use concise idiomatic go documentation principles. Ensure correctness. Please add some minimal inline documentation above code blocks when to explain the intent of the code. Don't change the code or variable names. All functions to should have concise accurate documentation. Blocks of code should have a comment describing them if practical.

Tests:
Please create an idiomatic golang unit tests for this specific directory/package. Please be sure to document the fields in the anonymous test structs and how they'll be used below. Each Test* function should also have some concise documentation as well. Each test case should have a brief explanation and expectation explained. Rather than having a description field, the description should just be in comments above the name in the struct declaration. Don't forget to run the tests and confirm they're working, but focus on just testing this package as others might have issues. Please avoid creating skipped tests. Feel free to refactor some code to make it more testable. Specifically hardcoded constants etc are good candidates to be refactor out of testable code. As are interface that can be mocked in a straight forward way. Please ensure backwards compatibility and new functions rather than changing the signature of public functions. Keep these refactors to a minimum though, so only do it where there's a good return on investment.

Test Quality And Coverage Check:

Please review all the tests for this package/directory. Do they make sense? Is coverage good? Are all tests and test cases well documented?

Does this PR introduce a API change?

NONE


Additional documentation e.g., enhancement proposals, usage docs, etc.:


Signed-off-by: Geoff Flarity <gflarity@nvidia.com>
@gflarity gflarity force-pushed the gflarity/docs_n_tests_initc branch from 0c69575 to ce9847c Compare September 17, 2025 20:09
Signed-off-by: Geoff Flarity <gflarity@nvidia.com>
@gflarity

Copy link
Copy Markdown
Contributor Author

Tagged @renormalize as looks like you coded the init container dir.

"github.com/NVIDIA/grove/operator/internal/version"
)

// log is the global logger instance configured for the grove init container.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not required.

// TestSetupSignalHandler tests the signal handling setup and context cancellation behavior.
// It validates proper signal registration, context cancellation, and graceful shutdown behavior.
func TestSetupSignalHandler(t *testing.T) {
tests := []struct {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets avoid creating table driven tests when the test function just has a single test.

{},
}

for i := range tests {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is not really useful as it does not really test the signal handler.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the signature of setupSignalHandler could be changed to accept the channel as a parameter, which would enable us in unit testing this function correctly. Right now, the test just establishes things like an error hasn't occured when a context is created without its channel being closed, which will always be the case.

However, I think we can delegate testing graceful termination of the init-container (a consquence of the signal handler) to an end-to-end test.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly sure how to adjust the prompt here. Please let me know if you have any suggestions.


// TestSetupSignalHandlerContextProperties tests the properties of the returned context.
// It validates that the context behaves correctly before any signals are sent.
func TestSetupSignalHandlerContextProperties(t *testing.T) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is not really useful as it does not really test the signal handler.

)

// CLIOptions defines the configuration that is passed to the init container.
// It contains the PodClique dependencies with their minimum available replica requirements.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line is not required as it is also repeated at the field level


// WaitForReadyWithClient waits for all parent PodCliques to reach their minimum ready replica count
// using the provided Kubernetes client. This enables testing by allowing client injection.
func (c *ParentPodCliqueDependencies) WaitForReadyWithClient(ctx context.Context, client kubernetes.Interface, log logr.Logger) error {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of creating this function, create the client when constructing ParentPodCliqueDependencies and make it part of the struct itself.

}

// createClient creates and returns a Kubernetes clientset using the in-cluster configuration.
// This function is designed to be called from within a pod running in a Kubernetes cluster.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant comment line as in-cluster configuration automatically means its going to be running withing a Pod. Also no code runs outside a Pod.

// createClient creates and returns a Kubernetes clientset using the in-cluster configuration.
// This function is designed to be called from within a pod running in a Kubernetes cluster.
func createClient() (*kubernetes.Clientset, error) {
// Get the in-cluster REST configuration

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for this comment. When code is simple and self-documentation additional documentation on top is just noise.

)
}

// Create the Kubernetes clientset from the REST configuration

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same remove such comments as they serve no purpose.

// registerEventHandler registers pod lifecycle event handlers with the shared informer factory.
// It handles pod addition, updates, and deletion events to track readiness state changes.
func (c *ParentPodCliqueDependencies) registerEventHandler(factory informers.SharedInformerFactory, log logr.Logger) error {
// Get the pod informer from the factory

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove comment.

@renormalize renormalize left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for raising a much needed PR, @gflarity!

Reading through the test cases made me find a couple hidden parsing bugs, since the test cases were written such that these bugs were considered as correct behavior.

I've added comments which correct the test cases, and suggest the fixes in the code.

I've also found a few tests to be redundant, which I've suggested the removal of. We can keep them if you wish to.

log = logger.MustNewLogger(false, configv1alpha1.InfoLevel, configv1alpha1.LogFormatJSON).WithName("grove-initc")
)

// main is the entry point for the grove init container.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// main is the entry point for the grove init container.

)

// main is the entry point for the grove init container.
// It parses CLI options, sets up signal handling, and waits for parent PodCliques to be ready.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// It parses CLI options, sets up signal handling, and waits for parent PodCliques to be ready.
// Parse CLI options, set up signal handling, and wait for parent PodCliques to be ready.

{},
}

for i := range tests {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the signature of setupSignalHandler could be changed to accept the channel as a parameter, which would enable us in unit testing this function correctly. Right now, the test just establishes things like an error hasn't occured when a context is created without its channel being closed, which will always be the case.

However, I think we can delegate testing graceful termination of the init-container (a consquence of the signal handler) to an end-to-end test.

},
// Valid with whitespace around values
{
input: []string{"podclique-whitespace:2"},

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the white space has been added to this test input?

Suggested change
input: []string{"podclique-whitespace:2"},
input: []string{"podclique-whitespace : 2"},

Correcting this test case has caught one potential bug:

		replicas, err := strconv.Atoi(nameAndMinAvailable[1])

does not trim the whitespace for the subtring containing the replicas.

		replicas, err := strconv.Atoi(strings.TrimSpace(nameAndMinAvailable[1]))

}

// Parse the replica count as an integer
replicas, err := strconv.Atoi(nameAndMinAvailable[1])

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context: #193 (comment)

Suggested change
replicas, err := strconv.Atoi(nameAndMinAvailable[1])
replicas, err := strconv.Atoi(strings.TrimSpace(nameAndMinAvailable[1]))

This hasn't been a problem till now (and potentially won't be either), since the arguments for the initc are generated in the operator. Still, it is always better to make the code more robust.

if tt.expectCreateClientErr {
// Should error during client creation since we're not in a K8s environment
assert.Error(t, err)
assert.Contains(t, err.Error(), "unable to load in-cluster configuration")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the ErrNotInCluster constant in the rest package instead? This would avoid test breakage if the string is changed in the future.

Comment on lines +641 to +647
{
name: "wrapper with empty dependencies",
podCliqueDependencies: map[string]int{},
namespaceContent: "empty-namespace",
podGangContent: "empty-podgang",
expectError: false,
},

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be removed since empty dependencies will not be a case as discussed.


// TestNewPodCliqueState tests the wrapper function with real file operations.
// It validates that the wrapper correctly calls the testable version with default paths.
func TestNewPodCliqueState(t *testing.T) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not so big on tests like this, which just check the creation of a struct instance from the passed fields to the constructing function. We can keep this if you prefer checking this in.

Comment on lines +883 to +894
// Zero requirements should always be ready
{
dependencies: map[string]int{"zero-clique": 0},
readyCounts: map[string]int{"zero-clique": 0},
expectedReady: true,
},
// Empty dependencies should be ready
{
dependencies: map[string]int{},
readyCounts: map[string]int{},
expectedReady: true,
},

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testcases not needed since, as discussed in my previous comments.

for cliqueName, readyCount := range tt.readyCounts {
podSet := sets.New[string]()
// Add dummy pod names to reach the desired count
for i := 0; i < readyCount; i++ {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Range loops are simpler and are the convention for [0, n) iterations.

Suggested change
for i := 0; i < readyCount; i++ {
for i := range readyCount {

@gflarity

Copy link
Copy Markdown
Contributor Author

I'm going to close this PR and open separate PR for documentation and testing using updated prompts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants