Normalize retry/timeouts for kubernetes installation by ArangoGutierrez · Pull Request #385 · NVIDIA/holodeck

ArangoGutierrez · 2025-06-02T09:52:19Z

No description provided.

Copilot

Pull Request Overview

This PR standardizes retry counts and timeouts across Kubernetes provisioning steps to normalize total wait durations and improve transient failure handling. Key changes include:

Increased delay parameters for with_retry calls in the Kubernetes template
Refactored with_retry implementation to capture output and detect transient errors
Aligned retry count and interval in Go’s waitForNodeReboot to match total timeout

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File	Description
pkg/provisioner/templates/kubernetes.go	Updated all `with_retry` calls from `20s` to `30s`, added `
pkg/provisioner/templates/common.go	Rewrote `with_retry` loop: renamed counter, captured command output, and added transient error detection
pkg/provisioner/provisioner.go	Changed `waitForNodeReboot` from 30×10s retries to 10×30s

Comments suppressed due to low confidence (2)

pkg/provisioner/templates/common.go:85

Always returning 1 when retries are exhausted loses the original exit code, which can be useful for diagnostics. Consider returning the last command’s exit code ($rc) instead.

return 1

pkg/provisioner/provisioner.go:100

[nitpick] The loop variable i is generic; consider renaming to attempt or retryCount to clarify its purpose in retry logic.

for i := 0; i < maxRetries; i++ {

pkg/provisioner/templates/common.go

pkg/provisioner/templates/kubernetes.go

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>

Copilot

Pull Request Overview

This PR standardizes retry counts and timeouts for Kubernetes installation steps, adds an explicit API server availability check, and aligns node-reboot retry logic.

Unifies retry intervals to 30s (up from 20s) across Calico installation and CRD waits
Adds explicit kubectl version retry to ensure API server readiness
Wraps node taint/label commands in with_retry and suppresses errors; adjusts reboot retry loop

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
pkg/provisioner/templates/kubernetes.go	Increased retry intervals, added API server wait, and applied retries with error suppression for taint/label steps
pkg/provisioner/provisioner.go	Reduced `maxRetries` to 10 and increased `retryInterval` to 30s for node-reboot polling

pkg/provisioner/templates/kubernetes.go

pkg/provisioner/provisioner.go

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>

pkg/provisioner/templates/kubernetes.go

Copilot

Pull Request Overview

This PR standardizes retry counts and timeouts for Kubernetes installation commands and improves version normalization logic.

Increased retry intervals from 20s to 30s for kubectl commands and node reboot waits
Added explicit wait for kube-apiserver availability
Normalized Kubernetes and MicroK8s version handling, stripping leading “v” and defaulting when empty

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
pkg/provisioner/templates/kubernetes.go	Updated `with_retry` timeouts, added kube-apiserver wait, enhanced version templating for microk8s
pkg/provisioner/provisioner.go	Adjusted node reboot retry parameters (interval and count)

pkg/provisioner/templates/kubernetes.go

pkg/provisioner/provisioner.go

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>

shivakunv

LGTM . except for some copilot comments

coveralls · 2025-06-03T17:41:45Z

Pull Request Test Coverage Report for Build 15389641778

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

7 of 11 (63.64%) changed or added relevant lines in 2 files are covered.
23 unchanged lines in 1 file lost coverage.
Overall coverage remained the same at 14.98%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
pkg/provisioner/provisioner.go	0	2	0.0%
pkg/provisioner/templates/kubernetes.go	7	9	77.78%

Files with Coverage Reduction	New Missed Lines	%
pkg/provisioner/templates/kubernetes.go	23	52.69%

Totals
Change from base Build 15363939553:	0.0%
Covered Lines:	262
Relevant Lines:	1749

💛 - Coveralls

ArangoGutierrez requested review from Copilot and shivakunv June 2, 2025 09:52

ArangoGutierrez self-assigned this Jun 2, 2025

ArangoGutierrez requested review from cdesiniotis, elezar and tariq1890 as code owners June 2, 2025 09:52

Copilot AI reviewed Jun 2, 2025

View reviewed changes

pkg/provisioner/templates/common.go Show resolved Hide resolved

pkg/provisioner/templates/common.go Outdated Show resolved Hide resolved

pkg/provisioner/templates/kubernetes.go Outdated Show resolved Hide resolved

pkg/provisioner/templates/kubernetes.go Show resolved Hide resolved

ArangoGutierrez force-pushed the v0212 branch from 470ddcd to bd405c4 Compare June 2, 2025 10:11

ArangoGutierrez requested a review from Copilot June 2, 2025 10:11

Normalize retry/timeouts for kubernetes installation

f9103a7

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>

ArangoGutierrez force-pushed the v0212 branch from bd405c4 to f9103a7 Compare June 2, 2025 10:12

Copilot AI reviewed Jun 2, 2025

View reviewed changes

pkg/provisioner/templates/kubernetes.go Outdated Show resolved Hide resolved

pkg/provisioner/provisioner.go Show resolved Hide resolved

fix kubernetes version setting on kubernetes template

4190ef6

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>

ArangoGutierrez force-pushed the v0212 branch from dd90892 to 4190ef6 Compare June 2, 2025 11:24

elezar reviewed Jun 2, 2025

View reviewed changes

pkg/provisioner/templates/kubernetes.go Outdated Show resolved Hide resolved

ArangoGutierrez requested review from Copilot and elezar June 2, 2025 12:00

Copilot AI reviewed Jun 2, 2025

View reviewed changes

pkg/provisioner/templates/kubernetes.go Show resolved Hide resolved

pkg/provisioner/templates/kubernetes.go Show resolved Hide resolved

pkg/provisioner/provisioner.go Show resolved Hide resolved

pkg/provisioner/provisioner.go Show resolved Hide resolved

Move microK8s version normalization to it's template

13fa2be

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>

ArangoGutierrez force-pushed the v0212 branch from 567ac25 to 13fa2be Compare June 2, 2025 12:06

shivakunv approved these changes Jun 2, 2025

View reviewed changes

ArangoGutierrez merged commit c1c5e8c into NVIDIA:main Jun 2, 2025
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalize retry/timeouts for kubernetes installation#385

Normalize retry/timeouts for kubernetes installation#385
ArangoGutierrez merged 3 commits intoNVIDIA:mainfrom
ArangoGutierrez:v0212

ArangoGutierrez commented Jun 2, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shivakunv left a comment

Uh oh!

Uh oh!

coveralls commented Jun 3, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

ArangoGutierrez commented Jun 2, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shivakunv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coveralls commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 15389641778

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

coveralls commented Jun 3, 2025 •

edited

Loading