Bug fixes after v0.2.8 release#368
Merged
ArangoGutierrez merged 6 commits intoNVIDIA:mainfrom May 26, 2025
Merged
Conversation
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
5868d15 to
d87b0a7
Compare
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
9e86a23 to
a33a293
Compare
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
a33a293 to
4fa270d
Compare
There was a problem hiding this comment.
Pull Request Overview
This pull request fixes several issues from the previous release by updating error handling in provisioning, simplifying the CLI, and improving provisioning and testing logic. Key changes include enhanced degraded condition handling in provisioning, updated installation and validation logic for NVIDIA drivers and Kubernetes resources, and removal of legacy flag usage in the delete command.
Reviewed Changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/e2e_test.go | Added SSH key loading and validation for end-to-end tests |
| tests/data/*.yml | Updated ingress IP ranges in AWS test configuration files |
| tests/aws_test.go | Added end-to-end test enhancements for AWS environments |
| pkg/provisioner/templates/nv-driver*.go | Updated NVIDIA driver installation to use nvidia-driver syntax and added module checks |
| pkg/provisioner/templates/kubernetes_test.go | Adjusted Kubernetes version comparisons and legacy initialization flags |
| pkg/provisioner/templates/kubernetes.go | Increased retry counts for Calico resource creation and updated legacy version check logic |
| pkg/provisioner/templates/kernel.go | Removed extraneous output after initiating reboot during kernel upgrade |
| internal/instances/instances.go & cmd/cli/list/list.go | Removed warnings for cache files without instance IDs |
| cmd/cli/delete/delete.go | Removed the envFile flag and associated processing |
| cmd/cli/create/create.go | Enhanced error handling by setting a degraded condition and updating the cache file on provisioning failure |
| .github/workflows/e2e.yaml | Modified E2E job to securely handle SSH keys via a temporary file |
Comments suppressed due to low confidence (2)
cmd/cli/create/create.go:204
- [nitpick] After updating the cache file with the degraded status, adding an informational log entry could improve traceability for provisioning failures.
if err = p.Run(opts.cfg); err != nil {
pkg/provisioner/templates/kubernetes_test.go:84
- The test now sets UseLegacyInit to true whereas it was previously false. Please confirm that this change in default behavior is intended and update documentation if necessary.
UseLegacyInit: true,
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces several changes across multiple areas of the codebase, these changes are fixes to changes introduced during the last release cut.
Error Handling and Status Updates:
runProvision(cmd/cli/create/create.go).getProviderStatusfunction to include the reason for a degraded status in the status message (internal/instances/instances.go).CLI Simplification:
envFileflag and its associated logic from thedeletecommand, simplifying the deletion process to rely solely on theinstance-idflag (cmd/cli/delete/delete.go).Provisioning Templates:
nvidia-persistenced) and improved driver validation (pkg/provisioner/templates/nv-driver.go).pkg/provisioner/templates/kubernetes.go). [1] [2]Testing Enhancements:
tests/aws_test.go).These changes collectively improve the robustness, usability, and maintainability of the codebase while enhancing the test coverage and reliability of the provisioning process.