Skip to content

Remove the broken auto_delete_disk system, and replace it with a working snapshot-based alternative, in the NFS Server module#3887

Merged
tpdownes merged 4 commits into
GoogleCloudPlatform:developfrom
ndebuhr:fix/nfs-disk-auto-deletion
Apr 24, 2025
Merged

Remove the broken auto_delete_disk system, and replace it with a working snapshot-based alternative, in the NFS Server module#3887
tpdownes merged 4 commits into
GoogleCloudPlatform:developfrom
ndebuhr:fix/nfs-disk-auto-deletion

Conversation

@ndebuhr

@ndebuhr ndebuhr commented Apr 5, 2025

Copy link
Copy Markdown
Contributor

The auto_delete parameter (and therefore auto_delete_disk variable) within the NFS instance's boot_disk block is only effective when the instance implicitly creates the boot disk (i.e., disk properties are defined directly within the instance block using initialize_params). Even with auto_delete=false, the separate google_compute_disk resource goes through it's own destroy lifecycle and the disk is deleted - so ultimately, the variable currently does nothing.

This PR addresses reproducible issue #3566, which was introduced as a regression in bb47067

This PR addresses the auto_delete issue, and more broadly expands the data retention options by:

  1. Removing the non-functioning auto-delete parameterization
  2. Implementing an alternative data protection system, for both the boot disk and the data disk, via create_snapshot_before_destroy and create_boot_snapshot_before_destroy

Design notes:

  1. Retained the existing, heavily-customized boot disk system (i.e., the null_resource, custom lifecycle management, and dedicated boot compute_disk specification), which blocks the ability to use the boot disk auto_delete control.
  2. While there's no working functionality to deprecate, the auto_delete_disk variable is nevertheless deprecated, rather than simply removed, to inform users of the snapshot-before-destroy alternative (encouraging proactive config changes to avoid accidental data deletion events). Terraform updates to add snapshot-before-destroy are quick and in-place.
  3. While the auto_delete=false does not impact deletion behavior due to the google_compute_disk's separate destroy lifecycle, it is important to enable the snapshot-before-destroy feature. If the disk is deleted in connection with the VM instance (i.e., auto_delete=true), then the snapshot does not trigger. If the disk is deleted during it's own dedicated destroy lifecycle (i.e., auto_delete=false), then the snapshot does trigger. So the auto_delete has no impact on module deletion behavior, but it needs to be hard-coded as auto_delete=false for snapshot enablement.

Manually tested a variety of scenarios with Terraform v1.11.3 and Packer v1.12.0

  • ✅ Fork your PR branch from the Toolkit "develop" branch (not main)
  • ✅ Test all changes with pre-commit in a local branch #
  • ✅ Confirm that "make tests" passes all tests
  • ✅ Add or modify unit tests to cover code changes
  • ✅ Ensure that unit test coverage remains above 80%
  • ✅ Update all applicable documentation
  • ✅ Follow Cluster Toolkit Contribution guidelines #

@ndebuhr ndebuhr requested review from a team and samskillman as code owners April 5, 2025 04:21
@SwarnaBharathiMantena SwarnaBharathiMantena added the release-module-improvements Added to release notes under the "Module Improvements" heading. label Apr 17, 2025

@SwarnaBharathiMantena SwarnaBharathiMantena left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SwarnaBharathiMantena

Copy link
Copy Markdown
Contributor

/gcbrun

@tpdownes tpdownes left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ndebuhr due to a bug in the provider, this would be a breaking change for all Terraform Google provider plugins below 6.14.0. Please modify versions.tf accordingly.

My preference would be moved block that puts the boot disk management entirely within the google_compute_instance resource but Terraform does not support re-factoring of a resource into an attribute within another resource.

…ute instance recreation during auto_delete boot disk changes (versions <6.14)

@tpdownes tpdownes left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you for the submission and response to change requests!

@tpdownes

Copy link
Copy Markdown
Contributor

/gcbrun

@tpdownes tpdownes enabled auto-merge April 23, 2025 16:34
@tpdownes

Copy link
Copy Markdown
Contributor

/gcbrun

@tpdownes tpdownes merged commit 22c3b57 into GoogleCloudPlatform:develop Apr 24, 2025
@ndebuhr ndebuhr deleted the fix/nfs-disk-auto-deletion branch April 24, 2025 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-module-improvements Added to release notes under the "Module Improvements" heading.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants