Skip to content

Forces VMs to be replaced when placement policy is destroyed#1849

Merged
nick-stroud merged 2 commits into
GoogleCloudPlatform:developfrom
nick-stroud:vm-instance-replace-for-placement
Nov 17, 2023
Merged

Forces VMs to be replaced when placement policy is destroyed#1849
nick-stroud merged 2 commits into
GoogleCloudPlatform:developfrom
nick-stroud:vm-instance-replace-for-placement

Conversation

@nick-stroud

@nick-stroud nick-stroud commented Oct 18, 2023

Copy link
Copy Markdown
Collaborator

Tested:

  • Deploy VM with no placement
  • Add placement with max_distance: 1; re-deploy
  • Edit max_distance from 1 -> 2; re-deploy
  • remove placement; re-deploy

Each time it recognizes that VM must be destroyed.

Note: If one does not turn auto_delete_boot_disk: false you will still get an error on second apply as it will not find disk. This can be worked around by doing another apply. This disk issue is a known pain point that is unrelated to the placement policy error.

Submission Checklist

Please take the following actions before submitting this pull request.

  • Fork your PR branch from the Toolkit "develop" branch (not main)
  • Test all changes with pre-commit in a local branch #
  • Confirm that "make tests" passes all tests
  • Add or modify unit tests to cover code changes
  • Ensure that unit test coverage remains above 80%
  • Update all applicable documentation
  • Follow Cloud HPC Toolkit Contribution guidelines #

@nick-stroud nick-stroud added the release-chore To not include into release notes label Oct 18, 2023
@nick-stroud nick-stroud requested a review from tpdownes October 18, 2023 05:00

@tpdownes tpdownes left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I trust that this works but I think google_compute_resource_policy.placement_policy might be a list/array all by itself and replace_triggered_by takes a list. So I think the solution might be a bit fragile. Can you wrap the array with flatten and repeat test?

@tpdownes tpdownes assigned nick-stroud and unassigned tpdownes Oct 19, 2023
@nick-stroud

Copy link
Copy Markdown
Collaborator Author

Can you wrap the array with flatten and repeat test?

I tried but I get Unexpected expression found in replace_triggered_by. or A static list expression is required..

From poking around online it looks like replace_triggered_by requires "static references" and cannot accept an expression.

Additionally the documentation for replace_triggered_by states:

If the reference is to a resource with multiple instances, a plan to update or replace any instance will trigger replacement.

So I think they have accounted for it to take a reference to a array resource, as is in this case.

I propose to leave it as is.

@nick-stroud nick-stroud assigned tpdownes and unassigned nick-stroud Oct 19, 2023
@tpdownes tpdownes self-requested a review October 19, 2023 18:17
tpdownes
tpdownes previously approved these changes Oct 19, 2023

@tpdownes tpdownes left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not entirely sure how to read

If the reference is to a resource with multiple instances, a plan to update or replace any instance will trigger replacement.

but reading it again, it does feel like yours is the correct interpretation.

@tpdownes tpdownes assigned nick-stroud and unassigned tpdownes Oct 19, 2023
@nick-stroud

Copy link
Copy Markdown
Collaborator Author

Spot check test failed in way that makes it seem that standard no placement policy case does not work, although this was tested manually.

I am dismissing approval so this doesn't get merged until I have a chance to debug.

@nick-stroud nick-stroud dismissed tpdownes’s stale review October 23, 2023 16:54

See last comment.

@nick-stroud nick-stroud force-pushed the vm-instance-replace-for-placement branch from 4319310 to 3cffbeb Compare November 17, 2023 00:07
@nick-stroud

Copy link
Copy Markdown
Collaborator Author

While it looks like replace_triggered_by can handle an array of instances, it does not seem to handle the case when count=0. I have updated to rely on a null_resource as a proxy for a trigger. I am dubious of this methodology but wanted to put something forward that is at least working as expected.

@nick-stroud nick-stroud assigned tpdownes and unassigned nick-stroud Nov 17, 2023
@tpdownes tpdownes self-requested a review November 17, 2023 17:44

@tpdownes tpdownes left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you are doing here. I think the most direct route would be this:

  1. Create a random suffix and have its lifecycle managed via the same null_resource you've created.
  2. Apply that random suffix to the name of the resource policy
  3. Remove the lifecycle from the VM

You can consider this at nit level but I think the basic problem is that the self_link of the resource policy isn't changing and you probably should change it if you're changing its properties.

@tpdownes tpdownes assigned nick-stroud and unassigned tpdownes Nov 17, 2023
@nick-stroud

Copy link
Copy Markdown
Collaborator Author

I cannot get rid of the lifecycle on the VM. I get the following error:

Error: Error adding resource policies: googleapi: Error 400: Invalid resource usage: 'Instance must be in stopped state'., invalidResourceUsage

It looks like it still tries to modify the resource policy on the live VM instead of destroying the VM.

@nick-stroud nick-stroud merged commit 5939b60 into GoogleCloudPlatform:develop Nov 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-chore To not include into release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants