Skip to content

Add retry logic to fix the server terminate failed issue#94

Merged
nywilken merged 4 commits intohashicorp:mainfrom
zhenggu:main
Apr 13, 2023
Merged

Add retry logic to fix the server terminate failed issue#94
nywilken merged 4 commits intohashicorp:mainfrom
zhenggu:main

Conversation

@zhenggu
Copy link
Copy Markdown
Contributor

@zhenggu zhenggu commented Feb 22, 2023

In our openstack environment, we randomly get the error while terminating the server on the end of packer build.
It causes many useless VMs existing in our openstack.
The error is like this:
error
The following is the stack trace from OCP service side when the issue happened.

2023-02-08 03:07:36.912 5493 ERROR nova.api.openstack.extensions [req-965cda1d-a06d-42f6-ade4-4067bcf951ad f2540ed3657741ad9485a440c24aa45a f361dec2c1bd43789b6f5a96f3f252ba - default default] Unexpected exception in API method
2023-02-08 03:07:36.912 5493 ERROR nova.api.openstack.extensions Traceback (most recent call last):
2023-02-08 03:07:36.912 5493 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/nova/api/openstack/extensions.py", line 338, in wrapped
2023-02-08 03:07:36.912 5493 ERROR nova.api.openstack.extensions     return f(*args, **kwargs)
2023-02-08 03:07:36.912 5493 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/nova/api/openstack/compute/deferred_delete.py", line 64, in _force_delete
2023-02-08 03:07:36.912 5493 ERROR nova.api.openstack.extensions     self.compute_api.force_delete(context, instance)
2023-02-08 03:07:36.912 5493 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/nova/compute/api.py", line 166, in inner
2023-02-08 03:07:36.912 5493 ERROR nova.api.openstack.extensions     return function(self, context, instance, *args, **kwargs)
2023-02-08 03:07:36.912 5493 ERROR nova.api.openstack.extensions   File "/usr/lib/python2.7/site-packages/nova/compute/api.py", line 139, in inner
2023-02-08 03:07:36.912 5493 ERROR nova.api.openstack.extensions     method=f.__name__)
2023-02-08 03:07:36.912 5493 ERROR nova.api.openstack.extensions InstanceInvalidState: Instance 983e2da3-fcaa-45cb-ba2d-1aa21bf9ec37 in task_state image_uploading. Cannot force_delete while the instance is in this state. 

According to the logic of packer-plugin-openstack, when the status of the image has become active, then packer will try to remove the temporary VM immediately, if the remove action fails, then the error will happen.

From the stack trace, when the status of the image has been changed to active, the image uploading task of the VM hasn't been finished.

Then the VM delete action will be failed, and OCP service will report: Internal Server Error, and the HTTP error code is 500.

In this PR, a retry logic has been added to detect whether it meets Internal Server Error, if so, it will have a retry up to 20 seconds.

@zhenggu zhenggu requested a review from a team as a code owner February 22, 2023 06:50
@hashicorp-cla
Copy link
Copy Markdown

hashicorp-cla commented Feb 22, 2023

CLA assistant check
All committers have signed the CLA.

@zhenggu
Copy link
Copy Markdown
Contributor Author

zhenggu commented Mar 31, 2023

Anyone can help to have a review? One month has been passed.

Copy link
Copy Markdown
Contributor

@nywilken nywilken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhenggu apologies for the delayed review here. I had time to look over the changes and left a small suggestion to the code changes. Otherwise I think this looks good to go. Please let me know your thoughts and if there are any issues with the suggested code.

If all works as expected I will merge the changes once updated and cut an immediate release.

@zhenggu
Copy link
Copy Markdown
Contributor Author

zhenggu commented Apr 13, 2023

Hi @nywilken, I have refactored the code as your suggestion, and it works fine, the following is the packer log on this case, please help to approve it.

==> openstack: Terminating the source server: d74fcb01-fd63-4f4c-b0ff-3fe50eef8a1e ...
2023/04/13 14:44:10 packer-builder-openstack plugin: Error terminating server on (1) time(s): Internal Server Error, retrying ...
2023/04/13 14:44:23 packer-builder-openstack plugin: Waiting for state to become: [DELETED]
2023/04/13 14:44:25 packer-builder-openstack plugin: Waiting for state to become: [DELETED] currently SHUTOFF (0%)
2023/04/13 14:44:30 packer-builder-openstack plugin: Waiting for state to become: [DELETED] currently SHUTOFF (0%)
2023/04/13 14:44:34 packer-builder-openstack plugin: Waiting for state to become: [DELETED] currently SHUTOFF (0%)
2023/04/13 14:44:38 packer-builder-openstack plugin: Waiting for state to become: [DELETED] currently SHUTOFF (0%)
2023/04/13 14:44:42 packer-builder-openstack plugin: Waiting for state to become: [DELETED] currently SHUTOFF (0%)
2023/04/13 14:44:46 packer-builder-openstack plugin: Waiting for state to become: [DELETED] currently SHUTOFF (0%)
2023/04/13 14:44:50 packer-builder-openstack plugin: Waiting for state to become: [DELETED] currently SHUTOFF (0%)
2023/04/13 14:44:55 packer-builder-openstack plugin: Waiting for state to become: [DELETED] currently SHUTOFF (0%)
2023/04/13 14:44:59 packer-builder-openstack plugin: [INFO] 404 on ServerStateRefresh, returning DELETED
==> openstack: Deleting temporary keypair: packer_643799ab-4e31-f28e-a141-35125c5e0453 ...
Build 'openstack' finished after 48 minutes 26 seconds.

Copy link
Copy Markdown
Contributor

@nywilken nywilken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @zhenggu. I'll merge and release once all goes green. Cheers!

@nywilken nywilken added the bug label Apr 13, 2023
@nywilken nywilken changed the title [bug fix] Add retry logic to fix the server terminate failed issue Add retry logic to fix the server terminate failed issue Apr 13, 2023
@nywilken nywilken merged commit 5caf967 into hashicorp:main Apr 13, 2023
@nywilken
Copy link
Copy Markdown
Contributor

I fixed the goimports lint check and merged directly. Thank you again.

@zhenggu
Copy link
Copy Markdown
Contributor Author

zhenggu commented May 15, 2023

@nywilken https://github.com/hashicorp/packer it still uses the old version, should I open a PR for that? Or you will include this fix in the next release?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants