Skip to content

Mount disabled storage pool on host reboot#6164

Merged
nvazquez merged 4 commits intoapache:mainfrom
ravening:mount-disabled-pool
Apr 2, 2022
Merged

Mount disabled storage pool on host reboot#6164
nvazquez merged 4 commits intoapache:mainfrom
ravening:mount-disabled-pool

Conversation

@ravening
Copy link
Copy Markdown
Member

Description

Add a global setting so that disabled pools will be mounted
again on host reboot

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

  1. Disable the storage pool
  2. Set the global setting mount.disabled.storage.pool to true
  3. The pool will still be mounted in the host.
  4. Reboot the host.
  5. The pool will still be mounted
  6. Change the setting to false
  7. Reboot the host
  8. The disabled pool will not be mounted

@nvazquez
Copy link
Copy Markdown
Contributor

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@nvazquez a Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 2988

@nvazquez
Copy link
Copy Markdown
Contributor

@blueorangutan package

@nvazquez nvazquez added this to the 4.17.0.0 milestone Mar 28, 2022
Copy link
Copy Markdown
Contributor

@nvazquez nvazquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ravening have left some minor comments after the refactor

Rakesh Venkatesh added 4 commits March 30, 2022 09:56
Add a global setting so that disabled pools will be mounted
again on host reboot
@ravening ravening force-pushed the mount-disabled-pool branch from d113178 to ac531a0 Compare March 30, 2022 07:59
@weizhouapache
Copy link
Copy Markdown
Member

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@weizhouapache a Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 3023

@nvazquez
Copy link
Copy Markdown
Contributor

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@nvazquez a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

Copy link
Copy Markdown
Contributor

@nvazquez nvazquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-3763)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 29588 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr6164-t3763-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_kubernetes_clusters.py

@weizhouapache
Copy link
Copy Markdown
Member

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@weizhouapache a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

Copy link
Copy Markdown
Member

@weizhouapache weizhouapache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code lgtm

@nvazquez
Copy link
Copy Markdown
Contributor

nvazquez commented Apr 1, 2022

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@nvazquez a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-3792)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 38073 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr6164-t3792-kvm-centos7.zip
Smoke tests completed. 90 look OK, 0 have errors
Only failed tests results shown below:

Test Result Time (s) Test File

@nvazquez nvazquez merged commit b88cfc2 into apache:main Apr 2, 2022
@wido
Copy link
Copy Markdown
Contributor

wido commented Apr 5, 2022

I noticed this PR a bit late. Great PR and good catch, however.... (There is always a however!)

The API: https://cloudstack.apache.org/api/apidocs-4.16/apis/updateStoragePool.html

enabled | false to disable the pool for allocation of new volumes, true to enable it back.

The enabled flag only tells if the storage pool should be considered for the allocation of new volumes. It is not intended to be use for not being mounted on hosts. Thats where we have the maintenance mode of a storage pool for.

The global setting (mount.disabled.storage.pool defaults to false) which we introduce actually keeps the bug in tact but we can influence the bug by manually changing a config setting.

I would say that we should not add this config value and always 'mount' disabled storage pools as that's how it was designed.

@nvazquez
Copy link
Copy Markdown
Contributor

nvazquez commented Apr 5, 2022

Hi @wido thanks for your input - @ravening @weizhouapache what's your opinion?

@ravening
Copy link
Copy Markdown
Member Author

ravening commented Apr 5, 2022

@wido @nvazquez Yes we always wanted the disabled pools to be mounted by default even after the node reboot. I didnt want to change the existing behaviour as it might break the functionality for others. Thats the reason, I added the global setting.

If everyone agrees that the disabled pools should be mounted by default then I can remove the global setting and always try to mount it

@wido
Copy link
Copy Markdown
Contributor

wido commented Apr 6, 2022

@ravening and @nvazquez The API docs are clear. Disabling a pool only avoids it for allocation of new volumes. If you don't want the pool to be mounted you can put it in maintenance mode.

The bug right now is that a disabled storage pool is not mounted and thus prevents VMs from being started on a hypervisor/host after it was rebooted. That's a true bug.

I would say we need to get rid of the global setting and stick to the workings intended and also documented in the API.

@ravening
Copy link
Copy Markdown
Member Author

ravening commented Apr 6, 2022

@ravening and @nvazquez The API docs are clear. Disabling a pool only avoids it for allocation of new volumes. If you don't want the pool to be mounted you can put it in maintenance mode.

The bug right now is that a disabled storage pool is not mounted and thus prevents VMs from being started on a hypervisor/host after it was rebooted. That's a true bug.

I would say we need to get rid of the global setting and stick to the workings intended and also documented in the API.

@wido we learnt in a hard way that api docs are always not reliable. So I didn't want to change any code without altering the behavior existing in master

I know it's actually a bug but as mentioned earlier, if I get a majority vote to remove global setting then I will do so

@wido
Copy link
Copy Markdown
Contributor

wido commented Apr 6, 2022

@ravening and @nvazquez The API docs are clear. Disabling a pool only avoids it for allocation of new volumes. If you don't want the pool to be mounted you can put it in maintenance mode.
The bug right now is that a disabled storage pool is not mounted and thus prevents VMs from being started on a hypervisor/host after it was rebooted. That's a true bug.
I would say we need to get rid of the global setting and stick to the workings intended and also documented in the API.

@wido we learnt in a hard way that api docs are always not reliable. So I didn't want to change any code without altering the behavior existing in master

I know it's actually a bug but as mentioned earlier, if I get a majority vote to remove global setting then I will do so

We actually learned this the hard way earlier this week. An operator disabled a (Ceph) storage pool as we wanted new volumes to be allocated on a different Ceph cluster. After a hypervisor reboot VMs were stuck in the 'Starting' state and it took a while to figure out how and what.

If we keep the current behavior then what's the difference between Disabled and Maintenance?

@ravening
Copy link
Copy Markdown
Member Author

ravening commented Apr 6, 2022

@weizhouapache @nvazquez @rohityadavcloud @DaanHoogland are you guys ok with removing the global setting and making the default behaviour as mounting the disabled pools all the time?

@yadvr
Copy link
Copy Markdown
Member

yadvr commented Apr 6, 2022

I would want to keep the global setting but keep the default behaviour to be what is either generally accepted or what was previously happening (i.e. for backward compatibility). Since the PR has been merged @ravening do you mind raising a new PR (if necessary?) or at least start a thread on dev@ as many people may miss your msg.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

No open projects
Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants