Skip to content

[VMware] Make disk controller selection on volume attachment consistent with VM creation and start#9636

Merged
JoaoJandre merged 4 commits intoapache:4.19from
scclouds:fix-volume-attachment-inconsistency
Sep 19, 2024
Merged

[VMware] Make disk controller selection on volume attachment consistent with VM creation and start#9636
JoaoJandre merged 4 commits intoapache:4.19from
scclouds:fix-volume-attachment-inconsistency

Conversation

@winterhazel
Copy link
Copy Markdown
Member

@winterhazel winterhazel commented Sep 4, 2024

Description

When both rootDiskController and dataDiskController details are set to SCSI controllers, the VM creation and start processes prioritize creating and using the controllers specified by rootDiskController for all volumes. However, the volume attachment process does not consider this and always tries to attach volumes using the controllers specified for the data disks (even if a root disk is being attached).

This way, in a scenario where rootDiskController is set to pvscsi and dataDiskController is set to lsilogic (one example, as this problem also happens for other combinations) we ignore the dataDiskController setting defined by the operator and choose to use only PVSCSI controllers for both the root and data disks on VM creation/start. Then, when a user tries to attach a new data disk, we only consider dataDiskController and try to attach with LSI Logic instead. However, the process fails, as there are no LSI Logic controllers because we ignored the data disk controller chosen by the operator and opted to create PVSCSI controllers for both root and data disks when creating/starting the VM. Also, if we detach the root disk and try to attach it again, the process also fails because CloudStack tries to attach it using the controller specified for the data disks.

To solve this issue, this PR makes the volume attachment disk controller selection logic consistent with VM creation/start. Tests will be added in another PR alongside an extension for the disk controllers that I'm currently working on, as it will require me to modify this code again.

Fixes #9400

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

For each of the following rootDiskController/dataDiskController combinations:

  1. I deployed a VM that had 1 root disk and 1 data disk, and verified which disk controllers were being used for them;
  2. I attached a data disk, and verified which disk controller was used;
  3. I detached the root disk and attached it again, and verified which disk controller was used.

Below are my results.

Configuration Initial root disk Initial data disk (re)Attached root disk Attached data disk
pvscsi/osdefault PVSCSI PVSCSI PVSCSI PVSCSI
osdefault/pvscsi LSI Logic LSI Logic LSI Logic LSI Logic
osdefault/osdefault LSI Logic LSI Logic LSI Logic LSI Logic
lsilogic/pvscsi LSI Logic LSI Logic LSI Logic LSI Logic
pvscsi/lsilogic PVSCSI PVSCSI PVSCSI PVSCSI
ide/ide IDE IDE IDE IDE
ide/pvscsi IDE PVSCSI IDE, but see [1] PVSCSI
lsilogic/lsilogic LSI Logic LSI Logic LSI Logic LSI Logic
ide/osdefault IDE LSI Logic IDE, but see [1] LSI Logic
osdefault/ide LSI Logic IDE LSI Logic IDE, but see [1]

[1] CloudStack tried to attach using an IDE controller. However, an exception happened because it ignored the CDRom when calculating the next available unit number, and tried to attach the disk on an unit number that was already being used. I will fix this issue separately in another PR.

@winterhazel
Copy link
Copy Markdown
Member Author

@blueorangutan package

@codecov
Copy link
Copy Markdown

codecov bot commented Sep 4, 2024

Codecov Report

Attention: Patch coverage is 0% with 58 lines in your changes missing coverage. Please review.

Project coverage is 15.08%. Comparing base (0204cb7) to head (e9d3682).
Report is 25 commits behind head on 4.19.

Files with missing lines Patch % Lines
...com/cloud/hypervisor/vmware/util/VmwareHelper.java 0.00% 37 Missing ⚠️
...oud/hypervisor/vmware/resource/VmwareResource.java 0.00% 9 Missing ⚠️
...cloud/storage/resource/VmwareStorageProcessor.java 0.00% 8 Missing ⚠️
...oud/hypervisor/vmware/mo/HypervisorHostHelper.java 0.00% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               4.19    #9636      +/-   ##
============================================
- Coverage     15.08%   15.08%   -0.01%     
- Complexity    11185    11188       +3     
============================================
  Files          5406     5406              
  Lines        473123   473229     +106     
  Branches      58163    60938    +2775     
============================================
+ Hits          71371    71376       +5     
- Misses       393810   393907      +97     
- Partials       7942     7946       +4     
Flag Coverage Δ
uitests 4.30% <ø> (-0.01%) ⬇️
unittests 15.79% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@DaanHoogland
Copy link
Copy Markdown
Contributor

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 10968

@yadvr yadvr added this to the 4.19.2.0 milestone Sep 5, 2024
@yadvr
Copy link
Copy Markdown
Member

yadvr commented Sep 5, 2024

@blueorangutan test ol8 vmware-70u3

@blueorangutan
Copy link
Copy Markdown

@rohityadavcloud a [SL] Trillian-Jenkins test job (ol8 mgmt + vmware-70u3) has been kicked to run smoke tests

Copy link
Copy Markdown
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

@blueorangutan
Copy link
Copy Markdown

[SF] Trillian test result (tid-11353)
Environment: vmware-70u3 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 51739 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9636-t11353-vmware-70u3.zip
Smoke tests completed. 125 look OK, 8 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_01_events_resource Error 131.67 test_events_resource.py
test_attach_and_distribute_multiple_volumes Error 21.19 test_attach_multiple_volumes.py
test_attach_multiple_volumes Failure 8.87 test_attach_multiple_volumes.py
test_01_vm_with_thin_disk_offering Error 127.60 test_disk_provisioning_types.py
test_02_vm_with_fat_disk_offering Error 194.53 test_disk_provisioning_types.py
test_03_vm_with_sparse_disk_offering Error 134.86 test_disk_provisioning_types.py
ContextSuite context=TestListVolumes>:setup Error 0.00 test_list_volumes.py
test_01_deploy_vms_storage_tags Error 11.17 test_primary_storage.py
test_01_volume_usage Error 88.39 test_usage.py
test_02_offline_migrate_VM_with_two_data_disks Error 54.90 test_vm_life_cycle.py
test_03_live_migrate_VM_with_two_data_disks Error 58.85 test_vm_life_cycle.py
test_04_migrate_detached_volume Error 54.66 test_vm_life_cycle.py
test_11_destroy_vm_and_volumes Error 47.22 test_vm_life_cycle.py
test_12_start_vm_multiple_volumes_allocated Error 80.16 test_vm_life_cycle.py
test_12_start_vm_multiple_volumes_allocated Error 80.17 test_vm_life_cycle.py
test_01_create_volume Error 4.47 test_volumes.py
test_10_list_volumes Failure 371.35 test_volumes.py
test_13_migrate_volume_and_change_offering Error 18.85 test_volumes.py
ContextSuite context=TestVolumes>:teardown Error 28.28 test_volumes.py

…areHelper.java

Co-authored-by: dahn <daan.hoogland@gmail.com>
@DaanHoogland
Copy link
Copy Markdown
Contributor

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 11031

@DaanHoogland
Copy link
Copy Markdown
Contributor

@blueorangutan test ol8 vmware-70u3

@blueorangutan
Copy link
Copy Markdown

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + vmware-70u3) has been kicked to run smoke tests

@blueorangutan
Copy link
Copy Markdown

[SF] Trillian test result (tid-11410)
Environment: vmware-70u3 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 94986 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9636-t11410-vmware-70u3.zip
Smoke tests completed. 121 look OK, 12 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_01_events_resource Error 138.67 test_events_resource.py
test_attach_and_distribute_multiple_volumes Error 22.31 test_attach_multiple_volumes.py
test_attach_multiple_volumes Failure 16.01 test_attach_multiple_volumes.py
test_01_vm_with_thin_disk_offering Error 138.96 test_disk_provisioning_types.py
test_02_vm_with_fat_disk_offering Error 157.73 test_disk_provisioning_types.py
test_03_vm_with_sparse_disk_offering Error 140.81 test_disk_provisioning_types.py
ContextSuite context=TestListVolumes>:setup Error 0.00 test_list_volumes.py
test_list_vms_metrics_admin Error 3622.91 test_metrics_api.py
test_list_vms_metrics_history Error 7.74 test_metrics_api.py
test_list_volumes_metrics_history Error 3615.50 test_metrics_api.py
test_01_deploy_vms_storage_tags Error 9.12 test_primary_storage.py
test_01_restore_vm Error 3614.75 test_restore_vm.py
test_02_restore_vm_allocated_root Error 20.20 test_restore_vm.py
ContextSuite context=TestRestoreVM>:teardown Error 32.71 test_restore_vm.py
ContextSuite context=TestVMWareStoragePolicies>:setup Error 0.00 test_storage_policy.py
test_01_volume_usage Error 88.20 test_usage.py
test_01_deploy_vm_on_specific_host Error 17.77 test_vm_deployment_planner.py
test_02_deploy_vm_on_specific_cluster Error 3602.91 test_vm_deployment_planner.py
test_03_deploy_vm_on_specific_pod Error 1.34 test_vm_deployment_planner.py
test_04_deploy_vm_on_host_override_pod_and_cluster Error 2.38 test_vm_deployment_planner.py
test_05_deploy_vm_on_cluster_override_pod Error 2.37 test_vm_deployment_planner.py
test_01_offline_migrate_VM_and_root_volume Error 5921.32 test_vm_life_cycle.py
test_02_offline_migrate_VM_with_two_data_disks Error 62.46 test_vm_life_cycle.py
test_03_live_migrate_VM_with_two_data_disks Error 51.60 test_vm_life_cycle.py
test_04_migrate_detached_volume Error 53.48 test_vm_life_cycle.py
test_11_destroy_vm_and_volumes Error 46.96 test_vm_life_cycle.py
test_12_start_vm_multiple_volumes_allocated Error 83.21 test_vm_life_cycle.py
test_12_start_vm_multiple_volumes_allocated Error 83.22 test_vm_life_cycle.py
test_01_create_volume Error 5.43 test_volumes.py
test_10_list_volumes Failure 371.35 test_volumes.py

@winterhazel
Copy link
Copy Markdown
Member Author

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@winterhazel a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 11061

@DaanHoogland
Copy link
Copy Markdown
Contributor

@blueorangutan test ol8 vmware-70u3

@blueorangutan
Copy link
Copy Markdown

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + vmware-70u3) has been kicked to run smoke tests

@blueorangutan
Copy link
Copy Markdown

[SF] Trillian test result (tid-11440)
Environment: vmware-70u3 (x2), Advanced Networking with Mgmt server ol8
Total time taken: 48115 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9636-t11440-vmware-70u3.zip
Smoke tests completed. 133 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

@DaanHoogland
Copy link
Copy Markdown
Contributor

@winterhazel maybe some test code could be added? integration or unit. @JoaoJandre at your descretion.

Copy link
Copy Markdown
Contributor

@JoaoJandre JoaoJandre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JoaoJandre
Copy link
Copy Markdown
Contributor

@winterhazel maybe some test code could be added? integration or unit. @JoaoJandre at your descretion.

Some unit testing would be nice. @winterhazel

@winterhazel
Copy link
Copy Markdown
Member Author

Hey @DaanHoogland @JoaoJandre

As I stated in the description, I am planning to add tests in another PR alongside an extension for the disk controllers that I'm working on, as it will require me to make changes to both this code and to the tests again. However, if you guys prefer it, I can try adding them here already next week.

@JoaoJandre
Copy link
Copy Markdown
Contributor

Hey @DaanHoogland @JoaoJandre

As I stated in the description, I am planning to add tests in another PR alongside an extension for the disk controllers that I'm working on, as it will require me to make changes to both this code and to the tests again. However, if you guys prefer it, I can try adding them here already next week.

Sorry @winterhazel, I totally missed that part of the description.

@JoaoJandre JoaoJandre merged commit 075f981 into apache:4.19 Sep 19, 2024
dhslove pushed a commit to ablecloud-team/ablestack-cloud that referenced this pull request Sep 23, 2024
…nt with VM creation and start (apache#9636)

* Make volume attachment disk controller selection consistent with VM creation and start

* Update vmware-base/src/main/java/com/cloud/hypervisor/vmware/util/VmwareHelper.java

Co-authored-by: dahn <daan.hoogland@gmail.com>

* Choose disk controllers after converting osdefault

* Rename function

---------

Co-authored-by: dahn <daan.hoogland@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants