[vm_set] Reduce the testbed-cli.sh start-vms time from 3 hours to 20 minutes#962
[vm_set] Reduce the testbed-cli.sh start-vms time from 3 hours to 20 minutes#962liat-grozovik merged 2 commits intosonic-net:masterfrom wangxin:start-vms
Conversation
The original approach starts and configures the VMs sequentially. It takes more than 3 hours to start 32 virtual machines. This change is to start all the VMs, then configure them one by one. With this change, starting 32 VMs needs around 30-40 minutes. Another change in this commit is to configure 'autostart' for VMs so that the VMs will automatically start running after host server is rebooted. Signed-off-by: Xin Wang <xinw@mellanox.com>
pavel-shirshov
left a comment
There was a problem hiding this comment.
Thank you for your contribution
-
Can you please make a batch size for starting vms. When I tested it some years ago, if I started more than 4 VMs at parallel, the whole host was stuck because of IO contention. So let's allow others to start VMs one by one, two at the time, and so one.
-
Please make autostart option optional. If someone needs it, she/he can set it enabled, otherwise the old behavior is kept.
|
@pavel-shirshov Thanks for your suggestions. I'll update and push a new commit. |
|
I added two enhancements in the new commit:
A better solution should have been used. For example, trigger the starting of a batch of VMs, then wait until this batch of VMs are started and do basic configurations on them (kickstart them). Then move on to the next batch. To use this solution, nested loop is inevitable. The outer loop is for the batches of VMs. The inner loop is for each VM in a batch. However, the ansible version (v2.0) used in sonic-mgmt has an issues blocks us from using nested loop: ansible/ansible#14146. So, I used a simple workaround: trigger the starting of a batch of VMs, then pause some time. After triggered the starting of all VMs, then kickstart them.
|
…minutes (#962) * [vm_set] Improve the start-vms performance The original approach starts and configures the VMs sequentially. It takes more than 3 hours to start 32 virtual machines. This change is to start all the VMs, then configure them one by one. With this change, starting 32 VMs needs around 30-40 minutes. Another change in this commit is to configure 'autostart' for VMs so that the VMs will automatically start running after host server is rebooted. Signed-off-by: Xin Wang <xinw@mellanox.com> * Add batch_size support, make autostart optional
) <!-- Please make sure you've read and understood our contributing guidelines; https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md Please provide following information to help code review process a bit easier: --> ### Description of PR <!-- - Please include a summary of the change and which issue is fixed. - Please also include relevant motivation and context. Where should reviewer start? background context? - List any dependencies that are required for this change. --> Skip v4 neighbor checks for v6 topo. Add new lookback_ipv6 fixture because IPv6 loopback IP is not directly used for route advertisement. Also add test_bgp_router_id_set_ipv6 for v6 topo only. Delete xfail and add skip for test_bgp_router_id_set/test_bgp_router_id_set_ipv6 based on v6/non-v6 topo. Summary: Fixes sonic-net#21454 ### Type of change <!-- - Fill x for your type of change. - e.g. - [x] Bug fix --> - [ ] Bug fix - [ ] Testbed and Framework(new/improvement) - [ ] New Test case - [ ] Skipped for non-supported platforms - [x] Test case improvement ### Back port request - [x] 202412 - [x] 202505 ### Approach #### What is the motivation for this PR? test_bgp_router_id.py failed on v6 topo #### How did you do it? Skip v4 neighbor checks for v6 topo Get correct IPv6 announced routes Delete xfail and add proper skip #### How did you verify/test it? The test passed after the fix #### Any platform specific information? #### Supported testbed topology if it's a new test case? ### Documentation <!-- (If it's a new feature, new test case) Did you update documentation/Wiki relevant to your implementation? Link to the wiki page? -->
Description of PR
Summary:
Fixes # (issue)
The original approach starts and configures the VMs sequentially.
It takes more than 3 hours to start 32 virtual machines. This
change is to start all the VMs, then configure them one by one.
With this change, starting 32 VMs needs around 20 minutes.
Another change in this commit is to configure 'autostart' for
VMs so that the VMs will automatically start running after host
server is rebooted.
Type of change
Approach
How did you do it?
Change the approach of starting and configuring VMs one by one to starting all the VMs in batch, then configuring them one by one.
After a VM is started, run "virsh autostart <VM_name>" to set it to autostart.
How did you verify/test it?
Verified in Mellanox lab.
Any platform specific information?
NA
Supported testbed topology if it's a new test case?
Documentation