Skip to content

[202311][dhcp_relay] Cherry-pick dhcp_relay changes#14740

Closed
yaqiangz wants to merge 586 commits intosonic-net:masterfrom
yaqiangz:azure-202311_cherry_pick
Closed

[202311][dhcp_relay] Cherry-pick dhcp_relay changes#14740
yaqiangz wants to merge 586 commits intosonic-net:masterfrom
yaqiangz:azure-202311_cherry_pick

Conversation

@yaqiangz
Copy link
Copy Markdown
Contributor

Description of PR

Summary:
Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405

Approach

What is the motivation for this PR?

Manually cherry-pick these 2 pr because conflicts
#14012
#14641

How did you do it?

Manually cherry-pick these 2 pr because conflicts
#14012
#14641

How did you verify/test it?

PR tests

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

liuh-80 and others added 30 commits May 20, 2024 12:41
…12819)

Improve TACACS run command on IPV6 failed issue.

#### Why I did it
Tacplus server crash when receive authorization request from IPV6 address.

#### How I did it
Check TACACS server status after run command with IPV6 address.

#### How to verify it
Pass all test case.

#### Description for the changelog
Improve TACACS run command on IPV6 failed issue.
Log message wrapper:
All the messages were outputted to PTF logging by default, and can set flag to output to stderr of PTF console for specific message.
so it can avoid lots of message flush on console and "test summary", easy to identify failure when triage.
and also can check PTF logging which include all the message when rootcause failure.

CounterCollector Class
provide general interface for counter collecting, comparing, and displaying.

Diagnostic Counter Wrapper
so far, we can read 8 kinds of counter:
port_counter, queue_counter_counter, queue_share_wm_counter, pg_share_wm_counter, pg_headroom_wm_counter, pg_counter_couner, pg_drop_counter and ptf_tx_rx_counter

Although CounterCollector provides a common API to collect, compare and display these counters, if you use countercollect directly, the code of the test case will still become confusing. After all, at least one line of code for each counter.
If the types of counter queries are subsequently increased, more code unrelated to the test steps will be exposed in the testcase.

Therefore, the diag coutner wrapper is used to include all types of counter activities, so that the code in the test case is more inclined to reflect the test steps and logic rather than these diagnostic codes.

assert wrapper
By default, we will display the counter difference between the first and last step of this case on both normal and abnormal exits.
but using python build-in assert instruction make it difficult to show counter diff.
so we implement a assert wrapper to show counter diff when assert exception occur.

TextTable Class
This is not newly added class, in befor, it help to output counters in table format like well-known python library prettytable.
in this PR, add a new class static method "merge_table())" to merge two table which need to show their difference.

example case:
not applied this feature to all qos testcase.
only applied above changes to xoff, xon, lossyqueue cases as a example first. Monitor for long time to collect the feedback, and then enhance.

already cover various sku/topo
see below test record table

skip chassis device
since test have not covered chassis yet, skip chassis device support so far.

How did you verify/test it?
pass verification in lab testbed
Fix ro disk test case generate garbled syslog and break loganalyzer issue.

#### Why I did it
Log rotate during ro disk may cause syslog file contains garbled characters.
These characters will break loganalyzer, to fix this issue, rotate again to cleanup syslog file.

#### How I did it
log rotate again after ro disk test case finish,

#### How to verify it
Pass all test case.

#### Description for the changelog
Fix ro disk test case generate garbled syslog and break loganalyzer issue.
…-net#12851)

Explicit zeros in prefix_v6 value causes issue in route_check. Need to use simplified form instead
…ds (sonic-net#12784)



Co-authored-by: bingwang-ms <66248323+bingwang-ms@users.noreply.github.com>
What is the motivation for this PR?
Many cases failed during teardown due to loganalyzer with below FRR related error messages.
The failure only happen on the device which runs slim image.

bgp#zebra[36]: [SHWNK-NWT5S][EC 100663304] No such command on config line 10: no fpm use-next-hop-groups
bgp#zebra[36]: [SHWNK-NWT5S][EC 100663304] No such command on config line 12: fpm address 127.0.0.1
ERR bgp#bgpd[50]: [P3GYW-PBKQG][EC 33554466] XXX [FSM] unexpected packet received in state OpenSent
ERR bgp#bgpd[50]: [MVZKX-EG443][EC 33554452] bgp_process_packet: BGP OPEN receipt failed for peer: XXX
ERR bgp#bgpd[50]: [P3GYW-PBKQG][EC 33554466] XXX [FSM] unexpected packet received in state OpenSent
ERR bgp#bgpd[50]: [P3GYW-PBKQG][EC 33554466] XXX [FSM] unexpected packet received in state OpenSent
ERR bgp#bgpd[50]: [P3GYW-PBKQG][EC 33554466] XXX [FSM] unexpected packet received in state OpenSent
ERR bgp#bgpd[50]: [P3GYW-PBKQG][EC 33554466] XXX [FSM] unexpected packet received in state OpenSent
ERR bgp#bgpd[50]: [P3GYW-PBKQG][EC 33554466] XXX [FSM] unexpected packet received in state OpenSent
ERR bgp#bgpd[50]: [P3GYW-PBKQG][EC 33554466] XXX [FSM] unexpected packet received in state OpenSent
ERR bgp#bgpd[50]: [P3GYW-PBKQG][EC 33554466] XXX [FSM] unexpected packet received in state OpenSent
ERR bgp#bgpd[50]: [P3GYW-PBKQG][EC 33554466] XXX [FSM] unexpected packet received in state OpenSent
ERR bgp#bgpd[49]: [P3GYW-PBKQG][EC 33554466] XXX [FSM] unexpected packet received in state OpenSent
ERR bgp#bgpd[49]: [P3GYW-PBKQG][EC 33554466] XXX [FSM] unexpected packet received in state OpenSent
ERR bgp#zebra[36]: [WVJCK-PPMGD][EC 4043309093] netlink-dp (NS 0) error: No route to host, type=RTM_NEWROUTE(24), seq=95412, pid=3576249171
ERR bgp#bgpd[65]: [P3GYW-PBKQG][EC 33554466] XXX [FSM] unexpected packet received in state OpenSent
ERR bgp#zebra[36]: [WVJCK-PPMGD][EC 4043309093] netlink-dp (NS 0) error: No route to host, type=RTM_NEWROUTE(24), seq=95495, pid=3576249171
ERR bgp#bgpd[65]: [P3GYW-PBKQG][EC 33554466] XXX [FSM] unexpected packet received in state OpenSent
ERR bgp#bgpd[65]: [XETTR-D5MR0][EC 100663316] Attempting to process an I/O event but for fd: 66(8) no thread to handle this!
ERR bgp#bgpd[53]: [H4B4J-DCW2R][EC 33554455] XXX [Error] bgp_read_packet error: Connection reset by peer
ERR bgp#zebra[62]: [SHWNK-NWT5S][EC 100663304] No such command on config line 10: no fpm use-next-hop-groups
ERR bgp#zebra[62]: [SHWNK-NWT5S][EC 100663304] No such command on config line 12: fpm address 127.0.0.1
ERR bgp#zebra[62]: [RFREB-PAV4B][EC 100663299] vty_read: read error on vty client fd 26, closing: Connection reset by peer
ERR bgp#zebra[36]: [SHWNK-NWT5S][EC 100663304] No such command on config line 10: no fpm use-next-hop-groups
ERR bgp#zebra[36]: [SHWNK-NWT5S][EC 100663304] No such command on config line 12: fpm address 127.0.0.1
ERR bgp#bgpd[51]: [P3GYW-PBKQG][EC 33554466] XXX [FSM] unexpected packet received in state OpenSent
ERR bgp#bgpd[49]: [P3GYW-PBKQG][EC 33554466] XXX [FSM] unexpected packet received in state OpenSent
ERR bgp#bgpd[51]: [VCGF0-X62M1][EC 100663301] INTERFACE_STATE: Cannot find IF PortChannel999 in VRF 0
ERR bgp#bgpd[51]: [VCGF0-X62M1][EC 100663301] INTERFACE_STATE: Cannot find IF PortChannel999 in VRF 0
ERR bgp#bgpd[48]: [P3GYW-PBKQG][EC 33554466] XXX [FSM] unexpected packet received in state OpenSent
ERR bgp#zebra[36]: [SHWNK-NWT5S][EC 100663304] No such command on config line 10: no fpm use-next-hop-groups
ERR bgp#zebra[36]: [SHWNK-NWT5S][EC 100663304] No such command on config line 12: fpm address 127.0.0.1
How did you do it?
Slim image's rsyslog config has been modified by sonic-net/sonic-buildimage#17905 by mistake.
So the FRR error message would output into syslog and cause teardown failure.
Created a github issue sonic-net/sonic-buildimage#19047 to track the issue and ignore these related error message until the issue fixed to make the nightly build more stable.

r, ".* ERR bgp#bgpd.* unexpected packet received in state OpenSent"
r, ".* ERR bgp#bgpd.* INTERFACE_STATE: Cannot find IF .*"
r, ".* ERR bgp#bgpd.* bgp_process_packet: BGP OPEN receipt failed for peer.*"
r, ".* ERR bgp#bgpd.* bgp_read_packet error: Connection reset by peer.*"
r, ".* ERR bgp#bgpd.* Attempting to process an I/O event but for fd.*"
r, ".* ERR bgp#zebra.* No such command on config line .*"
r, ".* ERR bgp#zebra.* error: No route to host.*"
r, ".* ERR bgp#zebra.* read error on vty client fd .*"
How did you verify/test it?
NA

Any platform specific information?
What is the motivation for this PR?
The new test is a poorly written IPV6 variant of testQosSaiDscpQueueMapping. It is expected to fail across all platforms/topolgies because qos/test_qos_sai.py cannot support IPV6 variant of any of it's testcase because IPV6 is disabled on the DUT by qos_sai_base.py: https://github.com/sonic-net/sonic-mgmt/blob/master/tests/qos/qos_sai_base.py#L1811-L1826 Also the changes made for this test touch class scoped fixtures which causes all the other testcases to error out as well.

How did you do it?
Revert sonic-net#10941 as well as the following fixes that were made to get around the issue

Fix qos/test_qos_sai.py sonic-net#12334
Skip IPV6 variant of testQosSaiDscpQueueMapping if IPV6 is not config… sonic-net#12834

How did you verify/test it?
Verfied on T0, T1 and T0-dualTor, sonic-net#126 was not seen.
…ic-net#12952)

checks if MAC resolution happens properly or not.
The same functionality is tested by test_neighbor_mac_noptf.py
This test fails on T0s because on T0s the Eth0 may b e part of a
port channel. As a result the test fails to assign an IP to the Interface.
Disablign this test on T0 is the right approach as the same functionality
is bening tested on T0 by test_neighbor_mac_noptf.py
Acl test support for t1-28-lag
In the t1-28-lag topo, the downstream links are physical ports, the upstream links are portchannels.
Need this change to add the links to the acl_table_ports.
What is the motivation for this PR?
Applying the fix in sonic-net#11779 to all tests

How did you do it?
How did you verify/test it?
…12941)

What is the motivation for this PR?
Surface more information in our test reporting infrastructure to aid in debugging

How did you do it?
Added more information to a pytest assertion.

How did you verify/test it?
Artificially added an exception to the try-except block and saw that it was printed out in the pytest assertions.
What is the motivation for this PR?
Add useful information about failed reason.

How did you do it?
Add the message parameter in pytest_assert

How did you verify/test it?
Run ssh.test_ssh_limit.
…atus with systemctl take 45 seconds. (sonic-net#12914)

Fix test_update_forced_mgmt failed because check interfaces-config status with systemctl take too much time.

#### Why I did it
Check interfaces-config status with systemctl some times take 45 seconds, however the wait timeout is 10 seconds, which cause test case break:

13/05/2024 23:01:03 base._run                                L0071 DEBUG  | /var/src/sonic-mgmt_vms21-t0-2700_646f1402735219c3e5444094/tests/common/devices/multi_asic.py::_run_on_asics#128: [] AnsibleModule::command, args=["sudo systemctl show --no-pager interfaces-config -p ExecMainExitTimestamp --value"], kwargs={}

13/05/2024 23:01:45 base._run                                L0108 DEBUG  | /var/src/sonic-mgmt_vms21-t0-2700_646f1402735219c3e5444094/tests/common/devices/multi_asic.py::_run_on_asics#128: [] AnsibleModule::command Result => {"changed": true, "stdout": "", "stderr": "", "rc": 0, "cmd": ["sudo", "systemctl", "show", "--no-pager", "interfaces-config", "-p", "ExecMainExitTimestamp", "--value"], "start": "2024-05-13 23:01:00.605392", "end": "2024-05-13 23:01:00.682419", "delta": "0:00:00.077027", "msg": "", "invocation": {"module_args": {"_raw_params": "sudo systemctl show --no-pager interfaces-config -p ExecMainExitTimestamp --value", "_uses_shell": false, "warn": false, "stdin_add_newline": true, "strip_empty_ends": true, "argv": null, "chdir": null, "executable": null, "creates": null, "removes": null, "stdin": null}}, "stdout_lines": [], "stderr_lines": [], "_ansible_no_log": null, "failed": false}

##### Work item tracking
- Microsoft ADO: 27683903

#### How I did it
Increase wait timeout to 60 seconds, because the some time the check status command take 45 seconds.

#### How to verify it
Pass all test case.

#### Description for the changelog
Fix test_update_forced_mgmt failed because check interfaces-config status with systemctl take too much time.
What is the motivation for this PR?
show_techsupport test failed for ACL issue.
If we run po2vlan test before show_techsupport, ACL rule configuration will be broken.

How did you do it?
"config replace" is not reliable, so I use config_reload to recover configuration.

How did you verify/test it?
Run po2vlan end2end test and show_techsupport end2end test.
…s_plus server crash issue (sonic-net#13026)

Description of PR
Summary:
Fixes TACACS failures in test_mgmt_ipv6_only module.

Approach
What is the motivation for this PR?
It's observed that tacacs_plus process may crash when receiving ipv6 tacacs requests. Need a way to workaround it.

How did you do it?
Similar issue in tacacs/test_ro_user has been fixed in sonic-net#12819
Using the similar method to restart tacacs_plus process if it fails.

How did you verify/test it?
Verified in both virtual testbed and physical testbed. Not seeing tacacs failures anymore.

co-authorized by: jianquanye@microsoft.com
* Per a recent release note by Nvidia, CRM available entries for a specific
   object reflect the momentarily availability and may change even if
   no entries of that object are created / removed due to the
   background computation done by the internal processes, therefore skip
   available route counter checks which cause crm test failures.

Signed-off-by: Prabhat Aravind <paravind@microsoft.com>
* Disable LogAnalyzer in test_loopback_action_reload

* Fix flake8 check
…t#8363)

* fix for failures in orchagent_standby_tor_downstream script

* Update test_orchagent_standby_tor_downstream.py

* fix for mocked T0 DToR TC failures due to config push delta
In qos/tunnel_qos_remap_base.py::check_queue_counter, we are trying to
convert string counter value to an integer, but the string can contain
a comma (for num > 999, Ex: "1,000"), which will lead to error
"invalid literal for int() with base 10".
What is the motivation for this PR?
Regression due to: sonic-net#10079

This has broken the following testcases for non dualtor-aa in 202305, 202311 and master -

1. qos/test_tunnel_qos_remap.py::test_encap_dscp_rewrite[active-standby]
2. qos/test_tunnel_qos_remap.py::test_bounced_back_traffic_in_expected_queue[active-standby]
3. qos/test_tunnel_qos_remap.py::test_tunnel_decap_dscp_to_queue_mapping

    @pytest.fixture
    def toggle_all_aa_ports_to_lower_tor(config_active_active_dualtor_active_standby,
                                         lower_tor_host, upper_tor_host, active_active_ports):  # noqa F811
&gt;       config_active_active_dualtor_active_standby(lower_tor_host, upper_tor_host, active_active_ports)
E       TypeError: 'NoneType' object is not callable
    @pytest.fixture
    def toggle_all_aa_ports_to_rand_selected_tor(config_active_active_dualtor_active_standby,
                                                 rand_selected_dut, rand_unselected_dut, active_active_ports):  # noqa F811
&gt;       config_active_active_dualtor_active_standby(rand_selected_dut, rand_unselected_dut, active_active_ports)
E       TypeError: 'NoneType' object is not callable

This is because the check for active-active ports is missing similar to this: https://github.com/sonic-net/sonic-mgmt/blob/master/tests/common/dualtor/dual_tor_utils.py#L1807.

How did you do it?
Proposed fix is to introduce a check for the existence of active-active ports so that the following fixtures don't end up running in active-standby dualtor.

toggle_all_aa_ports_to_lower_tor
toggle_all_aa_ports_to_rand_unselected_tor
toggle_all_aa_ports_to_rand_selected_tor
How did you verify/test it?
Tested on Arista-7260 platform with active-standby dualtor topology with 202305 and 202311 SONiC images.
What is the motivation for this PR?
After announcing the routes, it is taking some time for kernel routes (IP-in-IP tunnel routes from standby ToR to active ToR) to be populated in ASIC_DB. And current sleep_interval is not enough for kernel routes to be populated in some of the runs and thus test is failing because of mismatch in pre_test_route_snapshot and post_test_route_snapshot (lead to AssertionError.

How did you do it?
Increased the sleep_interval (to ~100 secs) for time.sleep after announcing the routes (announce or withdraw) to give more time for kernel routes to be populated in ASIC_DB.

How did you verify/test it?
Test is consistently passing with the fix (verified on Arista-7260CX3-C64).
…S360… (sonic-net#13011)

Signed-off-by: Chun'ang Li <chunangli@microsoft.com>
What is the motivation for this PR?
Allow ssh traffic between DUT and VM

How did you do it?
Allow ssh traffic between DUT and VM

How did you verify/test it?
Capture packets to verify
What is the motivation for this PR?
Improve GNMI fixture for NTP check.

How did you do it?
Remove assert, and generate warning instead.
andywongarista and others added 25 commits September 12, 2024 12:44
* Fix lag flap check for sonic peer

If using sonic peer, currently any lag flap will result in fail. Lag flaps are expected for fast reboot, so fix check so that it only applies for warm reboot.

* Fail fast-reboot if more than one flap
What is the motivation for this PR?
TestbedProcessing.py doesn't populate all the fields related to fanouts and IPV6 for the devices.

How did you do it?
Made changes the support the above mentioned gaps.
…ploy-mg (sonic-net#14553)

What is the motivation for this PR?
Currently, we generate golden_config_db for mx, and it only contains FEATURE table, and will load this golden_config_db in deploy-mg stage. But recently there is change made for GCU, which requires all dependent table should be appeared
This PR is to add all dependent tables (PORT table) in to golden_config_db

How did you do it?
Add all dependent tables(PORT table) in to golden_config_db

How did you verify/test it?
Deploy minigraph
…nic-net#14546)

Description of PR
ADO: 29416611
Summary: Skip comparison of unwanted new fields that introduced in swss
Fixes # (issue)


Approach
What is the motivation for this PR?
The swss update introduce last flap time. Those value are expected to be differenet. Thus skip it.

How did you do it?
Add two fields to skip val.

How did you verify/test it?
E2E test

co-authorized by: jianquanye@microsoft.com
…et#14511)

What is the motivation for this PR?
dut_console/test_idle_timeout was failing due to the console port being blocked (on a specific device with a blocked port), but the error was a generic OSError.

How did you do it?
Added a more descriptive error for ease of diagnosing and resolving the issue.

How did you verify/test it?
Ran on the affected testbed, resulting in the new error being thrown when the device's console port is blocked.

Any platform specific information?
There may be platforms which do not use the same Port is in use. Closing connection... message - however if such a case is encountered, it would be trivial to enhance the check to account for platform differences.

Documentation
This is a minor enhancement, so I have not updated any documentation.
…opo (sonic-net#14397)

What is the motivation for this PR?
Fix failed tests in test_snmp_default_route.py, test_snmp_loopback.py, and test_lldp.py

How did you do it?
Use conditional mark plugin to skip certain testcases.
Skip SNMP testcase as t0-standalone-32 topology has no neighbor VMs, and SNMP queries will be executed from the neighbor VM.
Skip LLDP test as no LLDP neighbors detected in the standalone topo.

How did you verify/test it?
Validate it in internal setup

Supported testbed topology if it's a new test case?
t0-standalone-32

Signed-off-by: Janetxxx <janet970527@gmail.com>
…Non-Mellanox leaf fanout (sonic-net#14562) (sonic-net#14570)

Co-authored-by: bingwang-ms <66248323+bingwang-ms@users.noreply.github.com>
Signed-off-by: Janetxxx <janet970527@gmail.com>
…le (sonic-net#13564)

Description of PR
Summary: Add pcbb_xoff parameters for dualtor-aa-56 topo and 50000_40m profile for Arista-7260CX3.
Fixes sonic-net#165
…_accuracy (sonic-net#14589)

What is the motivation for this PR?
test_pfcwd_timer_accuracy case is flaky on Arisa platform. Sometimes the detect time is larger than the config detect time.
Both the config detect time and polling time are 400ms, and most of the real detect times range between 800 ~ 1000 ms.
Based on lua script log, in the failure loop (the detect time is larger than the config detect time), it took 3 polling durations to trigger the pfc storm, and in most of these cases, there was a little traffic in the first loop pooling duration. Suppose the timestamp for the script to send PFC frames was at the end of the first polling duration. then cause there were no enough PFC received and trigger the pfc storm in the third polling loop.

How did you do it?
Add half of polling time as compensation for the detect config time.

How did you verify/test it?
Run the case
… traffic (sonic-net#14594)

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
standalone topologies have no upstream switch or any peer switches, meaning any test
that requires a peer switch shouldn't be run on this topology

Co-authored-by: David Meggy <davidm@arista.com>
What is the motivation for this PR?
Add wait until for route check in dhcp_relay test

There is a small probability that this case will fail due to route check during testing. I cannot reproduce it manually, so I suspect that the route is not ready. I don’t think there will be too much performance degradation, because in most cases even without this wait_until it can pass, once passed, wait_until will end early.

This issue is only seen in master and 202305 branch for now, so I think for now we don't need to backport it to other branches.

How did you do it?
Add wait until for route check in dhcp_relay test

How did you verify/test it?
Run tests
* adjust reduced_pause_thr to hardware value

Signed-off-by: Zhixin Zhu <zhixzhu@cisco.com>

* fix pre-commit check failure

Signed-off-by: Zhixin Zhu <zhixzhu@cisco.com>

---------

Signed-off-by: Zhixin Zhu <zhixzhu@cisco.com>
What is the motivation for this PR?
Fix failed test cases when sending packets to portchannel interfaces in a T0 standalone topology.

How did you do it?
Added a common helper function to check if ansible_facts contain non-empty portchannel interfaces and portchannels. The test is skipped if no portchannel interfaces are detected. For example, when sending packets to an egress port with PVID 0 on a portchannel interface, the absence of portchannel interfaces means the expected destination does not exist.

How did you verify/test it?
Validate it in internal setup
In tests/vlan/test_vlan.py:

========================================================================================= short test summary info =========================================================================================
SKIPPED [1] vlan/test_vlan.py:168: Test skipped: No portchannels detected when sending untagged packets
SKIPPED [1] vlan/test_vlan.py:213: Test skipped: No portchannels detected when sending tagged packets
SKIPPED [1] vlan/test_vlan.py:385: Test skipped: No portchannels detected when sending untagged packets
SKIPPED [1] vlan/test_vlan.py:444: Unsupported platform.
========================================================================== 3 passed, 4 skipped, 2 warnings in 300.77s (0:0

Signed-off-by: Janetxxx <janet970527@gmail.com>
…4483)

What is the motivation for this PR?
Some route tests are not applicable to t0 topos and should be skipped rather than failing unecessarily, negatively impacting pass rate.

How did you do it?
Applied skip conditions to the applicable tests in the tests mark conditions yaml file in sonic-mgmt.

How did you verify/test it?
Ran the route test suite and confirmed that the desired tests are now skipped.

Any platform specific information?
Verified on Arista-7060X6-64PE-256x200G.
…et#14080)

Description of PR
Summary:
Fixes # (issue)
Address sonic-net#13937
Test gap implementation: sonic-net#14216

ADO: 28941599

Approach
What is the motivation for this PR?
testplan of verification of LLDP_ENTRY_TABLE in SONiC APPL_DB

co-authorized by: jianquanye@microsoft.com
What is the motivation for this PR?
Current processing involves iterating through sonic logs (which can be large) from the beginning, which is unnecessary since only log lines starting from a particular timestamp are relevant.

How did you do it?
Optimize this processing by doing it in reverse and stopping after the last relevant timestamp.

How did you verify/test it?
Ran test_upgrade_path with SONiC neighbors, verified in warm-reboot.log that SSH threads no longer hang unnecessarily long due to log processing
* [dualtor] Fix flakiness of route/test_static_route.py

Fixes:
1) Adding "setup_standby_ports_on_rand_unselected_tor" fixture to setup
   ports in standby mode in case of active-active topology. This is
   needed for packets not to go out of unexpected tor and cause test
   failures.
2) Test is performing "config_reload", this can cause switchover (active
   to standy and viceversa). But rand_selected_dut should be in active
   state for traffic verification to pass, so after config_reload we
   need to toggle ports to rand_selected_dut.

* Addressing review comments.

* Reverting minor unintended change.
…et#14654)

Nokia-7215 has low performance. After device reboot, it's critical processes may not be fully up at the time SSH is reachable. For this platform, I add wait_critical_processes at teardown stage to improve the test stability.

What is the motivation for this PR?
Improve the test_reboot stability on Nokia-7215 platform.

How did you do it?
Add wait_critical_processes at teardown stage.

How did you verify/test it?
Verified on Nokia-7215 M0 testbed.
What is the motivation for this PR?
There is useless parameter testbed_mode in test_dhcp_relay. Also, skip test in test case is insufficient

How did you do it?
Remove useless parameter testbed_mode
Use condition mark to skip test

How did you verify/test it?
Run test in T0 / Dualtor testbeds, test passed
What is the motivation for this PR?
Address test gap for this change: sonic-net/sonic-buildimage#20021
Prevously, dhcrelay would hit the issue that it wouldn't relay any packets if there are packets come when dhcrelay startup. This issue has been fixed from image side by sonic-net/sonic-buildimage#20021. This PR is to add test for it.

How did you do it?
Add stress test with dhcp_relay restart:
Keep sending DHCP packets
Restart dhcp_relay
Check socket buffer
Run general dhcp relay test.

How did you verify/test it?
Run test on m0/t0/dualtor topos, all passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.