Skip to content

[dualtor] switchover active and standby mux eths needs more time to complete the code flow#7891

Merged
kevinskwang merged 4 commits intosonic-net:masterfrom
yenlu-keith:master
Apr 17, 2023
Merged

[dualtor] switchover active and standby mux eths needs more time to complete the code flow#7891
kevinskwang merged 4 commits intosonic-net:masterfrom
yenlu-keith:master

Conversation

@yenlu-keith
Copy link
Copy Markdown
Contributor

@yenlu-keith yenlu-keith commented Mar 28, 2023

Description
In our lab environment, sonic code flow verifying dualtor switchover behavior between active and standby on 36 sets of mux eths will need more time to perform correct code flow.

TC1: dualtor/test_change_mux_state
TC2: dualtor/test_orch_stress.py::test_flap_neighbor_entry_active
TC3: dualtor/test_orch_stress.py::test_flap_neighbor_entry_standby

these 3 testcases are to make sure CRM (critical resource) counters keeps same
when MUX state switches over between active and standby by config file.
(Including 36 eth sets with associated ipv4/6 addr)
After analyzing, we believe the testcase will need to add some delay at 2 places.

1. Pre-setup needs more time to setup/remove default tunnels with corresponding reactions:
For ex: https://github.com/sonic-net/sonic-mgmt/pull/7809/files

PROPOSAL SOLUTION:
https://github.com/sonic-net/sonic-mgmt/blob/master/tests/dualtor/test_orch_stress.py#L157

// Apply mux active state
+wait(20, 'wait for presetup finishl')
load_swss_config(dut, _swss_path(SWSS_MUX_STATE_ACTIVE_CONFIG_FILE))
load_swss_config(dut, _swss_path(SWSS_MUX_STATE_STANDBY_CONFIG_FILE))
load_swss_config(dut, _swss_path(SWSS_MUX_STATE_ACTIVE_CONFIG_FILE))

When loading the first active config file, presetup does not finish.
19:55:10.882205 mth-t0-64 INFO python[27334]: ansible-command Invoked with _uses_shell=True _raw_params=docker exec swss sh -c "swssconfig /swss_mux_state_active_config.json
…… still prepaing
19:55:15.530161 mth-t0-64 NOTICE swss#orchagent: :- addDecapTunnelTermEntries: Created tunnel entry for ip: 192.168.0.1
19:55:15.539559 mth-t0-64 NOTICE swss#orchagent: :- addDecapTunnelTermEntries: Created tunnel entry for ip: 10.0.0.4
19:55:15.549257 mth-t0-64 NOTICE swss#orchagent: :- addDecapTunnelTermEntries: Created tunnel entry for ip: 10.0.0.8
19:55:15.559039 mth-t0-64 NOTICE swss#orchagent: :- addDecapTunnelTermEntries: Created tunnel entry for ip: 10.0.0.12
19:55:15.568778 mth-t0-64 NOTICE swss#orchagent: :- addDecapTunnelTermEntries: Created tunnel entry for ip: 10.1.0.32
19:55:15.569695 mth-t0-64 NOTICE swss#orchagent: :- addDecapTunnel: Create overlay loopback router interface oid:60000000009e2
19:55:15.588244 mth-t0-64 NOTICE swss#orchagent: :- addDecapTunnelTermEntries: Created tunnel entry for ip: fc00::1
19:55:15.597428 mth-t0-64 NOTICE swss#orchagent: :- addDecapTunnelTermEntries: Created tunnel entry for ip: fc00::9
19:55:15.606482 mth-t0-64 NOTICE swss#orchagent: :- addDecapTunnelTermEntries: Created tunnel entry for ip: fc00::11
19:55:15.615648 mth-t0-64 NOTICE swss#orchagent: :- addDecapTunnelTermEntries: Created tunnel entry for ip: fc00::19
19:55:15.629263 mth-t0-64 NOTICE swss#orchagent: :- addDecapTunnelTermEntries: Created tunnel entry for ip: fc00:1::32
19:55:15.642086 mth-t0-64 NOTICE swss#orchagent: :- addDecapTunnelTermEntries: Created tunnel entry for ip: fc02:1000::1
…… still prepaing
19:55:26.760563 mth-t0-64 NOTICE swss#orchagent: :- handleMuxCfg: Mux entry for port ‘Ethernet60' was added, cable type 0
19:55:26.760563 mth-t0-64 NOTICE swss#orchagent: :- MuxAclHandler: Binding port 1000000000018
19:55:27.372346 mth-t0-64 NOTICE swss#orchagent: :- handleMuxCfg: Mux entry for port 'Ethernet88' was added, cable type 0
19:55:27.372346 mth-t0-64 NOTICE swss#orchagent: :- MuxAclHandler: Binding port 1000000000019
19:55:27.674147 mth-t0-64 NOTICE syncd#syncd: :- threadFunction: time span 300 ms for 'set:SAI_OBJECT_TYPE_ACL_ENTRY:oid:0x8000000000acb'
19:55:27.966945 mth-t0-64 NOTICE swss#orchagent: :- handleMuxCfg: Mux entry for port 'Ethernet92' was added, cable type 0
…… testscript starts...
19:55:28.649649 mth-t0-64 NOTICE swss#orchagent: :- setState: [Ethernet100] Set MUX state from standby to active

Need at least 28-10=18+ secs

2. after switching over standby/active config files,
sonic code flow will still need more time to perform ""enableNeigbor/disableNeighbor/add.rm nhp and tun”"
With insufficient duration, the failure could happen at any point among these 3 test cases
even though sometimes it will still get PASS.
however, it is not meaningful since CRM counter may not change as testcase expected.

PROPOSAL SOLUTION:
https://github.com/sonic-net/sonic-mgmt/blob/master/tests/dualtor/test_orch_stress.py#L93
-wait(2, 'for CRMs to be updated’)
+wait(10, 'for CRMs to be updated and corresponding codeflow done')

GOOD sequence: 7 config files switch-over should map to 7 create/remove_route for each MUX; old flow will miss few times.
Pick one of 36 MUXs as an example: such as 192.168.0.9 and fc02:1000::9

TC1: def test_change_mux_state
https://github.com/sonic-net/sonic-mgmt/blob/master/tests/dualtor/test_orch_stress.py#L149

21:44:11.074926 mth-t0-64 NOTICE swss#swssconfig: :- main: Loading config from JSON file:/swss_mux_state_active_config.json… 1st
21:44:20.703878 mth-t0-64 NOTICE swss#orchagent: :- remove_route: Removed tunnel route to 192.168.0.9/32
21:44:20.706369 mth-t0-64 NOTICE swss#orchagent: :- remove_route: Removed tunnel route to fc02:1000::9/128
21:44:20.861929 mth-t0-64 NOTICE swss#orchagent: :- setState: [Ethernet96] Set MUX state from standby to active — the other MUX
….
21:44:21.884119 mth-t0-64 NOTICE swss#swssconfig: :- main: Loading config from JSON file:/swss_mux_state_standby_config.json… 2nd
21:44:29.925530 mth-t0-64 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.9/32
21:44:29.940835 mth-t0-64 NOTICE swss#orchagent: :- create_route: Created tunnel route to fc02:1000::9/128
21:44:32.863372 mth-t0-64 NOTICE swss#orchagent: :- setState: [Ethernet96] Set MUX state from active to standby — the other MUX
….
21:44:33.127238 mth-t0-64 NOTICE swss#swssconfig: :- main: Loading config from JSON file:/swss_mux_state_active_config.json… 3rd
21:44:45.178279 mth-t0-64 NOTICE swss#orchagent: :- remove_route: Removed tunnel route to 192.168.0.9/32
21:44:45.182649 mth-t0-64 NOTICE swss#orchagent: :- remove_route: Removed tunnel route to fc02:1000::9/128
21:44:45.248834 mth-t0-64 NOTICE swss#orchagent: :- setState: [Ethernet96] Set MUX state from standby to active — the other MUX
….
21:44:56.450584 mth-t0-64 NOTICE swss#swssconfig: :- main: Loading config from JSON file:/swss_mux_state_standby_config.json… 4th
21:44:58.029024 mth-t0-64 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.9/32
21:44:58.050502 mth-t0-64 NOTICE swss#orchagent: :- create_route: Created tunnel route to fc02:1000::9/128
21:44:59.309795 mth-t0-64 NOTICE swss#orchagent: :- setState: [Ethernet96] Set MUX state from active to standby...— the other MUX
….
21:45:07.260043 mth-t0-64 NOTICE swss#swssconfig: :- main: Loading config from JSON file:/swss_mux_state_active_config.json… 5th
21:45:16.729873 mth-t0-64 NOTICE swss#orchagent: :- remove_route: Removed tunnel route to 192.168.0.9/32
21:45:16.733936 mth-t0-64 NOTICE swss#orchagent: :- remove_route: Removed tunnel route to fc02:1000::9/128
21:45:16.803974 mth-t0-64 NOTICE swss#orchagent: :- setState: [Ethernet96] Set MUX state from standby to active…..— the other MUX
….
21:45:17.994252 mth-t0-64 NOTICE swss#swssconfig: :- main: Loading config from JSON file:/swss_mux_state_standby_config.json… 6th
21:45:19.047491 mth-t0-64 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.9/32
21:45:19.050984 mth-t0-64 NOTICE swss#orchagent: :- create_route: Created tunnel route to fc02:1000::9/128
21:45:20.037662 mth-t0-64 NOTICE swss#orchagent: :- setState: [Ethernet96] Set MUX state from active to standby — the other MUX
….
21:45:28.750839 mth-t0-64 NOTICE swss#swssconfig: :- main: Loading config from JSON file:/swss_mux_state_active_config.json… 7th
21:45:38.343358 mth-t0-64 NOTICE swss#orchagent: :- remove_route: Removed tunnel route to 192.168.0.9/32
21:45:38.347389 mth-t0-64 NOTICE swss#orchagent: :- remove_route: Removed tunnel route to fc02:1000::9/128
21:45:38.413630 mth-t0-64 NOTICE swss#orchagent: :- setState: [Ethernet96] Set MUX state from standby to active — the other MUX
….

Pick one of 36 MUX as an example: 192.168.0.33 (the last item of rm/add neighbor)
TC2: def test_flap_neighbor_entry_active
https://github.com/sonic-net/sonic-mgmt/blob/master/tests/dualtor/test_orch_stress.py#L217
21:45:52.490395 mth-t0-64 NOTICE swss#swssconfig: :- main: Loading config from JSON file:/swss_mux_state_active_config.json...

21:46:08.750270 mth-t0-64 INFO python[28316]: ansible-shell_cmds Invoked with cmds=['ip -4 neigh del 192.168.0.36 ..
21:46:08.951647 mth-t0-64 NOTICE swss#orchagent: :- removeNeighbor: Removed next hop 192.168.0.33 on Vlan1000
21:46:11.706040 mth-t0-64 NOTICE swss#orchagent: :- addNextHop: Created next hop 192.168.0.33 on Vlan1000

21:46:14.338040 mth-t0-64 INFO python[28499]: ansible-shell_cmds Invoked with cmds=['ip -4 neigh del 192.168.0.36 ..
21:46:14.495752 mth-t0-64 NOTICE swss#orchagent: :- removeNeighbor: Removed next hop 192.168.0.33 on Vlan1000
21:46:17.294670 mth-t0-64 NOTICE swss#orchagent: :- addNextHop: Created next hop 192.168.0.33 on Vlan1000

TC3: def test_flap_neighbor_entry_standby
https://github.com/sonic-net/sonic-mgmt/blob/master/tests/dualtor/test_orch_stress.py#L251
21:46:32.782288 mth-t0-64 NOTICE swss#swssconfig: :- main: Loading config from JSON file:/swss_mux_state_standby_config.json...
21:46:39.123942 mth-t0-64 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.33/32
….
21:46:49.158120 mth-t0-64 NOTICE swss#orchagent: :- remove_route: Removed tunnel route to 192.168.0.33/32
21:46:51.970661 mth-t0-64 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.33/32
21:46:54.728997 mth-t0-64 NOTICE swss#orchagent: :- remove_route: Removed tunnel route to 192.168.0.33/32
21:46:57.597063 mth-t0-64 NOTICE swss#orchagent: :- create_route: Created tunnel route to 192.168.0.33/32

Steps to reproduce the issue
"dualtor/test_orch_stress"

Describe the results you expected
testcase Pass

Additional information you deem important
some of dualtor testcase will need to setup mocking tunnel
such as: #7809

@kevinskwang
Copy link
Copy Markdown
Contributor

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@kevinskwang
Copy link
Copy Markdown
Contributor

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@kevinskwang kevinskwang merged commit 2184149 into sonic-net:master Apr 17, 2023
wangxin pushed a commit that referenced this pull request Apr 20, 2023
…omplete the code flow (#7891)

* create dualtor tun during presetup based on asic type

* switchover active and standby dualtor mux eths needs more time to complete the correct code flow
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants