Skip to content

Fix the mux active/standby change issue for dualtor FIB testing#3502

Merged
wangxin merged 1 commit intosonic-net:masterfrom
wangxin:fix-dualtor-fib-pr
May 25, 2021
Merged

Fix the mux active/standby change issue for dualtor FIB testing#3502
wangxin merged 1 commit intosonic-net:masterfrom
wangxin:fix-dualtor-fib-pr

Conversation

@wangxin
Copy link
Copy Markdown
Collaborator

@wangxin wangxin commented May 19, 2021

Description of PR

Summary:
Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Approach

What is the motivation for this PR?

On dualtor testbed, the active/standby status may be changed while the PTF script is running.
Consequently the PTF script testing FIB may fail.

The reason of active/standby change is that no ICMP responder is running in PTF to simulate
servers. Then status of the muxes are unhealthy and link manager may try to recover the muxes
by switching the active/standby side.

The fix is to start ICMP responder befor the FIB testing to avoid mux active/standby state flapping.

Another logic issue of the FIB testing is getting active side of the DUT index. The issue is fixed
by changing:

    if not target_dut_index:

to:

    if target_dut_index is None:

Initializing target_mac also has the similiar issue and is fixed in the same way.

The third fix is to change ptf_test_port_map and get_mux_status from fixture
to function. These two functions depends on fixtures like set_mux_random
and set_mux_same_side. The execution sequence matters. If they are all
fixtures, changing the sequence of using the fixtures in test function argument
list may affect the test logic and cause issues that are hard to debug.
Making two of them to function calls can avoid this potential vulnerability.

How did you do it?

  • Start ICMP responder before FIB testing
  • Fix the logic issue of getting active side DUT index
  • Improve the vulnerability issue of fixture sequence and dependency

How did you verify/test it?

Test run the script on both dual ToR and single ToR testbeds.

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

On dualtor testbed, the active/standby status may be changed while the PTF script is running.
Consequently the PTF script testing FIB may fail.

The reason of active/standby change is that no ICMP responder is running in PTF to simulate
servers. Then status of the muxes are unhealthy and link manager may try to recover the muxes
by switching the active/standby side.

The fix is to start ICMP responder befor the FIB testing to avoid mux active/standby state flap.

Another logic issue of the FIB testing is getting active side of the DUT index. The issue is fixed
by changing:
    if not target_dut_index:
to:
    if target_dut_index is None:
Initializing `target_mac` also has the similiar issue and is fixed in the same way.

The third fix is to change `ptf_test_port_map` and `get_mux_status` from fixture
to function. These two functions depends on fixtures like `set_mux_random`
and `set_mux_same_side`. The execution sequence matters. If they are all
fixtures, changing the sequence of using the fixtures in test function argument
list may affect the test logic and cause issues that are hard to debug.
Making two of them to function calls can avoid this potential volunerability.

Signed-off-by: Xin Wang <xiwang5@microsoft.com>
@wangxin wangxin requested a review from a team May 19, 2021 13:22
@wangxin
Copy link
Copy Markdown
Collaborator Author

wangxin commented May 20, 2021

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@wangxin wangxin merged commit 7be42a3 into sonic-net:master May 25, 2021
@bingwang-ms bingwang-ms mentioned this pull request Jun 8, 2021
4 tasks
@wangxin wangxin deleted the fix-dualtor-fib-pr branch September 9, 2021 08:08
vmittal-msft pushed a commit to vmittal-msft/sonic-mgmt that referenced this pull request Sep 28, 2021
…c-net#3502)

On dualtor testbed, the active/standby status may be changed while the PTF script is running.
Consequently the PTF script testing FIB may fail.

The reason of active/standby change is that no ICMP responder is running in PTF to simulate
servers. Then status of the muxes are unhealthy and link manager may try to recover the muxes
by switching the active/standby side.

The fix is to start ICMP responder befor the FIB testing to avoid mux active/standby state flap.

Another logic issue of the FIB testing is getting active side of the DUT index. The issue is fixed
by changing:
    if not target_dut_index:
to:
    if target_dut_index is None:
Initializing `target_mac` also has the similiar issue and is fixed in the same way.

The third fix is to change `ptf_test_port_map` and `get_mux_status` from fixture
to function. These two functions depends on fixtures like `set_mux_random`
and `set_mux_same_side`. The execution sequence matters. If they are all
fixtures, changing the sequence of using the fixtures in test function argument
list may affect the test logic and cause issues that are hard to debug.
Making two of them to function calls can avoid this potential volunerability.

Signed-off-by: Xin Wang <xiwang5@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants