Skip to content

[fdb] Add FDB learning during config reload test#887

Merged
liat-grozovik merged 5 commits intosonic-net:masterfrom
wangxin:fdb-issue-pr
May 1, 2019
Merged

[fdb] Add FDB learning during config reload test#887
liat-grozovik merged 5 commits intosonic-net:masterfrom
wangxin:fdb-issue-pr

Conversation

@wangxin
Copy link
Copy Markdown
Collaborator

@wangxin wangxin commented Apr 24, 2019

Description of PR

Summary:
Fixes # (issue)

This case is to verify that all the FDB entries learned by SDK are correctly reported to upper layer in the initialization process after config reload.

The added case in this PR is to cover the issue fixed by SAI commit: Mellanox/SAI-Implementation@fc5a8dd

Steps performed by the scripts:

  • Find out the list of ports in VLAN
  • Find out the list of ports in PortChannel
  • Start a PTF script in asynchronous. Keep it running and move on to next ansible task immediately. Details of the PTF script:
    • The PTF script keeps sending packets with unique src MAC address to different vlan ports and lag ports in interleaved way. The unique src MAC addresses have same pre-defined prefix.
    • Pause 0.1 second after sending 8 packets (4 sent to VLAN ports, 4 sent to LAG ports, interleaved)
    • Repeat the above steps for 200 seconds.
  • Immediately execute “sudo config reload -y” after the PTF script is started.
  • After config reload is completed (need around 90 seconds), keep waiting until the PTF script is completed executing.
  • Check the number of learned FDB entries from SDK:
    • docker exec syncd sx_api_fdb_dump.py | grep {{ dummy_mac_prefix }} | wc -l
  • Check the number of learned FDB entries from “show mac”:
    • show mac | grep {{ dummy_mac_prefix }} | wc -l
  • Compare the number of learned FDB entries:
    • If SDK entries < “show mac” entries, issue reproduced
    • If SDK entries == “show mac” entries, verified OK

With the current tuning (send for 200 seconds, 8 packets in a batch, send to VLAN ports and LAG ports in interleaved pattern, 0.1s interval between batches), the issue could be reproduced 4 times in 6 tests.

Type of change

  • [] Bug fix
  • [] Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Approach

How did you do it?

Extended the existing FDB testing scripts. Please refer to the description for detailed steps.

How did you verify/test it?

Verified on Mellanox platform.

Any platform specific information?

The command for checking FDB entries learned from SDK is Mellanox specific. It takes nearly 4 minutes to run this added test case. There is no point wasting time run the case and skip verification step for other vendors. Because of this, I set the whole new test case Mellanox specific for now.

If other vendors consider this test case useful, they need little effort to refactor this test case to make just the step for counting FDB entries learned in SDK vendor specific.

Supported testbed topology if it's a new test case?

['t0', 't0-64', 't0-116']

Documentation

https://github.com/Azure/SONiC/wiki/FDB-Scale-Test-Plan
fdb_test_ansible.log

Xin Wang added 2 commits April 24, 2019 14:22
This case is to ensure that all the FDB entries
learned by SDK are correctly reported to upper
layer in the initialization process after config
reload.

Signed-off-by: Xin Wang <xinw@mellanox.com>
Signed-off-by: Xin Wang <xinw@mellanox.com>
@wangxin
Copy link
Copy Markdown
Collaborator Author

wangxin commented Apr 24, 2019

@qiluo-msft Could you please help review this?

pause: seconds=10

- name: Count FDB entries from SDK
shell: "docker exec syncd sx_api_fdb_dump.py | grep {{ dummy_mac_prefix }} | wc -l"
Copy link
Copy Markdown
Contributor

@stepanblyschak stepanblyschak Apr 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why there is a sx_api_fdb_dump.py call without platform check?
sx_api*.py scripts are Mellanox SDK example/dump scripts and do not exist in other vendors syncd container

UPD: Ok, I see there is a platform check at the bottom but it is better to think of generic approach

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It takes around 4 minutes to run this case. There is no point taking time to run the test and skip the SDK checking step for other vendors for now.

become: yes

- name: Wait until the PTF script completed execution
shell: "ps -ef | grep /usr/bin/ptf | grep -v grep"
Copy link
Copy Markdown
Contributor

@qiluo-msft qiluo-msft Apr 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ps -ef | grep /usr/bin/ptf | grep -v grep [](start = 12, length = 41)

Could you try wait pid or tail --pid=$pid -f /dev/null ? #Closed

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, updated and comitted.

pause: seconds=10

- name: Count FDB entries from SDK
shell: "docker exec syncd sx_api_fdb_dump.py | grep {{ dummy_mac_prefix }} | wc -l"
Copy link
Copy Markdown
Contributor

@qiluo-msft qiluo-msft Apr 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grep {{ dummy_mac_prefix }} [](start = 51, length = 27)

In stead of testing only dummy mac, how about testing all AGEABLE(dynamic) mac? #Closed

Copy link
Copy Markdown
Collaborator Author

@wangxin wangxin Apr 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this new test case, PTF script doesn't inject packets with src MAC set to MAC of PTF ports. So, all AGEABLE MAC are the dummy MAC addresses. #Closed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that all AGEABLE MAC are the dummy MAC addresses. So the two option are the effectively the same.

I am wondering the test purpose. Should we test all AGEABLE(dynamic) mac, no matter whether the MAC is generated with dummy mac or not?


In reply to: 278421108 [](ancestors = 278421108)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got your point. Updated the script to count all AGEABLE MAC.


always:
- name: clear FDB table
command: sonic-clear fdb all
Copy link
Copy Markdown
Contributor

@qiluo-msft qiluo-msft Apr 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the test failed, should we keep the scene and not clear FDB table? #Closed

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, updated and comitted.

Copy link
Copy Markdown
Contributor

@qiluo-msft qiluo-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented

* Use "tail --pid=<pid> -f /dev/null" to wait for PTF script to complete
* Clear FDB only when test case pass

Signed-off-by: Xin Wang <xinw@mellanox.com>
@praveen-li
Copy link
Copy Markdown
Member

Fo many PRs, Test results are not available for even single successful run. I think we should make it mandatory to paste test results at least once per PR. [A general comment]. Thanks.

become: yes

- name: Get PID of the PTF script
shell: "ps -ef | grep /usr/bin/ptf | grep -v grep | awk '{print $2}'"
Copy link
Copy Markdown
Contributor

@qiluo-msft qiluo-msft Apr 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ps -ef | grep [](start = 12, length = 13)

pgrep is just for this purpose #Closed

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, my method is silly. Updated with - command: pgrep -f '/usr/bin/python /usr/bin/ptf'. Thanks!

Xin Wang added 2 commits April 26, 2019 18:48
Signed-off-by: Xin Wang <xinw@mellanox.com>
Signed-off-by: Xin Wang <xinw@mellanox.com>
@wangxin
Copy link
Copy Markdown
Collaborator Author

wangxin commented Apr 26, 2019

@praveen-li Attached ansible test result of this script.

@praveen-li
Copy link
Copy Markdown
Member

praveen-li commented Apr 26, 2019 via email

@qiluo-msft
Copy link
Copy Markdown
Contributor

@stepanblyschak any more concern? I notice you 'requested changes' on this PR.

@liat-grozovik liat-grozovik merged commit c5e6a93 into sonic-net:master May 1, 2019
@wangxin wangxin deleted the fdb-issue-pr branch May 24, 2019 03:32
kazinator-arista pushed a commit to kazinator-arista/sonic-mgmt that referenced this pull request Mar 4, 2026
[Vxlan] : adding show vnet/vxlan cmds (sonic-net#880)
[show][bgp] Use only 'show ip bgp' as the base and use bgp_frr_v4 file
for FRR routing stack (sonic-net#884)
[fast reboot] set a fast-reboot DB flag (sonic-net#887)
[show] Add 'ip/ipv6 bgp network' commands (sonic-net#888)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants