Fix issue: wrong teamd link watch state after warm reboot#135
Fix issue: wrong teamd link watch state after warm reboot#135
Conversation
Signed-off-by: Stephen Sun <stephens@nvidia.com>
f20efeb to
5e4ca30
Compare
|
ci 2310 + rerun 275 passed |
stepanblyschak
left a comment
There was a problem hiding this comment.
Is this is a fix for an issue that hapeens due to warm reboot patch? I suggest changing warm reboot patch instead of adding new one in that case.
I understand the recursion happens because of team_refresh in lacp_port_added? Why do we need team_refresh there?
src/libteam/patch/0013-Fix-issue-TEAM_ATTR_PORT_CHANGED-can-be-lost-if-port.patch
Outdated
Show resolved
Hide resolved
src/libteam/patch/0013-Fix-issue-TEAM_ATTR_PORT_CHANGED-can-be-lost-if-port.patch
Outdated
Show resolved
Hide resolved
Regarding merging it into the warm reboot patch, I'm OK only if it's a common practice to merge multiple patches into one. Regarding whether |
|
@stephenxs I believe so, since it is a fix for a patch that we introduced, it is common to change the original patch (or we can lose track of which patches are backports, which were developed by us) - e.g we already did it in the past - I am Ok with these changes but I would like to better understand why team_refresh is being called from lacp_port_added. I am Ok to approve to discuss it with MSFT. |
Thanks. I'm still checking whether there is a negative effect if |
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Hi @stepanblyschak btw, I merged the patch to the original one. |
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
6267876 to
9fd6ba5
Compare
|
ci 2388 + rerun 313 passed |
Why I did it [Submodule][202211] Advance sonic-restapi pointer The branch 202012 has already updated to commit 47e4b53. 4f6f979 Fix the redis security issue CVE-2023-28858 and CVE-2023-28859 (#139) 47e4b53 Fix adv_pfx len for ipv6 (#135) 44121be Support ipv6 prefix lenght greater than 64 and check for adv_prefix (#134) 99c467d Add API support for adv prefix and custom monitoring (#133) 347684a Use github code scanning instead of LGTM (#132) 86543d0 Updates to route PATCH API (#129) a1af82c Install libyang to azure pipeline (#128) 2007c4c Increase coverage threshold (#126) Work item tracking Microsoft ADO (number only): 17705422 How I did it How to verify it
…lly (sonic-net#15929) #### Why I did it src/sonic-gnmi ``` * fb338d5 - (HEAD -> master, origin/master, origin/HEAD) Merge pull request #135 from liuh-80/dev/liuh/cherry-pick-zmq (3 hours ago) [Hua Liu] * f8d9c7e - Merge branch 'master' into dev/liuh/cherry-pick-zmq (8 hours ago) [Qi Luo] * cbd5185 - Fix PR comments (26 hours ago) [liuh-80] * 226fc31 - Fix PR comments (2 days ago) [liuh-80] * 6579847 - Fix UT (3 days ago) [liuh-80] * 53713c3 - Improve code coverage (3 days ago) [liuh-80] * d8ff562 - Fix UT (3 days ago) [liuh-80] * c3a66bc - Cherry-pick ZMQ change from nvidia repo (3 days ago) [liuh-80] ``` #### How I did it #### How to verify it #### Description for the changelog
…utomatically (sonic-net#19415) #### Why I did it src/sonic-host-services ``` * 02d9b55 - (HEAD -> master, origin/master, origin/HEAD) Added support to render template format of `delayed` flag on Feature Table. (#135) (28 hours ago) [abdosi] * 60fdfea - Fixed determine/process reboot-cause service dependency (sonic-net#17406) (#132) (13 days ago) [anamehra] ``` #### How I did it #### How to verify it #### Description for the changelog
…ically (sonic-net#25237) #### Why I did it src/sonic-dash-ha ``` * 9b3c0bf - (HEAD -> master, origin/master, origin/HEAD) Add bfd rewrite on pmon change. (#136) (33 hours ago) [dypet] * af44396 - [build] Disable debian helper auto install for cargo project. (#135) (2 days ago) [Liu Shilong] * 17e2e0b - Implement bfd pinned state (#134) (8 days ago) [yue-fred-gao] * c04969e - switch to using libboost1.83 (#133) (9 days ago) [yijingyan2] ``` #### How I did it #### How to verify it #### Description for the changelog
Why I did it
Fix issue: wrong teamd link watch state after warm reboot due to TEAM_ATTR_PORT_CHANGED lost
The flag TEAM_ATTR_PORT_CHANGED is maintained by kernel team driver:
In the userspace, the change flag is maintained by libteam in struct team_port.
The team daemon calls port_priv_change_handler_func on receiving port change event.
The logic in port_priv_change_handler_func
In lacp_port_added, it calls team_refresh to refresh ifinfo, port info, and option info from the kernel via RTNL.
In this step, port_priv_change_handler_func is called recursively.
If the port has been up when the port is being added to the team device, the "port up" information is carried in the outer call but will be lost.
In case the flag TEAM_ATTR_PORT_CHANGED is set only in the inner call, function port_priv_change_handler_func can be called in the inner call.
However, it will fail to fetch "enable" options because option_list_init has not be called.
Signed-off-by: Stephen Sun stephens@nvidia.com
How I did it
Fix:
Do not call check_call_change_handlers when parsing RTNL function is called from another check_call_change_handlers recursively.
How to verify it
Which release branch to backport (provide reason below if selected)
Description for the changelog
Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)