Merged EVPN VxLAN MH HLD from Cisco and BCM#1702
Conversation
|
@adyeung please add me to the reviewers list? |
AntonButenkoGL
left a comment
There was a problem hiding this comment.
Hello @pbrisset
I represent a team working on the EVPN Multihoming implementation and we found some comments to this PR.
Can you, please, check the comments? We think they are crucial to have the implementation.
Thank you
|
In the current implementation, can all the functionalities already work on the VS (virtual machine) platform? |
Yes, it was tested with VS platform. And there is a work in progress to align the ESI types logic with the current requirements. |
Updated kernel, SAI, config, and design sections
Update EVPN_VxLAN_Multihoming.md
|
SAI spec opencomputeproject/SAI#2084 |
Hello @AntonButenkoGL, thank you for reviewing the hld and posting comments! I see the responses to your comments now. Please confirm if the questions/concerns are addressed, and we are good to go. |
|
@prvattem @gord1306 @AntonButenkoGL @helloanandhi @mikemallin @skumar041 @venkatmahalingam @srj102 @eddyk-nvidia The HLD has been presented and reviewed at Routing WG and community weekly calls, if there is no further comments on the design, please signoff and mark approve to conclude the review |
|
@adyeung Sorry, I don't have the permission to approve. However, the reply messages above are ok with me. |
Kernel 6.1.94 vesrion. Why I did it Adding support bridge fdb nhid and sync libnl3 header file to Kernel 6.1.94 version Check HLD: sonic-net/SONiC#1702 Signed-off-by: Kishore Kunal <kishore.kunal@broadcom.com>
…om the Kernel Why I did it Managing NHID in Bridge FDB Updates and Handling NHG Updates from the Kernel in fdbsyncd Check HLD: sonic-net/SONiC#1702 Signed-off-by: Kishore Kunal <kishore.kunal@broadcom.com>
…om the Kernel Why I did it Managing NHID in Bridge FDB Updates and Handling NHG Updates from the Kernel in fdbsyncd Check HLD: sonic-net/SONiC#1702 Signed-off-by: Kishore Kunal <kishore.kunal@broadcom.com>
|
no code PR, move to backlog |
| - Multiple Tunnel bridgeports can have the isolation group attribute set. | ||
|
|
||
|
|
||
| [SAI PR 2058] (https://github.com/opencomputeproject/SAI/pull/2058) is raised for the above changes. |
There was a problem hiding this comment.
This SONiC HLD refers to an abandoned SAI PR 2058
The actual EVPN MH SAI additions are in PR 2084 which introduces SAI_BRIDGE_PORT_ATTR_BRIDGE_PORT_SET_SWITCHOVER.
The expectation is that NOS attaches a PROTECTION_NEXT_HOP_GROUP_ID on LAG ports where protection is enabled. On the failure of these LAG ports, NOS triggers the failover, i.e., sets SAI_BRIDGE_PORT_ATTR_BRIDGE_PORT_SET_SWITCHOVER. The SAI implementation then ensures that the MAC addresses learnt on the failed LAG will now be forwarded on the PROTECTION_NEXT_HOP_GROUP associated with the failed LAG.
Please update the SONiC HLD to describe the sequence of SAI operations when a LAG fails.
There was a problem hiding this comment.
addressed in the next HLD version
| - (a) FRR updates the kernel FDB entry with IN_TIMER flag and starts hold-timer. | ||
| - (c) Fdbsyncd receives notification from kernel with IN_TIMER flag set, and it replaces the VXLAN_FDB_TABLE entry with ageing=enabled, type=none. | ||
| - (d) Fdborch removes the mesh bit from the FDB entry in HW. | ||
| - (e) MAC learn event is received from SAI if the traffic hits after mesh bit is removed. |
There was a problem hiding this comment.
After step 5.d, the FDB entry programmed in the hardware should be as below: MAC=H2,Dest=PO1, SAI_FDB_ENTRY_TYPE_DYNAMIC.
This is how any MAC entry learnt locally would have been programmed. The standard SAI behavior is to NOT generate learn/move events when it receives packets with SMAC as H2 and ingress port as PO1; since these do NOT need any control plane handling and sending these events would waste CPU cycles-- imagine an elephant flow with SMAC as H2 being received on PO1.
Given the above, at step 5.e, why should SAI generate a learn/move event when it receives a packet with SMAC as H2 and ingress port as PO1? If we need a special behavior for a specific scenario, then we need new SAI attributes to indicate this behavior.
There was a problem hiding this comment.
I believe there is a confusion here about the scenario.
In MH, initially the MAC is learned on Leaf1 and sync'ed on Leaf2 where it is programmed as static. When MAC is ageing out on Leaf1, a withdraw is sent to Leaf2. The holdtimer is started and the mac is programmed as dynamic in HW. However, that entry has NOT be learned yet. Going from static to dynamic is simply to allow the HW to learn. Once that happens, the MAC is punt. Holdtimer is stop, RT-2 is advertised. There is no further punt happening for that MAC.
…om the Kernel Why I did it Managing NHID in Bridge FDB Updates and Handling NHG Updates from the Kernel in fdbsyncd Check HLD: sonic-net/SONiC#1702 Signed-off-by: Kishore Kunal <kishore.kunal@broadcom.com>
|
@pbrisset if you have a PR for sonic-mgmt tests, Nexthop can see if we can pull all these PRs into a local repo and build and test. |
|
@pbrisset , In the list of PRs, the first PR #4036 is in sonc-swss. But it is shown as "sonic-utilities". Any sonic-utilitis PR not listed here?
|
Sorry. my mistake. I just fix it |
This is the result of merging Cisco and BCM HLDs.
Cisco HLD
BCM HLD
PRs: