Skip to content

Merged EVPN VxLAN MH HLD from Cisco and BCM#1702

Merged
adyeung merged 14 commits intosonic-net:masterfrom
pbrisset:EVPN_VXLAN_MH
Nov 12, 2024
Merged

Merged EVPN VxLAN MH HLD from Cisco and BCM#1702
adyeung merged 14 commits intosonic-net:masterfrom
pbrisset:EVPN_VXLAN_MH

Conversation

@pbrisset
Copy link
Contributor

@pbrisset pbrisset commented May 21, 2024

This is the result of merging Cisco and BCM HLDs.

Cisco HLD
BCM HLD

PRs:

Module Detail PR Status
sonic-utilities Config 4247 Open
sonic-swss Cfgmgr 4036 Open
sonic-buidimage Data Models 23373 Open / In-progress
sonic-swss evpnmhorch 3771 Open
sonic-swss shlorch 3769 Open
sonic-swss-common shlorch 1051 Open
sonic-swss-common APP_DB 952 Open - NHG table - BCM PR
sonic-swss fdborch/neighorch 3914 Open
sonic-swss vxlanorch 3913 Open
sonic-swss fdbsyncd 4039 Open
sonic-swss fpmsyncd 4038 Open
sonic-swss Neighsyncd 4037 Open
sonic-frr FRR 19438 Open / In-progress
sonic-linux-kernel protocol field PR Open
sonic-linux-kernel MH peer sync PR Nvidia also need to open PR for MH peer sync flag
sonic-buildimage iproute2 PR Open / In-progress

@prvattem
Copy link
Collaborator

@adyeung please add me to the reviewers list?

@adyeung adyeung requested a review from prvattem May 31, 2024 23:31
Copy link

@AntonButenkoGL AntonButenkoGL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @pbrisset

I represent a team working on the EVPN Multihoming implementation and we found some comments to this PR.

Can you, please, check the comments? We think they are crucial to have the implementation.

Thank you

@gord1306
Copy link

In the current implementation, can all the functionalities already work on the VS (virtual machine) platform?

@AntonButenkoGL
Copy link

AntonButenkoGL commented Jul 12, 2024

In the current implementation, can all the functionalities already work on the VS (virtual machine) platform?

Yes, it was tested with VS platform.
Additionally, SAG and WarmReboot use cases testing is in progress.

And there is a work in progress to align the ESI types logic with the current requirements.

hasan-brcm and others added 3 commits July 29, 2024 15:35
Updated kernel, SAI, config, and design sections
Update EVPN_VxLAN_Multihoming.md
@adyeung
Copy link
Collaborator

adyeung commented Oct 28, 2024

SAI spec opencomputeproject/SAI#2084

@hasan-brcm
Copy link
Contributor

Hello @pbrisset

I represent a team working on the EVPN Multihoming implementation and we found some comments to this PR.

Can you, please, check the comments? We think they are crucial to have the implementation.

Thank you

Hello @AntonButenkoGL, thank you for reviewing the hld and posting comments! I see the responses to your comments now. Please confirm if the questions/concerns are addressed, and we are good to go.

@adyeung
Copy link
Collaborator

adyeung commented Nov 7, 2024

@prvattem @gord1306 @AntonButenkoGL @helloanandhi @mikemallin @skumar041 @venkatmahalingam @srj102 @eddyk-nvidia The HLD has been presented and reviewed at Routing WG and community weekly calls, if there is no further comments on the design, please signoff and mark approve to conclude the review

@gord1306
Copy link

@adyeung Sorry, I don't have the permission to approve. However, the reply messages above are ok with me.

@adyeung
Copy link
Collaborator

adyeung commented Nov 12, 2024

@adyeung Sorry, I don't have the permission to approve. However, the reply messages above are ok with me.

@gord1306 you can click files changed tab -> click review -> click approve

Addressed review comments.

(cherry picked from commit f9fe305)
Addressed review comments.

(cherry picked from commit 9368707)
Copy link
Contributor

@hasan-brcm hasan-brcm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@adyeung adyeung merged commit 5039fee into sonic-net:master Nov 12, 2024
kishorekunal01 added a commit to kishorekunal01/sonic-buildimage that referenced this pull request Nov 14, 2024
Kernel 6.1.94 vesrion.

Why I did it
Adding support bridge fdb nhid and sync libnl3 header file to Kernel 6.1.94
version
Check HLD: sonic-net/SONiC#1702

Signed-off-by: Kishore Kunal <kishore.kunal@broadcom.com>
kishorekunal01 added a commit to kishorekunal01/sonic-swss that referenced this pull request Nov 15, 2024
…om the Kernel

Why I did it
Managing NHID in Bridge FDB Updates and Handling NHG Updates from the Kernel in fdbsyncd

Check HLD: sonic-net/SONiC#1702

Signed-off-by: Kishore Kunal <kishore.kunal@broadcom.com>
kishorekunal01 added a commit to kishorekunal01/sonic-swss that referenced this pull request Nov 15, 2024
…om the Kernel

Why I did it
Managing NHID in Bridge FDB Updates and Handling NHG Updates from the Kernel in fdbsyncd

Check HLD: sonic-net/SONiC#1702

Signed-off-by: Kishore Kunal <kishore.kunal@broadcom.com>
@zhangyanzhao
Copy link
Collaborator

no code PR, move to backlog

@zhangyanzhao
Copy link
Collaborator

zhangyanzhao commented May 11, 2025

@pbrisset @adyeung can you please add the code PRs to this HLD by referring to #806 ? That is required to track the feature completeness. Thanks.

- Multiple Tunnel bridgeports can have the isolation group attribute set.


[SAI PR 2058] (https://github.com/opencomputeproject/SAI/pull/2058) is raised for the above changes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This SONiC HLD refers to an abandoned SAI PR 2058
The actual EVPN MH SAI additions are in PR 2084 which introduces SAI_BRIDGE_PORT_ATTR_BRIDGE_PORT_SET_SWITCHOVER.

The expectation is that NOS attaches a PROTECTION_NEXT_HOP_GROUP_ID on LAG ports where protection is enabled. On the failure of these LAG ports, NOS triggers the failover, i.e., sets SAI_BRIDGE_PORT_ATTR_BRIDGE_PORT_SET_SWITCHOVER. The SAI implementation then ensures that the MAC addresses learnt on the failed LAG will now be forwarded on the PROTECTION_NEXT_HOP_GROUP associated with the failed LAG.

Please update the SONiC HLD to describe the sequence of SAI operations when a LAG fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in the next HLD version

- (a) FRR updates the kernel FDB entry with IN_TIMER flag and starts hold-timer.
- (c) Fdbsyncd receives notification from kernel with IN_TIMER flag set, and it replaces the VXLAN_FDB_TABLE entry with ageing=enabled, type=none.
- (d) Fdborch removes the mesh bit from the FDB entry in HW.
- (e) MAC learn event is received from SAI if the traffic hits after mesh bit is removed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After step 5.d, the FDB entry programmed in the hardware should be as below: MAC=H2,Dest=PO1, SAI_FDB_ENTRY_TYPE_DYNAMIC.
This is how any MAC entry learnt locally would have been programmed. The standard SAI behavior is to NOT generate learn/move events when it receives packets with SMAC as H2 and ingress port as PO1; since these do NOT need any control plane handling and sending these events would waste CPU cycles-- imagine an elephant flow with SMAC as H2 being received on PO1.

Given the above, at step 5.e, why should SAI generate a learn/move event when it receives a packet with SMAC as H2 and ingress port as PO1? If we need a special behavior for a specific scenario, then we need new SAI attributes to indicate this behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there is a confusion here about the scenario.

In MH, initially the MAC is learned on Leaf1 and sync'ed on Leaf2 where it is programmed as static. When MAC is ageing out on Leaf1, a withdraw is sent to Leaf2. The holdtimer is started and the mac is programmed as dynamic in HW. However, that entry has NOT be learned yet. Going from static to dynamic is simply to allow the HW to learn. Once that happens, the MAC is punt. Holdtimer is stop, RT-2 is advertised. There is no further punt happening for that MAC.

@pbrisset pbrisset mentioned this pull request Oct 22, 2025
tahmed-dev pushed a commit to tahmed-dev/sonic-swss that referenced this pull request Jan 6, 2026
…om the Kernel

Why I did it
Managing NHID in Bridge FDB Updates and Handling NHG Updates from the Kernel in fdbsyncd

Check HLD: sonic-net/SONiC#1702

Signed-off-by: Kishore Kunal <kishore.kunal@broadcom.com>
@bhouse-nexthop
Copy link

@pbrisset if you have a PR for sonic-mgmt tests, Nexthop can see if we can pull all these PRs into a local repo and build and test.

@vganesan-nokia
Copy link
Contributor

vganesan-nokia commented Jan 21, 2026

@pbrisset , In the list of PRs, the first PR #4036 is in sonc-swss. But it is shown as "sonic-utilities". Any sonic-utilitis PR not listed here?

This is the result of merging Cisco and BCM HLDs.

Cisco HLD BCM HLD

PRs:

Module Detail PR Status
sonic-utilities Cfgmgr 4036 Open
sonic-buidimage Data Models 23373 Open / In-progress
sonic-swss evpnmhorch 3771 Open
sonic-swss shlorch 3769 Open
sonic-swss-common shlorch 1051 Open
sonic-swss-common APP_DB 952 Open - NHG table - BCM PR
sonic-swss fdborch/neighorch 3914 Open
sonic-swss vxlanorch 3913 Open
sonic-swss fdbsyncd 4039 Open
sonic-swss fpmsyncd 4038 Open
sonic-swss Neighsyncd 4037 Open
sonic-frr FRR 19438 Open / In-progress
sonic-linux-kernel protocol field PR Open
sonic-linux-kernel MH peer sync PR Nvidia also need to open PR for MH peer sync flag
sonic-buildimage iproute2 PR Open / In-progress

@pbrisset
Copy link
Contributor Author

@pbrisset , In the list of PRs, the first PR #4036 is in sonc-swss. But it is shown as "sonic-utilities". Any sonic-utilitis PR not listed here?

This is the result of merging Cisco and BCM HLDs.
Cisco HLD BCM HLD
PRs:
Module Detail PR Status
sonic-utilities Cfgmgr 4036 Open
sonic-buidimage Data Models 23373 Open / In-progress
sonic-swss evpnmhorch 3771 Open
sonic-swss shlorch 3769 Open
sonic-swss-common shlorch 1051 Open
sonic-swss-common APP_DB 952 Open - NHG table - BCM PR
sonic-swss fdborch/neighorch 3914 Open
sonic-swss vxlanorch 3913 Open
sonic-swss fdbsyncd 4039 Open
sonic-swss fpmsyncd 4038 Open
sonic-swss Neighsyncd 4037 Open
sonic-frr FRR 19438 Open / In-progress
sonic-linux-kernel protocol field PR Open
sonic-linux-kernel MH peer sync PR Nvidia also need to open PR for MH peer sync flag
sonic-buildimage iproute2 PR Open / In-progress

Sorry. my mistake. I just fix it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: MovedToBacklog

Development

Successfully merging this pull request may close these issues.