Fix up dplane handling of some edge cases#18919
Merged
riw777 merged 3 commits intoFRRouting:masterfrom Jun 5, 2025
Merged
Conversation
10 tasks
ton31337
approved these changes
Jun 2, 2025
mjstapp
reviewed
Jun 2, 2025
Currently the buffer size used for encoding of routes is 8192 bytes. This is not a great choice for when you are using 512 way ECMP with v6 nexthops. You quickly run out of space. Let's just make the size big enough to just work for the forseeable future. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When a dplane install fails in the dplane_fpm_nl code. Let the upper level protocol know that something has gone amiss. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
9637f09 to
32c5c45
Compare
Currently when using the dplane_fpm_nl module and a route installation
fails in that code, zebra is being notified that the route failed
to install but was doing nothing about it. Leaving the routes
in a `queued` state. Which is not a good. Modify the code
to remove the queued state and properly notice that the route
installation has failed for the operator.
New behavior(Note that I've intentionally introduced the problem):
2025-05-30 19:30:29.796 [ERR!] zebra: [X4VXS-XMMJ2] fpm_nl_enqueue: netlink_route_multipath_msg_encode failed
2025-05-30 19:30:29.796 [DEBG] zebra: [JGWSB-SMNVE] dplane: incoming new work counter: 0
2025-05-30 19:30:29.796 [DEBG] zebra: [QP8G8-08P0Q] dplane enqueues 0 new work to provider 'Kernel' curr is 0
2025-05-30 19:30:29.796 [DEBG] zebra: [JVY1P-93VFY] dplane provider 'Kernel': processing
2025-05-30 19:30:29.796 [DEBG] zebra: [VCDW6-A7ZF1] dplane dequeues 0 completed work from provider Kernel
2025-05-30 19:30:29.796 [DEBG] zebra: [QP8G8-08P0Q] dplane enqueues 0 new work to provider 'dplane_fpm_nl' curr is 0
2025-05-30 19:30:29.796 [DEBG] zebra: [VCDW6-A7ZF1] dplane dequeues 3 completed work from provider dplane_fpm_nl
2025-05-30 19:30:29.796 [DEBG] zebra: [JTWAB-1MH4Y] dplane has 3 completed, 0 errors, for zebra main
2025-05-30 19:30:29.796 [DEBG] zebra: [J7K9Z-9M7DT] Nexthop dplane ctx 0x55c0f6258700, op NH_INSTALL, nexthop ID (17), result SUCCESS
2025-05-30 19:30:29.796 [DEBG] zebra: [J7K9Z-9M7DT] Nexthop dplane ctx 0x55c0f624add0, op NH_INSTALL, nexthop ID (16), result SUCCESS
2025-05-30 19:30:29.796 [DEBG] zebra: [S17A6-6B1X2] default(0:254):1::1/128 Processing dplane result ctx 0x55c0f624c000, op ROUTE_UPDATE result FAILURE
2025-05-30 19:30:29.796 [DEBG] zebra: [TFH34-6R9W1] default(0:254):1::1/128 Stale dplane result for old_re 0x55c0f62569d0
2025-05-30 19:30:29.796 [WARN] zebra: [VYKYC-709DP] default(0:254):1::1/128: Route install failed
r1(config)# do show ipv6 route
Codes: K - kernel route, C - connected, L - local, S - static,
R - RIPng, O - OSPFv3, I - IS-IS, B - BGP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric, t - Table-Direct,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
IPv6 unicast VRF default:
S>r 1::1/128 [1/0] via fe80::1, r1-eth0, weight 1, 00:00:03
r via fe80::2, r1-eth0, weight 1, 00:00:03
r via fe80::3, r1-eth0, weight 1, 00:00:03
C>* fe80::/64 is directly connected, r1-eth0, weight 1, 00:00:58
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
32c5c45 to
e4acb14
Compare
r12f
pushed a commit
to Azure/sonic-buildimage-msft
that referenced
this pull request
Jun 13, 2025
<!--
Please make sure you've read and understood our contributing guidelines:
https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md
** Make sure all your commits include a signature generated with `git
commit -s` **
If this is a bug fix, make sure your description includes "fixes #xxxx",
or
"closes #xxxx" or "resolves #xxxx"
Please provide the following information:
-->
#### Why I did it
When a route with 512 nexthops is programmed, show ipv6 route shows the
route as queued, even though the route gets programmed in the hardware.
Porting the fix FRRouting/frr#18919 from FRR as
well as aligning dplane_fpm_sponic
##### Work item tracking
- Microsoft ADO **(number only)**:
#### How I did it
Modifying dplane_fpm_sonic and added patches.
#### How to verify it
Install routes with 512 nexthops and confirm if they are not queued.
<!--
If PR needs to be backported, then the PR must be tested against the
base branch and the earliest backport release branch and provide tested
image version on these two branches. For example, if the PR is requested
for master, 202211 and 202012, then the requester needs to provide test
results on master and 202012.
-->
#### Which release branch to backport (provide reason below if selected)
<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->
- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [ ] 202205
- [ ] 202211
#### Tested branch (Please provide the tested image version)
<!--
- Please provide tested image version
- e.g.
- [x] 20201231.100
-->
- [ ] <!-- image version 1 -->
- [ ] <!-- image version 2 -->
#### Description for the changelog
<!--
Write a short (one line) summary that describes the changes in this
pull request for inclusion in the changelog:
-->
<!--
Ensure to add label/tag for the feature raised. example - PR#2174 under
sonic-utilities repo. where, Generic Config and Update feature has been
labelled as GCU.
-->
#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on
https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->
#### A picture of a cute animal (not mandatory but encouraged)
9 tasks
StormLiangMS
pushed a commit
to sonic-net/sonic-buildimage
that referenced
this pull request
Jun 23, 2025
Why I did it When a route with 512 nexthops is programmed, show ipv6 route shows the route as queued, even though the route gets programmed in the hardware. Porting the fix FRRouting/frr#18919 from FRR as well as aligning dplane_fpm_sponic Work item tracking Microsoft ADO (number only): How I did it Modifying dplane_fpm_sonic and added patches. How to verify it Install routes with 512 nexthops and confirm if they are not queued.
mssonicbld
added a commit
to mssonicbld/sonic-buildimage
that referenced
this pull request
Jun 23, 2025
<!--
Please make sure you've read and understood our contributing guidelines:
https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md
** Make sure all your commits include a signature generated with `git commit -s` **
If this is a bug fix, make sure your description includes "fixes #xxxx", or
"closes #xxxx" or "resolves #xxxx"
Please provide the following information:
-->
#### Why I did it
When a route with 512 nexthops is programmed, show ipv6 route shows the route as queued, even though the route gets programmed in the hardware. Porting the fix FRRouting/frr#18919 from FRR as well as aligning dplane_fpm_sponic
##### Work item tracking
- Microsoft ADO **(number only)**:
#### How I did it
Modifying dplane_fpm_sonic and added patches.
#### How to verify it
Install routes with 512 nexthops and confirm if they are not queued.
<!--
If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012.
-->
#### Which release branch to backport (provide reason below if selected)
<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->
- [ ] 202205
- [ ] 202211
- [ ] 202305
- [ ] 202311
- [ ] 202405
- [ ] 202411
- [ ] 202505
#### Tested branch (Please provide the tested image version)
<!--
- Please provide tested image version
- e.g.
- [x] 20201231.100
-->
- [ ] <!-- image version 1 -->
- [ ] <!-- image version 2 -->
#### Description for the changelog
<!--
Write a short (one line) summary that describes the changes in this
pull request for inclusion in the changelog:
-->
<!--
Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
-->
#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->
#### A picture of a cute animal (not mandatory but encouraged)
9 tasks
mssonicbld
added a commit
to sonic-net/sonic-buildimage
that referenced
this pull request
Jun 23, 2025
<!-- Please make sure you've read and understood our contributing guidelines: https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md failure_prs.log Make sure all your commits include a signature generated with `git commit -s` ** If this is a bug fix, make sure your description includes "fixes #xxxx", or "closes #xxxx" or "resolves #xxxx" Please provide the following information: --> #### Why I did it When a route with 512 nexthops is programmed, show ipv6 route shows the route as queued, even though the route gets programmed in the hardware. Porting the fix FRRouting/frr#18919 from FRR as well as aligning dplane_fpm_sponic ##### Work item tracking - Microsoft ADO **(number only)**: #### How I did it Modifying dplane_fpm_sonic and added patches. #### How to verify it Install routes with 512 nexthops and confirm if they are not queued. <!-- If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012. --> #### Which release branch to backport (provide reason below if selected) <!-- - Note we only backport fixes to a release branch, *not* features! - Please also provide a reason for the backporting below. - e.g. - [x] 202006 --> - [ ] 202205 - [ ] 202211 - [ ] 202305 - [ ] 202311 - [ ] 202405 - [ ] 202411 - [ ] 202505 #### Tested branch (Please provide the tested image version) <!-- - Please provide tested image version - e.g. - [x] 20201231.100 --> - [ ] <!-- image version 1 --> - [ ] <!-- image version 2 --> #### Description for the changelog <!-- Write a short (one line) summary that describes the changes in this pull request for inclusion in the changelog: --> <!-- Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU. --> #### Link to config_db schema for YANG module changes <!-- Provide a link to config_db schema for the table for which YANG model is defined Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md --> #### A picture of a cute animal (not mandatory but encouraged)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When using
no fpm use-nexthop-groupand we have 512 way ecmp and ipv6 routes 8k buffer is not enough to actually encode the route to the dplane. Add code to stop this problem from happening. Add code to let the ctx carry the failure back up to zebra. Add code to let zebra set the flags appropriately.@mjstapp @pbrisset you two should put your heads together and make sure you are happy with this change. If so I would recommend that someone port the dplane_fpm_nl.c code changes over to the sonic dplane module.