Skip to content

datapath/neighbor: Add ratelimit to desired neighbor calculation#43928

Merged
aanm merged 1 commit intocilium:mainfrom
dylandreimerink:feature/ratelimit-neighbor-reconciliation
Mar 6, 2026
Merged

datapath/neighbor: Add ratelimit to desired neighbor calculation#43928
aanm merged 1 commit intocilium:mainfrom
dylandreimerink:feature/ratelimit-neighbor-reconciliation

Conversation

@dylandreimerink
Copy link
Copy Markdown
Member

During investigation of a memory leak in v1.18, one of the pprof profiles showed a high amount of memory usage in
netlink/nl.(*NetlinkSocket).Receive.

#41623 (comment)

This is most likely due to a lack of rate limiting in the desired neighbor calculation which does a lot of netlink requests to get next hops.

So this PR limits desired neighbor calculation to once every 15 seconds. In the worst case scenario where the default gateway changes, XDP might not be able to forward traffic for up to 15 seconds. Such a scenario should only happen when configuration changes are made or when the network topology changes, and thus this seems an acceptable tradeoff.

Add rate limiting to neighbor reconciler to reduce CPU usage and memory churn

@dylandreimerink dylandreimerink requested a review from a team as a code owner January 22, 2026 12:52
@dylandreimerink dylandreimerink added kind/enhancement This would improve or streamline existing functionality. release-note/bug This PR fixes an issue in a previous release of Cilium. needs-backport/1.19 This PR / issue needs backporting to the v1.19 branch labels Jan 22, 2026
@dylandreimerink
Copy link
Copy Markdown
Member Author

/test

@dylandreimerink dylandreimerink force-pushed the feature/ratelimit-neighbor-reconciliation branch from 008c660 to 7babedf Compare January 29, 2026 12:49
@dylandreimerink
Copy link
Copy Markdown
Member Author

/test

@aanm aanm enabled auto-merge February 2, 2026 14:42
@dylandreimerink dylandreimerink force-pushed the feature/ratelimit-neighbor-reconciliation branch from 7babedf to 565ca3d Compare February 4, 2026 09:41
@dylandreimerink
Copy link
Copy Markdown
Member Author

/test

@dylandreimerink
Copy link
Copy Markdown
Member Author

@cilium/sig-datapath any chance of getting a review 👋 ?

Copy link
Copy Markdown
Contributor

@smagnani96 smagnani96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes LGTM.
Left a comment on the interval: if you're seeking a feedback on the specific amount chosen (15sec), I'd ask you to pull in other reviewers.

@joestringer
Copy link
Copy Markdown
Member

@ldelossa do you have bandwidth to help with @cilium/sig-datapath reviews? If not, you might consider either marking your profile as "busy" on GitHub or (temporarily) stepping out of the team until you have the bandwidth.

Copy link
Copy Markdown
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nuances and nits discussed above, but I'll leave it up to you if any of the ideas discussed in the threads are useful to integrate (either in this PR or a subsequent one).

Copy link
Copy Markdown
Contributor

@ldelossa ldelossa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, sorry for delay.

@maintainer-s-little-helper maintainer-s-little-helper bot added ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels Mar 3, 2026
@joestringer joestringer removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 3, 2026
@joestringer
Copy link
Copy Markdown
Member

(Ready but deferring to @dylandreimerink to resolve comments and apply the PR)

@dylandreimerink dylandreimerink force-pushed the feature/ratelimit-neighbor-reconciliation branch 2 times, most recently from a2839e2 to 076a22d Compare March 6, 2026 10:56
@dylandreimerink
Copy link
Copy Markdown
Member Author

/test

@dylandreimerink dylandreimerink force-pushed the feature/ratelimit-neighbor-reconciliation branch from 076a22d to 97d1773 Compare March 6, 2026 11:05
@dylandreimerink dylandreimerink requested a review from a team as a code owner March 6, 2026 11:05
@dylandreimerink dylandreimerink requested a review from brlbil March 6, 2026 11:05
@dylandreimerink
Copy link
Copy Markdown
Member Author

/test

During investigation of a memory leak in v1.18, one of the pprof
profiles showed a high amount of memory usage in
`netlink/nl.(*NetlinkSocket).Receive`.

cilium#41623 (comment)

This is most likely due to a lack of rate limiting in the desired
neighbor calculation which does a lot of netlink requests to get next hops.

So this commit limits desired neighbor calculation to once every 15
seconds. In the worst case scenario where the default gateway changes,
XDP might not be able to forward traffic for up to 15 seconds. Such
a scenario should only happen when configuration changes are made or
when the network topology changes, and thus this seems an acceptable
tradeoff.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
@dylandreimerink dylandreimerink force-pushed the feature/ratelimit-neighbor-reconciliation branch from 97d1773 to b32a781 Compare March 6, 2026 12:13
@dylandreimerink dylandreimerink requested a review from a team as a code owner March 6, 2026 12:13
@dylandreimerink
Copy link
Copy Markdown
Member Author

/test

@aanm aanm added this pull request to the merge queue Mar 6, 2026
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 6, 2026
Merged via the queue into cilium:main with commit f38aa1b Mar 6, 2026
83 of 85 checks passed
@tommyp1ckles tommyp1ckles mentioned this pull request Mar 9, 2026
7 tasks
@tommyp1ckles tommyp1ckles added backport-pending/1.19 The backport for Cilium 1.19.x for this PR is in progress. and removed needs-backport/1.19 This PR / issue needs backporting to the v1.19 branch labels Mar 9, 2026
@github-actions github-actions bot added backport-done/1.19 The backport for Cilium 1.19.x for this PR is done. and removed backport-pending/1.19 The backport for Cilium 1.19.x for this PR is in progress. labels Mar 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-done/1.19 The backport for Cilium 1.19.x for this PR is done. kind/enhancement This would improve or streamline existing functionality. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants