Skip to content

Fix performance bug in L7 policy proxy redirect handling#44613

Merged
fristonio merged 2 commits intomainfrom
pr/fristonio/fix/l7-policy-redirect-perf
Mar 11, 2026
Merged

Fix performance bug in L7 policy proxy redirect handling#44613
fristonio merged 2 commits intomainfrom
pr/fristonio/fix/l7-policy-redirect-perf

Conversation

@fristonio
Copy link
Copy Markdown
Member

Sample result from testing with(cl2 module coming soon):

  • 32 Endpoints
  • 512 Policies
  • 4000 Redirects per endpoint

Before:
image

After:
image

See commit message for details.

This commit configures custom exponential histogram buckets for endpoint
regeneration and policy implementation delay metrics. Currently, the
histogram bucket for these metrics is capped at 10s. This leads to
incorrect interpretation of data in high scale environment where in some
cases we have seen the operations to take more than a minute.

With this change the metrics are configured with exponential bucket
starting from 1us with a step of 10 upto 100s.

Signed-off-by: Deepesh Pathak <deepeshpathak09@gmail.com>
This commit fixes a performance bug in cilium-agent endpoint policy
regeneration handling. Currently for each endpoint proxy
redirect(loosely corresponds to L7 dependent L4 elements in policy), we
loop through all existing redirects to account for total count. This
leads to a O(n^2) complexity loop during policy proxy configuration.
In clusters with large scale l7 policies this causes huge delays for
endpoint policy regeneration.

This commit fixes the issue by keeping a running total count of proxy
redirects making the overall operation O(n). In some cases this improves
endpoint regeneration time by 10x.

Signed-off-by: Deepesh Pathak <deepeshpathak09@gmail.com>
@fristonio fristonio added kind/performance There is a performance impact of this. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. affects/v1.17 This issue affects v1.17 branch needs-backport/1.18 This PR / issue needs backporting to the v1.18 branch needs-backport/1.19 This PR / issue needs backporting to the v1.19 branch labels Mar 3, 2026
@fristonio
Copy link
Copy Markdown
Member Author

/test

@jrajahalme
Copy link
Copy Markdown
Member

This is compounded by the fact that Proxy in general, and the redirects member in particular is shared acoss all endpoints in the node, so cardinality grows both with the number or endpoints and redirects per endpoint.

@fristonio fristonio marked this pull request as ready for review March 4, 2026 16:34
@fristonio fristonio requested review from a team as code owners March 4, 2026 16:34
@fristonio fristonio requested review from chancez and sayboras March 4, 2026 16:34
Copy link
Copy Markdown
Member

@sayboras sayboras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:amazing:

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 11, 2026
@fristonio fristonio added this pull request to the merge queue Mar 11, 2026
Merged via the queue into main with commit e5bf300 Mar 11, 2026
866 of 905 checks passed
@fristonio fristonio deleted the pr/fristonio/fix/l7-policy-redirect-perf branch March 11, 2026 15:46
@smagnani96 smagnani96 mentioned this pull request Mar 16, 2026
4 tasks
@smagnani96 smagnani96 added backport-pending/1.18 The backport for Cilium 1.18.x for this PR is in progress. and removed needs-backport/1.18 This PR / issue needs backporting to the v1.18 branch labels Mar 16, 2026
@smagnani96 smagnani96 mentioned this pull request Mar 16, 2026
10 tasks
@smagnani96 smagnani96 added backport-pending/1.19 The backport for Cilium 1.19.x for this PR is in progress. and removed needs-backport/1.19 This PR / issue needs backporting to the v1.19 branch labels Mar 16, 2026
@github-actions github-actions bot removed the backport-pending/1.18 The backport for Cilium 1.18.x for this PR is in progress. label Mar 24, 2026
@github-actions github-actions bot added backport-done/1.18 The backport for Cilium 1.18.x for this PR is done. backport-done/1.19 The backport for Cilium 1.19.x for this PR is done. and removed backport-pending/1.19 The backport for Cilium 1.19.x for this PR is in progress. labels Mar 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

affects/v1.17 This issue affects v1.17 branch backport-done/1.18 The backport for Cilium 1.18.x for this PR is done. backport-done/1.19 The backport for Cilium 1.19.x for this PR is done. kind/performance There is a performance impact of this. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants