policy: introduce lazy ordered/sorted selections by odinuge · Pull Request #43368 · cilium/cilium

odinuge · 2025-12-16T11:39:06Z

This introduces a separate sorted structure that can be used by the components that need it. Previously all selections were sorted. This very cosly in the hotpath of the selection updates, as sorting 10k+ slices is pretty expensive. This still creates a copy of the selections, but it does not sort it.

This can in theory result in higher memory usage since we now keep a set (thats in practice a map), and the sorted slice if asked for. In practice not many things need the sorted version, noteably the dnsproxy as well as some l7/xds configs. Its also not totally sure that those always need it either - and in the cases they need it, they also end up keeping it in memory - so the diff is pretty small in practice. Most of the datapath is also using updates, so its not relying on the sorted version all the time.

Its a bit hard to benchmark this directly without cherry-picking results, but a simple test-case where a single selector selects say 20k identities (eg. the 'cluster' entity), that are updated say 5 times/sec - results in a lot of cpu cycles spent on sorting that slice.

Fixes: #issue-number

n/a

odinuge · 2025-12-16T12:02:19Z

/test

odinuge · 2025-12-16T12:50:15Z

/test

odinuge · 2025-12-16T15:12:45Z

While testing a set of identities changing concurrently, while having a selector that selects the `cluster` entity we see that a lot of time is spent doing this sorting.

odinuge · 2025-12-17T10:18:18Z

pkg/policy/selectorcache_selector.go

 	id               types.SelectorId
 	users            map[CachedSelectionUser]struct{}
-	cachedSelections map[identity.NumericIdentity]struct{}
+	cachedSelections set.Set[identity.NumericIdentity]


Looking into doing part.Set[identity.NumericIdentity] here as well. That could be useful to reduce the number of allocations - but it will most likely do more allocations that the current version in main, but potentially be faster and allocate less memory.

odinuge · 2025-12-17T12:38:43Z

Benchmarks using #43407;

pkg: github.com/cilium/cilium/pkg/policy
                                │ /tmp/original.yml │             /tmp/pr.yml              │
                                │      sec/op       │    sec/op     vs base                │
ResolveNoMatchingRules-12              1.848m ± 37%   2.126m ± 48%        ~ (p=0.052 n=10)
SelectorCacheIdentityUpdates-12      2037.77µ ±  9%   10.59µ ±  4%  -99.48% (p=0.000 n=10)
geomean                                1.941m         150.0µ        -92.27%

                                │ /tmp/original.yml │             /tmp/pr.yml              │
                                │       B/op        │     B/op      vs base                │
ResolveNoMatchingRules-12              157.3Ki ± 0%   157.3Ki ± 0%        ~ (p=0.584 n=10)
SelectorCacheIdentityUpdates-12       164.80Ki ± 0%   11.52Ki ± 0%  -93.01% (p=0.000 n=10)
geomean                                161.0Ki        42.57Ki       -73.56%

                                │ /tmp/original.yml │              /tmp/pr.yml               │
                                │     allocs/op     │  allocs/op   vs base                   │
ResolveNoMatchingRules-12               20.02k ± 0%   20.02k ± 0%         ~ (p=1.000 n=10) ¹
SelectorCacheIdentityUpdates-12          51.00 ± 2%   116.00 ± 0%  +127.45% (p=0.000 n=10)
geomean                                 1.011k        1.524k        +50.81%

So it seems like this is much faster at identity updates, but it does have a significantly higher allocation count; but even more significant reduction in cpu time and allocation size.

This introduces a separate sorted structure that can be used by the components that need it. Previously all selections were sorted. This very costly in the hotpath of the selection updates, as sorting 10k+ slices is pretty expensive. This still creates a copy of the selections, but it does not sort it. This can in theory result in higher memory usage since we now keep a set (in practice a map), and the sorted slice if asked for. In practice not many things need the sorted version, notably the dnsproxy as well as some l7/xds configs. Its also not totally sure that those always need it either - and in the cases they need it, they also end up keeping it in memory - so the diff is pretty small in practice. Most of the datapath is also using updates, so its not relying on the sorted version all the time. Its a bit hard to benchmark this directly without cherry-picking results, but a simple test-case where a single selector selects say 20k identities (eg. the 'cluster' entity), that are updated say 5 times/sec - results in a lot of cpu cycles spent on sorting that slice. Signed-off-by: Odin Ugedal <odin@ugedal.com> Signed-off-by: Odin Ugedal <ougedal@palantir.com>

Signed-off-by: Odin Ugedal <ougedal@palantir.com>

Signed-off-by: Odin Ugedal <odin@ugedal.com> Signed-off-by: Odin Ugedal <ougedal@palantir.com>

odinuge · 2025-12-17T13:18:58Z

/test

This swaps the set.Set to a part.Set, so that we can avoid cloning the whole datastructure on each write. This does result in more allocations in general, but overall it looks like a win since we reduce the time it takes to clone it. The same applies to the total amount of allocated bytes, that seems to go down drastically. Signed-off-by: Odin Ugedal <odin@ugedal.com> Signed-off-by: Odin Ugedal <ougedal@palantir.com>

odinuge · 2025-12-17T13:54:38Z

/test

odinuge · 2026-01-07T17:33:44Z

Seems like we are having conflicts. I'll wait for #42580 to be worked through and merge before rebasing this. 😄

github-actions · 2026-02-14T02:13:53Z

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

github-actions · 2026-03-01T02:14:26Z

This pull request has not seen any activity since it was marked stale.
Closing.

odinuge · 2026-03-06T09:58:24Z

I can't seem to reopen, so I'll open a new one when I get time. I've been busy mitigating other critical bugs like #44328 the past few weeks, and to get v1.17 working so this hasn't been prioritized. So I'm still planning to do this work!

maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Dec 16, 2025

github-actions bot added the sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. label Dec 16, 2025

odinuge force-pushed the odinuge/selector-cache-perf-large-list-of-ids branch from 9e9b35c to 3196140 Compare December 16, 2025 12:00

odinuge force-pushed the odinuge/selector-cache-perf-large-list-of-ids branch 2 times, most recently from 1637272 to 35f4c45 Compare December 16, 2025 12:39

odinuge changed the title ~~policy: introduce lazy ordered selections~~ policy: introduce lazy ordered/sorted selections Dec 16, 2025

odinuge mentioned this pull request Dec 16, 2025

selectorcache: test new index in the selector cache id -> selectors #43376

Closed

odinuge commented Dec 17, 2025

View reviewed changes

odinuge mentioned this pull request Dec 17, 2025

policy: add new benchmarks for identity updates and large policy repository #43407

Merged

odinuge-work added 3 commits December 17, 2025 13:48

policy: rewrite selections API to not expose internal objects

ed044be

Signed-off-by: Odin Ugedal <ougedal@palantir.com>

policy: remove duplicate All iterator func

0d995d8

Signed-off-by: Odin Ugedal <odin@ugedal.com> Signed-off-by: Odin Ugedal <ougedal@palantir.com>

odinuge force-pushed the odinuge/selector-cache-perf-large-list-of-ids branch 2 times, most recently from 0ed7a43 to d39b47d Compare December 17, 2025 13:15

odinuge force-pushed the odinuge/selector-cache-perf-large-list-of-ids branch from d39b47d to 895b305 Compare December 17, 2025 13:54

odinuge mentioned this pull request Dec 18, 2025

policy: split selectorCache in two #42580

Merged

8 tasks

joestringer added the dont-merge/wait-until-release Freeze window for current release is blocking non-bugfix PRs label Jan 8, 2026

aanm removed the dont-merge/wait-until-release Freeze window for current release is blocking non-bugfix PRs label Jan 14, 2026

github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Feb 14, 2026

github-actions bot closed this Mar 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

policy: introduce lazy ordered/sorted selections#43368

policy: introduce lazy ordered/sorted selections#43368
odinuge wants to merge 4 commits intocilium:mainfrom
odinuge:odinuge/selector-cache-perf-large-list-of-ids

odinuge commented Dec 16, 2025 •

edited

Loading

Uh oh!

odinuge commented Dec 16, 2025

Uh oh!

odinuge commented Dec 16, 2025

Uh oh!

odinuge commented Dec 16, 2025

Uh oh!

odinuge Dec 17, 2025

Uh oh!

odinuge commented Dec 17, 2025 •

edited

Loading

Uh oh!

odinuge commented Dec 17, 2025

Uh oh!

odinuge commented Dec 17, 2025

Uh oh!

odinuge commented Jan 7, 2026

Uh oh!

github-actions bot commented Feb 14, 2026

Uh oh!

github-actions bot commented Mar 1, 2026

Uh oh!

odinuge commented Mar 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

odinuge commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

odinuge commented Dec 16, 2025

Uh oh!

odinuge commented Dec 16, 2025

Uh oh!

odinuge commented Dec 16, 2025

Uh oh!

odinuge Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

odinuge commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

odinuge commented Dec 17, 2025

Uh oh!

odinuge commented Dec 17, 2025

Uh oh!

odinuge commented Jan 7, 2026

Uh oh!

github-actions bot commented Feb 14, 2026

Uh oh!

github-actions bot commented Mar 1, 2026

Uh oh!

odinuge commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

odinuge commented Dec 16, 2025 •

edited

Loading

odinuge commented Dec 17, 2025 •

edited

Loading

odinuge commented Mar 6, 2026 •

edited

Loading