Skip to content

selectorcache: test new index in the selector cache id -> selectors#43376

Closed
odinuge wants to merge 5 commits intocilium:mainfrom
odinuge:odinuge/test-new-index-selector-cache
Closed

selectorcache: test new index in the selector cache id -> selectors#43376
odinuge wants to merge 5 commits intocilium:mainfrom
odinuge:odinuge/test-new-index-selector-cache

Conversation

@odinuge
Copy link
Copy Markdown
Member

@odinuge odinuge commented Dec 16, 2025

This is still TBD waiting on the split between peer and subject selector
caches, but will be useful to avoid having to loop through all the
rules on every single regeneration. If there are say 10k+ policy rules
but only 100 IDs, this is pretty cheap compared to looping through all
of them.

This is still proof-of-concept and will require more benchmarking to
figure out if its workable or not.

Built on top of #43368, and waiting for #42580.

Fixes: #issue-number

<!-- Enter the release note text here if needed or remove this section! -->

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Dec 16, 2025
@github-actions github-actions bot added the sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. label Dec 16, 2025
@odinuge odinuge force-pushed the odinuge/test-new-index-selector-cache branch from 7e8ff46 to 5b76c61 Compare December 17, 2025 07:16
if namespace != "" {
for rKey := range p.rulesByNamespace[namespace] {

if subjectIsNode := securityIdentity.ID == identity.ReservedIdentityHost; subjectIsNode {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on #42580, I think with that in place we will always get the updated node identity in that selector cache - so I'm pretty sure we in that situation are fine to just always rely on the index inside the selector cache without looping through all the rules.

// iterating selectors requires read lock
for key, sel := range sc.selectors.All() {
selections := sel.GetSelectionsAt(version)
selections := sel.GetSortedSelectionsAt(version)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably has no need to get the sorted version, and a set is fine.

This introduces a separate sorted structure that can be used by the
components that need it. Previously all selections were sorted. This
very costly in the hotpath of the selection updates, as sorting 10k+
slices is pretty expensive. This still creates a copy of the selections,
but it does not sort it.

This can in theory result in higher memory usage since we now keep a set
(in practice a map), and the sorted slice if asked for. In
practice not many things need the sorted version, notably the dnsproxy
as well as some l7/xds configs. Its also not totally sure that those
always need it either - and in the cases they need it, they also end up
keeping it in memory - so the diff is pretty small in practice. Most
of the datapath is also using updates, so its not relying on the sorted
version all the time.

Its a bit hard to benchmark this directly without cherry-picking
results, but a simple test-case where a single selector selects say 20k
identities (eg. the 'cluster' entity), that are updated say 5 times/sec
- results in a lot of cpu cycles spent on sorting that slice.

Signed-off-by: Odin Ugedal <odin@ugedal.com>
Signed-off-by: Odin Ugedal <ougedal@palantir.com>
Signed-off-by: Odin Ugedal <ougedal@palantir.com>
Signed-off-by: Odin Ugedal <odin@ugedal.com>
Signed-off-by: Odin Ugedal <ougedal@palantir.com>
This swaps the set.Set to a part.Set, so that we can avoid cloning the
whole datastructure on each write. This does result in more allocations
in general, but overall it looks like a win since we reduce the time it
takes to clone it. The same applies to the total amount of allocated
bytes, that seems to go down drastically.

Signed-off-by: Odin Ugedal <odin@ugedal.com>
Signed-off-by: Odin Ugedal <ougedal@palantir.com>
This is still TBD waiting on the split between peer and subject selector
caches, but will be useful to avoid having to loop through all the
rules on every single regeneration. If there are say 10k+ policy rules
but only 100 IDs, this is pretty cheap compared to looping through all
of them.

This is still proof-of-concept and will require more benchmarking to
figure out if its workable or not.

Signed-off-by: Odin Ugedal <odin@ugedal.com>
Signed-off-by: Odin Ugedal <ougedal@palantir.com>
@odinuge
Copy link
Copy Markdown
Member Author

odinuge commented Dec 17, 2025

Testing this using the benchmarks in #43407 yields the following results;

benchstat /tmp/original.yml /tmp/pr2.yml
pkg: github.com/cilium/cilium/pkg/policy
                                │ /tmp/original.yml │             /tmp/pr2.yml             │
                                │      sec/op       │    sec/op     vs base                │
ResolveNoMatchingRules-12          1848418.0n ± 37%   910.6n ± 11%  -99.95% (p=0.000 n=10)
SelectorCacheIdentityUpdates-12      2037.77µ ±  9%   10.27µ ±  3%  -99.50% (p=0.000 n=10)
geomean                                1.941m         3.058µ        -99.84%

                                │ /tmp/original.yml │             /tmp/pr2.yml             │
                                │       B/op        │     B/op      vs base                │
ResolveNoMatchingRules-12            157.297Ki ± 0%   1.203Ki ± 0%  -99.24% (p=0.000 n=10)
SelectorCacheIdentityUpdates-12       164.80Ki ± 0%   11.75Ki ± 0%  -92.87% (p=0.000 n=10)
geomean                                161.0Ki        3.760Ki       -97.66%

                                │ /tmp/original.yml │             /tmp/pr2.yml             │
                                │     allocs/op     │  allocs/op   vs base                 │
ResolveNoMatchingRules-12             20025.00 ± 0%    32.00 ± 0%   -99.84% (p=0.000 n=10)
SelectorCacheIdentityUpdates-12          51.00 ± 2%   121.00 ± 0%  +137.25% (p=0.000 n=10)
geomean                                 1.011k         62.23        -93.84%

@odinuge odinuge force-pushed the odinuge/test-new-index-selector-cache branch from 5b76c61 to 9be5a28 Compare December 17, 2025 13:00
@odinuge
Copy link
Copy Markdown
Member Author

odinuge commented Dec 17, 2025

/test

@odinuge odinuge mentioned this pull request Dec 18, 2025
8 tasks
@joestringer joestringer added the dont-merge/wait-until-release Freeze window for current release is blocking non-bugfix PRs label Jan 8, 2026
@aanm aanm removed the dont-merge/wait-until-release Freeze window for current release is blocking non-bugfix PRs label Jan 14, 2026
@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Feb 14, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 1, 2026

This pull request has not seen any activity since it was marked stale.
Closing.

@github-actions github-actions bot closed this Mar 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants