Fix key rotation race condition by configuring XFRM before advertising new SPI to other nodes by daanvinken · Pull Request #44335 · cilium/cilium

daanvinken · 2026-02-13T13:24:17Z

Fix IPSec key rotation race by setting up XFRM states before advertising new key

During IPSec key rotation, the current code advertises the new key via CiliumNode CRD
before configuring XFRM states. Under CPU contention of the control plane / worker nodes, completing AllNodeValidateImplementation()
can take an extended amount of time, during which peers may learn about the new key and start sending
traffic encrypted with it. In my understanding we currntly rely on eventual consistency. In the case of this inconsistency, our ingress XFRM states aren't ready yet, these packets
are dropped, causing cilium_ipsec_xfrm_error{error="no_state", type="inbound"} errors
and service disruption.

This was observed in one of our clusters after upgrading to Cilium 1.17, which doubled the number
of XFRM states due to encrypted overlay support (from ~1,620 to ~3,240 keys on an ~1000 node
cluster). The increased CPU load during rotation widened the race window significantly.

According to the IPSec documentation:

During transition the new and old keys will be in use. The Cilium agent keeps per
endpoint data on which key is used by each endpoint and will use the correct key
if either side has not yet been updated.

This per-endpoint tracking relies on the BPF datapath selecting min(local_key, peer_key).
However, this assumes XFRM states are configured before peers learn about the new key.
When CiliumNode CRD propagation outpaces XFRM setup, peers select the new key but
the receiving node cannot decrypt it.

The fix swaps the order of operations in keyfileWatcher:

Before: load key → advertise → setup XFRM → update BPF
After:  load key → setup XFRM → advertise → update BPF

By setting up XFRM states before advertising, we ensure ingress is ready before peers
start sending traffic with the new key. The localNode.Update() call triggers an
async observer that updates the CiliumNode CRD, so peers only learn about the new key
after XFRM configuration is complete.

The BPF min-key selection logic ensures outbound traffic remains safe during the
transition, as the BPF encrypt map is updated last.

Logging Improvements

This PR also adds logging to improve observability during key rotation:

Level	Message	Location
Info	"Loaded IPsec keyfile"	`keyfileWatcher`
Info	"Retrying XFRM state add after deleting conflicting state"	`xfrmStateReplace`
Debug	"Updated BPF encrypt map with new SPI"	`setIPSecSPI`

This helps diagnose key rotation timing by showing: key load → XFRM setup → BPF update sequence.

Fixes: #29319

Fix IPSec key rotation race condition where packets were dropped due to XFRM states not being ready when peers started using the new key. Also adds logging for key rotation flow.

daanvinken · 2026-02-16T13:03:32Z

@smagnani96 Could you share your thoughts on this? I had to dig into the internals to understand this and may be completely off.
It seems we have ran into this during a key rotation CPU saturated cluster, where we had a race condition and nodes were not able to receive traffic with the new key yet, even though advertised.

smagnani96 · 2026-02-16T22:45:21Z

Hey, when you start observing the issue, are the respective XFRM states in the node where the packet are being dropped still using the old SPI or the new one?
Follow-up question: did you try using a greater value for --ipsec-key-rotation-duration?
I want to rule out the possibility that a node clears up its old XFRM state with the old SPI while still receiving traffic with that SPI from a peer.

julianwiedmann · 2026-02-17T07:44:41Z

From a quick look into the history, it looks like #29319 introduced this.

daanvinken · 2026-02-19T21:21:20Z

Thanks folks.

Let me answer first on --ipsec-key-rotation-duration: We already have this set to 10 minutes. From both logs and metrics, we can confirm the errors occur well within this window, Specifically within the first ~90 seconds of key rotation, not after the 10-minute cleanup period. This rules out premature old-key cleanup as the cause, since that parameter only controls when old XFRM states are cleaned up, and those states would still be present during this timeframe.

Regarding old vs new SPI: Unfortunately, we cannot trace down the exact sequence of events to definitively confirm which SPI the dropped packets were using. We don't have packet captures from the incident, and cilium_ipsec_xfrm_error{error="no_state", type="inbound"} doesn't indicate the SPI of the rejected packets.

However, the timing strongly suggests this is about new SPI states not being ready, rather than old SPI states being cleaned early:

Errors spike immediately when key rotation starts, correlating with "Loading IPsec keyfile" log entries
We observe a ~32 second gap between loading the keyfile and the CiliumNode CRD update under CPU load, suggesting AllNodeValidateImplementation() is CPU-starved. The CPU starvation is unrelated to this PR, but relates to a cascading cluster wide red herring we had with CFP: Configurable Jitter for IPsec Key Rotation #42721 / ipsec: Add configurable jitter to key file watcher to prevent thundering herd #44263.
The entire rotation window across all nodes takes ~90 second; well within our 10-minute cleanup window.

On another note; it may be worth it to have us add some logging for other steps during rotation. E.g. we're missing a log updating XFRM state success. The existing "Loading IPsec keyfile" and "Creating or updating CiliumNode resource" are the only success log, where the lattter is async and in a different subsystem. Wdyt?

It's hard / close to impossible for me to reproduce this error on a high-traffic large-scale cluster. How do you suggest we proceed? with @julianwiedmann mention on the previous PR it seems this may have been an unintended change /regression. Any suggestions to test this better if we deem necessary?

smagnani96

PR LGTM.
Left a couple of comments for cleaning history and signing commit, but change itself is good.

Let me answer first on --ipsec-key-rotation-duration: ... We already have this set to 10 minutes. From both logs and metrics, we can confirm the errors occur well within this window, Specifically within the first ~90 seconds of key rotation, not after the 10-minute cleanup period. This rules out premature old-key cleanup as the cause, since that parameter only controls when old XFRM states are cleaned up, and those states would still be present during this timeframe.

Yep, if that is the case then we're most likely not ready with the new key in the receiving node.

On another note; it may be worth it to have us add some logging for other steps during rotation. E.g. we're missing a log updating XFRM state success. The existing "Loading IPsec keyfile" and "Creating or updating CiliumNode resource" are the only success log, where the lattter is async and in a different subsystem. Wdyt?

Adding Debug logs never hurts, given they give more context and help debugging issues. An Info log (per-state/policy addition/removal) sounds too verbose.

It's hard / close to impossible for me to reproduce this error on a high-traffic large-scale cluster. How do you suggest we proceed? with @julianwiedmann mention on the previous PR it seems this may have been an unintended change /regression. Any suggestions to test this better if we deem necessary?

I'm aligned with Julian. The fix itself makes sense, as it restores the previous behavior. With the fix, the flow in the keyfileWatcher function (for a valid new key) should be the following:

loadIPSecKeysFile: loads the key, updates ipSecCurrentKeySPI and sets the new key in ipSecKeysGlobal
AllNodeValidateImplementation: this will then call ipsec.UpsertIPsecEndpoint(), which will use getGlobalIPsecKey. The spi returned here and used for new states/policy is the new global key in ipSecKeysGlobal.
localNode.Update: now advertise the new SPI
setIPSecSPI: update SPI also in the BPF map.

pkg/datapath/linux/ipsec/ipsec_linux.go

daanvinken · 2026-02-20T15:29:10Z

Awesome thanks, cleaning up PR.

I added some debug logs, please lmk if you have any additional thoughts

pkg/datapath/linux/ipsec/ipsec_linux.go

daanvinken · 2026-02-24T17:16:55Z

@smagnani96 all done!

smagnani96 · 2026-03-02T12:11:37Z

@smagnani96 all done!

Thanks. There are some CI flakes which are not related to this PR, but before me re-running them I'd kindly ask you to address https://github.com/cilium/cilium/actions/runs/22361850515/job/64876940663?pr=44335#step:5:38

  Error: pkg/datapath/linux/ipsec/ipsec_linux.go:1246:4: arguments should be put on separate lines (sloglint)
  			a.log.Info("Loaded IPsec keyfile", logfields.SPI, spi, logfields.Path, keyfilePath)

daanvinken · 2026-03-02T15:30:19Z

Missed that one, done.

smagnani96

#44114 Got merged, so I think this needs a rebase on top of new fresh main to make CI happy (Smoke tests).

pkg/datapath/linux/ipsec/ipsec_linux.go

Set up XFRM states before updating LocalNode.EncryptionKey to ensure ingress is ready before peers learn about the new key via CiliumNode CRD. Fixes packet drops during rotation under CPU contention where AllNodeValidateImplementation() takes longer than CRD propagation. Signed-off-by: Daan Vinken <daanvinken@tythus.com>

Add Info log when keyfile is loaded, Info logs for XFRM state conflict resolution in xfrmStateReplace, and a Debug log when the BPF encrypt map is updated. Signed-off-by: Daan Vinken <daanvinken@tythus.com>

smagnani96 · 2026-03-04T09:49:55Z

/test

daanvinken · 2026-03-09T09:43:25Z

@smagnani96 what to do with these tests? Getting a lot of emails on failed tests :D
Lmk if there's anything I can do to help out.

smagnani96 · 2026-03-09T10:04:15Z

@smagnani96 what to do with these tests? Getting a lot of emails on failed tests :D Lmk if there's anything I can do to help out.

I double-checked and I believe these are CI flakes unrelated to this PR.
Re-running the failing tests. Thanks for the ping, I sometime miss notifications.

smagnani96 · 2026-03-09T11:37:54Z

Thanks for the contribution @daanvinken 🎉

daanvinken requested a review from a team as a code owner February 13, 2026 13:24

daanvinken requested a review from smagnani96 February 13, 2026 13:24

This comment was marked as resolved.

Sign in to view

maintainer-s-little-helper bot added dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Feb 13, 2026

github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Feb 13, 2026

daanvinken marked this pull request as draft February 13, 2026 13:25

daanvinken changed the title ~~fix key rotation race by configuring XFRM before advertising~~ [DRAFT: Need to validate my hypothesis] fix key rotation race by configuring XFRM before advertising Feb 13, 2026

pchaigno added release-note/bug This PR fixes an issue in a previous release of Cilium. feature/ipsec Relates to Cilium's IPsec feature needs-backport/1.19 This PR / issue needs backporting to the v1.19 branch labels Feb 17, 2026

maintainer-s-little-helper bot removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Feb 17, 2026

daanvinken marked this pull request as ready for review February 19, 2026 21:06

This comment was marked as resolved.

Sign in to view

daanvinken changed the title ~~[DRAFT: Need to validate my hypothesis] fix key rotation race by configuring XFRM before advertising~~ Fix key rotation race condition by configuring XFRM before advertising new SPI to other nodes Feb 19, 2026

smagnani96 requested changes Feb 20, 2026

View reviewed changes

pkg/datapath/linux/ipsec/ipsec_linux.go Show resolved Hide resolved

pkg/datapath/linux/ipsec/ipsec_linux.go Show resolved Hide resolved

pkg/datapath/linux/ipsec/ipsec_linux.go Show resolved Hide resolved

daanvinken force-pushed the fix-race-condition-key-rotation branch from 29e6f8a to ec78a0c Compare February 20, 2026 15:42

maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Feb 20, 2026

smagnani96 requested changes Feb 24, 2026

View reviewed changes

daanvinken force-pushed the fix-race-condition-key-rotation branch from ec78a0c to f4fa4ed Compare February 24, 2026 17:15

adkafka approved these changes Feb 24, 2026

View reviewed changes

daanvinken force-pushed the fix-race-condition-key-rotation branch from f4fa4ed to 0213453 Compare March 2, 2026 15:29

shavyshetty approved these changes Mar 3, 2026

View reviewed changes

smagnani96 approved these changes Mar 3, 2026

View reviewed changes

pkg/datapath/linux/ipsec/ipsec_linux.go Show resolved Hide resolved

daanvinken added 2 commits March 3, 2026 22:40

ipsec: add logging for key rotation and XFRM state operations

14edd1c

Add Info log when keyfile is loaded, Info logs for XFRM state conflict resolution in xfrmStateReplace, and a Debug log when the BPF encrypt map is updated. Signed-off-by: Daan Vinken <daanvinken@tythus.com>

daanvinken force-pushed the fix-race-condition-key-rotation branch from 0213453 to 14edd1c Compare March 3, 2026 21:41

pchaigno enabled auto-merge March 9, 2026 10:26

pchaigno added this pull request to the merge queue Mar 9, 2026

maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 9, 2026

Merged via the queue into cilium:main with commit 4b1f0b2 Mar 9, 2026
78 checks passed

tommyp1ckles mentioned this pull request Mar 9, 2026

v1.19 Backports 2026-03-09 #44699

Merged

7 tasks

tommyp1ckles added backport-pending/1.19 The backport for Cilium 1.19.x for this PR is in progress. and removed needs-backport/1.19 This PR / issue needs backporting to the v1.19 branch labels Mar 9, 2026

julianwiedmann added affects/v1.16 This issue affects v1.16 branch affects/v1.17 This issue affects v1.17 branch affects/v1.18 This issue affects v1.18 branch labels Mar 10, 2026

adkafka mentioned this pull request Mar 10, 2026

ipsec: publish spi after XFRM state configuration #44701

Open

github-actions bot added backport-done/1.19 The backport for Cilium 1.19.x for this PR is done. and removed backport-pending/1.19 The backport for Cilium 1.19.x for this PR is in progress. labels Mar 12, 2026

Conversation

daanvinken commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Logging Improvements

Uh oh!

This comment was marked as resolved.

daanvinken commented Feb 16, 2026

Uh oh!

smagnani96 commented Feb 16, 2026

Uh oh!

julianwiedmann commented Feb 17, 2026

Uh oh!

daanvinken commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

smagnani96 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

daanvinken commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

daanvinken commented Feb 24, 2026

Uh oh!

smagnani96 commented Mar 2, 2026

Uh oh!

daanvinken commented Mar 2, 2026

Uh oh!

smagnani96 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

smagnani96 commented Mar 4, 2026

Uh oh!

daanvinken commented Mar 9, 2026

Uh oh!

smagnani96 commented Mar 9, 2026

Uh oh!

Uh oh!

smagnani96 commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

daanvinken commented Feb 13, 2026 •

edited

Loading

daanvinken commented Feb 19, 2026 •

edited

Loading

daanvinken commented Feb 20, 2026 •

edited

Loading