Skip to content

ci: skip plain-text TCP RST packets in bpftrace#36962

Merged
aanm merged 1 commit intomainfrom
pr/smagnani96/check-ipsec-leak-tolerate-rst
Jan 17, 2025
Merged

ci: skip plain-text TCP RST packets in bpftrace#36962
aanm merged 1 commit intomainfrom
pr/smagnani96/check-ipsec-leak-tolerate-rst

Conversation

@smagnani96
Copy link
Copy Markdown
Contributor

@smagnani96 smagnani96 commented Jan 13, 2025

In #35485, we identified a leaked TCP RST packet generated from the kernel client in response to a FIN-ACK coming from the envoy proxy at the server-side. We noticed the behavior is due to a set of timers in the proxy that make it attempt closing (already closed) connections after specific intervals, causing the client to reply with a reset packet. The main issue is that such kernel-level packets don't contain the MARK we rely on to forward traffic from/to proxy, therefore letting the plain text packet through the default path.

Let's tolerate such RST packets with this patch in all the cases we'd trace it as plain-text. That means:

  1. $src_is_pod && $dst_is_pod
  2. $pod_to_pod_via_proxy && ($src_is_pod || $dst_is_pod)

Depending on the Cilium config, we observe leaked packets with source address either pod IP or ingress IP. The patch included both the two cases.

Fixes: #35485

Skip tracking unmarked plain-text TCP RST packets generated from proxy timeouts in the CI bpftrace script.

In #35485, we identified a leaked TCP RST packet generated from the
kernel client in response to a FIN-ACK coming from the envoy proxy at the
server-side. We noticed the behavior is due to a set of timers in the
proxy that make it attempt closing (already closed) connections after
specific intervals, causing the client to reply with a reset packet.
The main issue is that such kernel-level packets don't contain the MARK
we rely on to forward traffic from/to proxy, therefore letting the plain
text packet through the default path.

Let's tolerate such RST packets with this patch in all the cases we'd
trace it as plain-text. That means:

1. $src_is_pod && $dst_is_pod
2. $pod_to_pod_via_proxy && ($src_is_pod || $dst_is_pod)

Depending on the Cilium config, we observe leaked packets with source
address either pod IP or ingress IP. The patch included both the two cases.

Fixes: #35485

Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jan 13, 2025
@smagnani96
Copy link
Copy Markdown
Contributor Author

/test

Copy link
Copy Markdown
Member

@jschwinger233 jschwinger233 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Since we have a scenario to reproduce: pod-to-pod-with-l7-policy-encryption for ipv6, maybe we can draft a separate test pr, add a commit to run ipv6 pod-to-pod-with-l7-policy-encryption on top of this, to verify the issue is really gone.

#35485 will be closed by merging this PR. I'll open a new issue (#37009) with our current understandings and proposed long term solution.

@smagnani96
Copy link
Copy Markdown
Contributor Author

Plain TCP RST problem in CI is solved, we can merge this.
However, I opened a new issue (#37051) for a new problem while trying to enable, upon this fix, the pod-to-pod-with-l7-policy-encryption test. The new problem is unrelated to this plain TCP RST issue.

@smagnani96 smagnani96 added area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. release-note/ci This PR makes changes to the CI. and removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Jan 17, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot added ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels Jan 17, 2025
@smagnani96 smagnani96 added kind/enhancement This would improve or streamline existing functionality. feature/ipsec Relates to Cilium's IPsec feature and removed ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels Jan 17, 2025
@smagnani96 smagnani96 marked this pull request as ready for review January 17, 2025 17:21
@smagnani96 smagnani96 requested review from a team as code owners January 17, 2025 17:21
@aanm aanm added this pull request to the merge queue Jan 17, 2025
Merged via the queue into main with commit 09551d2 Jan 17, 2025
@aanm aanm deleted the pr/smagnani96/check-ipsec-leak-tolerate-rst branch January 17, 2025 19:00
@julianwiedmann
Copy link
Copy Markdown
Member

I think it would be good to bring this back into all releases to stop the CI bleed. From the looks of it we'll also need #36364 to make this apply cleanly.

@julianwiedmann julianwiedmann added needs-backport/1.14 needs-backport/1.17 This PR / issue needs backporting to the v1.17 branch labels Jan 20, 2025
@smagnani96 smagnani96 added the backport/author The backport will be carried out by the author of the PR. label Jan 20, 2025
@smagnani96
Copy link
Copy Markdown
Contributor Author

I think it would be good to bring this back into all releases to stop the CI bleed. From the looks of it we'll also need #36364 to make this apply cleanly.

Yep, I've got this on my todos, adding backport/author.
Should the backports be in separate PRs though, right?

@julianwiedmann
Copy link
Copy Markdown
Member

I think it would be good to bring this back into all releases to stop the CI bleed. From the looks of it we'll also need #36364 to make this apply cleanly.

Yep, I've got this on my todos, adding backport/author. Should the backports be in separate PRs though, right?

Bundled backport PRs are fine, if you're solving a class of issues or bring dependencies along (for instance #33575).

@smagnani96
Copy link
Copy Markdown
Contributor Author

Bundled backport PRs are fine, if you're solving a class of issues or bring dependencies along (for instance #33575).

That's an exact example of what I was looking for, many thanks.

@smagnani96 smagnani96 removed the backport/author The backport will be carried out by the author of the PR. label Jan 21, 2025
@rastislavs rastislavs mentioned this pull request Jan 22, 2025
19 tasks
@rastislavs rastislavs mentioned this pull request Jan 22, 2025
6 tasks
@rastislavs rastislavs mentioned this pull request Jan 22, 2025
4 tasks
@github-actions github-actions bot added backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. backport-done/1.16 The backport for Cilium 1.16.x for this PR is done. and removed backport-pending/1.14 labels Jan 22, 2025
@rastislavs rastislavs mentioned this pull request Jan 24, 2025
14 tasks
@rastislavs rastislavs added backport-pending/1.17 The backport for Cilium 1.17.x for this PR is in progress. and removed needs-backport/1.17 This PR / issue needs backporting to the v1.17 branch labels Jan 24, 2025
@github-actions github-actions bot added backport-done/1.17 The backport for Cilium 1.17.x for this PR is done. and removed backport-pending/1.17 The backport for Cilium 1.17.x for this PR is in progress. labels Jan 24, 2025
smagnani96 added a commit that referenced this pull request Dec 9, 2025
This substantially reverts #41765,
re-applying #36962 (modified code
for readability).

During our CLI tests, we observe some auto-generated TCP RST packets from
the kernel in response to TCP-FIN packets sent by, most likely,
some idle's timeout expiring. Envoy would generate TCP-FIN, but the
kernel replies with a RST, given there's not a socket anymore listening
to that port.

We thought that after VinE we were able to catch such packets in our
to-netdev program, but we kept observing them while adding unrelated
tests to our CLI suite (see https://github.com/cilium/cilium/actions/runs/19833684454).
Thus, we revert the latest change, and keep ignoring TCP RST packets.

Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
aanm pushed a commit that referenced this pull request Dec 15, 2025
This substantially reverts #41765,
re-applying #36962 (modified code
for readability).

During our CLI tests, we observe some auto-generated TCP RST packets from
the kernel in response to TCP-FIN packets sent by, most likely,
some idle's timeout expiring. Envoy would generate TCP-FIN, but the
kernel replies with a RST, given there's not a socket anymore listening
to that port.

We thought that after VinE we were able to catch such packets in our
to-netdev program, but we kept observing them while adding unrelated
tests to our CLI suite (see https://github.com/cilium/cilium/actions/runs/19833684454).
Thus, we revert the latest change, and keep ignoring TCP RST packets.

Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
github-merge-queue bot pushed a commit that referenced this pull request Dec 16, 2025
This substantially reverts #41765,
re-applying #36962 (modified code
for readability).

During our CLI tests, we observe some auto-generated TCP RST packets from
the kernel in response to TCP-FIN packets sent by, most likely,
some idle's timeout expiring. Envoy would generate TCP-FIN, but the
kernel replies with a RST, given there's not a socket anymore listening
to that port.

We thought that after VinE we were able to catch such packets in our
to-netdev program, but we kept observing them while adding unrelated
tests to our CLI suite (see https://github.com/cilium/cilium/actions/runs/19833684454).
Thus, we revert the latest change, and keep ignoring TCP RST packets.

Signed-off-by: Simone Magnani <simone.magnani@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/encryption Impacts encryption support such as IPSec, WireGuard, or kTLS. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. backport-done/1.16 The backport for Cilium 1.16.x for this PR is done. backport-done/1.17 The backport for Cilium 1.17.x for this PR is done. feature/ipsec Relates to Cilium's IPsec feature kind/enhancement This would improve or streamline existing functionality. release-note/ci This PR makes changes to the CI.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

plain TCP RST packet in pod-to-pod-with-l7-policy-encryption for IPv6+IPSec+VXLan

5 participants