Skip to content

policy: return ICMP "Destination unreachable" on ipv4 ingress policy denials#42068

Closed
antonipp wants to merge 1 commit intocilium:mainfrom
DataDog:ai/icmp-drop-ingress-ipv4
Closed

policy: return ICMP "Destination unreachable" on ipv4 ingress policy denials#42068
antonipp wants to merge 1 commit intocilium:mainfrom
DataDog:ai/icmp-drop-ingress-ipv4

Conversation

@antonipp
Copy link
Copy Markdown
Contributor

@antonipp antonipp commented Oct 8, 2025

Description

Follow-up to #41406. See #41859 for the full list of TODOs.

This PR adds the ICMP response functionality to ingress policy denials.
The functionality works for local and remote endpoints with/without tunneling and with/without eBPF Host Routing.

Testing

  • Created a new unit test
  • Performed integration tests: backported the PR to Cilium 1.18.2, then created a pod with an ingress policy denying all traffic. Then tried to reach this pod from multiple different places and confirmed ping was returning "Packet filtered":
     # ping 10.130.112.4
     PING 10.130.112.4 (10.130.112.4) 56(84) bytes of data.
     From 10.130.112.4 icmp_seq=1 Packet filtered
     From 10.130.112.4 icmp_seq=2 Packet filtered
     [...]
    
    • Local pod from same node [legacy routing]
    • Pod from a different node in the same cluster [legacy routing]
    • Pod in a different cluster [legacy routing]
    • Local pod from same node [eBPF host routing]
    • Pod from a different node in the same cluster [eBPF host routing]
    • Pod in a different cluster [eBPF host routing]
  • Set up Cilium with tunneling in kind and ran the connectivity test suite:
     $ cilium connectivity test
     [...]
     ✅ [cilium-test-1] All 77 tests (1087 actions) successful, 46 tests skipped, 1 scenarios skipped.
    

policy: add option to return ICMP "Destination unreachable" on ipv4 ingress policy denials

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Oct 8, 2025
@antonipp antonipp force-pushed the ai/icmp-drop-ingress-ipv4 branch from 128984c to 1ba7880 Compare October 8, 2025 10:14
@antonipp antonipp marked this pull request as ready for review October 8, 2025 10:30
@antonipp antonipp requested review from a team as code owners October 8, 2025 10:30
@antonipp
Copy link
Copy Markdown
Contributor Author

antonipp commented Oct 8, 2025

/test

@julianwiedmann julianwiedmann self-requested a review October 8, 2025 11:19
@pchaigno pchaigno added area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. labels Oct 9, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Oct 9, 2025
Copy link
Copy Markdown
Member

@julianwiedmann julianwiedmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Unfortunately I don't have a quick way to set up a real cluster with tunneling rn)

Have you tried using the kind-based development setup?

bpf/bpf_lxc.c Outdated
Comment on lines +2523 to +2564
#if defined(TUNNEL_MODE)
if (from_tunnel) {
struct remote_endpoint_info *info;
struct trace_ctx trace = {
.reason = TRACE_REASON_POLICY,
.monitor = TRACE_PAYLOAD_LEN,
};

info = lookup_ip4_remote_endpoint(ip4->daddr, 0);
if (info && info->flag_has_tunnel_ep) {
ret = encap_and_redirect_lxc(ctx, info, SECLABEL_IPV4,
info->sec_identity, &trace,
bpf_htons(ETH_P_IP));
if (ret == CTX_ACT_REDIRECT) {
update_metrics(ctx_full_len(ctx), metric_dir,
__DROP_REASON(verdict));
return ret;
}
if (IS_ERR(ret))
goto drop_err;
if (!revalidate_data(ctx, &data, &data_end, &ip4)) {
ret = DROP_INVALID;
goto drop_err;
}
}
}
#endif /* TUNNEL_MODE */

if (is_defined(ENABLE_HOST_ROUTING)) {
ret = fib_redirect_v4(ctx, ETH_HLEN, ip4, false, false, &ext_err, &oif);
if (fib_ok(ret)) {
update_metrics(ctx_full_len(ctx), metric_dir,
__DROP_REASON(verdict));
return ret;
}
} else {
ret = redirect_self(ctx);
if (!IS_ERR(ret)) {
update_metrics(ctx_full_len(ctx), metric_dir,
__DROP_REASON(verdict));
return ret;
}
Copy link
Copy Markdown
Member

@julianwiedmann julianwiedmann Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference - I'd expect us to follow this path here 1-to-1, as if the ICMP was genuinely sent out by the pod and needs forwarding to its destination. To the point where this might have to be a shared helper.

Unless there's cases in there which don't apply for an ICMP reply?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are right, it was a bit weird re-creating this logic. I refactored the code to extract it to a shared helper function ipv4_forward_to_destination which is now called in both places. I also set up kind and tested the tunneling scenario (updated the PR description accordingly)

@antonipp antonipp force-pushed the ai/icmp-drop-ingress-ipv4 branch 2 times, most recently from 1ccf465 to 84836e6 Compare October 21, 2025 09:16
@antonipp
Copy link
Copy Markdown
Contributor Author

/test

This expands the functionality enabled by --policy-deny-response=icmp from egress-only to
egress and ingress. Now ICMP responses can be sent for ingress policy denials as well.

Signed-off-by: Anton Ippolitov <anton.ippolitov@datadoghq.com>
@antonipp antonipp force-pushed the ai/icmp-drop-ingress-ipv4 branch from 84836e6 to 48d74f0 Compare October 21, 2025 15:34
@antonipp
Copy link
Copy Markdown
Contributor Author

/test

@julianwiedmann julianwiedmann self-requested a review October 23, 2025 07:06
Copy link
Copy Markdown
Member

@julianwiedmann julianwiedmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Had a quick first look to find my bearings, see some comments below.

Noting down the full(?) matrix of possible packet paths:

  1. arrival via {to-container, tailcall program}
  2. origin is {self, local pod, local host, local proxy, remote pod/host, world}

For the local delivery paths, we'll need to have a good eye on what expectations redirect_ep() has, to determine whether it can use bpf_redirect_peer().

Comment on lines -2381 to +2458
ret = redirect_self(ctx);

if (!IS_ERR(ret)) {
update_metrics(ctx_full_len(ctx), METRIC_EGRESS, __DROP_REASON(verdict));
info = lookup_ip4_remote_endpoint(ip4->daddr, 0);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, interesting that we can also use ipv4_forward_to_destination() for the egress-deny path! I'd suggest to do that in a follow-up PR though, so we can evaluate it on its own.

__u8 metric_dir = (__u8)ctx_load_meta(ctx, CB_POLICY_DENY_DIR);
__u32 src_label = metric_dir == METRIC_EGRESS ?
SECLABEL_IPV4 : ctx_load_meta(ctx, CB_POLICY_DENY_SRC_ID);
__u32 dst_sec_identity = WORLD_IPV4_ID;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm we previously applied network policy for the packet, so we must have already determined the peer's identity. Ideally we're re-using that identity here, and not try to resolve it again.

That allows us to be consistent, and handles cases where the actual identity is different from what's in the ipcache (relevant for eg. proxy connections).

Comment on lines +2465 to +2467
ret = ipv4_forward_to_destination(ctx, ip4, NULL, &dst_sec_identity,
&ext_err, &ct_state_local, CT_NEW,
info, &trace, false, false, false, 0, 0);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh it's going to be fun figuring out all the little corner-cases from passing "default" values here :)

ipv4_forward_to_destination(struct __ctx_buff *ctx,
struct iphdr *ip4,
struct ipv4_ct_tuple *tuple,
const __u32 *dst_sec_identity,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dst_sec_identity can be a regular __u32 now I think, we're not writing to it from this function.

Comment on lines -884 to +891
static __always_inline int handle_ipv4_from_lxc(struct __ctx_buff *ctx, __u32 *dst_sec_identity,
__s8 *ext_err)
static __always_inline int
ipv4_forward_to_destination(struct __ctx_buff *ctx,
struct iphdr *ip4,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please split this big patch into (at least) two parts, so it's easier to review (and partially revert if needed):

  1. pure refactor, only splitting off the ipv4_forward_to_destination() logic.
  2. adding all the new pieces for the ingress-deny ICMP message.

@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Nov 29, 2025
@antonipp antonipp marked this pull request as draft December 2, 2025 10:28
@github-actions github-actions bot removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Dec 3, 2025
@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 2, 2026

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jan 2, 2026
@joestringer joestringer added the dont-merge/wait-until-release Freeze window for current release is blocking non-bugfix PRs label Jan 8, 2026
@github-actions github-actions bot removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jan 9, 2026
@aanm aanm removed the dont-merge/wait-until-release Freeze window for current release is blocking non-bugfix PRs label Jan 14, 2026
@antonipp
Copy link
Copy Markdown
Contributor Author

Closing this one, since I handed over my work on this to Alex #44184

@antonipp antonipp closed this Feb 11, 2026
@antonipp antonipp deleted the ai/icmp-drop-ingress-ipv4 branch February 11, 2026 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants