Skip to content

Change host iptables rule addition from Append to Insert to ensure Istio's rules take precedence#56414

Merged
istio-testing merged 5 commits intoistio:masterfrom
Xscaperrr:iptables
Jul 12, 2025
Merged

Change host iptables rule addition from Append to Insert to ensure Istio's rules take precedence#56414
istio-testing merged 5 commits intoistio:masterfrom
Xscaperrr:iptables

Conversation

@Xscaperrr
Copy link
Copy Markdown
Contributor

@Xscaperrr Xscaperrr commented May 27, 2025

Please provide a description of this PR:
This PR modifies the existing host iptables rule addition logic from using the Append method to the Insert method. By inserting Istio's rules at the front of the iptables chain, we ensure these rules have higher matching priority compared to existing iptables rules.

Previously, appending rules caused issues in scenarios where existing iptables rules matched and handled targeted traffic before it reached Istio's rules. Consequently, the rules configured by Istio became ineffective.

We use a whitelist with iptables on the host, which only affects pods within the istio ipset. Therefore, matching it first will not cause any issues.

@Xscaperrr Xscaperrr requested a review from a team as a code owner May 27, 2025 03:34
@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla bot commented May 27, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@istio-policy-bot
Copy link
Copy Markdown

😊 Welcome @Xscaperrr! This is either your first contribution to the Istio istio repo, or it's been
a while since you've been here.

You can learn more about the Istio working groups, Code of Conduct, and contribution guidelines
by referring to Contributing to Istio.

Thanks for contributing!

Courtesy of your friendly welcome wagon.

@istio-testing istio-testing added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. needs-ok-to-test labels May 27, 2025
@istio-testing
Copy link
Copy Markdown
Collaborator

Hi @Xscaperrr. Thanks for your PR.

I'm waiting for a istio member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@sridhargaddam
Copy link
Copy Markdown
Contributor

/ok-to-test

@istio-testing istio-testing added ok-to-test Set this label allow normal testing to take place for a PR not submitted by an Istio org member. and removed needs-ok-to-test labels May 27, 2025
@sridhargaddam
Copy link
Copy Markdown
Contributor

Thank you for the PR. Yeah, this is a common thing with iptables where the order of the rules really matters. If a rule matches earlier in the chain, it can stop the traffic from ever reaching the Istio rules. So inserting the rule at the top makes sense in cases like this, especially when we're using a whitelist setup that only affects certain pods.

Could you please share some details about the platform where you're seeing this issue? Like the cloud platform, CNI, and anything else that might be relevant.

Also, it would be great if you could add/update the unit tests to cover this change.

@sridhargaddam
Copy link
Copy Markdown
Contributor

sridhargaddam commented May 27, 2025

CC: @yxun @leosarra

@istio-testing istio-testing added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels May 27, 2025
@Xscaperrr
Copy link
Copy Markdown
Contributor Author

/retest

@Xscaperrr
Copy link
Copy Markdown
Contributor Author

/retest-required

@Xscaperrr
Copy link
Copy Markdown
Contributor Author

Thank you for the PR. Yeah, this is a common thing with iptables where the order of the rules really matters. If a rule matches earlier in the chain, it can stop the traffic from ever reaching the Istio rules. So inserting the rule at the top makes sense in cases like this, especially when we're using a whitelist setup that only affects certain pods.

Could you please share some details about the platform where you're seeing this issue? Like the cloud platform, CNI, and anything else that might be relevant.

Also, it would be great if you could add/update the unit tests to cover this change.

In my scenario, the environment employs a custom-developed CNI plugin based on Open vSwitch. The following iptables rules currently exist in this environment:

-A POSTROUTING -m comment --comment "portmap/***-portmap" -j ***-SNAT
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A POSTROUTING -o docker0 -m addrtype --src-type LOCAL -j MASQUERADE
-A POSTROUTING -j ISTIO_POSTRT
-A ***-SNAT -s 88.88.0.0/16 -d 88.88.0.0/16 -m comment --comment "default/***-service" -j MASQUERADE
-A ***-SNAT ! -s 88.88.0.0/16 -d 88.88.0.0/16 -m comment --comment "default/***-service" -j MASQUERADE
-A ***-SNAT -s 88.88.0.0/16 ! -d 88.88.0.0/16 -m comment --comment "default/***-service" -j MASQUERADE
-A ISTIO_POSTRT -p tcp -m owner --socket-exists -m set --match-set istio-inpod-probes-v4 dst -j SNAT --to-source 169.254.7.127

The health check traffic originating from the host to Istio matches the first POSTROUTING rule, thus bypassing Istio's iptables rules.After my commit, Istio's iptables will be matched first.

@hzxuzhonghu
Copy link
Copy Markdown
Member

@bleggett @howardjohn Does ambient mesh still need to create iptabels on host node?

@sridhargaddam
Copy link
Copy Markdown
Contributor

@bleggett @howardjohn Does ambient mesh still need to create iptabels on host node?

I'll let @howardjohn to confirm, but AFAICT we are programming iptable rules on the host network to support health-check probes from kubelet.
i.e., when kubelet tries to perform health-checks to the podIP, the following rule on the host network modifies the sourceIP to 169.254.7.127 IP.

-A ISTIO_POSTRT -p tcp -m owner --socket-exists -m set --match-set istio-inpod-probes-v4 dst -j SNAT --to-source 169.254.7.127

where the ipset istio-inpod-probes-v4 includes the IP addresses of the pods that are part of the ambient mesh.

And inside the pod network namespace, we have the following rule to accept traffic with sourceIP of 169.254.7.127

-A ISTIO_PRERT -s 169.254.7.127 -p tcp -m tcp -j ACCEPT

@sridhargaddam
Copy link
Copy Markdown
Contributor

The health check traffic originating from the host to Istio matches the first POSTROUTING rule, thus bypassing Istio's iptables rules.After my commit, Istio's iptables will be matched first.

Assuming we have inserted the Istio iptable rule at the beginning of the POSTROUTING chain, could you please verify that your CNI will not overwrite the iptables rule later on, such as by pushing/removing the Istio iptable rule?

@Xscaperrr
Copy link
Copy Markdown
Contributor Author

The health check traffic originating from the host to Istio matches the first POSTROUTING rule, thus bypassing Istio's iptables rules.After my commit, Istio's iptables will be matched first.

Assuming we have inserted the Istio iptable rule at the beginning of the POSTROUTING chain, could you please verify that your CNI will not overwrite the iptables rule later on, such as by pushing/removing the Istio iptable rule?

In my scenario, the CNI plugin needs to insert its iptables rules just before the KUBE-POSTROUTING chain rule. Therefore, I can modify the CNI logic to always insert its rules directly before the KUBE-POSTROUTING rule instead of appending them at the top. However, if another program inserts a similar rule at the very beginning, it could still potentially interfere with the iptables rules set by Istio. This PR can only partially mitigate such issues, but does not completely resolve them.

@sridhargaddam
Copy link
Copy Markdown
Contributor

In my scenario, the CNI plugin needs to insert its iptables rules just before the KUBE-POSTROUTING chain rule. Therefore, I can modify the CNI logic to always insert its rules directly before the KUBE-POSTROUTING rule instead of appending them at the top. However, if another program inserts a similar rule at the very beginning, it could still potentially interfere with the iptables rules set by Istio. This PR can only partially mitigate such issues, but does not completely resolve them.

Yes, that’s a challenge when working with iptables. There needs to be some form of coordination among the different components that program iptables rules on the host network. Nftables (WIP) helps mitigate this issue by supporting priorities when creating base chains, allowing us to use higher priorities if needed and create custom table and chain names.

Co-authored-by: Sridhar Gaddam <sgaddam@redhat.com>
@hzxuzhonghu
Copy link
Copy Markdown
Member

Probably we should remove all these intrusion to the host

@howardjohn
Copy link
Copy Markdown
Member

howardjohn commented May 29, 2025 via email

@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Jun 29, 2025
@keithmattix
Copy link
Copy Markdown
Contributor

Not stale

@istio-policy-bot istio-policy-bot removed the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Jun 29, 2025
@sridhargaddam
Copy link
Copy Markdown
Contributor

sridhargaddam commented Jun 30, 2025

one worry I have is I think in testing some CNIs will revert changes if we are before their. Specifically calico can we test it

@howardjohn I verified the behavior with Calico CNI by deploying a KIND cluster and I can see that Calico is appending its rule (cali-POSTROUTING) at the end of the POSTROUTING by default. So, even if Istio Inserts (instead of append), it should be okay i guess for Calico.

# IPtable rules on the Host Network.
root@kind-worker:/# iptables -L POSTROUTING -n -v --line-numbers -t nat 
Chain POSTROUTING (policy ACCEPT 773 packets, 49584 bytes)
num   pkts bytes target     prot opt in     out     source               destination         
1      801 52897 KUBE-POSTROUTING  0    --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes postrouting rules */
2        0     0 DOCKER_POSTROUTING  0    --  *      *       0.0.0.0/0            172.18.0.1          
3      527 32254 ISTIO_POSTRT  0    --  *      *       0.0.0.0/0            0.0.0.0/0           
4      437 26306 cali-POSTROUTING  0    --  *      *       0.0.0.0/0            0.0.0.0/0            /* cali:0i8pjzKKPyA34aQD */

root@kind-worker:/# iptables -L ISTIO_POSTRT -n -v --line-numbers -t nat 
Chain ISTIO_POSTRT (1 references)
num   pkts bytes target     prot opt in     out     source               destination         
1      687 41220 SNAT       6    --  *      *       0.0.0.0/0            0.0.0.0/0            owner socket exists match-set istio-inpod-probes-v4 dst to:169.254.7.127

@sridhargaddam
Copy link
Copy Markdown
Contributor

@howardjohn @keithmattix, we need this fix even for OVN-Kubernetes CNI, which is used in OpenShift deployments. In an Ambient deployment, we observed that pod health checks were failing because OVN-K iptables rules were incorrectly being applied to the health-check traffic instead of the Istio SNAT rule (as its appended at the end of POSTROUTING chain).

FYI, the iptable rules from the host network were as follows in an OVN-K/OpenShift Cluster.

$ iptables -L -n -v --line-numbers -t nat
Chain POSTROUTING (policy ACCEPT 3002 packets, 181K bytes)
num   pkts bytes target     prot opt in     out     source               destination         
1       26  1652 MASQUERADE  all  --  *      *       169.254.0.1          0.0.0.0/0           
2     4385  263K MASQUERADE  all  --  *      *       10.131.0.0/23        0.0.0.0/0           
3     3002  181K OVN-KUBE-UDN-MASQUERADE  all  --  *      *       0.0.0.0/0            0.0.0.0/0           
4     2571  155K ISTIO_POSTRT  all  --  *      *       0.0.0.0/0            0.0.0.0/0           

Chain ISTIO_POSTRT (1 references)
num   pkts bytes target     prot opt in     out     source               destination         
1        0     0 SNAT       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            owner socket exists match-set istio-inpod-probes-v4 dst to:169.254.7.127

I discussed with the OVN-K team whether Istio's iptables rules, if inserted at the beginning of the POSTROUTING chain, would subsequently be pushed to the end of the chain by OVN-K. They confirmed that OVN-K only appends its rules and does not use insert operation.

Copy link
Copy Markdown
Contributor

@keithmattix keithmattix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I'm not enough of an expert in this area of the code yet to feel comfortable approving

@sridhargaddam
Copy link
Copy Markdown
Contributor

LGTM, but I'm not enough of an expert in this area of the code yet to feel comfortable approving

Thanks @keithmattix for the review.

Just to add some context. By using Insert instead of Append, we ensure that Istio’s rule is evaluated first which is important for Ambient use-case. The SNAT rule (on the host network) only affects traffic from the host network to pods in the Ambient mesh (based on the ipset), so inserting it at the top ensures it behaves as expected.

This is a common challenge with iptables, where rule order determines precedence. Nftables, on the other hand, supports setting priorities on base chains, which makes rule ordering easier to manage. For now, with iptables, inserting is the best option IMO.

Co-authored-by: Keith Mattix II <keithmattix2@gmail.com>
@sridhargaddam
Copy link
Copy Markdown
Contributor

@howardjohn can you PTAL, thanks.

@sridhargaddam
Copy link
Copy Markdown
Contributor

one worry I have is I think in testing some CNIs will revert changes if we
are before their. Specifically calico can we test it

I looked into the behavior of the Calico CNI and captured my observations above. I also discussed with the OVN-K team, who confirmed that OVN-K appends its rules and does not use insert operation. Even in the worst case where a CNI does revert our rule if it appears before theirs, we'd end up in the same state we’re in today (i.e., our rule being at the end of the chain). So this change wouldn't introduce any new issues compared to the current behavior. @istio/wg-networking-maintainers can you PTAL, thank you.

@keithmattix
Copy link
Copy Markdown
Contributor

/cherry-pick release-1.27

@istio-testing
Copy link
Copy Markdown
Collaborator

@keithmattix: once the present PR merges, I will cherry-pick it on top of release-1.27 in a new PR and assign it to you.

Details

In response to this:

/cherry-pick release-1.27

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@keithmattix
Copy link
Copy Markdown
Contributor

/test integ-ambient-dual

@keithmattix
Copy link
Copy Markdown
Contributor

/test unit-tests-arm64

@keithmattix
Copy link
Copy Markdown
Contributor

/retest

@sridhargaddam
Copy link
Copy Markdown
Contributor

/test unit-tests

@sridhargaddam
Copy link
Copy Markdown
Contributor

It appears like the following testcase is TestAmbientMulticlusterIndex_WaypointForWorkloadTraffic/no_traffic flaky.

@keithmattix
Copy link
Copy Markdown
Contributor

Yeah it's a data race caused by a bug. I've opened #56926 to fix it

@sridhargaddam
Copy link
Copy Markdown
Contributor

/test unit-tests

@sridhargaddam
Copy link
Copy Markdown
Contributor

/retest

@istio-testing istio-testing merged commit b208a07 into istio:master Jul 12, 2025
31 checks passed
@istio-testing
Copy link
Copy Markdown
Collaborator

@keithmattix: new pull request created: #56984

Details

In response to this:

/cherry-pick release-1.27

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

fjglira pushed a commit to fjglira/istio that referenced this pull request Sep 26, 2025
* upstream/master: (21 commits)
  feat: skip queue for status updates on gw (istio#56962)
  Automator: update proxy@master in istio/istio@master (istio#56993)
  Automator: update proxy@master in istio/istio@master (istio#56990)
  Change host iptables rule addition from Append to Insert to ensure Istio's rules take precedence (istio#56414)
  support specifying proxy admin port for describe (istio#56854)
  support reset log level or stack trace level separately for admin log (istio#56642)
  improve example format for istioctl x describe (istio#56951)
  Automator: update ztunnel@master in istio/istio@master (istio#56971)
  Remove flaky test (istio#56919)
  fix: fixes test which fails for distroless (istio#56965)
  Automator: update proxy@master in istio/istio@master (istio#56969)
  Ambient Multicluster SplitHorizon WDS Implementation (istio#56844)
  Fix log message in cni install.go file (istio#56966)
  add env vars for ip auto allocate ipv4/v6 cidr prefixes (istio#56276)
  Update BASE_VERSION to master-2025-07-10T19-01-16 (istio#56967)
  Add AllowCRDsMismatch parameter to gateway conformance options. (istio#56945)
  Revert "feat: represent revision tags using services (istio#56851)" (istio#56941)
  Automator: update proxy@master in istio/istio@master (istio#56954)
  Automator: update istio/client-go@master dependency in istio/istio@master (istio#56911)
  Automator: update common-files@master in istio/istio@master (istio#56952)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/ambient Issues related to ambient mesh area/environments area/networking ok-to-test Set this label allow normal testing to take place for a PR not submitted by an Istio org member. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants